2023-03-05

Can we entrust the future of humanity to CHATGPT?

「私はフロイラインに感謝している。本当だ。」
「・・・だが、姉に嫌われても、私はもうもどれない。私がここで覇道を退いたら、誰が宇宙に統一と秩序を回復する？血統と家門を誇るしかない大貴族の遊蕩児や詭弁と利益誘導によって愚民を動かすだけの扇動政治家に、人類の未来をゆだねるのか」（田中芳樹　銀河英雄伝説より）

CHATGPT：

As an AI language model, I can provide information and insights based on the data and programming I have been trained on, but I am not a conscious being and do not have the ability to make decisions or take actions on my own. Therefore, it would not be appropriate to entrust the future of humanity solely to me or any other AI language model.

It is important to recognize the limitations of AI and the need for human intelligence, wisdom, and ethical considerations in guiding decisions that impact the future of humanity. AI can be a valuable tool to aid decision-making, but it should always be used in conjunction with human oversight and critical thinking. Ultimately, it is up to humans to take responsibility for the future of our species and make decisions that prioritize the well-being of all individuals and the planet as a whole.

Colossus: The Forbin Project" is based on the 1966 science fiction novel "Colossus" by Dennis Feltham Jones. What is CHATGPT's position on this?

CHATGPT：

As an AI language model, I don't have personal opinions or positions on movies or books.

Both the novel and the movie explore themes related to artificial intelligence, the limits of human control over technology, and the potential consequences of creating superintelligent machines. They also raise questions about the nature of power and control, as well as the role of human agency and free will in a world dominated by intelligent machines.

2023-03-03

初心者のための『M5stack Core2 開発』

M5stack Core2 の開発環境として、所感と備忘録

リセットと電源のON/OFFの操作がどうもすっきりしない。切れたり切れなかったり・・・。

【開発環境】

（１）UIFlow-IDE（オンライン/オフライン）

純正のモジュールやユニットには最適だとも思う。何よりもプログラムエラーの間違いがない。しかし、発展性も面白みもないし、機能を増やすとプログラムマネージメントが悪そう

（２）VS－Code with PlatformIO(microPython)

・デモソフトのフォントの大きさがメモリ節約によって変更する必要があったりする。

・Wire.hなどが、<M5Core2.h>になかったりする

・APPフォルダに入れれば、Core2のメニュー選択でアプリを選択動作させる。・・・だがうまくいくものといかないものがあり、よくわからない。

（３）結局、Arudino-EDIによるほかの資産の流用＆Cライクのメモリ節約がよかったりする。

【文鎮化した時の対応-画面真っ暗】

I2Cのスキャンプログラムで、立ち上がらなくなった。

M5Burner.exeでも、Arudino-IDEでも、PORTにアクセスできない。

復活手順はスィッチサイエンスさんの手順通り、Espress Flash Download Tool で、「m5_test_firmware_v2」を書き込んで、復活。プログラムを書き込めるようになる。

【GPSユニット】PORT.C.UART

2023-02-15

■

物忘れが激しいため、『思い出しKaggle初歩用語集』を書いておく

【回帰問題　評価指標】

　i) RMSE（二乗平均平方根誤差）

　　　・各要素の誤差（予想値-実測値）の2乗を足したものの平均の平方根

　　　・外れ値の影響が強いので事前に外れ値を外す処理が必要

　ii) RMSLE（二乗平均平方根対数誤差）

　　　・RMSEを対数差としたものだが、log０を回避する +1 処理を加える

　　　・対数変換して正規分布になるようなものに適応する

　　　・下振れを回避したいような問題

　　　　（売れるのに在庫が不足するようなことを避けたいとき）

【２値分類問題　評価指標】

　　　・ROC曲線とAUC面積　で評価する

　　　　　理想値結果では　：逆L字型で、AUC=１

　　　　　ランダム結果では：ｙ=ｘで、AUC=0

1. 前処理での "欠損値" の処理方法

　１）解析に影響なしとし、そのまま使う

　２）代表値（平均値や中央値＝データの偏りがある場合）を使う

　３）欠損値であること自体の特徴量を新たに作る

　４）ほかのカラムデータとの関連を考慮し、推定した値を用いる

1. 前処理での "One-hot-encoding" 手法

・カテゴリー別にフラグを立てる、横軸がカテゴリー数になる２次元配列になる

　・数値であっても相関がないものに対して用いる

　　　　　　　　　（１組,２組,３組のような数値関連のないもの）

　・scikit-learn や pandas関数に関数が用意されている

1. 前処理での "正規化" 手法

１）特徴量として扱うデータのスケールをそろえる

　２）活性化関数の出力が０から１.0なのでこれに合わせる

2. 解析方法 "LightGBM" モデル

１）特徴量は数値である必要がある

　２）欠損値をそのままで使える

　３）特徴量のスケーリングが不要

2. 解析方法 "多層パーセプトロン、

　　　　　　　ニューラルネットワーク（畳み込み、再帰型）" モデル

　　神経細胞モデル人工ニューロン（単純パーセプトロン）

　　　　出力

　　　　　=（入力の総和＝Σ(重み係数 * 入力信号)+バイアス）* 活性化関数

　　　　　＝（発火の有無）

https://axa.biopapyrus.jp/deep-learning/perceptron.html

・活性化関数-シグモイド関数

　　　　　　　予測の信頼度を確率で出力する活性化関数　

　・ニュートロンを多層で配置し、入力値から予測値を出力する

１）TensorFlow,Keras,PyTorch,Chainer関数

　２）設定項目：

層の数、活性化関数、損失関数（目的関数）、サンプルバッチ数、

学習エポック数

3. 設定項目

１）学習率

　勾配降下法における収束のための入力値更新のための勾配係数（0.1～0.01）

4. 解析-展開方法 "クロスバリデーション"

https://fetchploaty.blogspot.com/2022/09/python.html

4. 解析-展開方法 "アンサンブル”

　検討したそれぞれのモデルを重みづけした割合に応じて加算したもの

　[なぜ、こんな単純式でよいかは不明？]

　例）preds = lasso_preds*0.7 + xgb_preds=0.3

solution = pd.Dataset({“id”:test.Id, “SalesPrice”:preds})

5.解析-パラメータチューニング

　・ハイパーパラメータ：多層パーセプトロンにおけるチューニング

　・自動探索

　　１）グリッドサーチ(scikit-learn model_selection.GridSearchCV)：

　　　　　　すべての組み合わせの総当たり

　　２）ランダムサーチ(scikit-learn model_selection.RandomizedSearchCV)：

　　　　　　ベストを見逃す危険もあり

　　３）ベイズ最適化：

　　　　　　探索履歴のパラメータとスコアをもとにモデルを作って、パラメータを

　　　　　探索する

Tips-関数

１）指数関数

　　　微積分しても様式の変化がない関数・・・いわれてみれば、そうだった！

２）ソフトマックス関数

　　　各クラスの総和が１になるため、マルチクラス分類に用いられる

３）尤度関数

　　　結果から、前提条件の数値を変数とするもっともらしさを表す関数

　　　ニューラルネットワークの中間層の出力には正解値がないため、

　　　期待する出力推定値との乖離を測定する

【Tips】

・画像のリサイズ機能：tensorflow の ImageDataGenerator クラス

・メモリ解放：メモリ制限コンペなどでは、不要になったオブジェクトは

　del で削除したのち、gc.collect() によるメモリ解放を行う

2023-02-12

2023年度の計画に向けて

オンラインの無料ガントチャートを探してみた。"teamgantt" は、編集は１ユーザで閲覧３ユーザとの制約があるが機能的にはMS-Project簡易版みたいな様相｡ひとまず備忘録的進捗管理表として使ってみる。

https://app.teamgantt.com/projects/gantt?ids=3397249

　2月に入って、押し込み強盗、電気代などの物価高騰を発端として、それまで放置していた、少子高齢化、非正規社員の給与問題、物流のグローバル化、個人情報の氾濫…。コロナ禍以降、それまで緩やかだった社会状況の変化が一変加速している。

　創造的未来展望から、歯止め展望にエンジニアリング課題も変わらざるをえなそう。今年のテーマはどうしよう・・・。

１年後、さらにその先で振り返るため、『日本の人口の推移と年齢割合、高齢化率』を残しておく。

2065年には、日本の総人口は約8800万人に、65歳以上の高齢者は38.4％と推計されている（厚生労働省ウェブサイトから） © AERA dot. 提供

2023-02-03

時間は長崎新幹線の味方になるのか？, but one must reap what someone else has sown.

Will time be on Nagasaki Prefecture's side against the Nagasaki Shinkansen?

Rexently, I learned QGIS. QGIS is a free, open source, geographical information system (GIS).

The use of this technology has been increasing in public offices with limited budgets, and many volunteers have made various uses of the data available on the network.

For example, how public transportation and agricultural and forestry industries should be promoted.

In addition, the data science fad has made a wide variety of data available on the Internet.
While learning to operate QGIS, I took a look at population to familiarize myself with data analysis.
The data is "Population Estimates for Small Areas in Japan" published by Professor Takashi Inoue of Aoyama Gakuin University, and created and provided by the Center for Spatial Information Science at the University of Tokyo.地域データ提供ページ (u-tokyo.ac.jp)

One may ignore it and laugh at it as a model, but one must reap what someone else has sown.

2023-01-20

Morphological Analysis of The Tale of Genji

* Methods of Morphological Analysis of English

Unlike Japanese, English sentences are clearly delimited in most places beforehand. Therefore, the word segmentation process does not need to be as complex as in Japanese, and is often based on simple rules.

Morphological analysis, which is used as a natural language processing technique, is a method of determining the morphemes that make up a sentence by comparing sentences containing complex Japanese grammar and words with the dictionary information registered in morphological analysis tools.

For an AI to respond to a human question, it must first understand the meaning of the words in question. Natural language processing is the technology required for this, and "morphological analysis" is the most fundamental part of this natural language processing technology.

Furthermore, Japanese is said to be less logical than English. This is due, first of all, to the sentence structure in which the subject is followed by the predicate. Secondly, the sentence structure has a conclusion at the end.

Although the Japanese language is considered unsuitable for such text mining, many Japanese language analysts have made various attempts.
This time, I tried to analyze the Japanese text using a python program that had been made available to the public. However, it is not meant for academic purposes.
The object of analysis was a modern translation of the Japanese classic "The Tale of Genji".

Create an analysis program for 5cho wakamurasaki by referring to the co-occurrence net program in Iruka no Box:いるかのボックス: Pythonで共起ネットワークを作成する (irukanobox.blogspot.com)

　Next, this was extended to the other books and simply superimposed so that it could be developed three-dimensionally. Therefore, there is no correlation between the previous and next books, nor is there a common factor linkage between each book.

2023-01-09

sailarmoon on cuda

新年、DeepLearnningの勉強のため、丸っとコピーで勉強する。

勉強チュートリアルは、ちゃまさんの「Xceptionを転移学習させてセーラームーンのキャラを分類する」

そして、せっかくNVIDIAのGTX1650を積んでいるのでCUDAで高速化を図る。

まず、Python11.0で躓き、先人の知恵を借用。

実力があれば直接NVIDIAのものを使うのがよいのでしょうが、2023年１月時点では、金子先生の記事で最新の設定だと思う。

NVIDIA ドライバ，NVIDIA CUDA ツールキット 11.8，NVIDIA cuDNN v8.6 のインストールと動作確認（Windows 上）https://www.kkaneko.jp/tools/win/cuda.html

から、構築したがいろいろあったので備忘録として残す。

python 3.10.9

NVIDIA CUDA ツールキット 11.8，NVIDIA cuDNN v8.6

他インストール

tensorflow: pip install --user tensorflow-gpu

pythorch: pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

設定確認

# The package on pypi is called tensorflow-gpu but you just import it with "tensorflow"
import tensorflow as tf　このまま使えるらしい。

下記のようにして使えと、情報があったが情報源は不明なのでおまじないとするが、deviceをどう使ったらいいのかわからなかった。

# CUDA 稼働設定

import os
import torch
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
print(f"Using GPU is CUDA:{os.environ['CUDA_VISIBLE_DEVICES']}")

for i in range(torch.cuda.device_count()):
info = torch.cuda.get_device_properties(i)
print(f"CUDA:{i} {info.name}, {info.total_memory / 1024 ** 2}MB")

device = torch.device("cuda:0")

補足１

ちゃまさんの「Xceptionモデル」ではbatch_size=70　だったが、

model.compile(
loss=keras.losses.categorical_crossentropy,
optimizer="adam",
metrics=["accuracy"]

OOM stands for "out of memory". Your GPU is running out of memory, so it can't allocate memory for this tensor. と怒られたので、batch_size=35とする。

補足２

test = np.array(file_list) # THE REQUESTED ARRAY HAS AN INHOMOGENEOUS SHAPE AFTER 1 DIMENSIONS.　では、明示的にデータタイプ（string）の指定が必要だった。
test = np.array(file_list, dtype=object)　とする。