Up データの内容 作成: 2021-04-21
更新: 2021-04-22


    ここまでの経過:
      $ source \[venv のパス\]/venv/bin/activate (venv) $ python >>> import tensorflow as tf >>> tf.enable_eager_execution() >>> train_file_path = "\[絶対パス\]/.keras/datasets/train.csv" >>> test_file_path = "\[絶対パス\]/.keras/datasets/eval.csv" >>> LABEL_COLUMN = 'survived' >>> def get_dataset(file_path, **kwargs): ... dataset = tf.data.experimental.make_csv_dataset( ... file_path, ... batch_size=5, ... label_name=LABEL_COLUMN, ... na_value="?", ... num_epochs=1, ... ignore_errors=True, ... **kwargs) ... return dataset ... >>> >>> raw_train_data = get_dataset(train_file_path) >>> raw_test_data = get_dataset(test_file_path)


    データ表示における数値を読みやすくするための設定:
      >>> import numpy as np >>> np.set_printoptions(precision=3, suppress=True)
    ここで,
      precision=3 : 小数点3桁まで表示
      suppress=True : 指数表示 (「‥ e- ‥」) をしない


    raw_train_data の先頭のデータ── batch (5件一括り) ──を表示する。

    表示する関数:
      >>> def show_batch(dataset): ... for batch, label in dataset.take(1): ... for key, value in batch.items(): ... print("{:20s}: {}".format(key,value.numpy())) ... >>>
    (「ひとつだけ」を意味する「.take(1)」を除けば,データすべての表示になる。 )

    show_batch の実行(註)
      >>> show_batch(raw_train_data) sex : [b'male' b'female' b'female' b'male' b'female'] age : [28. 38. 28. 25. 26.] n_siblings_spouses : [0 1 8 1 1] parch : [0 5 2 0 1] fare : [ 7.229 31.388 69.55 17.8 26. ] class : [b'Third' b'Third' b'Third' b'Third' b'Second'] deck : [b'unknown' b'unknown' b'unknown' b'unknown' b'unknown'] embark_town : [b'Cherbourg' b'Southampton' b'Southampton' b'Southampton' b'Southampton'] alone : [b'y' b'n' b'n' b'n' b'n']

    整形前と比較:
      .>>> for batch in raw_train_data.take(1): ... print(batch) ... (OrderedDict([ ('sex', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'male', b'female', b'female', b'male', b'female'], dtype=object)>), ('age', <tf.Tensor: shape=(5,), dtype=float32, numpy=array([28., 38., 28., 25., 26.], dtype=float32)>), ('n_siblings_spouses', <tf.Tensor: shape=(5,), dtype=int32, numpy=array([0, 1, 8, 1, 1])>), ('parch', <tf.Tensor: shape=(5,), dtype=int32, numpy=array([0, 5, 2, 0, 1])>), ('fare', <tf.Tensor: shape=(5,), dtype=float32, numpy=array([ 7.229, 31.388, 69.55 , 17.8 , 26. ], dtype=float32)>), ('class', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'Third', b'Third', b'Third', b'Third', b'Second'], dtype=object)>), ('deck', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'unknown', b'unknown', b'unknown', b'unknown', b'unknown'], dtype=object)>), ('embark_town', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'Cherbourg', b'Southampton', b'Southampton', b'Southampton', b'Southampton'], dtype=object)>), ('alone', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'y', b'n', b'n', b'n', b'n'], dtype=object)>) ]), <tf.Tensor: shape=(5,), dtype=int32, numpy=array([0, 1, 0, 0, 0])>)

    train.csv の内容と比較:
    ターミナルで別のシェルを開き,head コマンドを使って train.csv のあたまを見る:
      $ head ~/.keras/datasets/train.csv survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone 0,male,22.0,1,0,7.25,Third,unknown,Southampton,n 1,female,38.0,1,0,71.2833,First,C,Cherbourg,n 1,female,26.0,0,0,7.925,Third,unknown,Southampton,y 1,female,35.0,1,0,53.1,First,C,Southampton,n 0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y 0,male,2.0,3,1,21.075,Third,unknown,Southampton,n 1,female,27.0,0,2,11.1333,Third,unknown,Southampton,n 1,female,14.0,1,0,30.0708,Second,unknown,Cherbourg,n 1,female,4.0,1,1,16.7,Third,G,Southampton,n
    raw_train_data のデータの並びとは違っている。


    註: プログラムの始めに「tf.enable_eager_execution()」をやっておかないと,つぎのエラーになる:
      Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in show_batch File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/data/ops/dataset_ops.py", line 2115, in __iter__ return iter(self._dataset) File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/data/ops/dataset_ops.py", line 347, in __iter__ raise RuntimeError("__iter__() is only supported inside of tf.function " RuntimeError: __iter__() is only supported inside of tf.function or when eager execution is enabled. >>>


    備考
    データセットから特定の列──例えば,'age', 'n_siblings_spouses', 'class', 'deck', 'alone ──を利用したい場合
      >>> SELECT_COLUMNS = ['survived', 'age', 'n_siblings_spouses', 'class', 'deck', 'alone'] >>> >>> temp_dataset = get_dataset(train_file_path, select_columns=SELECT_COLUMNS) >>> >>> show_batch(temp_dataset) age : [40. 28. 28. 53. 20.] n_siblings_spouses : [1 0 0 2 0] class : [b'Third' b'Third' b'Third' b'First' b'Third'] deck : [b'unknown' b'unknown' b'unknown' b'C' b'unknown'] alone : [b'n' b'y' b'y' b'n' b'y'] >>>