Up データのロード 作成: 2021-04-21
更新: 2021-04-21


  1. データファイル (train.csv, eval.csv) のダウンロード
    「tf.data を使って CSV をロードする」を練習で繰り返す場合,ダウンロードは初回だけで,次回以降はこのプロセスをスキップする。

    ダウンロード:
      >>> TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv" >>> TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv" >>> tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL) Downloading data from https://storage.googleapis.com/tf-datasets/titanic/train.csv 32768/30874 [===============================] - 0s 1us/step >>> tf.keras.utils.get_file("eval.csv", TEST_DATA_URL) Downloading data from https://storage.googleapis.com/tf-datasets/titanic/eval.csv 16384/13049 [=====================================] - 0s 1us/step >>>

    このとき,(カーレントディレクトリがどこであっても) ディレクトリ ~ に .keras が以下の内容でつくられる:

      .keras ─ datasets ┬ eval.csv └ train.csv



  2. train.csv, eval.csv のロード
    ファイル rain.csv, eval.csv のパスを設定:
      >>> train_file_path = "[絶対パス]/.keras/datasets/train.csv" >>> test_file_path = "[絶対パス]/.keras/datasets/eval.csv"
     「~/.keras/datasets/‥‥.csv」はダメ(註)

    読み込み形式の設定:
      >>> LABELS = [0, 1] >>> LABEL_COLUMN = 'survived' >>> def get_dataset(file_path, **kwargs): ... dataset = tf.data.experimental.make_csv_dataset( ... file_path, ... batch_size=5, ... label_name=LABEL_COLUMN, ... na_value="?", ... num_epochs=1, ... ignore_errors=True, ... **kwargs) ... return dataset ... >>>

      ここで「batch_size=5」の意味は:
        データを<5件を一括り>で表示
        (Artificially small to make examples easier to show.)

    ロード
      >>> raw_train_data = get_dataset(train_file_path) WARNING:tensorflow:From /home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/data/experimental/ops/readers.py:540: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`. >>> raw_test_data = get_dataset(test_file_path)



     註: rain.csv, eval.csv のパスを設定するところで「~/.keras/datasets/‥‥.csv」を使うと,
    train.csv, eval.csv をロードする get_dataset で,つぎのエラーが返される:
      >>> raw_train_data = get_dataset(train_file_path) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 9, in get_dataset File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/data/experimental/ops/readers.py", line 588, in make_csv_dataset_v1 compression_type, ignore_errors)) File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/data/experimental/ops/readers.py", line 437, in make_csv_dataset_v2 filenames = _get_file_names(file_pattern, False) File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/data/experimental/ops/readers.py", line 970, in _get_file_names file_names = list(gfile.Glob(file_pattern)) File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/lib/io/file_io.py", line 363, in get_matching_files return get_matching_files_v2(filename) File "/home/pi/venv/lib/python3.7/site-packages/ tensorflow_core/python/lib/io/file_io.py", line 384, in get_matching_files_v2 compat.as_bytes(pattern)) tensorflow.python.framework.errors_impl. NotFoundError: ~/.keras/datasets; No such file or directory