Up numeric_columns の作成 作成: 2021-04-22
更新: 2021-04-23


    numerical な column は,'age','n_siblings_spouses','parch', 'fare' である:
      >>> CSV_COLUMNS = ['survived', 'sex', 'age', 'n_siblings_spouses', ... 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone'] >>> NUMERIC_FEATURES = ['age','n_siblings_spouses','parch', 'fare']

    1. これらをベクトル化して,単一の列に変換:
      >>> class PackNumericFeatures(object): ... def __init__(self, names): ... self.names = names ... ... def __call__(self, features, labels): ... numeric_features = [features.pop(name) for name in self.names] ... numeric_features = [tf.cast(feat, tf.float32) for feat in numeric_features] ... numeric_features = tf.stack(numeric_features, axis=-1) ... features['numeric'] = numeric_features ... ... return features, labels ... >>> >>> packed_train_data = raw_train_data.map( ... PackNumericFeatures(NUMERIC_FEATURES)) ... >>> >>> packed_test_data = raw_test_data.map( ... PackNumericFeatures(NUMERIC_FEATURES))


    packed_train_data = ‥‥,packed_test_data = ‥‥ では,つぎの WARNING が返される:
      WARNING:tensorflow:AutoGraph could not transform > and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING: AutoGraph could not transform > and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code


    内容チェック:
      >>> show_batch(raw_train_data) sex : [b'male' b'female' b'male' b'female' b'male'] age : [28. 33. 20. 35. 31.] n_siblings_spouses : [0 1 0 0 1] parch : [0 0 0 0 0] fare : [ 7.75 53.1 8.05 21. 57. ] class : [b'Third' b'First' b'Third' b'Second' b'First'] deck : [b'unknown' b'E' b'unknown' b'unknown' b'B'] embark_town : [b'Queenstown' b'Southampton' b'Southampton' b'Southampton' b'Southampton'] alone : [b'y' b'n' b'y' b'y' b'n'] >>> show_batch(packed_train_data) sex : [b'male' b'female' b'male' b'female' b'male'] class : [b'Third' b'First' b'Third' b'Second' b'First'] deck : [b'unknown' b'E' b'unknown' b'unknown' b'B'] embark_town : [b'Queenstown' b'Southampton' b'Southampton' b'Southampton' b'Southampton'] alone : [b'y' b'n' b'y' b'y' b'n'] numeric : [[28. 0. 0. 7.75] [33. 1. 0. 53.1 ] [20. 0. 0. 8.05] [35. 0. 0. 21. ] [31. 1. 0. 57. ]] >>> for batch in packed_train_data.take(1): ... print(batch) .... (OrderedDict([ ('sex', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'male', b'female', b'male', b'female', b'male'], dtype=object)>), ('class', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'Third', b'First', b'Third', b'Second', b'First'], dtype=object)>), ('deck', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'unknown', b'E', b'unknown', b'unknown', b'B'], dtype=object)>), ('embark_town', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'Queenstown', b'Southampton', b'Southampton', b'Southampton', b'Southampton'], dtype=object)>), ('alone', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'y', b'n', b'y', b'y', b'n'], dtype=object)>), ('numeric', <tf.Tensor: shape=(5, 4), dtype=float32, numpy= array([[28. , 0. , 0. , 7.75], [33. , 1. , 0. , 53.1 ], [20. , 0. , 0. , 8.05], [35. , 0. , 0. , 21. ], [31. , 1. , 0. , 57. ]], dtype=float32)>) ]), <tf.Tensor: shape=(5,), dtype=int32, numpy=array([0, 1, 0, 1, 1])>)


    2. データの正規化
      (モジュール pandas を使う(註)) >>> import pandas as pd >>> desc = pd.read_csv(train_file_path)[NUMERIC_FEATURES].describe() >>> MEAN = np.array(desc.T['mean']) >>> STD = np.array(desc.T['std']) >>> def normalize_numeric_data(data, mean, std): ... return (data - mean)/std >>> import functools >>> normalizer = functools.partial( normalize_numeric_data, mean=MEAN, std=STD ) >>> numeric_column = tf.feature_column.numeric_column( ... 'numeric', normalizer_fn = normalizer, shape = [len(NUMERIC_FEATURES)] )


    内容チェック:
      >>> desc age n_siblings_spouses parch fare count 627.000000 627.000000 627.000000 627.000000 mean 29.631308 0.545455 0.379585 34.385399 std 12.511818 1.151090 0.792999 54.597730 min 0.750000 0.000000 0.000000 0.000000 25% 23.000000 0.000000 0.000000 7.895800 50% 28.000000 0.000000 0.000000 15.045800 75% 35.000000 1.000000 0.000000 31.387500 max 80.000000 8.000000 5.000000 512.329200 >>> numeric_column NumericColumn( key='numeric', shape=(4,), default_value=None, dtype=tf.float32, normalizer_fn=functools.partial(<function normalize_numeric_data at 0x64b42420>, mean=array([29.631, 0.545, 0.38 , 34.385]), std=array([12.512, 1.151, 0.793, 54.598])))


    3. numeric_columns の作成
      >>> numeric_columns = [numeric_column]


    内容チェック:
      >>> numeric_columns [ NumericColumn( key='numeric', shape=(4,), default_value=None, dtype=tf.float32, normalizer_fn=functools.partial(<function normalize_numeric_data at 0x5f4c68a0>, mean=array([29.631, 0.545, 0.38 , 34.385]), std=array([12.512, 1.151, 0.793, 54.598]))) ]


    4. numeric_columns の機能テスト
     numeric_layer を作成:
      >>> numeric_layer = tf.keras.layers.DenseFeatures(numeric_columns) >>> numeric_layer <tensorflow.python.feature_column.dense_features.DenseFeatures object at 0x641b3710>


     packed_train_data からバッチを1つとって,numeric_layer に入力:
      >>> for batch in packed_train_data.take(1): ... print( numeric_layer(batch).numpy()[0] ) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 913, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/feature_column/dense_features.py", line 129, in call features) ValueError: ('We expected a dictionary here. Instead we got: ', (OrderedDict([ ('sex', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'male', b'female', b'male', b'female', b'male'], dtype=object)>), ('class', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'Third', b'First', b'Third', b'Second', b'First'], dtype=object)>), ('deck', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'unknown', b'E', b'unknown', b'unknown', b'B'], dtype=object)>), ('embark_town', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'Queenstown', b'Southampton', b'Southampton', b'Southampton', b'Southampton'], dtype=object)>), ('alone', <tf.Tensor: shape=(5,), dtype=string, numpy=array([b'y', b'n', b'y', b'y', b'n'], dtype=object)>), ('numeric', <tf.Tensor: shape=(5, 4), dtype=float32, numpy= array([[28. , 0. , 0. , 7.75], [33. , 1. , 0. , 53.1 ], [20. , 0. , 0. , 8.05], [35. , 0. , 0. , 21. ], [31. , 1. , 0. , 57. ]], dtype=float32)>) ]), <tf.Tensor: shape=(5,), dtype=int32, numpy=array([0, 1, 0, 1, 1])>)



    註 : pandas が install されていない場合
      >>> import pandas as pd Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'pandas'

     ターミナルで別のシェルを開いて,pandas をインストールする:
      $ source ./venv/bin/activate (venv) pi@raspi:~ $ pip show pandas WARNING: Package(s) not found: pandas (venv) pi@raspi:~ $ pip install pandas Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple Collecting pandas Downloading https://www.piwheels.org/simple/pandas/pandas-1.2.4-cp37-cp37m-linux_armv7l.whl (32.3 MB) |████████████████████████████████| 32.3 MB 6.5 kB/s Collecting pytz>=2017.3 Downloading https://www.piwheels.org/simple/pytz/pytz-2021.1-py2.py3-none-any.whl (510 kB) |████████████████████████████████| 510 kB 292 kB/s Requirement already satisfied: numpy>=1.16.5 in ./venv/lib/python3.7/site-packages (from pandas) (1.20.2) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/lib/python3/dist-packages (from pandas) (2.7.3) Installing collected packages: pytz, pandas Successfully installed pandas-1.2.4 pytz-2021.1 (venv) pi@raspi:~ $ pip show pandas  ‥‥ Version: 1.2.4  ‥‥ Location: /home/pi/venv/lib/python3.7/site-packages  ‥‥