numerical な column は,'age','n_siblings_spouses','parch', 'fare' である:
>>> CSV_COLUMNS = ['survived', 'sex', 'age', 'n_siblings_spouses',
... 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']
>>> NUMERIC_FEATURES = ['age','n_siblings_spouses','parch', 'fare']
1. これらをベクトル化して,単一の列に変換:
>>> class PackNumericFeatures(object):
... def __init__(self, names):
... self.names = names
...
... def __call__(self, features, labels):
... numeric_features = [features.pop(name) for name in self.names]
... numeric_features = [tf.cast(feat, tf.float32) for feat in numeric_features]
... numeric_features = tf.stack(numeric_features, axis=-1)
... features['numeric'] = numeric_features
...
... return features, labels
...
>>>
>>> packed_train_data = raw_train_data.map(
... PackNumericFeatures(NUMERIC_FEATURES))
...
>>>
>>> packed_test_data = raw_test_data.map(
... PackNumericFeatures(NUMERIC_FEATURES))
packed_train_data = ‥‥,packed_test_data = ‥‥ では,つぎの WARNING が返される:
WARNING:tensorflow:AutoGraph could not transform > and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING: AutoGraph could not transform > and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
内容チェック:
>>> show_batch(raw_train_data)
sex : [b'male' b'female' b'male' b'female' b'male']
age : [28. 33. 20. 35. 31.]
n_siblings_spouses : [0 1 0 0 1]
parch : [0 0 0 0 0]
fare : [ 7.75 53.1 8.05 21. 57. ]
class : [b'Third' b'First' b'Third' b'Second' b'First']
deck : [b'unknown' b'E' b'unknown' b'unknown' b'B']
embark_town : [b'Queenstown' b'Southampton' b'Southampton' b'Southampton' b'Southampton']
alone : [b'y' b'n' b'y' b'y' b'n']
>>> show_batch(packed_train_data)
sex : [b'male' b'female' b'male' b'female' b'male']
class : [b'Third' b'First' b'Third' b'Second' b'First']
deck : [b'unknown' b'E' b'unknown' b'unknown' b'B']
embark_town : [b'Queenstown' b'Southampton' b'Southampton' b'Southampton' b'Southampton']
alone : [b'y' b'n' b'y' b'y' b'n']
numeric : [[28. 0. 0. 7.75]
[33. 1. 0. 53.1 ]
[20. 0. 0. 8.05]
[35. 0. 0. 21. ]
[31. 1. 0. 57. ]]
>>> for batch in packed_train_data.take(1):
... print(batch)
....
(OrderedDict([
('sex', ),
('class', ),
('deck', ),
('embark_town', ),
('alone', ),
('numeric', )
]),
)
2. データの正規化
(モジュール pandas を使う(註))
>>> import pandas as pd
>>> desc = pd.read_csv(train_file_path)[NUMERIC_FEATURES].describe()
>>> MEAN = np.array(desc.T['mean'])
>>> STD = np.array(desc.T['std'])
>>> def normalize_numeric_data(data, mean, std):
... return (data - mean)/std
>>> import functools
>>> normalizer = functools.partial( normalize_numeric_data, mean=MEAN, std=STD )
>>> numeric_column = tf.feature_column.numeric_column(
... 'numeric', normalizer_fn = normalizer, shape = [len(NUMERIC_FEATURES)] )
内容チェック:
>>> desc
age n_siblings_spouses parch fare
count 627.000000 627.000000 627.000000 627.000000
mean 29.631308 0.545455 0.379585 34.385399
std 12.511818 1.151090 0.792999 54.597730
min 0.750000 0.000000 0.000000 0.000000
25% 23.000000 0.000000 0.000000 7.895800
50% 28.000000 0.000000 0.000000 15.045800
75% 35.000000 1.000000 0.000000 31.387500
max 80.000000 8.000000 5.000000 512.329200
>>> numeric_column
NumericColumn(
key='numeric',
shape=(4,),
default_value=None,
dtype=tf.float32,
normalizer_fn=functools.partial(,
mean=array([29.631, 0.545, 0.38 , 34.385]),
std=array([12.512, 1.151, 0.793, 54.598])))
3. numeric_columns の作成
>>> numeric_columns = [numeric_column]
内容チェック:
>>> numeric_columns
[
NumericColumn(
key='numeric',
shape=(4,),
default_value=None,
dtype=tf.float32,
normalizer_fn=functools.partial(,
mean=array([29.631, 0.545, 0.38 , 34.385]),
std=array([12.512, 1.151, 0.793, 54.598])))
]
4. numeric_columns の機能テスト
numeric_layer を作成:
>>> numeric_layer = tf.keras.layers.DenseFeatures(numeric_columns)
>>> numeric_layer
packed_train_data からバッチを1つとって,numeric_layer に入力:
>>> for batch in packed_train_data.take(1):
... print( numeric_layer(batch).numpy()[0] )
...
Traceback (most recent call last):
File "", line 2, in
File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 913, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "/home/pi/venv/lib/python3.7/site-packages/tensorflow_core/python/feature_column/dense_features.py", line 129, in call
features)
ValueError: ('We expected a dictionary here. Instead we got: ',
(OrderedDict([
('sex', ),
('class', ),
('deck', ),
('embark_town', ),
('alone', ),
('numeric', )
]),
)
註 : pandas が install されていない場合
>>> import pandas as pd
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'pandas'
ターミナルで別のシェルを開いて,pandas をインストールする:
$ source ./venv/bin/activate
(venv) pi@raspi:~ $ pip show pandas
WARNING: Package(s) not found: pandas
(venv) pi@raspi:~ $ pip install pandas
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting pandas
Downloading https://www.piwheels.org/simple/pandas/pandas-1.2.4-cp37-cp37m-linux_armv7l.whl (32.3 MB)
|████████████████████████████████| 32.3 MB 6.5 kB/s
Collecting pytz>=2017.3
Downloading https://www.piwheels.org/simple/pytz/pytz-2021.1-py2.py3-none-any.whl (510 kB)
|████████████████████████████████| 510 kB 292 kB/s
Requirement already satisfied: numpy>=1.16.5 in ./venv/lib/python3.7/site-packages (from pandas) (1.20.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/lib/python3/dist-packages (from pandas) (2.7.3)
Installing collected packages: pytz, pandas
Successfully installed pandas-1.2.4 pytz-2021.1
(venv) pi@raspi:~ $ pip show pandas
‥‥
Version: 1.2.4
‥‥
Location: /home/pi/venv/lib/python3.7/site-packages
‥‥
|