部署#
部署机器学习模型需要在新数据上重复特征工程步骤。在某些情况下,这些步骤需要近乎实时地执行。Featuretools 具有简化特征工程部署的功能。
保存特征#
首先,让我们构建一些生成相同格式的训练和测试数据。我们使用随机种子为测试生成不同的数据。
注意
在一个版本的 Featuretools 中保存的特征不能保证在另一个版本中加载。这意味着在升级 Featuretools 后可能需要重新创建特征。
[1]:
import featuretools as ft
es_train = ft.demo.load_mock_customer(return_entityset=True)
es_test = ft.demo.load_mock_customer(return_entityset=True, random_seed=33)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
现在让我们使用 DFS 构建一些特征定义。因为我们有类别特征,所以我们还根据训练数据中的值使用独热编码对其进行编码。
[2]:
feature_matrix, feature_defs = ft.dfs(
entityset=es_train, target_dataframe_name="customers"
)
feature_matrix_enc, features_enc = ft.encode_features(feature_matrix, feature_defs)
feature_matrix_enc
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[2]:
COUNT(sessions) | NUM_UNIQUE(sessions.device) | COUNT(transactions) | MAX(transactions.amount) | MEAN(transactions.amount) | MIN(transactions.amount) | NUM_UNIQUE(transactions.product_id) | SKEW(transactions.amount) | STD(transactions.amount) | SUM(transactions.amount) | ... | MODE(sessions.MODE(transactions.product_id)) 是未知 | MODE(sessions.MONTH(session_start)) = 1 | MODE(sessions.MONTH(session_start)) 是未知 | MODE(sessions.WEEKDAY(session_start)) = 2 | MODE(sessions.WEEKDAY(session_start)) 是未知 | MODE(sessions.YEAR(session_start)) = 2014 | MODE(sessions.YEAR(session_start)) 是未知 | MODE(transactions.sessions.device) = mobile | MODE(transactions.sessions.device) = desktop | MODE(transactions.sessions.device) 是未知 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | |||||||||||||||||||||
5 | 6 | 3 | 79 | 149.02 | 80.375443 | 7.55 | 5 | -0.025941 | 44.095630 | 6349.66 | ... | False | True | False | True | False | True | False | True | False | False |
4 | 8 | 3 | 109 | 149.95 | 80.070459 | 5.73 | 5 | -0.036348 | 45.068765 | 8727.68 | ... | False | True | False | True | False | True | False | True | False | False |
1 | 8 | 3 | 126 | 139.43 | 71.631905 | 5.81 | 5 | 0.019698 | 40.442059 | 9025.62 | ... | False | True | False | True | False | True | False | True | False | False |
3 | 6 | 3 | 93 | 149.15 | 67.060430 | 5.89 | 5 | 0.418230 | 43.683296 | 6236.62 | ... | False | True | False | True | False | True | False | False | True | False |
2 | 7 | 3 | 93 | 146.81 | 77.422366 | 8.73 | 5 | 0.098259 | 37.705178 | 7200.28 | ... | False | True | False | True | False | True | False | False | True | False |
5 行 × 121 列
现在,我们可以使用 featuretools.save_features 将特征列表保存到 json 文件中
[3]:
ft.save_features(features_enc, "feature_definitions.json")
计算新数据的特征矩阵#
我们可以使用 featuretools.load_features 读取已保存的特征列表,以便为我们的新实体集计算。
[4]:
saved_features = ft.load_features("feature_definitions.json")
加载回特征后,我们可以计算特征矩阵。
[5]:
feature_matrix = ft.calculate_feature_matrix(saved_features, es_test)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f29813e6820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f29813e6040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f29813e1ee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f29813e6940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f29813e18b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[5]:
zip_code = 60091 | zip_code = 13244 | zip_code 是未知 | COUNT(sessions) | MODE(sessions.device) = mobile | MODE(sessions.device) = desktop | MODE(sessions.device) 是未知 | NUM_UNIQUE(sessions.device) | COUNT(transactions) | MAX(transactions.amount) | ... | SUM(sessions.MAX(transactions.amount)) | SUM(sessions.MEAN(transactions.amount)) | SUM(sessions.MIN(transactions.amount)) | SUM(sessions.NUM_UNIQUE(transactions.product_id)) | SUM(sessions.SKEW(transactions.amount)) | SUM(sessions.STD(transactions.amount)) | MODE(transactions.sessions.device) = mobile | MODE(transactions.sessions.device) = desktop | MODE(transactions.sessions.device) 是未知 | NUM_UNIQUE(transactions.sessions.device) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | |||||||||||||||||||||
1 | True | False | False | 6 | False | True | False | 3 | 73 | 147.64 | ... | 834.08 | 524.919674 | 198.92 | 25.0 | -1.546156 | 217.064024 | True | False | False | 3 |
4 | False | True | False | 9 | False | True | False | 3 | 126 | 147.55 | ... | 1180.90 | 733.862898 | 193.08 | 43.0 | -1.797214 | 319.497611 | False | True | False | 3 |
3 | True | False | False | 5 | True | False | False | 2 | 64 | 148.09 | ... | 715.80 | 407.390549 | 108.69 | 23.0 | 0.353061 | 215.417211 | True | False | False | 2 |
2 | False | True | False | 8 | False | True | False | 3 | 129 | 148.34 | ... | 1100.82 | 615.714934 | 136.01 | 39.0 | -0.082021 | 315.817331 | False | True | False | 3 |
5 | True | False | False | 7 | False | True | False | 3 | 108 | 149.53 | ... | 997.48 | 584.302915 | 137.50 | 33.0 | -0.595128 | 261.535265 | False | True | False | 3 |
5 行 × 121 列
如上所示,我们获得了与之前完全相同的特征,但使用测试数据计算。
导出特征矩阵#
保存为 csv#
特征矩阵是一个 pandas DataFrame,我们可以将其保存到磁盘。
[6]:
feature_matrix.to_csv("feature_matrix.csv")
我们也可以按如下方式读回它
[7]:
import pandas as pd
saved_fm = pd.read_csv("feature_matrix.csv", index_col="customer_id")
saved_fm
[7]:
zip_code = 60091 | zip_code = 13244 | zip_code 是未知 | COUNT(sessions) | MODE(sessions.device) = mobile | MODE(sessions.device) = desktop | MODE(sessions.device) 是未知 | NUM_UNIQUE(sessions.device) | COUNT(transactions) | MAX(transactions.amount) | ... | SUM(sessions.MAX(transactions.amount)) | SUM(sessions.MEAN(transactions.amount)) | SUM(sessions.MIN(transactions.amount)) | SUM(sessions.NUM_UNIQUE(transactions.product_id)) | SUM(sessions.SKEW(transactions.amount)) | SUM(sessions.STD(transactions.amount)) | MODE(transactions.sessions.device) = mobile | MODE(transactions.sessions.device) = desktop | MODE(transactions.sessions.device) 是未知 | NUM_UNIQUE(transactions.sessions.device) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
customer_id | |||||||||||||||||||||
1 | True | False | False | 6 | False | True | False | 3 | 73 | 147.64 | ... | 834.08 | 524.919674 | 198.92 | 25.0 | -1.546156 | 217.064024 | True | False | False | 3 |
4 | False | True | False | 9 | False | True | False | 3 | 126 | 147.55 | ... | 1180.90 | 733.862898 | 193.08 | 43.0 | -1.797214 | 319.497611 | False | True | False | 3 |
3 | True | False | False | 5 | True | False | False | 2 | 64 | 148.09 | ... | 715.80 | 407.390549 | 108.69 | 23.0 | 0.353061 | 215.417211 | True | False | False | 2 |
2 | False | True | False | 8 | False | True | False | 3 | 129 | 148.34 | ... | 1100.82 | 615.714934 | 136.01 | 39.0 | -0.082021 | 315.817331 | False | True | False | 3 |
5 | True | False | False | 7 | False | True | False | 3 | 108 | 149.53 | ... | 997.48 | 584.302915 | 137.50 | 33.0 | -0.595128 | 261.535265 | False | True | False | 3 |
5 行 × 121 列