什么是 Featuretools?#

Featuretools

Featuretools 是一个执行自动化特征工程的框架。它擅长将时态和关系数据集转换为用于机器学习的特征矩阵。

5 分钟快速入门#

下面是一个使用深度特征合成 (DFS) 执行自动化特征工程的示例。在此示例中,我们将 DFS 应用于由带时间戳的客户交易组成的多表数据集。

[1]:
import featuretools as ft

加载模拟数据#

[2]:
data = ft.demo.load_mock_customer()

准备数据#

在这个模拟数据集中,有 3 个 DataFrame。

  • customers:进行过会话的唯一客户

  • sessions:唯一的会话和相关属性

  • transactions:此会话中的事件列表

[3]:
customers_df = data["customers"]
customers_df
[3]:
customer_id zip_code join_date birthday
0 1 60091 2011-04-17 10:48:33 1994-07-18
1 2 13244 2012-04-15 23:31:04 1986-08-18
2 3 13244 2011-08-13 15:42:34 2003-11-21
3 4 60091 2011-04-08 20:08:14 2006-08-15
4 5 60091 2010-07-17 05:27:50 1984-07-28
[4]:
sessions_df = data["sessions"]
sessions_df.sample(5)
[4]:
session_id customer_id device session_start
13 14 1 tablet 2014-01-01 03:28:00
6 7 3 tablet 2014-01-01 01:39:40
1 2 5 mobile 2014-01-01 00:17:20
28 29 1 mobile 2014-01-01 07:10:05
24 25 3 desktop 2014-01-01 05:59:40
[5]:
transactions_df = data["transactions"]
transactions_df.sample(5)
[5]:
transaction_id session_id transaction_time product_id amount
74 417 5 2014-01-01 01:20:10 1 139.20
231 229 17 2014-01-01 04:10:15 2 90.79
434 127 31 2014-01-01 07:50:10 3 62.35
420 359 30 2014-01-01 07:35:00 3 72.70
54 249 4 2014-01-01 00:58:30 4 43.59

首先,我们指定一个包含数据集中所有 DataFrame 的字典。如果 DataFrame 存在索引列和时间索引列,则将其一并传入。

[6]:
dataframes = {
    "customers": (customers_df, "customer_id"),
    "sessions": (sessions_df, "session_id", "session_start"),
    "transactions": (transactions_df, "transaction_id", "transaction_time"),
}

其次,我们指定 DataFrame 如何关联。当两个 DataFrame 具有一对多关系时,我们将“一”的那一方 DataFrame 称为“父 DataFrame”。父子关系定义如下

(parent_dataframe, parent_column, child_dataframe, child_column)

在此数据集中,我们有两种关系

[7]:
relationships = [
    ("sessions", "session_id", "transactions", "session_id"),
    ("customers", "customer_id", "sessions", "customer_id"),
]

注意

为了管理 DataFrame 和关系的设置,我们建议使用 EntitySet 类,该类提供了方便的 API 来管理此类数据。有关更多信息,请参阅 使用 EntitySet 表示数据

运行深度特征合成#

DFS 的最小输入是一个 DataFrame 字典、一个关系列表以及我们要计算其特征的目标 DataFrame 名称。DFS 的输出是一个特征矩阵和相应的特征定义列表。

首先,让我们为数据中的每个客户创建一个特征矩阵

[8]:
feature_matrix_customers, features_defs = ft.dfs(
    dataframes=dataframes,
    relationships=relationships,
    target_dataframe_name="customers",
)
feature_matrix_customers
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
[8]:
zip_code COUNT(sessions) MODE(sessions.device) NUM_UNIQUE(sessions.device) COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) ... STD(sessions.SKEW(transactions.amount)) STD(sessions.SUM(transactions.amount)) SUM(sessions.MAX(transactions.amount)) SUM(sessions.MEAN(transactions.amount)) SUM(sessions.MIN(transactions.amount)) SUM(sessions.NUM_UNIQUE(transactions.product_id)) SUM(sessions.SKEW(transactions.amount)) SUM(sessions.STD(transactions.amount)) MODE(transactions.sessions.device) NUM_UNIQUE(transactions.sessions.device)
customer_id
1 60091 8 mobile 3 126 139.43 71.631905 5.81 4 5 ... 0.589386 279.510713 1057.97 582.193117 78.59 40.0 -0.476122 312.745952 mobile 3
2 13244 7 desktop 3 93 146.81 77.422366 8.73 4 5 ... 0.509798 251.609234 931.63 548.905851 154.60 35.0 -0.277640 258.700528 desktop 3
3 13244 6 desktop 3 93 149.15 67.060430 5.89 1 5 ... 0.429374 219.021420 847.63 405.237462 66.21 29.0 2.286086 257.299895 desktop 3
4 60091 8 mobile 3 109 149.95 80.070459 5.73 2 5 ... 0.387884 235.992478 1157.99 649.657515 131.51 37.0 0.002764 356.125829 mobile 3
5 60091 6 mobile 3 79 149.02 80.375443 7.55 5 5 ... 0.415426 402.775486 839.76 472.231119 86.49 30.0 0.014384 259.873954 mobile 3

5 行 × 75 列

我们现在有几十个新特征来描述客户的行为。

更改目标 DataFrame#

DFS 如此强大的原因之一是它可以为 EntitySet 中的任何 DataFrame 创建特征矩阵。例如,如果我们想为会话构建特征。

[10]:
feature_matrix_sessions, features_defs = ft.dfs(
    dataframes=dataframes, relationships=relationships, target_dataframe_name="sessions"
)
feature_matrix_sessions.head(5)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f2a84110820> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f2a8410bee0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f2a84110040> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f2a84110940> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f2a8410b8b0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  to_merge = base_frame.groupby(
[10]:
customer_id device COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) SKEW(transactions.amount) STD(transactions.amount) ... customers.STD(transactions.amount) customers.SUM(transactions.amount) customers.DAY(birthday) customers.DAY(join_date) customers.MONTH(birthday) customers.MONTH(join_date) customers.WEEKDAY(birthday) customers.WEEKDAY(join_date) customers.YEAR(birthday) customers.YEAR(join_date)
session_id
1 2 desktop 16 141.66 76.813125 20.91 3 5 0.295458 41.600976 ... 37.705178 7200.28 18 15 8 4 0 6 1986 2012
2 5 mobile 10 135.25 74.696000 9.32 5 5 -0.160550 45.893591 ... 44.095630 6349.66 28 17 7 7 5 5 1984 2010
3 4 mobile 15 147.73 88.600000 8.70 1 5 -0.324012 46.240016 ... 45.068765 8727.68 15 8 8 4 1 4 2006 2011
4 1 mobile 25 129.00 64.557200 6.29 5 5 0.234349 40.187205 ... 40.442059 9025.62 18 17 7 4 0 6 1994 2011
5 4 mobile 11 139.20 70.638182 7.43 5 5 0.336381 48.918663 ... 45.068765 8727.68 15 8 8 4 1 4 2006 2011

5 行 × 44 列

理解特征输出#

通常,Featuretools 通过特征名称引用生成的特征。为了使特征更易于理解,Featuretools 提供了另外两个工具,featuretools.graph_feature()featuretools.describe_feature(),以帮助解释特征是什么以及 Featuretools 生成它的步骤。让我们看看这个示例特征

[11]:
feature = features_defs[18]
feature
[11]:
<Feature: MODE(transactions.WEEKDAY(transaction_time))>
特征血缘图#

特征血缘图通过可视化方式展示特征生成过程。从基础数据开始,它们一步一步地展示应用的基元以及生成的中间特征,最终创建出最终特征。

[12]:
ft.graph_feature(feature)
[12]:
_images/index_22_0.svg
digraph "MODE(transactions.WEEKDAY(transaction_time))" {
	graph [bb="0,0,1456,156",
		rankdir=LR
	];
	node [label="\N",
		shape=box
	];
	edge [arrowhead=none,
		dir=forward,
		style=dotted
	];
	{
		graph [rank=min];
		"1_WEEKDAY(transaction_time)_weekday"	[height=0.94444,
			label=<<FONT POINT-SIZE="12"><B>Step 1:</B>   Transform<BR></BR></FONT>WEEKDAY>,
			pos="140,41",
			shape=diamond,
			width=3.8889];
	}
	sessions	[height=1.1389,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>★ sessions (target)</B></TD>
    </TR>
    <TR>
        <TD ALIGN="LEFT" port="MODE(transactions.WEEKDAY(transaction_time))" BGCOLOR="#D9EAD3">MODE(transactions.WEEKDAY(transaction_time))</TD>
    </TR>
</TABLE>>,
		pos="1258,79",
		shape=plaintext,
		width=5.5];
	transactions	[height=2.1667,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>transactions</B></TD>
    </TR><TR><TD ALIGN="LEFT" port="session_id">session_id</TD></TR>
<TR><TD ALIGN="LEFT" port="transaction_time">transaction_time</TD></TR>
<TR><TD ALIGN="LEFT" port="WEEKDAY(transaction_time)">WEEKDAY(transaction_time)</TD></TR>
</TABLE>>,
		pos="438.5,78",
		shape=plaintext,
		width=3.4028];
	transactions:transaction_time -> "1_WEEKDAY(transaction_time)_weekday"	[arrowhead="",
		pos="e,229.94,53.305 323.5,59 296.4,59 267.14,57.01 240.16,54.353",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[height=0.52778,
		label="group by
session_id",
		pos="641.5,60",
		width=1.2361];
	transactions:"WEEKDAY(transaction_time)" -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[arrowhead="",
		pos="e,612.79,40.777 554.5,22 571.67,22 589.24,28.414 604,35.971",
		style=solid];
	transactions:session_id -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[pos="554.5,97 574.82,97 595.8,88.179 611.95,79.15"];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode"	[height=0.94444,
		label=<<FONT POINT-SIZE="12"><B>Step 2:</B>   Aggregation<BR></BR></FONT>MODE>,
		pos="873,60",
		shape=diamond,
		width=4.1944];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode" -> sessions:"MODE(transactions.WEEKDAY(transaction_time))"	[arrowhead="",
		pos="e,1067,60 1024.1,60 1035.1,60 1046.1,60 1056.8,60",
		style=solid];
	"1_WEEKDAY(transaction_time)_weekday" -> transactions:"WEEKDAY(transaction_time)"	[arrowhead="",
		pos="e,323.5,22 227.61,28.274 254.68,25.178 284.94,22.601 313.48,22.091",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" -> "0_MODE(transactions.WEEKDAY(transaction_time))_mode"	[arrowhead="",
		pos="e,721.54,60 686.21,60 693.96,60 702.42,60 711.32,60",
		style=solid];
}
特征描述#

Featuretools 还可以自动生成特征的英文句子描述。特征描述有助于解释特征是什么,并且可以通过包含手动定义的自定义定义来进一步改进。有关如何自定义自动生成特征描述的更多详细信息,请参阅 生成特征描述

[13]:
ft.describe_feature(feature)
[13]:
'The most frequently occurring value of the day of the week of the "transaction_time" of all instances of "transactions" for each "session_id" in "sessions".'

下一步是什么?#

目录#

资源与参考