常见问题#
在此我们尝试回答 GitHub 和 Stack Overflow 上经常出现的一些常见问题。
[1]:
import pandas as pd
import woodwork as ww
import featuretools as ft
EntitySet#
如何在 EntitySet
中获取列名和类型的列表?#
创建 EntitySet
后,您可能希望查看列名。一个 EntitySet
包含多个 DataFrame,每个表在 EntitySet
中都有一个对应的 DataFrame。
[2]:
es = ft.demo.load_mock_customer(return_entityset=True)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[2]:
Entityset: transactions
DataFrames:
transactions [Rows: 500, Columns: 6]
products [Rows: 5, Columns: 3]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 5]
Relationships:
transactions.product_id -> products.product_id
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
如果您想查看底层 DataFrame,可以执行以下操作:
[3]:
es["transactions"].head()
[3]:
交易ID | 会话ID | 交易时间 | 产品ID | 金额 | _ft_上次时间 | |
---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2014-01-01 00:00:00 |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2014-01-01 00:01:05 |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2014-01-01 00:02:10 |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2014-01-01 00:03:15 |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2014-01-01 00:04:20 |
如果您想查看“transactions” DataFrame 的列和类型,可以执行以下操作:
[4]:
es["transactions"].ww
[4]:
物理类型 | 逻辑类型 | 语义标签 | |
---|---|---|---|
列 | |||
交易ID | int64 | 整数 | ['index'] |
会话ID | int64 | 整数 | ['numeric', 'foreign_key'] |
交易时间 | datetime64[ns] | 日期时间 | ['time_index'] |
产品ID | 类别 | 分类 | ['category', 'foreign_key'] |
金额 | float64 | 双精度 | ['numeric'] |
_ft_上次时间 | datetime64[ns] | 日期时间 | ['last_time_index'] |
copy_columns
和 additional_columns
有什么区别?#
函数 normalize_dataframe
从现有 DataFrame 的唯一值创建一个新的 DataFrame 和关系。它接受 2 个类似的参数:
additional_columns
从基础 DataFrame 中移除列并将它们移动到新的 DataFrame。copy_columns
将给定列保留在基础 DataFrame 中,但也将其复制到新的 DataFrame。
[5]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions",
dataframe=transactions_df,
index="transaction_id",
time_index="transaction_time",
)
es = es.add_dataframe(
dataframe_name="products", dataframe=products_df, index="product_id"
)
es = es.add_relationship("products", "product_id", "transactions", "product_id")
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
在规范化创建新 DataFrame 之前,我们先看看基础 DataFrame:
[6]:
es["transactions"].head()
[6]:
交易ID | 会话ID | 交易时间 | 产品ID | 金额 | 客户ID | 设备 | 会话开始 | 邮政编码 | 加入日期 | 生日 | |
---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
注意 session_id
, session_start
, join_date
, device
, customer_id
和 zip_code
这些列。
[7]:
es = es.normalize_dataframe(
base_dataframe_name="transactions",
new_dataframe_name="sessions",
index="session_id",
make_time_index="session_start",
additional_columns=["join_date"],
copy_columns=["device", "customer_id", "zip_code", "session_start"],
)
上面,我们规范化了列以创建一个新的 DataFrame。
对于
additional_columns
,列['join_date]
将从transactions
DataFrame 中移除,并移动到新的sessions
DataFrame 中。对于
copy_columns
,列['device', 'customer_id', 'zip_code','session_start']
将从transactions
DataFrame 复制到新的sessions
DataFrame 中。
让我们在实际的 EntitySet
中看看。
[8]:
es["transactions"].head()
[8]:
交易ID | 会话ID | 交易时间 | 产品ID | 金额 | 客户ID | 设备 | 会话开始 | 邮政编码 | 生日 | |
---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | 桌面 | 2014-01-01 | 13244 | 1986-08-18 |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | 桌面 | 2014-01-01 | 13244 | 1986-08-18 |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | 桌面 | 2014-01-01 | 13244 | 1986-08-18 |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | 桌面 | 2014-01-01 | 13244 | 1986-08-18 |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | 桌面 | 2014-01-01 | 13244 | 1986-08-18 |
注意上面,['device', 'customer_id', 'zip_code','session_start']
如何仍然在 transactions
DataFrame 中,而 ['join_date']
不在。但是,它们都已移至 sessions
DataFrame 中,如下所示。
[9]:
es["sessions"].head()
[9]:
会话ID | 加入日期 | 设备 | 客户ID | 邮政编码 | 会话开始 | |
---|---|---|---|---|---|---|
1 | 1 | 2012-04-15 23:31:04 | 桌面 | 2 | 13244 | 2014-01-01 00:00:00 |
2 | 2 | 2010-07-17 05:27:50 | 移动设备 | 5 | 60091 | 2014-01-01 00:17:20 |
3 | 3 | 2011-04-08 20:08:14 | 移动设备 | 4 | 60091 | 2014-01-01 00:28:10 |
4 | 4 | 2011-04-17 10:48:33 | 移动设备 | 1 | 60091 | 2014-01-01 00:44:25 |
5 | 5 | 2011-04-08 20:08:14 | 移动设备 | 4 | 60091 | 2014-01-01 01:11:30 |
如何更新列的描述或元数据?#
您可以直接更新列模式(column schema)的描述或元数据属性。但是,您必须专门使用 DataFrame.ww.columns['col_name']
返回的列模式,而不是 DataFrame.ww['col_name'].ww.schema
。从 DataFrame.ww.columns['col_name']
获取的列模式仍然与 EntitySet 关联,并传播任何属性更新,而另一种方式则不会。例如,您可以通过以下方式更新列的描述或元数据:
column_schema = df.ww.columns['col_name']
column_schema.description = 'my description'
column_schema.metadata.update(key='value')
如何合并两个或多个有趣值?#
您可能希望在计算特征之前创建基于多个值作为条件的特征。这将需要使用 interesting_values
。然而,由于我们正尝试创建具有多个条件的特征,因此我们需要在创建 EntitySet
之前修改 DataFrame。
让我们看看如何实现这一点。
首先,让我们创建我们的 DataFrame。
[12]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
products_df = data["products"]
[13]:
transactions_df.head()
[13]:
交易ID | 会话ID | 交易时间 | 产品ID | 金额 | 客户ID | 设备 | 会话开始 | 邮政编码 | 加入日期 | 生日 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
1 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
2 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
3 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
4 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
[14]:
products_df.head()
[14]:
产品ID | 品牌 | |
---|---|---|
0 | 1 | B |
1 | 2 | B |
2 | 3 | B |
3 | 4 | B |
4 | 5 | A |
现在,让我们修改 transactions
DataFrame,以创建代表我们特征的多个条件的附加列。
[15]:
transactions_df["product_id_device"] = (
transactions_df["product_id"].astype(str) + " and " + transactions_df["device"]
)
这里,我们创建了一个名为 product_id_device
的新列,它只是组合了 product_id
列和 device
列。
现在让我们创建我们的 EntitySet
。
[16]:
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions",
dataframe=transactions_df,
index="transaction_id",
time_index="transaction_time",
logical_types={
"product_id": ww.logical_types.Categorical,
"product_id_device": ww.logical_types.Categorical,
"zip_code": ww.logical_types.PostalCode,
},
)
es = es.add_dataframe(
dataframe_name="products", dataframe=products_df, index="product_id"
)
es = es.normalize_dataframe(
base_dataframe_name="transactions",
new_dataframe_name="sessions",
index="session_id",
additional_columns=["device", "product_id_device", "customer_id"],
)
es = es.normalize_dataframe(
base_dataframe_name="sessions", new_dataframe_name="customers", index="customer_id"
)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[16]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 9]
products [Rows: 5, Columns: 2]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 2]
Relationships:
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
现在,我们准备添加我们的有趣值。
首先,让我们看看有趣值可能有哪些选项。
[17]:
interesting_values = transactions_df["product_id_device"].unique().tolist()
interesting_values
[17]:
['5 and desktop',
'2 and desktop',
'3 and desktop',
'4 and desktop',
'1 and desktop',
'4 and mobile',
'5 and mobile',
'1 and mobile',
'3 and mobile',
'2 and mobile',
'4 and tablet',
'3 and tablet',
'2 and tablet',
'1 and tablet',
'5 and tablet']
如果您愿意,可以选择其中的一部分,并且创建的 where
特征将仅使用这些条件。在我们的示例中,我们将使用所有可能的有趣值。
在这里,我们将所有这些值设置为此特定 DataFrame 和列的有趣值。如果我们愿意,可以用同样的方式为多个列设置有趣值,但在此示例中我们只使用这一个。
[18]:
values = {"product_id_device": interesting_values}
es.add_interesting_values(dataframe_name="sessions", values=values)
现在我们可以运行 DFS 了。
[19]:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=["count"],
where_primitives=["count"],
trans_primitives=[],
)
feature_matrix.head()
[19]:
COUNT(sessions) | COUNT(transactions) | COUNT(sessions WHERE product_id_device = 1 and mobile) | COUNT(sessions WHERE product_id_device = 2 and mobile) | COUNT(sessions WHERE product_id_device = 4 and desktop) | COUNT(sessions WHERE product_id_device = 5 and mobile) | COUNT(sessions WHERE product_id_device = 1 and tablet) | COUNT(sessions WHERE product_id_device = 5 and desktop) | COUNT(sessions WHERE product_id_device = 2 and desktop) | COUNT(sessions WHERE product_id_device = 2 and tablet) | ... | COUNT(transactions WHERE sessions.product_id_device = 1 and desktop) | COUNT(transactions WHERE sessions.product_id_device = 3 and mobile) | COUNT(transactions WHERE sessions.product_id_device = 1 and mobile) | COUNT(transactions WHERE sessions.product_id_device = 4 and desktop) | COUNT(transactions WHERE sessions.product_id_device = 5 and mobile) | COUNT(transactions WHERE sessions.product_id_device = 5 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 4 and mobile) | COUNT(transactions WHERE sessions.product_id_device = 4 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 2 and tablet) | COUNT(transactions WHERE sessions.product_id_device = 3 and desktop) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
客户ID | |||||||||||||||||||||
2 | 7 | 93 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | ... | 8 | 0 | 0 | 10 | 0 | 13 | 18 | 0 | 0 | 0 |
5 | 6 | 79 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 8 | 18 | 14 | 0 | 0 | 10 | 14 | 0 | 0 |
4 | 8 | 109 | 1 | 2 | 1 | 0 | 0 | 1 | 1 | 0 | ... | 0 | 15 | 15 | 18 | 0 | 18 | 0 | 0 | 0 | 0 |
1 | 8 | 126 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 56 | 27 | 0 | 0 |
3 | 6 | 93 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | ... | 33 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 0 |
5 行 × 32 列
为了更好地理解 where
子句特征,我们来检查其中一个特征。特征 COUNT(sessions WHERE product_id_device = 5 and tablet)
告诉我们客户在平板电脑上购买 product_id
5 的会话次数。注意该特征如何依赖于多个条件 (product_id = 5 & device = tablet)。
[20]:
feature_matrix[["COUNT(sessions WHERE product_id_device = 5 and tablet)"]]
[20]:
COUNT(sessions WHERE product_id_device = 5 and tablet) | |
---|---|
客户ID | |
2 | 1 |
5 | 0 |
4 | 1 |
1 | 0 |
3 | 0 |
DFS#
为什么 DFS 没有创建聚合特征?#
您可能已经创建了 EntitySet
,然后应用 DFS 创建特征。然而,您可能会感到困惑,为什么没有创建聚合特征。
这很可能是因为您的 EntitySet 中只有一个 DataFrame,而 DFS 无法在少于 2 个 DataFrame 的情况下创建聚合特征。Featuretools 会寻找关系,并基于该关系进行聚合。
让我们看一个简单的例子。
[21]:
data = ft.demo.load_mock_customer()
transactions_df = data["transactions"].merge(data["sessions"]).merge(data["customers"])
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions", dataframe=transactions_df, index="transaction_id"
)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[21]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 11]
Relationships:
No relationships
注意我们的 EntitySet
中只有一个 DataFrame。如果我们尝试在此 EntitySet
上创建聚合特征,这是不可能的,因为 DFS 需要 2 个 DataFrame 来生成聚合特征。
[22]:
feature_matrix, feature_defs = ft.dfs(
entityset=es, target_dataframe_name="transactions"
)
feature_defs
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[22]:
[<Feature: session_id>,
<Feature: product_id>,
<Feature: amount>,
<Feature: customer_id>,
<Feature: device>,
<Feature: zip_code>,
<Feature: DAY(birthday)>,
<Feature: DAY(join_date)>,
<Feature: DAY(session_start)>,
<Feature: DAY(transaction_time)>,
<Feature: MONTH(birthday)>,
<Feature: MONTH(join_date)>,
<Feature: MONTH(session_start)>,
<Feature: MONTH(transaction_time)>,
<Feature: WEEKDAY(birthday)>,
<Feature: WEEKDAY(join_date)>,
<Feature: WEEKDAY(session_start)>,
<Feature: WEEKDAY(transaction_time)>,
<Feature: YEAR(birthday)>,
<Feature: YEAR(join_date)>,
<Feature: YEAR(session_start)>,
<Feature: YEAR(transaction_time)>]
以上这些特征都不是聚合特征。要解决此问题,您可以向 EntitySet 添加另一个 DataFrame。
解决方案 #1 - 如果您有额外数据,可以添加新的 DataFrame。
[23]:
products_df = data["products"]
es = es.add_dataframe(
dataframe_name="products", dataframe=products_df, index="product_id"
)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[23]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 11]
products [Rows: 5, Columns: 2]
Relationships:
No relationships
注意,我们的 EntitySet
中现在有一个额外的 DataFrame,名为 products
。
解决方案 #2 - 您可以规范化现有 DataFrame。
[24]:
es = es.normalize_dataframe(
base_dataframe_name="transactions",
new_dataframe_name="sessions",
index="session_id",
make_time_index="session_start",
additional_columns=["device", "customer_id", "zip_code", "join_date"],
copy_columns=["session_start"],
)
es
[24]:
Entityset: customer_data
DataFrames:
transactions [Rows: 500, Columns: 7]
products [Rows: 5, Columns: 2]
sessions [Rows: 35, Columns: 6]
Relationships:
transactions.session_id -> sessions.session_id
注意,我们的 EntitySet
中现在有一个额外的 DataFrame,名为 sessions
。在这里,规范化在 transactions
和 sessions
之间创建了关系。然而,如果我们只使用了解决方案 #1,我们也可以指定 transactions
和 products
之间的关系。
现在,我们可以生成聚合特征了。
[25]:
feature_matrix, feature_defs = ft.dfs(
entityset=es, target_dataframe_name="transactions"
)
feature_defs[:-10]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
[25]:
[<Feature: session_id>,
<Feature: product_id>,
<Feature: amount>,
<Feature: DAY(birthday)>,
<Feature: DAY(session_start)>,
<Feature: DAY(transaction_time)>,
<Feature: MONTH(birthday)>,
<Feature: MONTH(session_start)>,
<Feature: MONTH(transaction_time)>,
<Feature: WEEKDAY(birthday)>,
<Feature: WEEKDAY(session_start)>,
<Feature: WEEKDAY(transaction_time)>,
<Feature: YEAR(birthday)>,
<Feature: YEAR(session_start)>,
<Feature: YEAR(transaction_time)>,
<Feature: sessions.device>,
<Feature: sessions.customer_id>,
<Feature: sessions.zip_code>,
<Feature: sessions.COUNT(transactions)>,
<Feature: sessions.MAX(transactions.amount)>,
<Feature: sessions.MEAN(transactions.amount)>,
<Feature: sessions.MIN(transactions.amount)>,
<Feature: sessions.MODE(transactions.product_id)>,
<Feature: sessions.NUM_UNIQUE(transactions.product_id)>,
<Feature: sessions.SKEW(transactions.amount)>]
一些聚合特征是:
<Feature: sessions.MAX(transactions.amount)>
<Feature: sessions.SKEW(transactions.amount)>
<Feature: sessions.MIN(transactions.amount)>
<Feature: sessions.MEAN(transactions.amount)>
<Feature: sessions.COUNT(transactions)>
如何加快 DFS 的运行时?#
运行 ft.dfs
时可能遇到的一个问题是性能缓慢。虽然 Featuretools 通常具有最优的默认设置用于计算特征,但在计算大量特征时,您可能希望加快性能。
一种快速提高性能的方法是调整 ft.dfs
或 ft.calculate_feature_matrix
的 n_jobs
设置。
# setting n_jobs to -1 will use all cores
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_dataframe_name="customers",
n_jobs=-1)
feature_matrix, feature_defs = ft.calculate_feature_matrix(entityset=es,
features=feature_defs,
n_jobs=-1)
要了解更多提高性能的方法,请访问:
如何在运行 DFS 时仅包含某些特征?#
使用 DFS 生成特征时,您可能希望仅包含某些特征。您可以通过多种方式实现这一点:
使用
ignore_columns
指定 DataFrame 中不应用于创建特征的列。它是一个字典,将 DataFrame 名称映射到要忽略的列名列表。使用
drop_contains
丢弃名称包含此参数中列出的任何字符串的特征。使用
drop_exact
丢弃名称与此参数中列出的任何字符串完全匹配的特征。
这里是使用这三个参数的示例:
[26]:
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
ignore_columns={
"transactions": ["amount"],
"customers": ["age", "gender", "birthday"],
}, # ignore these columns
drop_contains=["customers.SUM("], # drop features that contain these strings
drop_exact=["STD(transactions.quanity)"],
) # drop features that exactly match
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
如何按列或按 DataFrame 指定原语?#
使用 DFS 生成特征时,您可能希望仅对特定原语使用某些特征或 DataFrame。这可以通过 primitive_options
参数完成。primitive_options
参数是一个字典,将一个原语或一个原语元组映射到包含该原语选项的字典。如果原语接受多个输入,也可以将一个原语或原语元组映射到选项字典列表。原语键可以是原语的字符串名称、原语类或特定原语实例。每个字典为其各自的输入列提供选项。您可以通过这些选项控制原语的应用方式:
使用
ignore_dataframes
指定不应用于为该原语创建特征的 DataFrame。它是一个要忽略的 DataFrame 名称列表。使用
include_dataframes
指定仅用于为该原语创建特征的 DataFrame。它是一个要包含的 DataFrame 名称列表。使用
ignore_columns
指定 DataFrame 中不应用于为该原语创建特征的列。它是一个字典,将 DataFrame 名称映射到要忽略的列名列表。使用
include_columns
指定 DataFrame 中仅应用于为该原语创建特征的列。它是一个字典,将 DataFrame 名称映射到要包含的列名列表。
您还可以使用 primitive_options
指定希望用作 groupby 转换原语分组的 DataFrame 或列:
使用
ignore_groupby_dataframes
指定不应用于为该原语获取分组的 DataFrame。它是一个要忽略的 DataFrame 名称列表。使用
include_groupby_dataframes
指定仅应用于为该原语获取分组的 DataFrame。它是一个要包含的 DataFrame 名称列表。使用
ignore_groupby_columns
指定 DataFrame 中不应用于为该原语分组的列。它是一个字典,将 DataFrame 名称映射到要忽略的列名列表。使用
include_groupby_columns
指定 DataFrame 中仅应用于为该原语分组的列。它是一个字典,将 DataFrame 名称映射到要包含的列名列表。
这里是使用其中一些选项的示例:
[27]:
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
primitive_options={
"mode": {
"ignore_dataframes": ["sessions"],
"ignore_columns": {"products": ["brand"], "transactions": ["product_id"]},
},
# For mode, ignore the "sessions" DataFrame and only include "brands" in the
# "products" dataframe and "product_id" in the "transactions" DataFrame
("count", "mean"): {"include_dataframes": ["sessions", "transactions"]},
# For count and mean, only include the dataframes "sessions" and "transactions"
},
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
请注意,如果为一个特定原语实例和一般原语(通过字符串名称或类)都提供了选项,则具有自己选项的实例将不会使用通用选项。例如,在这种情况下:
special_mean = Mean()
options = {
special_mean: {'include_dataframes': ['customers']},
'mean': {'include_dataframes': ['sessions']}
原语 special_mean
将不会使用 DataFrame sessions
,因为它的选项只包含 customers
。其他所有 Mean
原语实例将使用 'mean'
选项。
有关指定 DFS 选项的更多示例,请访问:
如果我没有指定 cutoff_time ,将使用什么日期进行特征计算?#
截止时间将使用 cutoff_time = datetime.now()
设置为当前时间。
在计算特征时如何选择特定数量的历史数据?#
您可能会遇到一种情况,希望仅使用特定数量的历史数据进行预测。您可以使用 ft.dfs
中的 training_window
参数来实现此目的。当您使用 training_window
时,Featuretools 将使用 cutoff_time
和 cutoff_time - training_window
之间的历史数据。
为了进行计算,Featuretools 将检查 target_dataframe
中 time_index
列中的时间。
[28]:
es = ft.demo.load_mock_customer(return_entityset=True)
es["customers"].ww.time_index
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[28]:
'join_date'
我们的 target_dataframe 有一个 time_index
,这是进行 training_window
计算所必需的。这里,我们正在创建一个截止时间 DataFrame,以便每个客户可以有一个唯一的训练窗口。
[29]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
training_window="1 hour",
)
feature_matrix.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
[29]:
邮政编码 | COUNT(sessions) | 模式(会话.设备) | 唯一数量(会话.设备) | COUNT(transactions) | 最大值(交易.金额) | 平均值(交易.金额) | 最小值(交易.金额) | 模式(交易.产品ID) | 唯一数量(交易.产品ID) | ... | 标准差(会话.总和(交易.金额)) | 总和(会话.最大值(交易.金额)) | 总和(会话.平均值(交易.金额)) | 总和(会话.最小值(交易.金额)) | 总和(会话.唯一数量(交易.产品ID)) | 总和(会话.偏度(交易.金额)) | 总和(会话.标准差(交易.金额)) | 模式(交易.会话.设备) | 唯一数量(交易.会话.设备) | 标签 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
客户ID | 时间 | |||||||||||||||||||||
1 | 2014-01-01 04:00:00 | 60091 | 1 | 平板电脑 | 1 | 12 | 139.09 | 85.469167 | 6.78 | 4 | 5 | ... | NaN | 139.09 | 85.469167 | 6.78 | 5.0 | -0.830975 | 39.825249 | 平板电脑 | 1 | True |
2 | 2014-01-01 05:00:00 | 13244 | 1 | 平板电脑 | 1 | 13 | 118.85 | 77.304615 | 21.82 | 1 | 5 | ... | NaN | 118.85 | 77.304615 | 21.82 | 5.0 | -0.314918 | 33.725036 | 平板电脑 | 1 | True |
3 | 2014-01-01 06:00:00 | 13244 | 2 | 桌面 | 1 | 12 | 128.26 | 81.747500 | 20.06 | 3 | 5 | ... | 563.882303 | 220.02 | 172.597273 | 111.82 | 6.0 | -0.289466 | 35.704680 | 桌面 | 1 | False |
1 | 2014-01-01 08:00:00 | 60091 | 1 | 移动设备 | 1 | 16 | 126.11 | 88.755625 | 11.62 | 4 | 5 | ... | NaN | 126.11 | 88.755625 | 11.62 | 5.0 | -1.038434 | 32.324534 | 移动设备 | 1 | True |
4 行 × 76 列
上面,我们使用 training_window
参数设置为 1 hour
运行了 DFS,以创建仅使用最近一小时(从我们提供的截止时间算起)收集的客户数据的特征。
我可以在单个表上运行 DFS 吗?#
虽然可以在单个表上运行 DFS,但这未能充分利用 DFS 的能力。例如,DFS 将无法使用任何聚合原语,因为这至少需要两个表。您将只能使用转换原语。这限制了 DFS 可以通过特征堆叠生成的特征的复杂性。此外,在某些情况下,对带有时间列的数据运行单表 DFS 可能会导致标签泄漏。对于分割在多个表中的数据,featuretools 可以基于截止时间过滤数据,而不是假设数据已适当地展平,但对于单个表则无法做到这一点。
如果您只有单个表的数据,DFS 当然仍然有用。有两种主要方式可以将单个表传递给 DFS。
第一种是简单地创建一个包含一个表的 EntitySet。
例如:
[30]:
transactions_df = ft.demo.load_mock_customer(return_single_table=True)
es = ft.EntitySet(id="customer_data")
es = es.add_dataframe(
dataframe_name="transactions",
dataframe=transactions_df,
index="transaction_id",
time_index="transaction_time",
)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="transactions",
trans_primitives=[
"time_since",
"day",
"is_weekend",
"cum_min",
"minute",
"weekday",
"percentile",
"year",
"week",
"cum_mean",
],
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
第二种方式是将 dataframe 插入到一个字典中,该字典将其名称映射到一个包含特定 dataframe 信息的元组。然后我们将该字典传递给 DFS 中的 dataframes
参数。
在此场景中,对于字典中的值,我们传递一个元组,其中包含 dataframe、其索引列和其时间索引。关于可能的参数的更多信息可以在 DFS 文档 中找到。
例如:
[31]:
transactions_df = ft.demo.load_mock_customer(return_single_table=True)
dataframes = {"transactions": (transactions_df, "transaction_id", "transaction_time")}
feature_matrix, feature_defs = ft.dfs(
dataframes=dataframes,
target_dataframe_name="transactions",
trans_primitives=[
"time_since",
"day",
"is_weekend",
"cum_min",
"minute",
"weekday",
"percentile",
"year",
"week",
"cum_mean",
],
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
在检查输出之前,我们先看看我们原始的单表数据。
[32]:
transactions_df.head()
[32]:
交易ID | 会话ID | 交易时间 | 产品ID | 金额 | 客户ID | 设备 | 会话开始 | 邮政编码 | 加入日期 | 生日 | 品牌 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 1 | 2014-01-01 00:00:00 | 5 | 127.64 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | A |
2 | 2 | 1 | 2014-01-01 00:01:05 | 2 | 109.48 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
438 | 438 | 1 | 2014-01-01 00:02:10 | 3 | 95.06 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
192 | 192 | 1 | 2014-01-01 00:03:15 | 4 | 78.92 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
271 | 271 | 1 | 2014-01-01 00:04:20 | 3 | 31.54 | 2 | 桌面 | 2014-01-01 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 | B |
现在我们可以看看 Featuretools 能够应用于这个单个 DataFrame 以创建特征矩阵的转换。
[33]:
feature_matrix.head()
[33]:
会话ID | 产品ID | 金额 | 客户ID | 设备 | 邮政编码 | 品牌 | 累计平均值(金额) | 累计平均值(客户ID) | 累计平均值(会话ID) | ... | 周(会话开始) | 周(交易时间) | 工作日(生日) | 工作日(加入日期) | 工作日(会话开始) | 工作日(交易时间) | 年(生日) | 年(加入日期) | 年(会话开始) | 年(交易时间) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
交易ID | |||||||||||||||||||||
10 | 1 | 5 | 127.64 | 2 | 桌面 | 13244 | A | 127.640000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
2 | 1 | 2 | 109.48 | 2 | 桌面 | 13244 | B | 118.560000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
438 | 1 | 3 | 95.06 | 2 | 桌面 | 13244 | B | 110.726667 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
192 | 1 | 4 | 78.92 | 2 | 桌面 | 13244 | B | 102.775000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
271 | 1 | 3 | 31.54 | 2 | 桌面 | 13244 | B | 88.528000 | 2.0 | 1.0 | ... | 1 | 1 | 0 | 6 | 2 | 2 | 1986 | 2012 | 2014 | 2014 |
5 行 × 44 列
如何使用 DFS 防止标签泄漏?#
使用 DFS 时您可能关心的一个问题是标签泄漏。您希望确保数据中的标签不会被不当地用于创建特征和特征矩阵。
Featuretools 特别注重帮助用户避免标签泄漏。
根据您的数据是否有时间戳,有两种方法可以防止标签泄漏。
1. 无时间戳数据#
在没有时间戳的情况下,您可以仅使用训练数据创建一个 EntitySet
,然后运行 ft.dfs
。这将仅使用训练数据创建特征矩阵,但也会返回一个特征定义列表。接下来,您可以使用测试数据创建一个 EntitySet
,并通过使用之前获取的特征定义列表调用 ft.calculate_feature_matrix
来重新计算相同的特征。
以下是该流程的样子:
首先,我们创建训练数据。
[34]:
train_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"age": [40, 50, 10, 20, 30],
"gender": ["m", "f", "m", "f", "f"],
"signup_date": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
"labels": [True, False, True, False, True],
}
)
train_data.head()
[34]:
客户ID | 年龄 | 性别 | 注册日期 | 标签 | |
---|---|---|---|---|---|
0 | 1 | 40 | 男 | 2014-01-01 01:41:50 | True |
1 | 2 | 50 | 女 | 2014-01-01 02:06:50 | False |
2 | 3 | 10 | 男 | 2014-01-01 02:31:50 | True |
3 | 4 | 20 | 女 | 2014-01-01 02:56:50 | False |
4 | 5 | 30 | 女 | 2014-01-01 03:21:50 | True |
现在,我们可以为训练数据创建一个 entityset。
[35]:
es_train_data = ft.EntitySet(id="customer_train_data")
es_train_data = es_train_data.add_dataframe(
dataframe_name="customers", dataframe=train_data, index="customer_id"
)
es_train_data
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[35]:
Entityset: customer_train_data
DataFrames:
customers [Rows: 5, Columns: 5]
Relationships:
No relationships
接下来,我们准备为训练数据创建特征和特征矩阵。我们不希望 Featuretools 使用标签列来构建新特征,因此我们将使用 ignore_columns
选项排除它。这也会将标签列从特征矩阵中移除,因此我们将告诉 DFS 将其包含为种子特征。
[36]:
labels_feature = ft.Feature(es_train_data["customers"].ww["labels"])
feature_matrix_train, feature_defs = ft.dfs(
entityset=es_train_data,
target_dataframe_name="customers",
ignore_columns={"customers": ["labels"]},
seed_features=[labels_feature],
)
feature_matrix_train
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[36]:
年龄 | 标签 | 日(注册日期) | 月(注册日期) | 工作日(注册日期) | 年(注册日期) | |
---|---|---|---|---|---|---|
客户ID | ||||||
1 | 40 | True | 1 | 1 | 2 | 2014 |
2 | 50 | False | 1 | 1 | 2 | 2014 |
3 | 10 | True | 1 | 1 | 2 | 2014 |
4 | 20 | False | 1 | 1 | 2 | 2014 |
5 | 30 | True | 1 | 1 | 2 | 2014 |
我们还会对特征矩阵进行编码,使其与机器学习兼容。
[37]:
feature_matrix_train_enc, features_enc = ft.encode_features(
feature_matrix_train, feature_defs
)
feature_matrix_train_enc.head()
[37]:
年龄 | 标签 | 日(注册日期)= 1 | 日(注册日期)未知 | 月(注册日期)= 1 | 月(注册日期)未知 | 工作日(注册日期)= 2 | 工作日(注册日期)未知 | 年(注册日期)= 2014 | 年(注册日期)未知 | |
---|---|---|---|---|---|---|---|---|---|---|
客户ID | ||||||||||
1 | 40 | True | True | False | True | False | True | False | True | False |
2 | 50 | False | True | False | True | False | True | False | True | False |
3 | 10 | True | True | False | True | False | True | False | True | False |
4 | 20 | False | True | False | True | False | True | False | True | False |
5 | 30 | True | True | False | True | False | True | False | True | False |
注意,整个特征矩阵现在只包含数值和布尔值。
现在我们可以使用特征定义来计算测试数据的特征矩阵,并避免标签泄漏。
[38]:
test_train = pd.DataFrame(
{
"customer_id": [6, 7, 8, 9, 10],
"age": [20, 25, 55, 22, 35],
"gender": ["f", "m", "m", "m", "m"],
"signup_date": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
"labels": [True, False, False, True, True],
}
)
es_test_data = ft.EntitySet(id="customer_test_data")
es_test_data = es_test_data.add_dataframe(
dataframe_name="customers",
dataframe=test_train,
index="customer_id",
time_index="signup_date",
)
# Use the feature definitions from earlier
feature_matrix_enc_test = ft.calculate_feature_matrix(
features=features_enc, entityset=es_test_data
)
feature_matrix_enc_test.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[38]:
年龄 | 标签 | 日(注册日期)= 1 | 日(注册日期)未知 | 月(注册日期)= 1 | 月(注册日期)未知 | 工作日(注册日期)= 2 | 工作日(注册日期)未知 | 年(注册日期)= 2014 | 年(注册日期)未知 | |
---|---|---|---|---|---|---|---|---|---|---|
客户ID | ||||||||||
6 | 20 | True | True | False | True | False | True | False | True | False |
7 | 25 | False | True | False | True | False | True | False | True | False |
8 | 55 | False | True | False | True | False | True | False | True | False |
9 | 22 | True | True | False | True | False | True | False | True | False |
10 | 35 | True | True | False | True | False | True | False | True | False |
请查看 建模 部分,了解如何使用编码矩阵与 sklearn 的示例。
2. 有时间戳数据#
如果您的数据有时间戳,防止标签泄漏的最佳方法是使用截止时间列表,该列表指定了允许用于结果特征矩阵中每一行的最后一个时间点。要使用截止时间,您需要为 EntitySet 中每个时间敏感的 DataFrame 设置一个时间索引。
提示:即使您的数据没有时间戳,您也可以添加一列虚拟时间戳,供 Featuretools 用作时间索引。
当您调用 ft.dfs
时,可以像这样提供一个截止时间 DataFrame:
[39]:
cutoff_times = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"time": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
}
)
cutoff_times.head()
[39]:
客户ID | 时间 | |
---|---|---|
0 | 1 | 2014-01-01 01:41:50 |
1 | 2 | 2014-01-01 02:06:50 |
2 | 3 | 2014-01-01 02:31:50 |
3 | 4 | 2014-01-01 02:56:50 |
4 | 5 | 2014-01-01 03:21:50 |
[40]:
train_test_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"age": [20, 25, 55, 22, 35],
"gender": ["f", "m", "m", "m", "m"],
"signup_date": pd.date_range("2010-01-01 01:41:50", periods=5, freq="25min"),
}
)
es_train_test_data = ft.EntitySet(id="customer_train_test_data")
es_train_test_data = es_train_test_data.add_dataframe(
dataframe_name="customers",
dataframe=train_test_data,
index="customer_id",
time_index="signup_date",
)
feature_matrix_train_test, features = ft.dfs(
entityset=es_train_test_data,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
)
feature_matrix_train_test.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[40]:
年龄 | 日(注册日期) | 月(注册日期) | 工作日(注册日期) | 年(注册日期) | ||
---|---|---|---|---|---|---|
客户ID | 时间 | |||||
1 | 2014-01-01 01:41:50 | 20 | 1 | 1 | 4 | 2010 |
2 | 2014-01-01 02:06:50 | 25 | 1 | 1 | 4 | 2010 |
3 | 2014-01-01 02:31:50 | 55 | 1 | 1 | 4 | 2010 |
4 | 2014-01-01 02:56:50 | 22 | 1 | 1 | 4 | 2010 |
5 | 2014-01-01 03:21:50 | 35 | 1 | 1 | 4 | 2010 |
上面,我们创建了一个使用截止时间来避免标签泄漏的特征矩阵。我们也可以使用 ft.encode_features
对此特征矩阵进行编码。
将原语对象与字符串传递给 DFS 有什么区别?#
有两种方法可以将原语传递给 DFS:原语对象或原语名称的字符串。
我们将使用名为 TimeSincePrevious
的转换原语来说明区别。
首先,让我们使用原语名称的字符串。
[41]:
es = ft.demo.load_mock_customer(return_entityset=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[42]:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[],
trans_primitives=["time_since_previous"],
)
feature_matrix
[42]:
邮政编码 | TIME_SINCE_PREVIOUS(join_date) | |
---|---|---|
客户ID | ||
5 | 60091 | NaN |
4 | 60091 | 22948824.0 |
1 | 60091 | 744019.0 |
3 | 13244 | 10212841.0 |
2 | 13244 | 21282510.0 |
现在,让我们使用原语对象。
[43]:
from featuretools.primitives import TimeSincePrevious
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[],
trans_primitives=[TimeSincePrevious],
)
feature_matrix
[43]:
邮政编码 | TIME_SINCE_PREVIOUS(join_date) | |
---|---|---|
客户ID | ||
5 | 60091 | NaN |
4 | 60091 | 22948824.0 |
1 | 60091 | 744019.0 |
3 | 13244 | 10212841.0 |
2 | 13244 | 21282510.0 |
正如上面所见,特征矩阵是相同的。
然而,如果我们需要修改原语中可控制的参数,我们应该使用原语对象。例如,让我们让 TimeSincePrevious 返回以小时为单位(默认单位是秒)。
[44]:
from featuretools.primitives import TimeSincePrevious
time_since_previous_in_hours = TimeSincePrevious(unit="hours")
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[],
trans_primitives=[time_since_previous_in_hours],
)
feature_matrix
[44]:
邮政编码 | TIME_SINCE_PREVIOUS(join_date, unit=hours) | |
---|---|---|
客户ID | ||
5 | 60091 | NaN |
4 | 60091 | 6374.673333 |
1 | 60091 | 206.671944 |
3 | 13244 | 2836.900278 |
2 | 13244 | 5911.808333 |
特征#
如何根据某些属性(特定字符串、显式原语类型、返回类型、给定深度)选择特征?#
您可能希望根据某些属性选择特征的一个子集。
假设您想选择名称中包含字符串 amount
的特征。您可以使用特征定义上的 get_name
函数来检查这一点。
[45]:
es = ft.demo.load_mock_customer(return_entityset=True)
feature_defs = ft.dfs(
entityset=es, target_dataframe_name="customers", features_only=True
)
features_with_amount = []
for x in feature_defs:
if "amount" in x.get_name():
features_with_amount.append(x)
features_with_amount[0:5]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[45]:
[<Feature: MAX(transactions.amount)>,
<Feature: MEAN(transactions.amount)>,
<Feature: MIN(transactions.amount)>,
<Feature: SKEW(transactions.amount)>,
<Feature: STD(transactions.amount)>]
您可能还希望只选择聚合特征。
[46]:
from featuretools import AggregationFeature
features_only_aggregations = []
for x in feature_defs:
if type(x) == AggregationFeature:
features_only_aggregations.append(x)
features_only_aggregations[0:5]
[46]:
[<Feature: COUNT(sessions)>,
<Feature: MODE(sessions.device)>,
<Feature: NUM_UNIQUE(sessions.device)>,
<Feature: COUNT(transactions)>,
<Feature: MAX(transactions.amount)>]
此外,您可能只希望选择在特定深度计算的特征。您可以使用 get_depth
函数来实现这一点。
[47]:
features_only_depth_2 = []
for x in feature_defs:
if x.get_depth() == 2:
features_only_depth_2.append(x)
features_only_depth_2[0:5]
[47]:
[<Feature: MAX(sessions.COUNT(transactions))>,
<Feature: MAX(sessions.MEAN(transactions.amount))>,
<Feature: MAX(sessions.MIN(transactions.amount))>,
<Feature: MAX(sessions.NUM_UNIQUE(transactions.product_id))>,
<Feature: MAX(sessions.SKEW(transactions.amount))>]
最后,您可能只希望返回特定类型的特征。您可以使用 column_schema
属性来实现这一点。有关使用列模式的更多信息,请查看 从变量过渡到 Woodwork。
[48]:
features_only_numeric = []
for x in feature_defs:
if "numeric" in x.column_schema.semantic_tags:
features_only_numeric.append(x)
features_only_numeric[0:5]
[48]:
[<Feature: COUNT(sessions)>,
<Feature: NUM_UNIQUE(sessions.device)>,
<Feature: COUNT(transactions)>,
<Feature: MAX(transactions.amount)>,
<Feature: MEAN(transactions.amount)>]
一旦您有了特定的特征列表,您可以使用 ft.calculate_feature_matrix
来仅生成这些特征的特征矩阵。
在我们的示例中,我们仅使用名称中包含字符串 amount
的特征。
[49]:
feature_matrix = ft.calculate_feature_matrix(
entityset=es, features=features_with_amount
) # change to your specific feature list
feature_matrix.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[49]:
最大值(交易.金额) | 平均值(交易.金额) | 最小值(交易.金额) | 偏度(交易.金额) | 标准差(交易.金额) | 总和(交易.金额) | 最大值(会话.平均值(交易.金额)) | 最大值(会话.最小值(交易.金额)) | 最大值(会话.偏度(交易.金额)) | 最大值(会话.标准差(交易.金额)) | ... | 标准差(会话.最大值(交易.金额)) | 标准差(会话.平均值(交易.金额)) | 标准差(会话.最小值(交易.金额)) | 标准差(会话.偏度(交易.金额)) | 标准差(会话.总和(交易.金额)) | 总和(会话.最大值(交易.金额)) | 总和(会话.平均值(交易.金额)) | 总和(会话.最小值(交易.金额)) | 总和(会话.偏度(交易.金额)) | 总和(会话.标准差(交易.金额)) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
客户ID | |||||||||||||||||||||
5 | 149.02 | 80.375443 | 7.55 | -0.025941 | 44.095630 | 6349.66 | 94.481667 | 20.65 | 0.602209 | 51.149250 | ... | 7.928001 | 11.007471 | 4.961414 | 0.415426 | 402.775486 | 839.76 | 472.231119 | 86.49 | 0.014384 | 259.873954 |
4 | 149.95 | 80.070459 | 5.73 | -0.036348 | 45.068765 | 8727.68 | 110.450000 | 54.83 | 0.382868 | 54.293903 | ... | 3.514421 | 13.027258 | 16.960575 | 0.387884 | 235.992478 | 1157.99 | 649.657515 | 131.51 | 0.002764 | 356.125829 |
1 | 139.43 | 71.631905 | 5.81 | 0.019698 | 40.442059 | 9025.62 | 88.755625 | 26.36 | 0.640252 | 46.905665 | ... | 7.322191 | 13.759314 | 6.954507 | 0.589386 | 279.510713 | 1057.97 | 582.193117 | 78.59 | -0.476122 | 312.745952 |
3 | 149.15 | 67.060430 | 5.89 | 0.418230 | 43.683296 | 6236.62 | 82.109444 | 20.06 | 0.854976 | 50.110120 | ... | 10.724241 | 11.174282 | 5.424407 | 0.429374 | 219.021420 | 847.63 | 405.237462 | 66.21 | 2.286086 | 257.299895 |
2 | 146.81 | 77.422366 | 8.73 | 0.098259 | 37.705178 | 7200.28 | 96.581000 | 56.46 | 0.755711 | 47.935920 | ... | 17.221593 | 11.477071 | 15.874374 | 0.509798 | 251.609234 | 931.63 | 548.905851 | 154.60 | -0.277640 | 258.700528 |
5 行 × 37 列
上面,请注意我们特征矩阵的所有列名都包含字符串 amount
。
如何创建 where 特征?#
有时,您可能希望创建基于第二个值作为条件计算的特征。这种额外的过滤称为“where 子句”。您可以使用列的 interesting_values
来创建这些特征。
如果您的 EntitySet
中有分类列,可以使用 add_interesting_values
。此函数将查找分类列的有趣值,然后这些值可用于生成“where”子句。
首先,让我们创建我们的 EntitySet
。
[50]:
es = ft.demo.load_mock_customer(return_entityset=True)
es
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[50]:
Entityset: transactions
DataFrames:
transactions [Rows: 500, Columns: 6]
products [Rows: 5, Columns: 3]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 5]
Relationships:
transactions.product_id -> products.product_id
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
现在我们可以为分类列添加有趣值。
[51]:
es.add_interesting_values()
现在我们可以运行 DFS,并使用 where_primitives
参数来定义要应用带有 where 子句的原语。在这种情况下,我们使用原语 count
。要使其工作,原语 count
必须同时存在于 agg_primitives
和 where_primitives
中。
[52]:
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=["count"],
where_primitives=["count"],
trans_primitives=[],
)
feature_matrix.head()
[52]:
邮政编码 | COUNT(sessions) | COUNT(transactions) | COUNT(sessions WHERE device = desktop) | COUNT(sessions WHERE device = mobile) | COUNT(sessions WHERE device = tablet) | COUNT(sessions WHERE customers.zip_code = 13244) | COUNT(sessions WHERE customers.zip_code = 60091) | COUNT(transactions WHERE sessions.device = mobile) | COUNT(transactions WHERE sessions.device = desktop) | COUNT(transactions WHERE sessions.device = tablet) | |
---|---|---|---|---|---|---|---|---|---|---|---|
客户ID | |||||||||||
5 | 60091 | 6 | 79 | 2 | 3 | 1 | 0 | 6 | 36 | 29 | 14 |
4 | 60091 | 8 | 109 | 3 | 4 | 1 | 0 | 8 | 53 | 38 | 18 |
1 | 60091 | 8 | 126 | 2 | 3 | 3 | 0 | 8 | 56 | 27 | 43 |
3 | 13244 | 6 | 93 | 4 | 1 | 1 | 6 | 0 | 16 | 62 | 15 |
2 | 13244 | 7 | 93 | 3 | 2 | 2 | 7 | 0 | 31 | 34 | 28 |
我们现在创建了一些有用的特征。一个有用的特征示例是 COUNT(sessions WHERE device = tablet)
。此特征告诉我们客户在平板电脑上完成了多少个会话。
[53]:
feature_matrix[["COUNT(sessions WHERE device = tablet)"]]
[53]:
COUNT(sessions WHERE device = tablet) | |
---|---|
客户ID | |
5 | 1 |
4 | 1 |
1 | 3 |
3 | 1 |
2 | 2 |
原语#
原语类型(转换、分组转换和聚合)有什么区别?#
您可能好奇想知道原语组之间的区别。让我们回顾一下转换、分组转换和聚合原语之间的区别。
首先,让我们创建一个简单的 EntitySet
。
[54]:
import pandas as pd
import featuretools as ft
df = pd.DataFrame(
{
"id": [1, 2, 3, 4, 5, 6],
"time_index": pd.date_range("1/1/2019", periods=6, freq="D"),
"group": ["a", "a", "a", "a", "a", "a"],
"val": [5, 1, 10, 20, 6, 23],
}
)
es = ft.EntitySet()
es = es.add_dataframe(
dataframe_name="observations", dataframe=df, index="id", time_index="time_index"
)
es = es.normalize_dataframe(
base_dataframe_name="observations", new_dataframe_name="groups", index="group"
)
es.plot()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[54]:
调用 normalize_dataframe
后,列“group”具有语义标签“foreign_key”,因为它标识了另一个 DataFrame。或者,在我们首次调用 es.add_dataframe()
时,可以使用 semantic_tags
参数进行设置。
转换原语#
cum_sum 原语计算数字列表中的累计总和。
[55]:
from featuretools.primitives import CumSum
cum_sum = CumSum()
cum_sum([1, 2, 3, 4, 5]).tolist()
[55]:
[1, 3, 6, 10, 15]
如果使用 trans_primitives
参数应用它,它将在整个 observations DataFrame 上计算,如下所示:
[56]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=[],
trans_primitives=["cum_sum"],
groupby_trans_primitives=[],
)
feature_matrix
[56]:
组 | 值 | 累计总和(值) | |
---|---|---|---|
ID | |||
1 | a | 5 | 5.0 |
2 | a | 1 | 6.0 |
3 | a | 10 | 16.0 |
4 | a | 20 | 36.0 |
5 | a | 6 | 42.0 |
6 | a | 23 | 65.0 |
分组转换原语#
如果使用 groupby_trans_primitives
应用它,那么 DFS 将首先按任何外键列进行分组,然后应用转换原语。结果,我们得到按组计算的累计总和。
[57]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=[],
trans_primitives=[],
groupby_trans_primitives=["cum_sum"],
)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:545: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped = frame.groupby(groupby)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:588: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
frame[name].update(pd.concat(col_vals))
[57]:
组 | 值 | 按组计算的累计总和(值) | |
---|---|---|---|
ID | |||
1 | a | 5 | 5.0 |
2 | a | 1 | 6.0 |
3 | a | 10 | 16.0 |
4 | a | 20 | 36.0 |
5 | a | 6 | 42.0 |
6 | a | 23 | 65.0 |
聚合原语#
最后,还有聚合原语“sum”。如果我们使用 sum,它将在每个截止时间为每个行计算组的总和。因为我们没有指定截止时间,它将使用每个组的全部数据为每个行计算。
[58]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=["sum"],
trans_primitives=[],
cutoff_time_in_index=True,
groupby_trans_primitives=[],
)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[58]:
组 | 值 | 组.总和(观测值.值) | ||
---|---|---|---|---|
ID | 时间 | |||
1 | 2024-05-14 19:03:25.357997 | a | 5 | 65.0 |
2 | 2024-05-14 19:03:25.357997 | a | 1 | 65.0 |
3 | 2024-05-14 19:03:25.357997 | a | 10 | 65.0 |
4 | 2024-05-14 19:03:25.357997 | a | 20 | 65.0 |
5 | 2024-05-14 19:03:25.357997 | a | 6 | 65.0 |
6 | 2024-05-14 19:03:25.357997 | a | 23 | 65.0 |
如果我们将每行的截止时间设置为时间索引,然后使用 sum 作为聚合原语,结果与 cum_sum 相同(尽管显示的 DataFrame 中的顺序不同)。
[59]:
cutoff_time = df[["id", "time_index"]]
cutoff_time
[59]:
ID | 时间索引 | |
---|---|---|
1 | 1 | 2019-01-01 |
2 | 2 | 2019-01-02 |
3 | 3 | 2019-01-03 |
4 | 4 | 2019-01-04 |
5 | 5 | 2019-01-05 |
6 | 6 | 2019-01-06 |
[60]:
feature_matrix, feature_defs = ft.dfs(
target_dataframe_name="observations",
entityset=es,
agg_primitives=["sum"],
trans_primitives=[],
groupby_trans_primitives=[],
cutoff_time_in_index=True,
cutoff_time=cutoff_time,
)
feature_matrix
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
[60]:
组 | 值 | 组.总和(观测值.值) | ||
---|---|---|---|---|
ID | 时间 | |||
1 | 2019-01-01 | a | 5 | 5.0 |
2 | 2019-01-02 | a | 1 | 6.0 |
3 | 2019-01-03 | a | 10 | 16.0 |
4 | 2019-01-04 | a | 20 | 36.0 |
5 | 2019-01-05 | a | 6 | 42.0 |
6 | 2019-01-06 | a | 23 | 65.0 |
如何获取所有聚合和转换原语的列表?#
您可以执行 featuretools.list_primitives()
来获取 Featuretools 中的所有原语。它将返回一个包含原语名称、类型和描述的 DataFrame。
[61]:
df_primitives = ft.list_primitives()
df_primitives.head()
[61]:
名称 | 类型 | 描述 | 有效输入 | 返回类型 | |
---|---|---|---|---|---|
0 | 唯一周数 | 聚合 | 确定唯一周数。 | <列模式 (逻辑类型 = 日期时间)> | <列模式 (逻辑类型 = 整数) (语义... |
1 | 方差 | 聚合 | 计算数字列表的方差。 | <列模式 (语义标签 = ['numeric'])> | <列模式 (逻辑类型 = 双精度) (语义... |
2 | 第一个 | 聚合 | 确定列表中的第一个值。 | <列模式> | 无 |
3 | 唯一数量 | 聚合 | 确定不同值的数量,忽略... | <列模式 (语义标签 = ['category'])> | <列模式 (逻辑类型 = 可为空整数)... |
4 | 低于平均值计数 | 聚合 | 确定低于平均值的数值数量... | <列模式 (语义标签 = ['numeric'])> | <列模式 (逻辑类型 = 可为空整数)... |
[62]:
df_primitives.tail()
[62]:
名称 | 类型 | 描述 | 有效输入 | 返回类型 | |
---|---|---|---|---|---|
198 | 年内日期 | 转换 | 从日期中确定年内序数日... | <列模式 (逻辑类型 = 日期时间)> | <列模式 (逻辑类型 = 序数: [1, 2, ... |
199 | 添加数值标量 | 转换 | 将一个标量添加到列表中的每个值。 | <列模式 (语义标签 = ['numeric'])> | <列模式 (语义标签 = ['numeric'])> |
200 | 词长中位数 | 转换 | 确定词长的中位数。 | <列模式 (逻辑类型 = 自然语言)> | <列模式 (逻辑类型 = 双精度) (语义... |
201 | 自上次为 True 以来的累计时间 | 转换 | 确定自上次为 True 以来经过的时间(秒)... | <列模式 (逻辑类型 = 布尔)>, <列... | <列模式 (逻辑类型 = 双精度) (语义... |
202 | 滚动最小值 | 转换 | 确定给定时间窗口内的最小值... | <列模式 (语义标签 = ['numeric'])>, ... | <列模式 (逻辑类型 = 双精度) (语义... |
如何更改 TimeSince 原语的单位?#
Featuretools 中有一些基于时间计算的原语。这些包括 TimeSince, TimeSincePrevious, TimeSinceLast, TimeSinceFirst
。
您可以将默认单位秒更改为任何有效的时间单位,具体操作如下:
[63]:
from featuretools.primitives import (
TimeSince,
TimeSinceFirst,
TimeSinceLast,
TimeSincePrevious,
)
time_since = TimeSince(unit="minutes")
time_since_previous = TimeSincePrevious(unit="hours")
time_since_last = TimeSinceLast(unit="days")
time_since_first = TimeSinceFirst(unit="years")
es = ft.demo.load_mock_customer(return_entityset=True)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
agg_primitives=[time_since_last, time_since_first],
trans_primitives=[time_since, time_since_previous],
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
上面,我们将单位更改如下: - TimeSince
的单位为分钟 - TimeSincePrevious
的单位为小时 - TimeSinceLast
的单位为天 - TimeSinceFirst
的单位为年。
现在我们可以看到我们的特征矩阵包含多个 TimeSince 原语单位更改的特征。
[64]:
feature_matrix.head()
[64]:
邮政编码 | 自首次(会话.会话开始,单位=年) | 自上次(会话.会话开始,单位=天) | 自首次(交易.交易时间,单位=年) | 自上次(交易.交易时间,单位=天) | 自(生日,单位=分钟) | 自(加入日期,单位=分钟) | TIME_SINCE_PREVIOUS(join_date, unit=hours) | 自首次(交易.会话.会话开始,单位=年) | 自上次(交易.会话.会话开始,单位=天) | |
---|---|---|---|---|---|---|---|---|---|---|
客户ID | ||||||||||
5 | 60091 | 10.373429 | 3786.459267 | 10.373429 | 3786.454001 | 2.093154e+07 | 7.272816e+06 | NaN | 10.373429 | 3786.459267 |
4 | 60091 | 10.373409 | 3786.570610 | 10.373409 | 3786.563839 | 9.335223e+06 | 6.890335e+06 | 6374.673333 | 10.373409 | 3786.570610 |
1 | 60091 | 10.373378 | 3786.495378 | 10.373378 | 3786.484094 | 1.568706e+07 | 6.877935e+06 | 206.671944 | 10.373378 | 3786.495378 |
3 | 13244 | 10.373273 | 3786.429927 | 10.373273 | 3786.418642 | 1.077234e+07 | 6.707721e+06 | 2836.900278 | 10.373273 | 3786.429927 |
2 | 13244 | 10.373462 | 3786.453249 | 10.373462 | 3786.444221 | 1.985010e+07 | 6.353012e+06 | 5911.808333 | 10.373462 | 3786.453249 |
现在有一些特征的时间单位与默认的秒不同,例如 TIME_SINCE_LAST(sessions.session_start, unit=days)
和 TIME_SINCE_FIRST(sessions.session_start, unit=years)
。
建模#
我的训练和测试数据如何与 Featuretools 和 sklearn 的 train_test_split 一起使用?#
您可能想知道如何正确地将训练和测试数据与 Featuretools 以及 sklearn 的 train_test_split 一起使用。要确保此工作流程的准确性,您必须做一些事情。
让我们想象一下,我们有一个包含标签的训练数据 DataFrame。
[65]:
train_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"age": [20, 25, 55, 22, 35],
"gender": ["f", "m", "m", "m", "m"],
"signup_date": pd.date_range("2010-01-01 01:41:50", periods=5, freq="25min"),
"labels": [False, True, True, False, False],
}
)
train_data.head()
[65]:
客户ID | 年龄 | 性别 | 注册日期 | 标签 | |
---|---|---|---|---|---|
0 | 1 | 20 | 女 | 2010-01-01 01:41:50 | False |
1 | 2 | 25 | 男 | 2010-01-01 02:06:50 | True |
2 | 3 | 55 | 男 | 2010-01-01 02:31:50 | True |
3 | 4 | 22 | 男 | 2010-01-01 02:56:50 | False |
4 | 5 | 35 | 男 | 2010-01-01 03:21:50 | False |
现在我们可以为训练数据创建 EntitySet
,并创建特征。为了防止标签泄漏,我们将使用截止时间(参见前面提到的问题)。
[66]:
es_train_data = ft.EntitySet(id="customer_data")
es_train_data = es_train_data.add_dataframe(
dataframe_name="customers", dataframe=train_data, index="customer_id"
)
cutoff_times = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"time": pd.date_range("2014-01-01 01:41:50", periods=5, freq="25min"),
}
)
feature_matrix_train, features = ft.dfs(
entityset=es_train_data,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
)
feature_matrix_train.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[66]:
年龄 | 标签 | 日(注册日期) | 月(注册日期) | 工作日(注册日期) | 年(注册日期) | ||
---|---|---|---|---|---|---|---|
客户ID | 时间 | ||||||
1 | 2014-01-01 01:41:50 | 20 | False | 1 | 1 | 4 | 2010 |
2 | 2014-01-01 02:06:50 | 25 | True | 1 | 1 | 4 | 2010 |
3 | 2014-01-01 02:31:50 | 55 | True | 1 | 1 | 4 | 2010 |
4 | 2014-01-01 02:56:50 | 22 | False | 1 | 1 | 4 | 2010 |
5 | 2014-01-01 03:21:50 | 35 | False | 1 | 1 | 4 | 2010 |
我们还将对特征矩阵进行编码,使其与机器学习算法兼容。
[67]:
feature_matrix_train_enc, feature_enc = ft.encode_features(
feature_matrix_train, features
)
feature_matrix_train_enc.head()
[67]:
年龄 | 标签 | 日(注册日期)= 1 | 日(注册日期)未知 | 月(注册日期)= 1 | 月(注册日期)未知 | 工作日(注册日期)= 4 | 工作日(注册日期)未知 | 年(注册日期)= 2010 | 年(注册日期)未知 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
客户ID | 时间 | ||||||||||
1 | 2014-01-01 01:41:50 | 20 | False | True | False | True | False | True | False | True | False |
2 | 2014-01-01 02:06:50 | 25 | True | True | False | True | False | True | False | True | False |
3 | 2014-01-01 02:31:50 | 55 | True | True | False | True | False | True | False | True | False |
4 | 2014-01-01 02:56:50 | 22 | False | True | False | True | False | True | False | True | False |
5 | 2014-01-01 03:21:50 | 35 | False | True | False | True | False | True | False | True | False |
[68]:
from sklearn.model_selection import train_test_split
X = feature_matrix_train_enc.drop(["labels"], axis=1)
y = feature_matrix_train_enc["labels"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
现在您可以使用编码后的特征矩阵与 sklearn 的 train_test_split。这将使您能够训练模型并调整参数。
在分割训练和测试数据时,分类列如何编码?#
您可能想知道在对训练和测试数据进行分类列编码时会发生什么。您可能很好奇,如果训练数据包含一个测试数据中不存在的分类值,会发生什么。
让我们通过一个简单的例子来探讨编码过程中会发生什么。
[69]:
train_data = pd.DataFrame(
{
"customer_id": [1, 2, 3, 4, 5],
"product_purchased": ["coke zero", "car", "toothpaste", "coke zero", "car"],
}
)
es_train = ft.EntitySet(id="customer_data")
es_train = es_train.add_dataframe(
dataframe_name="customers",
dataframe=train_data,
index="customer_id",
logical_types={"product_purchased": ww.logical_types.Categorical},
)
feature_matrix_train, features = ft.dfs(
entityset=es_train, target_dataframe_name="customers"
)
feature_matrix_train
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/synthesis/deep_feature_synthesis.py:169: UserWarning: Only one dataframe in entityset, changing max_depth to 1 since deeper features cannot be created
warnings.warn(
[69]:
购买产品 | |
---|---|
客户ID | |
1 | 零度可乐 |
2 | 汽车 |
3 | 牙膏 |
4 | 零度可乐 |
5 | 汽车 |
我们将使用 ft.encode_features
对 product_purchased
列进行正确编码。
[70]:
feature_matrix_train_encoded, features_encoded = ft.encode_features(
feature_matrix_train, features
)
feature_matrix_train_encoded.head()
[70]:
购买产品 = 零度可乐 | 购买产品 = 汽车 | 购买产品 = 牙膏 | 购买产品未知 | |
---|---|---|---|---|
客户ID | ||||
1 | True | False | False | False |
2 | False | True | False | False |
3 | False | False | True | False |
4 | True | False | False | False |
5 | False | True | False | False |
现在让我们想象一下,我们有一些测试数据不包含某个分类值(牙膏)。此外,测试数据包含一个训练数据中不存在的值(水)。
[71]:
test_data = pd.DataFrame(
{
"customer_id": [6, 7, 8, 9, 10],
"product_purchased": ["coke zero", "car", "coke zero", "coke zero", "water"],
}
)
es_test = ft.EntitySet(id="customer_data")
es_test = es_test.add_dataframe(
dataframe_name="customers", dataframe=test_data, index="customer_id"
)
feature_matrix_test = ft.calculate_feature_matrix(
entityset=es_test, features=features_encoded
)
feature_matrix_test.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[71]:
购买产品 = 零度可乐 | 购买产品 = 汽车 | 购买产品 = 牙膏 | 购买产品未知 | |
---|---|---|---|---|
客户ID | ||||
6 | True | False | False | False |
7 | False | True | False | False |
8 | True | False | False | False |
9 | True | False | False | False |
10 | False | False | False | True |
如上所示,我们成功地处理了编码,并解决了以下复杂情况:- 牙膏 存在于训练数据中但不存在于测试数据中 - 水 存在于测试数据中但不存在于训练数据中。
错误和警告#
为什么出现“Index is not unique on dataframe”错误?#
您可能正在尝试创建 EntitySet
,并遇到此错误。
IndexError: Index column must be unique
这是因为 EntitySet 中的每个 dataframe 都需要一个唯一的索引。
让我们看一个简单的例子。
[72]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 4], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df
[72]:
ID | 评分 | |
---|---|---|
0 | 1 | 3.5 |
1 | 2 | 4.0 |
2 | 3 | 4.5 |
3 | 4 | 1.5 |
4 | 4 | 5.0 |
注意 id
列如何包含重复的索引 4
。如果您尝试将此 dataframe 添加到 EntitySet,将遇到以下错误。
es = ft.EntitySet(id="product_data")
es = es.add_dataframe(dataframe_name="products",
dataframe=product_df,
index="id")
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-78-854fbaf207f8> in <module>
1 es = ft.EntitySet(id="product_data")
----> 2 es = es.add_dataframe(dataframe_name="products",
3 dataframe=product_df,
4 index="id")
~/Code/featuretools/featuretools/entityset/entityset.py in add_dataframe(self, dataframe, dataframe_name, index, logical_types, semantic_tags, make_index, time_index, secondary_time_index, already_sorted)
625 index_was_created, index, dataframe = _get_or_create_index(index, make_index, dataframe)
626
--> 627 dataframe.ww.init(name=dataframe_name,
628 index=index,
629 time_index=time_index,
/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in init(self, index, time_index, logical_types, already_sorted, schema, validate, use_standard_tags, **kwargs)
94 """
95 if validate:
---> 96 _validate_accessor_params(self._dataframe, index, time_index, logical_types, schema, use_standard_tags)
97 if schema is not None:
98 self._schema = schema
/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in _validate_accessor_params(dataframe, index, time_index, logical_types, schema, use_standard_tags)
877 # We ignore these parameters if a schema is passed
878 if index is not None:
--> 879 _check_index(dataframe, index)
880 if logical_types:
881 _check_logical_types(dataframe.columns, logical_types)
/usr/local/Caskroom/miniconda/base/envs/featuretools/lib/python3.8/site-packages/woodwork/table_accessor.py in _check_index(dataframe, index)
903 # User specifies an index that is in the dataframe but not unique
--> 904 raise IndexError('Index column must be unique')
905
906
IndexError: Index column must be unique
要修复上述错误,您可以执行以下任一解决方案:
解决方案 #1 - 您可以在 DataFrame 上创建一个唯一索引。
[73]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 5], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
product_df
[73]:
ID | 评分 | |
---|---|---|
0 | 1 | 3.5 |
1 | 2 | 4.0 |
2 | 3 | 4.5 |
3 | 4 | 1.5 |
4 | 5 | 5.0 |
注意我们现在有一个名为 id
的唯一索引列。
[74]:
es = es.add_dataframe(dataframe_name="products", dataframe=product_df, index="id")
es
[74]:
Entityset: transactions
DataFrames:
transactions [Rows: 500, Columns: 6]
products [Rows: 5, Columns: 2]
sessions [Rows: 35, Columns: 5]
customers [Rows: 5, Columns: 5]
Relationships:
transactions.product_id -> products.product_id
transactions.session_id -> sessions.session_id
sessions.customer_id -> customers.customer_id
如上所示,通过在 DataFrame 中创建唯一索引,我们现在可以为 EntitySet
创建 DataFrame 而不会出错。
解决方案 #2 - 在调用 add_dataframe 时将 make_index 设置为 True,以便在该数据上创建一个新索引 - make_index
通过仅查看行相对于所有其他行的编号来为每一行创建一个唯一索引。
[75]:
product_df = pd.DataFrame({"id": [1, 2, 3, 4, 4], "rating": [3.5, 4.0, 4.5, 1.5, 5.0]})
es = ft.EntitySet(id="product_data")
es = es.add_dataframe(
dataframe_name="products", dataframe=product_df, index="product_id", make_index=True
)
es["products"]
[75]:
产品ID | ID | 评分 | |
---|---|---|---|
0 | 0 | 1 | 3.5 |
1 | 1 | 2 | 4.0 |
2 | 2 | 3 | 4.5 |
3 | 3 | 4 | 1.5 |
4 | 4 | 4 | 5.0 |
如上所示,我们使用 make_index
参数为 EntitySet
创建了 dataframe,没有出现错误。
为什么出现警告“Using training_window but last_time_index is not set”?#
如果您正在使用训练窗口,并且尚未为 DataFrame 设置 last_time_index
,您将收到此警告。Featuretools 中的训练窗口属性限制了在计算特定特征向量时可以使用的历史数据量。
创建 EntitySet
后,您可以自动为所有 DataFrame 添加 last_time_index
,方法是调用 your_entityset.add_last_time_indexes()
。这将消除警告。
[76]:
es = ft.demo.load_mock_customer(return_entityset=True)
es.add_last_time_indexes()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
现在我们可以运行 DFS,而不会收到警告。
[77]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
training_window="1 hour",
)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function max at 0x7f15d8329820> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function sum at 0x7f15d83291f0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function min at 0x7f15d8329940> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function mean at 0x7f15d832e160> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
to_merge = base_frame.groupby(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:781: FutureWarning: The provided callable <function std at 0x7f15d832e280> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
to_merge = base_frame.groupby(
last_time_index
与 time_index
#
time_index
是实例首次已知的时间。last_time_index
是实例最后一次出现的时间。例如,一个客户的会话可以有多个交易,这些交易发生在不同的时间点。如果我们尝试计算用户在给定时间段内的会话次数,我们通常希望计算在训练窗口期间有过任何交易的所有会话。为了实现这一点,我们不仅需要知道会话何时开始(time_index),还需要知道何时结束(last_time_index)。实例在数据中最后一次出现的时间存储为 DataFrame 的
last_time_index
。一旦设置了 last_time_index,Featuretools 将检查 last_time_index 是否晚于训练窗口的开始时间。这与截止时间结合,使 DFS 能够发现哪些数据与给定的训练窗口相关。
为什么在 Google Colab 上使用 Featuretools 出现错误?#
Google Colab 默认安装了 Featuretools 0.4.1
。您在使用旧版本 Featuretools 时可能会遇到问题,无法遵循我们最新的指南或文档。因此,我们建议您在 Google Colab 的 Notebook 中通过以下方式升级到最新的 Featuretools 版本:
!pip install -U featuretools
您可能需要通过 Runtime -> Restart Runtime 重启运行时。您可以通过以下方式检查最新的 Featuretools 版本:
import featuretools as ft
print(ft.__version__)
您应该看到一个高于 0.4.1
的版本。