特征选择#

Featuretools 提供了移除在构建有效机器学习模型中不太可能有用的特征的能力。减少特征矩阵中的特征数量既可以提高模型效果，也可以降低预测所需的计算成本。

Featuretools 允许用户使用以下三个函数对深度特征合成的结果执行特征选择

ft.selection.remove_highly_null_features
ft.selection.remove_single_value_features
ft.selection.remove_highly_correlated_features

我们将深入描述这些函数，但首先我们必须创建一个实体集，以便运行 ft.dfs。

[1]:

import pandas as pd

import featuretools as ft
from featuretools.demo.flight import load_flight
from featuretools.selection import (
    remove_highly_correlated_features,
    remove_highly_null_features,
    remove_single_value_features,
)

es = load_flight(nrows=50)
es

Downloading data ...

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/demo/flight.py:291: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
  clean_data.loc[:, "dep_time"] = clean_data["scheduled_dep_time"] + pd.to_timedelta(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/demo/flight.py:296: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
  clean_data.loc[:, "arr_time"] = clean_data["dep_time"] + pd.to_timedelta(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/demo/flight.py:302: PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
  clean_data["scheduled_dep_time"] + clean_data["scheduled_elapsed_time"]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  series = series.replace(ww.config.get_option("nan_values"), np.nan)

[1]:

Entityset: Flight Data
  DataFrames:
    trip_logs [Rows: 50, Columns: 21]
    flights [Rows: 6, Columns: 9]
    airlines [Rows: 1, Columns: 1]
    airports [Rows: 4, Columns: 3]
  Relationships:
    trip_logs.flight_id -> flights.flight_id
    flights.carrier -> airlines.carrier
    flights.dest -> airports.dest

移除高缺失特征#

我们的数据集可能包含许多缺失值的列。深度特征合成可能会基于这些缺失列构建特征，从而创建更多高缺失特征。在这种情况下，我们可能希望移除任何缺失值百分比超过某个阈值的特征。下面是我们的特征矩阵中存在这种情况的示例

[2]:

fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="trip_logs",
    cutoff_time=pd.DataFrame(
        {
            "trip_log_id": [30, 1, 2, 3, 4],
            "time": pd.to_datetime(["2016-09-22 00:00:00"] * 5),
        }
    ),
    trans_primitives=[],
    agg_primitives=[],
    max_depth=2,
)
fm

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:143: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, default_df], sort=True)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/woodwork/logical_types.py:841: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
series = series.replace(ww.config.get_option("nan_values"), np.nan)

[2]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	NaN	NaN	NaN	NaN	<NA>	NaN	600.0	NaN	NaN	NaN	NaN	NaN	<NA>	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

我们查看上面的特征矩阵，并决定移除高缺失特征

[3]:

ft.selection.remove_highly_null_features(fm)

[3]:

	flight_id	distance	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	600.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

请注意，调用 remove_highly_null_features 并未移除所有包含缺失值的特征。默认情况下，我们只移除计算出的特征矩阵中缺失值百分比高于 95% 的特征。如果想降低此阈值，我们可以自行设置 pct_null_threshold 参数。

[4]:

remove_highly_null_features(fm, pct_null_threshold=0.2)

[4]:


trip_log_id
30
1
2
3
4

移除单值特征#

我们可能遇到的另一种情况是，计算出的特征没有任何方差。在这种情况下，我们很可能希望移除这些无趣的特征。为此，我们使用 remove_single_value_features。

让我们看看移除下面特征矩阵中的单值特征会发生什么。

[5]:

fm

[5]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	NaN	NaN	NaN	NaN	<NA>	NaN	600.0	NaN	NaN	NaN	NaN	NaN	<NA>	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	NaN	NaN	NaN	NaN	<NA>	NaN	1773.0	NaN	NaN	NaN	NaN	NaN	<NA>	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	<NA>	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

注意

可以将特征定义列表（例如由 dfs 创建的列表）提供给特征选择函数。这样做将更改输出，使其包含更新的特征定义列表。

[6]:

new_fm, new_features = remove_single_value_features(fm, features=features)
new_fm

[6]:

	flight_id	distance	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	600.0	RSW	Fort Myers, FL	FL	CLT	3	Charlotte, NC	NC
1	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

现在我们有了更新后的特征矩阵的特征定义，可以看到被移除的特征是

[7]:

set(features) - set(new_features)

[7]:

{<Feature: air_time>,
 <Feature: arr_delay>,
 <Feature: canceled>,
 <Feature: carrier_delay>,
 <Feature: dep_delay>,
 <Feature: diverted>,
 <Feature: flights.carrier>,
 <Feature: flights.flight_num>,
 <Feature: late_aircraft_delay>,
 <Feature: national_airspace_delay>,
 <Feature: security_delay>,
 <Feature: taxi_in>,
 <Feature: taxi_out>,
 <Feature: weather_delay>}

如上所示使用该函数时，在计算特征的唯一值时不会考虑缺失值。如果希望将 NaN 视为一个独立的值，可以将 count_nan_as_value 设置为 True，然后就会在矩阵中看到 flights.carrier 和 flights.flight_num。

[8]:

new_fm, new_features = remove_single_value_features(
    fm, features=features, count_nan_as_value=True
)
new_fm

[8]:

	flight_id	distance	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	600.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
1	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
2	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
3	AA-494:CLT->PHX	1773.0	CLT	Charlotte, NC	NC	PHX	8	AA	494	Phoenix, AZ	AZ
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

被移除的特征是

[9]:

set(features) - set(new_features)

[9]:

{<Feature: air_time>,
 <Feature: arr_delay>,
 <Feature: canceled>,
 <Feature: carrier_delay>,
 <Feature: dep_delay>,
 <Feature: diverted>,
 <Feature: late_aircraft_delay>,
 <Feature: national_airspace_delay>,
 <Feature: security_delay>,
 <Feature: taxi_in>,
 <Feature: taxi_out>,
 <Feature: weather_delay>}

移除高相关特征#

我们最后一个特征选择函数允许我们通过考虑计算特征对之间的相关性来移除对我们试图构建的模型可能冗余的特征。

当两个特征被确定为高度相关时，我们会移除其中更复杂的那个。例如，假设我们有两个特征：col 和 -(col)。

我们可以看到 -(col) 只是 col 的否定，因此我们可以猜测这些特征将高度相关。-(col) 应用了 Negate 原语，因此它比恒等特征 col 更复杂。因此，如果我们只需要 col 和 -(col) 中的一个，我们应该保留恒等特征。对于复杂性没有明显差异的特征，我们舍弃在特征矩阵中靠后的特征。

让我们在数据上试试看

[10]:

fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="trip_logs",
    trans_primitives=["negate"],
    agg_primitives=[],
    max_depth=3,
)
fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/featuretools/entityset/entityset.py:1455: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.
  df.loc[mask, columns] = np.nan

[10]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(air_time)	-(arr_delay)	-(carrier_delay)	-(dep_delay)	-(distance)	-(late_aircraft_delay)	-(national_airspace_delay)	-(security_delay)	-(taxi_in)	-(taxi_out)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-88.0	12.0	-0.0	11.0	-600.0	-0.0	-0.0	-0.0	-10.0	-12.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-224.0	-1.0	-0.0	6.0	-1587.0	-0.0	-0.0	-0.0	-5.0	-28.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	226.0	0.0	0.0	0.0	0.0	0.0	False	-50.0	3.0	-0.0	2.0	-226.0	-0.0	-0.0	-0.0	-8.0	-18.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-87.0	3.0	-0.0	-0.0	-600.0	-0.0	-0.0	-0.0	-10.0	-11.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-235.0	-10.0	-0.0	4.0	-1587.0	-0.0	-0.0	-0.0	-3.0	-26.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

请注意，这里所有特征及其否定形式之间存在相当明显的相关性。

现在，使用 remove_highly_correlated_features，我们的默认相关性阈值为 95%，所有明显相关的特征都被移除，只剩下复杂度较低的特征。

[11]:

new_fm, new_features = remove_highly_correlated_features(fm, features=features)
new_fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

[11]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(security_delay)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

被移除的特征是

[12]:

set(features) - set(new_features)

[12]:

{<Feature: -(carrier_delay)>,
 <Feature: -(distance)>,
 <Feature: distance>,
 <Feature: -(taxi_out)>,
 <Feature: -(taxi_in)>,
 <Feature: -(late_aircraft_delay)>,
 <Feature: -(air_time)>,
 <Feature: -(national_airspace_delay)>,
 <Feature: -(arr_delay)>,
 <Feature: -(dep_delay)>}

更改相关性阈值#

如果想更严格地移除相关特征，可以使用 pct_corr_threshold 参数来降低移除相关特征的阈值。

[13]:

new_fm, new_features = remove_highly_correlated_features(
    fm, features=features, pct_corr_threshold=0.9
)
new_fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

[13]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	carrier_delay	weather_delay	security_delay	late_aircraft_delay	canceled	-(security_delay)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

被移除的特征是

[14]:

set(features) - set(new_features)

[14]:

{<Feature: -(carrier_delay)>,
 <Feature: -(distance)>,
 <Feature: distance>,
 <Feature: -(taxi_out)>,
 <Feature: -(taxi_in)>,
 <Feature: -(late_aircraft_delay)>,
 <Feature: -(air_time)>,
 <Feature: -(national_airspace_delay)>,
 <Feature: -(arr_delay)>,
 <Feature: national_airspace_delay>,
 <Feature: -(dep_delay)>}

检查特征子集#

如果只想检查特征的子集，可以将 features_to_check 设置为想要检查相关性的特征列表，该列表之外的特征将不会被移除。

[15]:

new_fm, new_features = remove_highly_correlated_features(
    fm,
    features=features,
    features_to_check=["air_time", "distance", "flights.distance_group"],
)
new_fm.head()

[15]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(air_time)	-(arr_delay)	-(carrier_delay)	-(dep_delay)	-(distance)	-(late_aircraft_delay)	-(national_airspace_delay)	-(security_delay)	-(taxi_in)	-(taxi_out)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	0.0	0.0	0.0	0.0	0.0	False	-88.0	12.0	-0.0	11.0	-600.0	-0.0	-0.0	-0.0	-10.0	-12.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	0.0	0.0	0.0	0.0	0.0	False	-224.0	-1.0	-0.0	6.0	-1587.0	-0.0	-0.0	-0.0	-5.0	-28.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	0.0	0.0	0.0	0.0	0.0	False	-50.0	3.0	-0.0	2.0	-226.0	-0.0	-0.0	-0.0	-8.0	-18.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	0.0	0.0	0.0	0.0	0.0	False	-87.0	3.0	-0.0	-0.0	-600.0	-0.0	-0.0	-0.0	-10.0	-11.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	0.0	0.0	0.0	0.0	0.0	False	-235.0	-10.0	-0.0	4.0	-1587.0	-0.0	-0.0	-0.0	-3.0	-26.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

被移除的特征是

[16]:

set(features) - set(new_features)

[16]:

{<Feature: distance>}

保护特征不被移除#

要保护特定特征不被从特征矩阵中移除，可以包含一个 features_to_keep 列表，这些特征将不会被移除

[17]:

new_fm, new_features = remove_highly_correlated_features(
    fm,
    features=features,
    features_to_keep=["air_time", "distance", "flights.distance_group"],
)
new_fm.head()

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2897: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-featuretools/envs/stable/lib/python3.9/site-packages/numpy/lib/function_base.py:2898: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

[17]:

	flight_id	dep_delay	taxi_out	taxi_in	arr_delay	diverted	air_time	distance	carrier_delay	weather_delay	national_airspace_delay	security_delay	late_aircraft_delay	canceled	-(security_delay)	-(weather_delay)	flights.origin	flights.origin_city	flights.origin_state	flights.dest	flights.distance_group	flights.carrier	flights.flight_num	flights.airports.dest_city	flights.airports.dest_state
trip_log_id
30	AA-494:RSW->CLT	-11.0	12.0	10.0	-12.0	False	88.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
38	AA-495:ATL->PHX	-6.0	28.0	5.0	1.0	False	224.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ
46	AA-495:CLT->ATL	-2.0	18.0	8.0	-3.0	False	50.0	226.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	CLT	Charlotte, NC	NC	ATL	1	AA	495	Atlanta, GA	GA
31	AA-494:RSW->CLT	0.0	11.0	10.0	-3.0	False	87.0	600.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	RSW	Fort Myers, FL	FL	CLT	3	AA	494	Charlotte, NC	NC
39	AA-495:ATL->PHX	-4.0	26.0	3.0	10.0	False	235.0	1587.0	0.0	0.0	0.0	0.0	0.0	False	-0.0	-0.0	ATL	Atlanta, GA	GA	PHX	7	AA	495	Phoenix, AZ	AZ

被移除的特征是

[18]:

set(features) - set(new_features)

[18]:

{<Feature: -(carrier_delay)>,
 <Feature: -(distance)>,
 <Feature: -(taxi_out)>,
 <Feature: -(taxi_in)>,
 <Feature: -(late_aircraft_delay)>,
 <Feature: -(air_time)>,
 <Feature: -(national_airspace_delay)>,
 <Feature: -(arr_delay)>,
 <Feature: -(dep_delay)>}

目录

上一主题

下一主题

本页

特征选择#

移除高缺失特征#

移除单值特征#

移除高相关特征#

更改相关性阈值#

检查特征子集#

保护特征不被移除#

目录

上一主题

下一主题

本页

快速搜索

特征选择#

移除高缺失特征#

移除单值特征#

移除高相关特征#

更改相关性阈值#

检查特征子集#

保护特征不被移除#