featuretools.primitives.NumUnique#

class featuretools.primitives.NumUnique(use_string_for_pd_calc=True)[source]#

确定去重后的值的数量，忽略 NaN 值。

参数:: use_string_for_pd_calc (bool) – 确定是使用字符串 ‘nunique’ 还是使用函数 pd.Series.nunique 来进行原语计算。此参数用于解决 bug https://github.com/pandas-dev/pandas/issues/57317。默认为使用字符串。

示例

>>> num_unique = NumUnique(use_string_for_pd_calc=False)
>>> num_unique(['red', 'blue', 'green', 'yellow'])
4

NaN 值将被忽略。

>>> num_unique(['red', 'blue', 'green', 'yellow', None])
4

方法

`__init__`([use_string_for_pd_calc])
`flatten_nested_input_types`(input_types)	将嵌套的列模式输入展平为单个列表。
`generate_name`(base_feature_names, ...)
`generate_names`(base_feature_names, ...)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()

属性

`base_of`
`base_of_exclude`
`commutative`
`default_value`	如果没有找到数据，此特征返回的默认值。
`description_template`
`input_types`	输入数据的 woodwork.ColumnSchema 类型
`max_stack_depth`
`name`	原语的名称
`number_output_features`	与此特征相关的特征矩阵中的列数
`return_type`	返回值的 ColumnSchema 类型
`stack_on`
`stack_on_exclude`
`stack_on_self`
`uses_calc_time`