featuretools.primitives.NumberOfHashtags#

class featuretools.primitives.NumberOfHashtags[source]#

确定字符串中话题标签的数量。

描述

给定一个字符串列表，确定每个字符串中话题标签的数量。

话题标签定义为符合以下标准的字符串

以“#”字符开头，后跟包含至少一个字母字符的字母数字字符序列
出现在字符串的开头或空白字符之后
由字符串结尾、空白字符或除“#”之外的标点符号终止
- 例如，字符串“#yes-no”包含一个有效的话题标签（“#yes”）
- 例如，字符串“#yes#”不包含有效的话题标签

此实现支持 Unicode 字符。

此实现不对话题标签施加任何字符限制。

如果字符串缺失，返回 NaN。

示例

>>> x = ['#regular #expression', 'this is a string', '###__regular#1and_0#expression']
>>> number_of_hashtags = NumberOfHashtags()
>>> number_of_hashtags(x).tolist()
[2.0, 0.0, 0.0]

__init__()[source]#

方法

`__init__`()
`flatten_nested_input_types`(input_types)	将嵌套的列 schema 输入展平为单个列表。
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()
`process_text`(text)

属性

`base_of`
`base_of_exclude`
`commutative`
`default_value`	如果未找到数据，此 feature 返回的默认值。
`description_template`
`input_types`	woodwork.ColumnSchema 的输入类型
`max_stack_depth`
`name`	primitive 的名称
`number_output_features`	与此 feature 关联的 feature 矩阵中的列数
`return_type`	ColumnSchema 的返回类型
`stack_on`
`stack_on_exclude`
`stack_on_self`
`uses_calc_time`
`uses_full_dataframe`

目录

上一个主题

下一个主题

本页

featuretools.primitives.NumberOfHashtags#

目录

上一个主题

下一个主题

本页

快速搜索

featuretools.primitives.NumberOfHashtags#