featuretools.primitives.CountString#

class featuretools.primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

确定给定字符串在文本字段中出现的次数。

参数:

string (str) – 要计算出现次数的字符串。默认为单词“the”。
ignore_case (bool) – 确定是否应考虑字符串的大小写。默认为 True。
ignore_non_alphanumeric (bool) – 确定在搜索中是否应使用非字母数字字符。默认为 False。
is_regex (bool) – 定义 string 参数是否为正则表达式。默认为 False。
match_whole_words_only (bool) – 确定是否只匹配整个单词。例如，搜索单词 the 在 then, the, there 中，如果此参数为 True，则只应返回 the。默认为 False。

示例

>>> count_string = CountString(string="the")
>>> count_string(["The problem was difficult.",
...               "He was there.",
...               "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match case of string
>>> count_string_ignore_case = CountString(string="the", ignore_case=False)
>>> count_string_ignore_case(["The problem was difficult.",
...                           "He was there.",
...                           "The girl went to the store."]).tolist()
[0.0, 1.0, 1.0]
>>> # Ignore non-alphanumeric characters in the search
>>> count_string_ignore_non_alphanumeric = CountString(string="the",
...                                                    ignore_non_alphanumeric=True)
>>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.",
...                                       "He was there.",
...                                       "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Specify the string as a regex
>>> count_string_is_regex = CountString(string="t.e", is_regex=True)
>>> count_string_is_regex(["The problem was difficult.",
...                        "He was there.",
...                        "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match whole words only
>>> count_string_match_whole_words_only = CountString(string="the",
...                                                   match_whole_words_only=True)
>>> count_string_match_whole_words_only(["The problem was difficult.",
...                                      "He was there.",
...                                      "The girl went to the store."]).tolist()
[1.0, 0.0, 2.0]

__init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

方法

`__init__`([string, ignore_case, ...])
`flatten_nested_input_types`(input_types)	将嵌套的列模式输入展平为单个列表。
`generate_name`(base_feature_names)
`generate_names`(base_feature_names)
`get_args_string`()
`get_arguments`()
`get_description`(input_column_descriptions[, ...])
`get_filepath`(filename)
`get_function`()
`process_text`(text)

属性

`base_of`
`base_of_exclude`
`commutative`
`default_value`	如果未找到数据，此特征返回的默认值。
`description_template`
`input_types`	输入数据的 woodwork.ColumnSchema 类型
`max_stack_depth`
`name`	原语的名称
`number_output_features`	与此特征关联的特征矩阵中的列数
`return_type`	返回值的 ColumnSchema 类型
`stack_on`
`stack_on_exclude`
`stack_on_self`
`uses_calc_time`
`uses_full_dataframe`

目录

上一主题

下一主题

本页

featuretools.primitives.CountString#

目录

上一主题

下一主题

本页

快速搜索

featuretools.primitives.CountString#