featuretools.primitives.CountString#

class featuretools.primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

确定给定字符串在文本字段中出现的次数。

参数:
  • string (str) – 要计算出现次数的字符串。默认为单词“the”。

  • ignore_case (bool) – 确定是否应考虑字符串的大小写。默认为 True。

  • ignore_non_alphanumeric (bool) – 确定在搜索中是否应使用非字母数字字符。默认为 False。

  • is_regex (bool) – 定义 string 参数是否为正则表达式。默认为 False。

  • match_whole_words_only (bool) – 确定是否只匹配整个单词。例如,搜索单词 thethen, the, there 中,如果此参数为 True,则只应返回 the。默认为 False。

示例

>>> count_string = CountString(string="the")
>>> count_string(["The problem was difficult.",
...               "He was there.",
...               "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match case of string
>>> count_string_ignore_case = CountString(string="the", ignore_case=False)
>>> count_string_ignore_case(["The problem was difficult.",
...                           "He was there.",
...                           "The girl went to the store."]).tolist()
[0.0, 1.0, 1.0]
>>> # Ignore non-alphanumeric characters in the search
>>> count_string_ignore_non_alphanumeric = CountString(string="the",
...                                                    ignore_non_alphanumeric=True)
>>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.",
...                                       "He was there.",
...                                       "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Specify the string as a regex
>>> count_string_is_regex = CountString(string="t.e", is_regex=True)
>>> count_string_is_regex(["The problem was difficult.",
...                        "He was there.",
...                        "The girl went to the store."]).tolist()
[1.0, 1.0, 2.0]
>>> # Match whole words only
>>> count_string_match_whole_words_only = CountString(string="the",
...                                                   match_whole_words_only=True)
>>> count_string_match_whole_words_only(["The problem was difficult.",
...                                      "He was there.",
...                                      "The girl went to the store."]).tolist()
[1.0, 0.0, 2.0]
__init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#

方法

__init__([string, ignore_case, ...])

flatten_nested_input_types(input_types)

将嵌套的列模式输入展平为单个列表。

generate_name(base_feature_names)

generate_names(base_feature_names)

get_args_string()

get_arguments()

get_description(input_column_descriptions[, ...])

get_filepath(filename)

get_function()

process_text(text)

属性

base_of

base_of_exclude

commutative

default_value

如果未找到数据,此特征返回的默认值。

description_template

input_types

输入数据的 woodwork.ColumnSchema 类型

max_stack_depth

name

原语的名称

number_output_features

与此特征关联的特征矩阵中的列数

return_type

返回值的 ColumnSchema 类型

stack_on

stack_on_exclude

stack_on_self

uses_calc_time

uses_full_dataframe