featuretools.primitives.CountString#
- class featuretools.primitives.CountString(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#
确定给定字符串在文本字段中出现的次数。
- 参数:
string (str) – 要计算出现次数的字符串。默认为单词“the”。
ignore_case (bool) – 确定是否应考虑字符串的大小写。默认为 True。
ignore_non_alphanumeric (bool) – 确定在搜索中是否应使用非字母数字字符。默认为 False。
is_regex (bool) – 定义 string 参数是否为正则表达式。默认为 False。
match_whole_words_only (bool) – 确定是否只匹配整个单词。例如,搜索单词 the 在 then, the, there 中,如果此参数为 True,则只应返回 the。默认为 False。
示例
>>> count_string = CountString(string="the") >>> count_string(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Match case of string >>> count_string_ignore_case = CountString(string="the", ignore_case=False) >>> count_string_ignore_case(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [0.0, 1.0, 1.0] >>> # Ignore non-alphanumeric characters in the search >>> count_string_ignore_non_alphanumeric = CountString(string="the", ... ignore_non_alphanumeric=True) >>> count_string_ignore_non_alphanumeric(["Th*/e problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Specify the string as a regex >>> count_string_is_regex = CountString(string="t.e", is_regex=True) >>> count_string_is_regex(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 1.0, 2.0] >>> # Match whole words only >>> count_string_match_whole_words_only = CountString(string="the", ... match_whole_words_only=True) >>> count_string_match_whole_words_only(["The problem was difficult.", ... "He was there.", ... "The girl went to the store."]).tolist() [1.0, 0.0, 2.0]
- __init__(string='the', ignore_case=True, ignore_non_alphanumeric=False, is_regex=False, match_whole_words_only=False)[source]#
方法
__init__
([string, ignore_case, ...])flatten_nested_input_types
(input_types)将嵌套的列模式输入展平为单个列表。
generate_name
(base_feature_names)generate_names
(base_feature_names)get_args_string
()get_arguments
()get_description
(input_column_descriptions[, ...])get_filepath
(filename)get_function
()process_text
(text)属性
base_of
base_of_exclude
commutative
default_value
如果未找到数据,此特征返回的默认值。
description_template
input_types
输入数据的 woodwork.ColumnSchema 类型
max_stack_depth
name
原语的名称
number_output_features
与此特征关联的特征矩阵中的列数
return_type
返回值的 ColumnSchema 类型
stack_on
stack_on_exclude
stack_on_self
uses_calc_time
uses_full_dataframe