Count strings in a corpus
Usage
cas_count(
corpus,
pattern,
text = text,
group_by = date,
ignore_case = TRUE,
drop_na = TRUE,
fixed = FALSE,
full_words_only = FALSE,
pattern_column_name = pattern,
n_column_name = n,
locale = "en"
)
Arguments
- corpus
A textual corpus as a data frame.
- pattern
A character vector of one or more words or strings to be counted.
- text
Defaults to
text
. The unquoted name of the column of the corpus data frame to be used for matching.- group_by
Defaults to NULL. If given, the unquoted name of the column to be used for grouping (e.g. date, or doc_id, or source, etc.)
- ignore_case
Defaults to TRUE.
- drop_na
Defaults to TRUE. If TRUE, all rows where either
text
orgroup_by
column is NA are removed before further processing.- full_words_only
Defaults to FALSE. If FALSE, string is counted even when the it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").
- pattern_column_name
Defaults to
word
. The unquoted name of the column to be used for the word in the output.- n_column_name
Defaults to
n
. The unquoted name of the column to be used for the count in the output.- locale
Locale to be used when ignore_case is set to TRUE. Passed to
stringr::str_to_lower
, defaults to "en".
Examples
if (FALSE) { # \dontrun{
cas_count(
corpus = corpus,
pattern = c("dogs", "cats", "horses"),
text = text,
group_by = date,
n_column_name = n
)
} # }