Count total words in a dataset
Usage
cas_count_total_words(
corpus,
pattern = "\\w+",
text = text,
group_by = date,
ignore_case = TRUE,
n_column_name = n,
locale = "en"
)
Arguments
- corpus
A textual corpus as a data frame.
- pattern
Defaults to pattern commonly used to count words.
- text
Defaults to
text
. The unquoted name of the column of the corpus data frame to be used for matching.- group_by
Defaults to NULL. If given, the unquoted name of the column to be used for grouping (e.g. date, or doc_id, or source, etc.)
- ignore_case
Defaults to TRUE.
- n_column_name
Defaults to
n
. The unquoted name of the column to be used for the count in the output.- locale
Locale to be used when ignore_case is set to TRUE. Passed to
stringr::str_to_lower
, defaults to "en".