Count total words in a dataset

Usage

cas_count_total_words(
  corpus,
  pattern = "\\w+",
  text = text,
  group_by = date,
  ignore_case = TRUE,
  n_column_name = n,
  locale = "en"
)

corpus: A textual corpus as a data frame.
pattern: Defaults to pattern commonly used to count words.
text: Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.
group_by: Defaults to NULL. If given, the name of one ore more columns to be used for grouping (e.g. date, or doc_id, or source, etc.)
ignore_case: Defaults to TRUE.
n_column_name: Defaults to n. The unquoted name of the column to be used for the count in the output.
locale: Locale to be used when ignore_case is set to TRUE. Passed to stringr::str_to_lower, defaults to "en".