Skip to contents

Count total words in a dataset

Usage

cas_count_total_words(
  corpus,
  pattern = "\\w+",
  text = text,
  group_by = date,
  ignore_case = TRUE,
  n_column_name = n,
  locale = "en"
)

Arguments

corpus

A textual corpus as a data frame.

pattern

Defaults to pattern commonly used to count words.

text

Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.

group_by

Defaults to NULL. If given, the unquoted name of the column to be used for grouping (e.g. date, or doc_id, or source, etc.)

ignore_case

Defaults to TRUE.

n_column_name

Defaults to n. The unquoted name of the column to be used for the count in the output.

locale

Locale to be used when ignore_case is set to TRUE. Passed to stringr::str_to_lower, defaults to "en".