Count strings in a corpus

Usage

cas_count(
  corpus,
  pattern,
  text = text,
  group_by = date,
  ignore_case = TRUE,
  drop_na = TRUE,
  fixed = FALSE,
  full_words_only = FALSE,
  pattern_column_name = pattern,
  n_column_name = n,
  locale = "en"
)

Arguments

corpus: A textual corpus as a data frame.
pattern: A character vector of one or more words or strings to be counted.
text: Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.
group_by: Defaults to NULL. If given, the name of one ore more columns to be used for grouping (e.g. date, or doc_id, or source, etc.)
ignore_case: Defaults to TRUE.
drop_na: Defaults to TRUE. If TRUE, all rows where either text or group_by column is NA are removed before further processing.
full_words_only: Defaults to FALSE. If FALSE, string is counted even when the it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").
pattern_column_name: Defaults to word. The unquoted name of the column to be used for the word in the output.
n_column_name: Defaults to n. The unquoted name of the column to be used for the count in the output.
locale: Locale to be used when ignore_case is set to TRUE. Passed to stringr::str_to_lower, defaults to "en".

Value

A data frame

Examples

if (FALSE) { # \dontrun{
cas_count(
  corpus = corpus,
  pattern = c("dogs", "cats", "horses"),
  text = text,
  group_by = date,
  n_column_name = n
)
} # }