Count strings in a corpus relative to the number of words

Usage

cas_count_relative(
  corpus,
  pattern,
  text = text,
  group_by = date,
  ignore_case = TRUE,
  fixed = FALSE,
  full_words_only = FALSE,
  pattern_column_name = pattern,
  n_column_name = n,
  locale = "en"
)

Arguments

corpus: A textual corpus as a data frame.
pattern: A character vector of one or more words or strings to be counted.
text: Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.
group_by: Defaults to NULL. If given, the unquoted name of the column to be used for grouping (e.g. date, or doc_id, or source, etc.)
ignore_case: Defaults to TRUE.
full_words_only: Defaults to FALSE. If FALSE, string is counted even when the it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").
pattern_column_name: Defaults to 'word'. The unquoted name of the column to be used for the word in the output (if include_string is set to TRUE, as per default).
n_column_name: Defaults to 'n'. The unquoted name of the column to be used for the count in the output.
locale: Locale to be used when ignore_case is set to TRUE. Passed to stringr::str_to_lower, defaults to "en".

Value

A data frame

Examples

if (FALSE) { # \dontrun{
cas_count_relative(
  corpus = corpus,
  pattern = c("dogs", "cats", "horses"),
  text = text,
  group_by = date,
  n_column_name = n
)
} # }