Skip to contents

Count strings in a corpus relative to the number of words

Usage

cas_count_relative(
  corpus,
  pattern,
  text = text,
  group_by = date,
  ignore_case = TRUE,
  fixed = FALSE,
  full_words_only = FALSE,
  pattern_column_name = pattern,
  n_column_name = n,
  locale = "en"
)

Arguments

corpus

A textual corpus as a data frame.

pattern

A character vector of one or more words or strings to be counted.

text

Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.

group_by

Defaults to NULL. If given, the unquoted name of the column to be used for grouping (e.g. date, or doc_id, or source, etc.)

ignore_case

Defaults to TRUE.

full_words_only

Defaults to FALSE. If FALSE, string is counted even when the it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").

pattern_column_name

Defaults to 'word'. The unquoted name of the column to be used for the word in the output (if include_string is set to TRUE, as per default).

n_column_name

Defaults to 'n'. The unquoted name of the column to be used for the count in the output.

locale

Locale to be used when ignore_case is set to TRUE. Passed to stringr::str_to_lower, defaults to "en".

Value

A data frame

Examples

if (FALSE) { # \dontrun{
cas_count_relative(
  corpus = corpus,
  pattern = c("dogs", "cats", "horses"),
  text = text,
  group_by = date,
  n_column_name = n
)
} # }