Skip to contents

Count strings in a corpus

Usage

cas_count(
  corpus,
  pattern,
  text = text,
  group_by = date,
  ignore_case = TRUE,
  drop_na = TRUE,
  fixed = FALSE,
  full_words_only = FALSE,
  pattern_column_name = pattern,
  n_column_name = n,
  locale = "en"
)

Arguments

corpus

A textual corpus as a data frame.

pattern

A character vector of one or more words or strings to be counted.

text

Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.

group_by

Defaults to NULL. If given, the unquoted name of the column to be used for grouping (e.g. date, or doc_id, or source, etc.)

ignore_case

Defaults to TRUE.

drop_na

Defaults to TRUE. If TRUE, all rows where either text or group_by column is NA are removed before further processing.

full_words_only

Defaults to FALSE. If FALSE, string is counted even when the it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").

pattern_column_name

Defaults to word. The unquoted name of the column to be used for the word in the output.

n_column_name

Defaults to n. The unquoted name of the column to be used for the count in the output.

locale

Locale to be used when ignore_case is set to TRUE. Passed to stringr::str_to_lower, defaults to "en".

Value

A data frame

Examples

if (FALSE) {
cas_count(
  corpus = corpus,
  pattern = c("dogs", "cats", "horses"),
  text = text,
  group_by = date,
  n_column_name = n
)
}