Summarise for a given time period word counts, typically calculatd with cas_count()

Summarise for a given time period word counts, typically calculatd with cas_count()

Usage

cas_summarise(
  count_df,
  date_column_name = date,
  n_column_name = n,
  pattern_column_name = pattern,
  period = NULL,
  f = mean,
  period_summary_function = sum,
  every = 1L,
  before = 0L,
  after = 0L,
  complete = FALSE,
  auto_convert = FALSE
)

Arguments

count_df

A data frame. Must include at least a column with a date or date-time column and a column with number of occurrences for the given time.

period

Defaults to NULL. A string describing the time unit to be used for summarising. Possible values include "year", "quarter", "month", "day", "hour", "minute", "second", "millisecond".

f

Defaults to mean. Function to be applied over n for all the values in a given time period. Common alternatives would be mean or median.

period_summary_function

Defaults to sum. This is applied when grouping by period (e.g. when period is set to year). When calculating absolute word frequency, the default (sum) is fine. When calculating relative frequencies, then mean would be more appropriate, but extra consideration should be given to the implications if then a rolling average is applied.

every

[positive integer(1)]

The number of periods to group together.

For example, if the period was set to "year" with an every value of 2, then the years 1970 and 1971 would be placed in the same group.

before, after

[integer(1) / Inf]

The number of values before or after the current element to include in the sliding window. Set to Inf to select all elements before or after the current element. Negative values are allowed, which allows you to "look forward" from the current element if used as the .before value, or "look backwards" if used as .after.

complete

[logical(1)]

Should the function be evaluated on complete windows only? If FALSE, the default, then partial computations will be allowed.

auto_convert

Defaults to FALSE. If FALSE, the date column is returned using the same format as the input; the minimun vale in the given group is used for reference (e.g. all values for January 2022 are summarised as 2021-01-01 it the data were originally given as dates.). If TRUE, it tries to adapt the output to the most intuitive correspondent type; for year, a numeric column with only the year number, for quarter in the format 2022.1, for month in the format 2022-01.

date

Defaults to date. Unquoted name of a column having either date or date-time as class.

n

Unquoted to n. Unquoted name of a column having number of occurrences per time unit.

Value

A data frame with two columns: the name of the period, and the same name originally used for n.

Examples

if (FALSE) { # \dontrun{
# this assumes dates are provided in a column called date
corpus_df %>%
  cas_count(
    pattern = "example",
    group_by = date
  ) %>%
  cas_summarise(period = "year")
} # }