Skip to contents

Adds a column with n words before and after the selected pattern to see keywords in context

Usage

cas_kwic(
  corpus,
  pattern,
  text = text,
  words_before = 5,
  words_after = 5,
  same_sentence = TRUE,
  period_at_end_of_sentence = TRUE,
  ignore_case = TRUE,
  regex = TRUE,
  full_words_only = FALSE,
  full_word_with_partial_match = TRUE,
  pattern_column_name = pattern
)

Arguments

corpus

A textual corpus as a data frame.

pattern

A pattern, typically of one or more words, to be used to break text. Should be of length 1 or length equal to the number of rows.

text

Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.

words_before

Integer, defaults to 5. Number of columns to include in the before column.

words_after

Integer, defaults to 5. Number of columns to include in the after column.

same_sentence

Logical, defaults to TRUE. If TRUE, before and after include only words found in the sentence including the matched pattern.

period_at_end_of_sentence

Logical, defaults to TRUE. If TRUE, a period (".") is always included at the end of a sentence. Relevant only if same_sentence is set to TRUE.

ignore_case

Defaults to TRUE.

regex

Defaults to TRUE. Treat pattern as regex.

full_words_only

Defaults to FALSE. If FALSE, pattern is counted even when it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").

full_word_with_partial_match

Defaults to TRUE. If TRUE, if there is a partial match of the pattern, the pattern column still includes the full word where the match has been found. Relevant only when full_words_only is set to FALSE.

pattern_column_name

Defaults to "pattern'. The unquoted name of the column to be used for the word in the output.

Value

A data frame (a tibble), with the same columns as input, plus three columns: before, pattern, and after. Only rows where the pattern is found are included.

Examples


cas_kwic(
  corpus = cas_demo_corpus,
  pattern = c("china", "india")
)
#> # A tibble: 27 × 10
#>    doc_id       text  date       title location link     id before pattern after
#>    <chr>        <chr> <date>     <chr> <chr>    <chr> <dbl> <chr>  <chr>   <chr>
#>  1 president_r… Acti… 2000-01-18 Acti… The Kre… http… 37781 Minis… China's Mini…
#>  2 president_r… Acti… 2000-01-18 Acti… The Kre… http… 37781 the C… China   Janu…
#>  3 president_r… Mr P… 2000-01-18 Acti… The Kre… http… 37781 for t… China   .    
#>  4 president_r… He a… 2000-01-18 Acti… The Kre… http… 37781 He al… China   had …
#>  5 president_r… Acti… 2000-02-28 Acti… The Kre… http… 38018 to th… China   sche…
#>  6 president_r… As t… 2000-03-01 Vlad… The Kre… http… 38761 and c… China   were…
#>  7 president_r… Mr T… 2000-03-01 Vlad… The Kre… http… 38761 to Mr… China   with…
#>  8 president_r… The … 2000-07-12 Pres… The Kre… http… 38330 to Mr… China   and …
#>  9 president_r… Pres… 2001-02-21 Pres… The Kre… http… 40921 the C… China   Febr…
#> 10 president_r… Mr P… 2001-04-29 Pres… The Kre… http… 41069 Putin… China   had …
#> # ℹ 17 more rows