Adds a column with n words before and after the selected pattern to see keywords in context
Source:R/cas_kwic.R
cas_kwic.Rd
Adds a column with n words before and after the selected pattern to see keywords in context
Usage
cas_kwic(
corpus,
pattern,
text = text,
words_before = 5,
words_after = 5,
same_sentence = TRUE,
period_at_end_of_sentence = TRUE,
ignore_case = TRUE,
regex = TRUE,
full_words_only = FALSE,
full_word_with_partial_match = TRUE,
pattern_column_name = pattern
)
Arguments
- corpus
A textual corpus as a data frame.
- pattern
A pattern, typically of one or more words, to be used to break text. Should be of length 1 or length equal to the number of rows.
- text
Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.
- words_before
Integer, defaults to 5. Number of columns to include in the
before
column.- words_after
Integer, defaults to 5. Number of columns to include in the
after
column.- same_sentence
Logical, defaults to
TRUE
. If TRUE, before and after include only words found in the sentence including the matched pattern.- period_at_end_of_sentence
Logical, defaults to
TRUE
. IfTRUE
, a period (".") is always included at the end of a sentence. Relevant only ifsame_sentence
is set toTRUE
.- ignore_case
Defaults to
TRUE
.- regex
Defaults to
TRUE
. Treat pattern as regex.- full_words_only
Defaults to
FALSE
. IfFALSE
, pattern is counted even when it is found in the middle of a word (e.g. ifFALSE
, "ratio" would be counted as match in the word "irrational").- full_word_with_partial_match
Defaults to
TRUE
. IfTRUE
, if there is a partial match of the pattern, thepattern
column still includes the full word where the match has been found. Relevant only whenfull_words_only
is set toFALSE
.- pattern_column_name
Defaults to "pattern'. The unquoted name of the column to be used for the word in the output.
Value
A data frame (a tibble), with the same columns as input, plus three
columns: before
, pattern
, and after
. Only rows where the pattern is
found are included.
Examples
cas_kwic(
corpus = cas_demo_corpus,
pattern = c("china", "india")
)
#> # A tibble: 27 × 10
#> doc_id text date title location link id before pattern after
#> <chr> <chr> <date> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 president_r… Acti… 2000-01-18 Acti… The Kre… http… 37781 Minis… China's Mini…
#> 2 president_r… Acti… 2000-01-18 Acti… The Kre… http… 37781 the C… China Janu…
#> 3 president_r… Mr P… 2000-01-18 Acti… The Kre… http… 37781 for t… China .
#> 4 president_r… He a… 2000-01-18 Acti… The Kre… http… 37781 He al… China had …
#> 5 president_r… Acti… 2000-02-28 Acti… The Kre… http… 38018 to th… China sche…
#> 6 president_r… As t… 2000-03-01 Vlad… The Kre… http… 38761 and c… China were…
#> 7 president_r… Mr T… 2000-03-01 Vlad… The Kre… http… 38761 to Mr… China with…
#> 8 president_r… The … 2000-07-12 Pres… The Kre… http… 38330 to Mr… China and …
#> 9 president_r… Pres… 2001-02-21 Pres… The Kre… http… 40921 the C… China Febr…
#> 10 president_r… Mr P… 2001-04-29 Pres… The Kre… http… 41069 Putin… China had …
#> # ℹ 17 more rows