Adds a column with n words before and after the selected pattern to see keywords in context
Source:R/cas_kwic.R
cas_kwic_single_pattern.Rd
Adds a column with n words before and after the selected pattern to see keywords in context
Usage
cas_kwic_single_pattern(
corpus,
pattern,
text = text,
words_before = 5,
words_after = 5,
same_sentence = TRUE,
period_at_end_of_sentence = TRUE,
ignore_case = TRUE,
regex = TRUE,
full_words_only = FALSE,
full_word_with_partial_match = TRUE,
pattern_column_name = pattern
)
Arguments
- corpus
A textual corpus as a data frame.
- pattern
A pattern, typically of one or more words, to be used to break text. Should be of length 1 or length equal to the number of rows.
- text
Defaults to text. The unquoted name of the column of the corpus data frame to be used for matching.
- words_before
Integer, defaults to 5. Number of columns to include in the
before
column.- words_after
Integer, defaults to 5. Number of columns to include in the
after
column.- same_sentence
Logical, defaults to TRUE. If TRUE, before and after include only words found in the sentence including the matched pattern.
- period_at_end_of_sentence
Logical, defaults to TRUE. If TRUE, a period (".") is always included at the end of a sentence. Relevant only if
same_sentence
is set to TRUE.- ignore_case
Defaults to TRUE.
- regex
Defaults to TRUE. Treat pattern as regex.
- full_words_only
Defaults to FALSE. If FALSE, pattern is counted even when it is found in the middle of a word (e.g. if FALSE, "ratio" would be counted as match in the word "irrational").
- full_word_with_partial_match
Defaults to TRUE. If TRUE, if there is a partial match of the pattern, the
pattern
column still includes the full word where the match has been found. Relevant only whenfull_words_only
is set to FALSE.- pattern_column_name
Defaults to 'pattern'. The unquoted name of the column to be used for the word in the output.
Value
A data frame (a tibble), with the same columns as input, plus three columns: before, pattern, and after. Only rows where the pattern is found are included.
Examples
cas_kwic_single_pattern(
corpus = tifkremlinen::kremlin_en,
pattern = "West"
)
#> # A tibble: 1,844 × 11
#> doc_id text date title location link id term before pattern after
#> <chr> <chr> <date> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 presi… Prim… 2000-01-12 Prim… Petroza… http… 37746 Puti… Counc… West Asso…
#> 2 presi… Prim… 2000-01-12 Prim… Petroza… http… 37746 Puti… by th… northw… Russ…
#> 3 presi… In P… 2000-01-12 Prim… NA http… 37745 Puti… Counc… West Asso…
#> 4 presi… Howe… 2000-01-23 Acti… NA http… 37821 Puti… that … West woul…
#> 5 presi… He u… 2000-01-28 Acti… The Kre… http… 38049 Puti… He un… Western poin…
#> 6 presi… Acti… 2000-02-08 Acti… The Kre… http… 37887 Puti… for b… Western Euro…
#> 7 presi… The … 2000-02-10 Acti… The Gov… http… 37905 Puti… suppl… Western Euro…
#> 8 presi… Vlad… 2000-03-03 Vlad… Surgut http… 38771 Puti… oil-p… West Surg…
#> 9 presi… Vlad… 2000-05-18 Vlad… NA http… 38189 Puti… Pleni… West Fede…
#> 10 presi… Pres… 2000-05-22 Pres… The Kre… http… 38214 Puti… Presi… West Fede…
#> # ℹ 1,834 more rows