Get path to folder where the corpus is stored.
Source:R/cas_get_corpus_path.R
cas_get_corpus_path.Rd
Get path to folder where the corpus is stored.
Usage
cas_get_corpus_path(
...,
corpus_folder = "corpus",
file_format = "parquet",
partition = NULL,
token = "full_text"
)
Arguments
- ...
Passed to
cas_get_db_file()
.- file_format
Defaults to "parquet". Currently, other options are not implemented.
- partition
Defaults to NULL. If NULL, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If a
year
column does not exist, it is created based on the assumption that adate
column exists and it is (or can be coerced to) a vector of classDate
.- token
Defaults to "full_text", which does not tokenise the text column. If different from
full_text
, it is passed totidytext::unnest_tokens
(see its help for details). Accepted values include "words", "sentences", and "paragraphs". See?tidytext::unnest_tokens()
for details.