Skip to contents

Get path to folder where the corpus is stored.

Usage

cas_get_corpus_path(
  ...,
  corpus_folder = "corpus",
  file_format = "parquet",
  partition = NULL,
  token = "full_text"
)

Arguments

...

Passed to cas_get_db_file().

file_format

Defaults to "parquet". Currently, other options are not implemented.

partition

Defaults to NULL. If NULL, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If a year column does not exist, it is created based on the assumption that a date column exists and it is (or can be coerced to) a vector of class Date.

token

Defaults to "full_text", which does not tokenise the text column. If different from full_text, it is passed to tidytext::unnest_tokens (see its help for details). Accepted values include "words", "sentences", and "paragraphs". See ?tidytext::unnest_tokens() for details.

Examples

if (FALSE) {
cas_get_corpus_path()
}