Delete previously stored corpora written with cas_write_corpus()
.
Source: R/cas_delete_corpus.R
cas_delete_corpus.Rd
Typically used for file maintainance, especially when datasets are routinely updated.
Usage
cas_delete_corpus(
keep = 1,
ask = TRUE,
file_format = "parquet",
partition = "year",
token = "full_text",
corpus_folder = "corpus",
path = NULL,
...
)
Arguments
- keep
Numeric, defaults to 1. Number of corpus files to keep. Only the most recent files are kept.
- file_format
Defaults to "parquet". Currently, other options are not implemented.
- partition
Defaults to NULL. If NULL, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If a
year
column does not exist, it is created based on the assumption that adate
column exists and it is (or can be coerced to) a vector of classDate
.- token
Defaults to "full_text", which does not tokenise the text column. If different from
full_text
, it is passed totidytext::unnest_tokens
(see its help for details). Accepted values include "words", "sentences", and "paragraphs". See?tidytext::unnest_tokens()
for details.- path
Defaults to NULL. If NULL, path is set to the project/website/export/dataset/file_format folder.
- ...
Passed to
cas_get_db_file()
.