Delete previously stored corpora written with cas_write_corpus()
.
Source: R/cas_delete_corpus.R
cas_delete_corpus.Rd
Typically used for file maintainance, especially when datasets are routinely updated.
Usage
cas_delete_corpus(
keep = 1,
ask = TRUE,
file_format = "parquet",
partition = "year",
token = "full_text",
corpus_folder = "corpus",
path = NULL,
...
)
Arguments
- keep
Numeric, defaults to 1. Number of corpus files to keep. Only the most recent files are kept.
- file_format
Defaults to "parquet". Currently, other options are not implemented.
- partition
Defaults to
NULL
. IfNULL
, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If ayear
column does not exist, it is created based on the assumption that adate
column exists and it is (or can be coerced to) a vector of classDate
.- token
Defaults to "full_text", which does not tokenise the text column. If different from
full_text
, it is passed totidytext::unnest_tokens()
(see its help for details). Accepted values include "words", "sentences", and "paragraphs".- path
Defaults to
NULL
. IfNULL
, path is set to theproject/website/export/dataset/file_format
folder.- ...
Passed to
cas_get_db_file()
.