Skip to contents

Export the textual dataset for the current website


  corpus = NULL,
  arrange_by = NULL,
  to_lower = FALSE,
  drop_na = TRUE,
  date = date,
  text = text,
  tif_compliant = TRUE,
  file_format = "parquet",
  partition = "year",
  token = "full_text",
  corpus_folder = "corpus",
  path = NULL,
  db_connection = NULL,
  db_folder = NULL,



Defaults to NULL. If NULL, retrieves corpus from the current website with cas_read_db_contents_data(). If given, it is expected to be a corresponding data frame.


Defaults to NULL. If given, expected to be an unquoted column name, such as date or id.


Defaults to FALSE. Whether to convert tokens to lowercase. Passed to tidytext if token is not full_text.


Defaults to TRUE. If TRUE, items that have NA in their text or date columns are dropped. This is often useful, as in many cases these may have other issues and/or cause inconsistencies in further analyses.


Unquoted date column, defaults to date.


Unquoted text column, defaults to text. If tif_compliant is set to TRUE, it will be renamed to "text" even if originally it had a different name.


Defaults to TRUE. If TRUE, it ensures that the first column is a character vector named "doc_id" and that the second column is a character vector named "text". See for details


Defaults to "parquet". Currently, other options are not implemented.


Defaults to NULL. If NULL, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If a year column does not exist, it is created based on the assumption that a date column exists and it is (or can be coerced to) a vector of class Date.


Defaults to "full_text", which does not tokenise the text column. If different from full_text, it is passed to tidytext::unnest_tokens (see its help for details). Accepted values include "words", "sentences", and "paragraphs". See ?tidytext::unnest_tokens() for details.


Defaults to NULL. If NULL, path is set to the project/website/export/dataset/file_format folder.


Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).


Passed to cas_get_db_file().