Skip to contents

Downloads files systematically, and stores details about the download in a local database


  download_df = NULL,
  index = FALSE,
  index_group = NULL,
  file_format = "html",
  overwrite_file = FALSE,
  create_folder_if_missing = NULL,
  wait = 1,
  pause_base = 2,
  pause_cap = 256,
  pause_min = 4,
  sample = FALSE,
  retry_times = 8,
  terminate_on = 404,
  user_agent = NULL,
  download_again_if_status_is_not = NULL,



Logical, defaults to FALSE. If TRUE, downloaded files will be considered index files. If not, they will be considered contents files. See Readme for a more extensive explanation.


Logical, defaults to FALSE. If TRUE, files are downloaded again even if already present, overwriting previously downloaded items.


Defaults to 1. Number of seconds to wait between downloading one page and the next. Can be increased to reduce server load, or can be set to 0 when this is not an issue.


Defaults to FALSE. If TRUE, the download order is randomised. If a numeric is given, the download order is randomised and at most the given number of items is downloaded.


Defaults to 10. Number of times to retry download in case of errors.


Defaults to NULL. If given, passed to download method.


Passed to cas_get_db_file().


A data frame with at least two columns named id and url. Typically generated with cas_build_urls() for index files. If a character vector is given instead, identifiers will be given automatically.