Skip to contents

Create a data frame with not yet downloaded files


  urls = NULL,
  index = FALSE,
  index_group = NULL,
  desc_id = FALSE,
  batch = NULL,
  create_folder_if_missing = NULL,
  custom_folder = NULL,
  custom_path = NULL,
  file_format = "html",
  db_connection = NULL,
  download_again = FALSE,
  download_again_if_status_is_not = NULL,



Defaults to NULL. If given, it should correspond with a data frame with at least two columns named id and url. If not given, an attempt will be made to load it from the local database.


Logical, defaults to FALSE. If TRUE, downloaded files will be considered index files. If not, they will be considered contents files. See Readme for a more extensive explanation.


Logical, defaults to FALSE. If TRUE, results are returned with highest id first.


An integer, defaults to NULL. If not given, a check is performed in the database to find if previous downloads have taken place. If so, by default, the current batch will be one unit higher than the highest batch number found in the database.


Defaults to NULL. If given, it must a status code as integer, typically 200L, or c(200L, 404L).


Arguments passed on to cas_get_urls_df, cas_get_base_folder


Defaults to NULL. If given, all other parameters and settings are ignored, and folder is set to this value.


A data frame with four columns: id, url, path and type