Skip to contents

Create a data frame with not yet downloaded files

Usage

cas_get_files_to_download(
  urls = NULL,
  index = FALSE,
  index_group = NULL,
  desc_id = FALSE,
  batch = NULL,
  create_folder_if_missing = NULL,
  custom_folder = NULL,
  custom_path = NULL,
  file_format = "html",
  db_connection = NULL,
  download_again = FALSE,
  download_again_if_status_is_not = NULL,
  ...
)

Arguments

urls

Defaults to NULL. If given, it should correspond with a data frame with at least two columns named id and url. If not given, an attempt will be made to load it from the local database.

index

Logical, defaults to FALSE. If TRUE, downloaded files will be considered index files. If not, they will be considered contents files. See Readme for a more extensive explanation.

desc_id

Logical, defaults to FALSE. If TRUE, results are returned with highest id first.

batch

An integer, defaults to NULL. If not given, a check is performed in the database to find if previous downloads have taken place. If so, by default, the current batch will be one unit higher than the highest batch number found in the database.

download_again_if_status_is_not

Defaults to NULL. If given, it must a status code as integer, typically 200L, or c(200L, 404L).

...

Arguments passed on to cas_get_urls_df, cas_get_base_folder

custom_path

Defaults to NULL. If given, all other parameters and settings are ignored, and folder is set to this value.

Value

A data frame with four columns: id, url, path and type