Downloads files systematically, and stores details about the download in a local database
Source:R/cas_download.R
cas_download.Rd
Downloads files systematically, and stores details about the download in a local database
Usage
cas_download(
download_df = NULL,
index = FALSE,
index_group = NULL,
file_format = "html",
overwrite_file = FALSE,
create_folder_if_missing = NULL,
ignore_id = TRUE,
wait = 1,
pause_base = 2,
pause_cap = 256,
pause_min = 4,
sample = FALSE,
retry_times = 3,
terminate_on = NULL,
user_agent = NULL,
download_again_if_status_is_not = NULL,
...
)
Arguments
- index
Logical, defaults to FALSE. If TRUE, downloaded files will be considered
index
files. If not, they will be consideredcontents
files. See Readme for a more extensive explanation.- overwrite_file
Logical, defaults to FALSE. If TRUE, files are downloaded again even if already present, overwriting previously downloaded items.
- wait
Defaults to 1. Number of seconds to wait between downloading one page and the next. Can be increased to reduce server load, or can be set to 0 when this is not an issue.
- sample
Defaults to FALSE. If TRUE, the download order is randomised. If a numeric is given, the download order is randomised and at most the given number of items is downloaded.
- retry_times
Defaults to 3. Number of times to retry download in case of errors.
- user_agent
Defaults to NULL. If given, passed to download method.
- ...
Passed to
cas_get_db_file()
.- urls_df
A data frame with at least two columns named
id
andurl
. Typically generated withcas_build_urls()
for index files. If a character vector is given instead, identifiers will be given automatically.