Create a data frame with not yet downloaded files
Source:R/cas_download.R
cas_get_files_to_download.RdCreate a data frame with not yet downloaded files
Usage
cas_get_files_to_download(
urls = NULL,
index = FALSE,
index_group = NULL,
ignore_id = TRUE,
desc_id = FALSE,
batch = NULL,
create_folder_if_missing = NULL,
custom_folder = NULL,
custom_path = NULL,
file_format = "html",
db_connection = NULL,
download_again = FALSE,
download_again_if_status_is_not = NULL,
...
)Arguments
- urls
Defaults to NULL. If given, it should correspond with a data frame with at least two columns named
idandurl. If not given, an attempt will be made to load it from the local database.- index
Logical, defaults to FALSE. If TRUE, downloaded files will be considered
indexfiles. If not, they will be consideredcontentsfiles. See Readme for a more extensive explanation.- desc_id
Logical, defaults to FALSE. If TRUE, results are returned with highest id first.
- batch
An integer, defaults to NULL. If not given, a check is performed in the database to find if previous downloads have taken place. If so, by default, the current batch will be one unit higher than the highest batch number found in the database.
- file_format
Defaults to
html. Used for storing files in dedicated folders, but also for determining processing options. For example, if a sitemap is downloaded as an index withfile_formatset to xml, it will be processed accordingly. If it is stored as xml.gz, it will be automatically decompressed for correct processing.- download_again_if_status_is_not
Defaults to NULL. If given, it must a status code as integer, typically
200L, orc(200L, 404L).- ...
Arguments passed on to
cas_get_urls_df,cas_get_base_foldercustom_pathDefaults to NULL. If given, all other parameters and settings are ignored, and this value is returned.