Skip to contents

Downloads one file at a time with chromote

Usage

cas_download_chromote(
  download_df = NULL,
  index = FALSE,
  index_group = NULL,
  overwrite_file = FALSE,
  ignore_id = TRUE,
  wait = 1,
  db_connection = NULL,
  sample = FALSE,
  file_format = "html",
  download_again = FALSE,
  disconnect_db = FALSE,
  ...
)

Arguments

download_df

A data frame with four columns: id, url, path, type.

index

Logical, defaults to FALSE. If TRUE, downloaded files will be considered index files. If not, they will be considered contents files. See Readme for a more extensive explanation.

overwrite_file

Logical, defaults to FALSE.

wait

Defaults to 1. Number of seconds to wait between downloading one page and the next. Can be increased to reduce server load, or can be set to 0 when this is not an issue.

db_connection

Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).

sample

Defaults to FALSE. If TRUE, the download order is randomised. If a numeric is given, the download order is randomised and at most the given number of items is downloaded.

file_format

Defaults to html. Used for storing files in dedicated folders, but also for determining processing options. For example, if a sitemap is downloaded as an index with file_format set to xml, it will be processed accordingly. If it is stored as xml.gz, it will be automatically decompressed for correct processing.

disconnect_db

Defaults to TRUE. If FALSE, leaves the connection to database open.

...

Passed to cas_get_db_file().