Consider using long waiting times, and using a high number of retry. Retry is
done graciously, using httr::RETRY
, and respecting the waiting time given
when error 529 "too many requests" is returned by the server. This is still
likely to take a long amount of time.
Usage
cas_ia_save(
url = NULL,
wait = 32,
retry_times = 3,
pause_base = 16,
pause_cap = 1024,
pause_min = 64,
only_if_unavailable = TRUE,
ia_check = TRUE,
ia_check_wait = 2,
db_connection = NULL,
check_db = TRUE,
write_db = TRUE,
...
)
Arguments
- url
A charachter vector of length one, a url.
- wait
Defaults to 32. I have found no information online about what wait time is considered suitable by Archive.org itself, but I've noticed that with wait time shorter than 10 seconds the whole process stops getting positive replies from the server very soon.
- retry_times
Defaults to 3. Number of times to retry download in case of errors.
- pause_base, pause_cap
This method uses exponential back-off with full jitter - this means that each request will randomly wait between
pause_min
andpause_base * 2 ^ attempt
seconds, up to a maximum ofpause_cap
seconds.- pause_min
Minimum time to wait in the backoff; generally only necessary if you need pauses less than one second (which may not be kind to the server, use with caution!).
Defaults to TRUE. If TRUE, checks for availability of urls before attempting to save them.
- ia_check
Defaults to TRUE. If TRUE, checks again the URL after saving it and keeps record in the local database.
- ia_check_wait
Defaults to 2, passed to
cas_ia_check()
. Can generally be kept low, as this is a light API.- check_db
Defaults to TRUE. If TRUE, checks if given URL has already been checked in local database, and queries APIs only for URLs that have not been previously checked.
- write_db
Defaults to TRUE. If TRUE, writes result to a local database.
- ...
Passed to
cas_get_db_file()
.
Examples
if (FALSE) { # \dontrun{
if (interactive()) {
# Once the usual parameters are set with `cas_set_options()` it is generally
# ok to just let it get urls from the database and let it run without any
# additional parameter.
cas_ia_save()
}
} # }