Currently supports only update when re-downloading index urls is expected to bring new articles. It takes the first urls for each index group, and continues downloading new index pages as long as new links are found in each page. If no new link is found, it stops downloading and moves to the next index group.
Usage
cas_update(
extract_links_partial,
extractors,
post_processing = NULL,
wait = 3,
user_agent = NULL,
...
)
Arguments
- extract_links_partial
A partial function, typically created with
purrr::partial(.f = cas_extract_links)
, followed by the paramters originally used bycas_extract_links()
. See examples.- extractors
A named list of functions. See examples for details.
- post_processing
Defaults to NULL. If given, it must be a function that takes a data frame as input (logically, a row of the dataset) and returns it with additional or modified columns.
- wait
Defaults to 1. Number of seconds to wait between downloading one page and the next. Can be increased to reduce server load, or can be set to 0 when this is not an issue.
- user_agent
Defaults to NULL. If given, passed to download method.
- ...
Passed to
cas_get_db_file()
.