Searches in common locations (namely, example.com/sitemap.xml, and
example.com/sitemap_index.xml) and then in robots.txt and returns a url to
the sitemap, along with the contents of the sitemap itself, if found.
Usage
cas_get_sitemap(
domain = NULL,
sitemap_url = NULL,
check_robots = TRUE,
check_common = TRUE,
read_from_db = TRUE,
write_to_db = FALSE,
db_connection = NULL,
disconnect_db = FALSE,
...
)Arguments
- domain
Defaults to
NULL, but required unlesssitemap_urlgiven. Expected to be a full domain name. If input does not start withhttp, thenhttps://is prepended automatically.- sitemap_url
Defaults to
NULL. If given,domainis ignored.- db_connection
Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).
- ...
Passed to
cas_get_db_file().
Value
A data frame, including a sitemap_url column, the response as an
httr2 object, and the body of the xml.
Examples
if (interactive()) {
cas_get_sitemap(domain = "https://www.europeandatajournalism.eu/")
}
#> ℹ Folder /tmp/RtmpSOhy4K/R/castarter_data for storing project and website files
#> created.
#> # A tibble: 1 × 1
#> sitemap_url
#> <chr>
#> 1 https://www.europeandatajournalism.eu/sitemap_index.xml