Searches in common locations (namely, example.com/sitemap.xml
, and
example.com/sitemap_index.xml
) and then in robots.txt and returns a url to
the sitemap, along with the contents of the sitemap itself, if found.
Usage
cas_get_sitemap(
domain = NULL,
sitemap_url = NULL,
check_robots = TRUE,
check_common = TRUE,
read_from_db = TRUE,
write_to_db = FALSE,
db_connection = NULL,
disconnect_db = FALSE,
...
)
Arguments
- domain
Defaults to
NULL
, but required unlesssitemap_url
given. Expected to be a full domain name. If input does not start withhttp
, thenhttps://
is prepended automatically.- sitemap_url
Defaults to
NULL
. If given,domain
is ignored.- db_connection
Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).
- ...
Passed to
cas_get_db_file()
.
Value
A data frame, including a sitemap_url
column, the response as an
httr2 object, and the body of the xml.
Examples
if (interactive()) {
cas_get_sitemap(domain = "https://www.europeandatajournalism.eu/")
}
#> ℹ Folder /tmp/RtmpSOhy4K/R/castarter_data for storing project and website files
#> created.
#> # A tibble: 1 × 1
#> sitemap_url
#> <chr>
#> 1 https://www.europeandatajournalism.eu/sitemap_index.xml