Read index from local database
Usage
cas_read_db_download(
index = FALSE,
id = NULL,
batch = "latest",
status = 200L,
db_connection = NULL,
db_folder = NULL,
...
)
Arguments
- batch
Default to "latest": returns only the path to the file with the highest batch identifier available. Valid values are: "latest", "all", or a numeric identifier corresponding to desired batch.
- status
Defaults to 200. Keeps only files downloaded with the given status (can be more than one, given as a vector). If NULL, no filter based on status is applied.
- db_connection
Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).
- ...
Passed to
cas_get_db_file()
.
Value
A data frame with three columns and data stored in the index_id
table of the local database. The data frame has zero rows if the database
does not exist or no data was previously stored there.
Examples
cas_set_options(
base_folder = fs::path(tempdir(), "R", "castarter_data"),
db_folder = fs::path(tempdir(), "R", "castarter_data"),
project = "example_project",
website = "example_website"
)
cas_enable_db()
urls_df <- cas_build_urls(
url = "https://www.example.com/news/",
start_page = 1,
end_page = 10
)
cas_write_db_index(urls = urls_df)
#> ✔ Urls added to index_id table: 10
cas_read_db_index()
#> # Source: table<`index_id`> [10 x 3]
#> # Database: sqlite 3.46.0 [/tmp/Rtmp1QODEK/R/castarter_data/cas_example_project_example_website_db.sqlite]
#> id url index_group
#> <dbl> <chr> <chr>
#> 1 1 https://www.example.com/news/1 index
#> 2 2 https://www.example.com/news/2 index
#> 3 3 https://www.example.com/news/3 index
#> 4 4 https://www.example.com/news/4 index
#> 5 5 https://www.example.com/news/5 index
#> 6 6 https://www.example.com/news/6 index
#> 7 7 https://www.example.com/news/7 index
#> 8 8 https://www.example.com/news/8 index
#> 9 9 https://www.example.com/news/9 index
#> 10 10 https://www.example.com/news/10 index