Skip to contents

Read index from local database

Usage

cas_read_db_download(
  index = FALSE,
  id = NULL,
  batch = "latest",
  status = 200L,
  db_connection = NULL,
  db_folder = NULL,
  ...
)

Arguments

batch

Default to "latest": returns only the path to the file with the highest batch identifier available. Valid values are: "latest", "all", or a numeric identifier corresponding to desired batch.

status

Defaults to 200. Keeps only files downloaded with the given status (can be more than one, given as a vector). If NULL, no filter based on status is applied.

db_connection

Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).

...

Passed to cas_get_db_file().

Value

A data frame with three columns and data stored in the index_id table of the local database. The data frame has zero rows if the database does not exist or no data was previously stored there.

Examples

cas_set_options(
  base_folder = fs::path(tempdir(), "R", "castarter_data"),
  db_folder = fs::path(tempdir(), "R", "castarter_data"),
  project = "example_project",
  website = "example_website"
)
cas_enable_db()


urls_df <- cas_build_urls(
  url = "https://www.example.com/news/",
  start_page = 1,
  end_page = 10
)

cas_write_db_index(urls = urls_df)
#>  Urls added to index_id table: 10

cas_read_db_index()
#> # Source:   table<`index_id`> [10 x 3]
#> # Database: sqlite 3.46.0 [/tmp/Rtmp1QODEK/R/castarter_data/cas_example_project_example_website_db.sqlite]
#>       id url                             index_group
#>    <dbl> <chr>                           <chr>      
#>  1     1 https://www.example.com/news/1  index      
#>  2     2 https://www.example.com/news/2  index      
#>  3     3 https://www.example.com/news/3  index      
#>  4     4 https://www.example.com/news/4  index      
#>  5     5 https://www.example.com/news/5  index      
#>  6     6 https://www.example.com/news/6  index      
#>  7     7 https://www.example.com/news/7  index      
#>  8     8 https://www.example.com/news/8  index      
#>  9     9 https://www.example.com/news/9  index      
#> 10    10 https://www.example.com/news/10 index