Write index URLs to local database — cas_write_db

If some URLs are already included in the database, it appends only the new ones: URLs are expected to be unique.

Usage

cas_write_db_index(
  urls,
  overwrite = FALSE,
  db_connection = NULL,
  disconnect_db = FALSE,
  ...
)

Arguments

urls: A data frame with three columns, with the same name and type as casdb_empty_index_id, or a character vector.
overwrite: Logical, defaults to FALSE. If TRUE, checks if matching data are previously held in the table and overwrites them. This should be used with caution, as it may overwrite completely the selected table.
db_connection: Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).
disconnect_db: Defaults to TRUE. If FALSE, leaves the connection to database open.
...: Passed to cas_get_db_file().

Value

Invisibly returns only new rows added.

Examples


cas_set_options(
  base_folder = fs::path(fs::path_temp(), "R", "castarter_data"),
  db_folder = fs::path(fs::path_temp(), "R", "castarter_data"),
  project = "example_project",
  website = "example_website"
)
cas_enable_db()


urls_df <- cas_build_urls(
  url = "https://www.example.com/news/",
  start_page = 1,
  end_page = 10
)

cas_write_db_index(urls = urls_df)
#> ✔ Urls added to index_id table: 10

cas_read_db_index()
#> # Source:   table<`index_id`> [?? x 3]
#> # Database: sqlite 3.47.1 [/tmp/RtmpSOhy4K/R/castarter_data/cas_example_project_example_website_db.sqlite]
#>       id url                             index_group
#>    <dbl> <chr>                           <chr>      
#>  1     1 https://www.example.com/news/1  index      
#>  2     2 https://www.example.com/news/2  index      
#>  3     3 https://www.example.com/news/3  index      
#>  4     4 https://www.example.com/news/4  index      
#>  5     5 https://www.example.com/news/5  index      
#>  6     6 https://www.example.com/news/6  index      
#>  7     7 https://www.example.com/news/7  index      
#>  8     8 https://www.example.com/news/8  index      
#>  9     9 https://www.example.com/news/9  index      
#> 10    10 https://www.example.com/news/10 index