Skip to contents

If some URLs are already included in the database, it appends only the new ones: URLs are expected to be unique.

Usage

cas_write_db_index(
  urls,
  overwrite = FALSE,
  db_connection = NULL,
  disconnect_db = FALSE,
  ...
)

Arguments

urls

A data frame with three columns, with the same name and type as casdb_empty_index_id, or a character vector.

overwrite

Logical, defaults to FALSE. If TRUE, checks if matching data are previously held in the table and overwrites them. This should be used with caution, as it may overwrite completely the selected table.

db_connection

Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).

disconnect_db

Defaults to TRUE. If FALSE, leaves the connection to database open.

...

Passed to cas_get_db_file().

Value

Invisibly returns only new rows added.

Examples


cas_set_options(
  base_folder = fs::path(tempdir(), "R", "castarter_data"),
  db_folder = fs::path(tempdir(), "R", "castarter_data"),
  project = "example_project",
  website = "example_website"
)
cas_enable_db()


urls_df <- cas_build_urls(
  url = "https://www.example.com/news/",
  start_page = 1,
  end_page = 10
)

cas_write_db_index(urls = urls_df)
#>  No new url added to index_id table.

cas_read_db_index()
#> # Source:   table<`index_id`> [?? x 3]
#> # Database: sqlite 3.46.0 [/tmp/Rtmp1QODEK/R/castarter_data/cas_example_project_example_website_db.sqlite]
#>       id url                             index_group
#>    <dbl> <chr>                           <chr>      
#>  1     1 https://www.example.com/news/1  index      
#>  2     2 https://www.example.com/news/2  index      
#>  3     3 https://www.example.com/news/3  index      
#>  4     4 https://www.example.com/news/4  index      
#>  5     5 https://www.example.com/news/5  index      
#>  6     6 https://www.example.com/news/6  index      
#>  7     7 https://www.example.com/news/7  index      
#>  8     8 https://www.example.com/news/8  index      
#>  9     9 https://www.example.com/news/9  index      
#> 10    10 https://www.example.com/news/10 index      
#> # ℹ more rows