If some URLs are already included in the database, it appends only the new ones: URLs are expected to be unique.
Usage
cas_write_db_index(
urls,
overwrite = FALSE,
db_connection = NULL,
disconnect_db = FALSE,
...
)
Arguments
- urls
A data frame with three columns, with the same name and type as
casdb_empty_index_id
, or a character vector.- overwrite
Logical, defaults to FALSE. If TRUE, checks if matching data are previously held in the table and overwrites them. This should be used with caution, as it may overwrite completely the selected table.
- db_connection
Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).
- disconnect_db
Defaults to TRUE. If FALSE, leaves the connection to database open.
- ...
Passed to
cas_get_db_file()
.
Examples
cas_set_options(
base_folder = fs::path(tempdir(), "R", "castarter_data"),
db_folder = fs::path(tempdir(), "R", "castarter_data"),
project = "example_project",
website = "example_website"
)
cas_enable_db()
urls_df <- cas_build_urls(
url = "https://www.example.com/news/",
start_page = 1,
end_page = 10
)
cas_write_db_index(urls = urls_df)
#> ℹ No new url added to index_id table.
cas_read_db_index()
#> # Source: table<`index_id`> [?? x 3]
#> # Database: sqlite 3.46.0 [/tmp/Rtmp1QODEK/R/castarter_data/cas_example_project_example_website_db.sqlite]
#> id url index_group
#> <dbl> <chr> <chr>
#> 1 1 https://www.example.com/news/1 index
#> 2 2 https://www.example.com/news/2 index
#> 3 3 https://www.example.com/news/3 index
#> 4 4 https://www.example.com/news/4 index
#> 5 5 https://www.example.com/news/5 index
#> 6 6 https://www.example.com/news/6 index
#> 7 7 https://www.example.com/news/7 index
#> 8 8 https://www.example.com/news/8 index
#> 9 9 https://www.example.com/news/9 index
#> 10 10 https://www.example.com/news/10 index
#> # ℹ more rows