Skip to contents

If some URLs are already included in the database, it appends only the new ones: URLs are expected to be unique.

Usage

cas_write_db_contents_id(
  urls,
  overwrite = FALSE,
  db_connection = NULL,
  disconnect_db = FALSE,
  quiet = FALSE,
  check_previous = TRUE,
  ...
)

Arguments

urls

A data frame with five columns, such as casdb_empty_contents_id, or a character vector.

overwrite

Logical, defaults to FALSE. If TRUE, checks if matching data are previously held in the table and overwrites them. This should be used with caution, as it may overwrite completely the selected table.

db_connection

Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).

disconnect_db

Defaults to TRUE. If FALSE, leaves the connection to database open.

quiet

Defaults to FALSE. If set to TRUE, messages on number of lines added are not shown.

check_previous

Defaults to TRUE. If set to FALSE, the given input is stored in the database without checking if the same url had already been stored.

...

Passed to cas_get_db_file().

Value

Invisibly returns only new rows added.

Examples


cas_set_options(
  base_folder = fs::path(tempdir(), "R", "castarter_data"),
  db_folder = fs::path(tempdir(), "R", "castarter_data"),
  project = "example_project",
  website = "example_website"
)
cas_enable_db()


urls_df <- cas_build_urls(
  url = "https://www.example.com/news/",
  start_page = 1,
  end_page = 10
)

cas_write_db_contents_id(urls = urls_df)
#> Error in cas_write_to_db(df = links_to_add_df, table = "contents_id",     overwrite = overwrite, db_connection = db, disconnect_db = FALSE,     ...): Incompatible data frame passed to `contents_id`.

cas_read_db_contents_id()
#> # A tibble: 0 × 5
#> # ℹ 5 variables: id <dbl>, url <chr>, link_text <chr>, source_index_id <dbl>,
#> #   source_index_batch <dbl>