Package index • castarter

Settings

cas_set_options(): Set key project parameters that determine the folder used for storing project files

cas_set_db(): Set database connection settings for the session

cas_set_db_folder() cas_get_db_folder(): Set folder for storing the database

cas_get_db_settings(): Get database connection settings from the environment

Compress, archive, and backup files

cas_archive(): Archive originals of downloaded files in compressed folders

cas_backup_gd(): Backup files to Google Drive

Caching

cas_check_db_folder(): Checks if database folder exists, if not returns an informative message

cas_check_use_db(): Check caching status in the current session, and override it upon request

cas_connect_to_db(): Return a connection to be used for caching

cas_create_db_folder(): Creates the base folder where castarter stores the project database.

cas_disable_db(): Disable caching for the current session

cas_disconnect_from_db(): Ensure that connection to database is disconnected consistently

cas_enable_db(): Enable caching for the current session

cas_get_db_settings(): Get database connection settings from the environment

cas_read_from_db(): Reads data from local database

cas_set_db(): Set database connection settings for the session

cas_set_db_folder() cas_get_db_folder(): Set folder for storing the database

cas_write_to_db(): Generic function for writing to database

casdb_empty_index_id: Empty data frame with the same format as data stored in the index_id table

Shiny modules

cass_build_urls(): Helps you define the parameters you need for building index urls

cass_combine_into_pattern(): Combines a vector of words into a string to be used for regex matching.

cass_download_csv_app(): A minimal shiny app that demonstrates the functioning of related modules

cass_highlight(): Takes a character vector and returns it with matches of pattern wrapped in html tags used for highlighting

cass_show_ts_dygraph_app(): A minimal shiny app that demonstrates the functioning of related modules

cass_split_string(): Split string into multiple inputs

Other functions

cas_archive(): Archive originals of downloaded files in compressed folders

cas_backup_gd(): Backup files to Google Drive

cas_browse(): Open in a browser a URL stored in the local database

cas_build_urls(): URL builder

cas_check_corpus(): Checks if given corpus exists, and, optionally updates it

cas_check_db_folder(): Checks if database folder exists, if not returns an informative message

cas_check_read_db_contents_data(): Returns a corpus from the contents_data table in the database; if corpus is give, it just returns that instead.

cas_check_use_db(): Check caching status in the current session, and override it upon request

cas_check_website_folder(): Checks if current website folder exists

cas_connect_to_db(): Return a connection to be used for caching

cas_convert_db_type(): Convert database type, e.g. from DuckDB to SQLite

cas_count(): Count strings in a corpus

cas_count_relative(): Count strings in a corpus relative to the number of words

cas_count_total_words(): Count total words in a dataset

cas_create_db_folder(): Creates the base folder where castarter stores the project database.

cas_delete_corpus(): Delete previously stored corpora written with cas_write_corpus().

cas_delete_from_db(): Delete rows from selected database table

cas_disable_db(): Disable caching for the current session

cas_disconnect_from_db(): Ensure that connection to database is disconnected consistently

cas_download(): Downloads files systematically, and stores details about the download in a local database

cas_download_chromote(): Downloads one file at a time with chromote

cas_download_httr(): Downloads one file at a time with httr

cas_download_index(): Downloads index files systematically, and stores details about the download in a local database

cas_download_internal(): Downloads one file at a time with readLines

cas_download_legacy(): Downloads html pages based on a vector of links

cas_enable_db(): Enable caching for the current session

cas_explorer(): Run the Shiny Application

cas_explorer_legacy(): Run the Shiny Application

cas_export_tables(): Export database tables to another format such as csv

cas_extract(): Extract fields and contents from downloaded files

cas_extract_html(): Facilitates extraction of contents from an html file

cas_extract_links(): Extract direct links to individual content pages from index pages

cas_extract_script(): Extracts scripts from an html page

cas_find_extractor(): Facilitate finding extractors, typically to be used with cas_extract_html()

cas_generate_metadata(): Generate basic metadata about the corpus, including start and end date and total number of items available.

cas_get_base_folder(): Get base folder under which files will be stored.

cas_get_base_path(): Build full path to base working folder

cas_get_corpus_path(): Get path to folder where the corpus is stored.

cas_get_db(): Get connection to database with details about current website

cas_get_db_file(): Gets location of database file

cas_get_db_settings(): Get database connection settings from the environment

cas_get_files_to_download(): Create a data frame with not yet downloaded files

cas_get_options(): Get key project parameters that determine the folder used for storing project files

cas_get_path_to_files(): Get path to locally downloaded files

cas_get_urls_df(): Checks that a given input corresponds to the format expected of a download data frame, consistently returns expected format

cas_get_website_folder(): Get folder were files and data related to the current website are stored

cas_ia_check(): Gets an Archive.org Wayback Machine URL

cas_ia_save(): Save a URL the Internet Archive's Wayback Machine

cas_kwic(): Adds a column with n words before and after the selected pattern to see keywords in context

cas_kwic_single_pattern(): Adds a column with n words before and after the selected pattern to see keywords in context

cas_read_corpus(): Read datasets created with cas_write_dataset

cas_read_db_contents_data(): Read contents data from local database

cas_read_db_contents_id(): Read contents from local database

cas_read_db_download(): Read index from local database

cas_read_db_ia(): Read status on the Internet Archive of given URLs

cas_read_db_ignore_id(): Read identifiers to be ignored from the local database

cas_read_db_index(): Read index from local database

cas_read_db_urls(): Read urls stored in the local database

cas_read_from_db(): Reads data from local database

cas_reset_db(): Delete a specific table from database

cas_reset_db_contents_data(): Removes from the local database the folder where extracted data are stored

cas_reset_db_contents_id(): Removes from the local database the folder where links to contents associated with their id are stored

cas_reset_db_ignore_id(): Removes from the local database all identifiers included in the ignore list

cas_reset_db_index_id(): Removes from the local database the table where links to index urls are stored

cas_reset_download_contents(): Delete all files and database records for the contents pages of the current website

cas_reset_download_index(): Delete all files and database records for the index pages of the current website

cas_restore(): Restore files from compressed files

cas_set_db(): Set database connection settings for the session

cas_set_db_folder() cas_get_db_folder(): Set folder for storing the database

cas_set_options(): Set key project parameters that determine the folder used for storing project files

cas_show_barchart_ggiraph(): Creates interacative barchart with ggiraph

cas_show_barchart_ggplot2(): Creates barchart with ggplot2

cas_show_gg_base(): Creates base ggplot2 object to be used by ggplot or ggiraph

cas_show_ts_dygraph(): Create dygraphs based on a data frame typically generated with cas_count()

cas_summarise(): Summarise for a given time period word counts, typically calculatd with cas_count()

cas_update(): Update corpus

cas_write_corpus(): Export the textual dataset for the current website

cas_write_db_contents_data(): Write extracted contents to local database

cas_write_db_contents_id(): Write contents URLs to local database

cas_write_db_ignore_id() cas_ignore_id(): Ignore a set of ids from the download or processing step

cas_write_db_index(): Write index URLs to local database

cas_write_db_urls(): Write index or contents urls directly to the local database

cas_write_to_db(): Generic function for writing to database

Functions interacting with the Internet Archive’s Wayback Machine

cas_ia_check(): Gets an Archive.org Wayback Machine URL

cas_ia_save(): Save a URL the Internet Archive's Wayback Machine