Skip to contents

Settings

cas_set_options()
Set key project parameters that determine the folder used for storing project files
cas_set_db()
Set database connection settings for the session
cas_set_db_folder() cas_get_db_folder()
Set folder for storing the database
cas_get_db_settings()
Get database connection settings from the environment

Compress, archive, and backup files

cas_archive()
Archive originals of downloaded files in compressed folders
cas_backup_gd()
Backup files to Google Drive

Caching

cas_check_db_folder()
Checks if database folder exists, if not returns an informative message
cas_check_use_db()
Check caching status in the current session, and override it upon request
cas_connect_to_db()
Return a connection to be used for caching
cas_create_db_folder()
Creates the base folder where castarter stores the project database.
cas_disable_db()
Disable caching for the current session
cas_disconnect_from_db()
Ensure that connection to database is disconnected consistently
cas_enable_db()
Enable caching for the current session
cas_get_db_settings()
Get database connection settings from the environment
cas_read_from_db()
Reads data from local database
cas_set_db()
Set database connection settings for the session
cas_set_db_folder() cas_get_db_folder()
Set folder for storing the database
cas_write_to_db()
Generic function for writing to database
casdb_empty_index_id
Empty data frame with the same format as data stored in the index_id table

Shiny modules

cass_build_urls()
Helps you define the parameters you need for building index urls
cass_combine_into_pattern()
Combines a vector of words into a string to be used for regex matching.
cass_download_csv_app()
A minimal shiny app that demonstrates the functioning of related modules
cass_highlight()
Takes a character vector and returns it with matches of pattern wrapped in html tags used for highlighting
cass_show_ts_dygraph_app()
A minimal shiny app that demonstrates the functioning of related modules
cass_split_string()
Split string into multiple inputs

Other functions

cas_archive()
Archive originals of downloaded files in compressed folders
cas_backup_gd()
Backup files to Google Drive
cas_browse()
Open in a browser a URL stored in the local database
cas_build_urls()
URL builder
cas_check_corpus()
Checks if given corpus exists, and, optionally updates it
cas_check_db_folder()
Checks if database folder exists, if not returns an informative message
cas_check_read_db_contents_data()
Returns a corpus from the contents_data table in the database; if corpus is give, it just returns that instead.
cas_check_use_db()
Check caching status in the current session, and override it upon request
cas_check_website_folder()
Checks if current website folder exists
cas_connect_to_db()
Return a connection to be used for caching
cas_convert_db_type()
Convert database type, e.g. from DuckDB to SQLite
cas_count()
Count strings in a corpus
cas_count_relative()
Count strings in a corpus relative to the number of words
cas_count_total_words()
Count total words in a dataset
cas_create_db_folder()
Creates the base folder where castarter stores the project database.
cas_delete_corpus()
Delete previously stored corpora written with cas_write_corpus().
cas_delete_from_db()
Delete rows from selected database table
cas_disable_db()
Disable caching for the current session
cas_disconnect_from_db()
Ensure that connection to database is disconnected consistently
cas_download()
Downloads files systematically, and stores details about the download in a local database
cas_download_chromote()
Downloads one file at a time with chromote
cas_download_httr()
Downloads one file at a time with httr
cas_download_index()
Downloads index files systematically, and stores details about the download in a local database
cas_download_internal()
Downloads one file at a time with readLines
cas_download_legacy()
Downloads html pages based on a vector of links
cas_enable_db()
Enable caching for the current session
cas_explorer()
Run the Shiny Application
cas_explorer_legacy()
Run the Shiny Application
cas_export_tables()
Export database tables to another format such as csv
cas_extract()
Extract fields and contents from downloaded files
cas_extract_html()
Facilitates extraction of contents from an html file
cas_extract_links()
Extract direct links to individual content pages from index pages
cas_extract_script()
Extracts scripts from an html page
cas_find_extractor()
Facilitate finding extractors, typically to be used with cas_extract_html()
cas_generate_metadata()
Generate basic metadata about the corpus, including start and end date and total number of items available.
cas_get_base_folder()
Get base folder under which files will be stored.
cas_get_base_path()
Build full path to base working folder
cas_get_corpus_path()
Get path to folder where the corpus is stored.
cas_get_db()
Get connection to database with details about current website
cas_get_db_file()
Gets location of database file
cas_get_db_settings()
Get database connection settings from the environment
cas_get_files_to_download()
Create a data frame with not yet downloaded files
cas_get_options()
Get key project parameters that determine the folder used for storing project files
cas_get_path_to_files()
Get path to locally downloaded files
cas_get_urls_df()
Checks that a given input corresponds to the format expected of a download data frame, consistently returns expected format
cas_get_website_folder()
Get folder were files and data related to the current website are stored
cas_ia_check()
Gets an Archive.org Wayback Machine URL
cas_ia_save()
Save a URL the Internet Archive's Wayback Machine
cas_kwic()
Adds a column with n words before and after the selected pattern to see keywords in context
cas_kwic_single_pattern()
Adds a column with n words before and after the selected pattern to see keywords in context
cas_read_corpus()
Read datasets created with cas_write_dataset
cas_read_db_contents_data()
Read contents data from local database
cas_read_db_contents_id()
Read contents from local database
cas_read_db_download()
Read index from local database
cas_read_db_ia()
Read status on the Internet Archive of given URLs
cas_read_db_ignore_id()
Read identifiers to be ignored from the local database
cas_read_db_index()
Read index from local database
cas_read_db_urls()
Read urls stored in the local database
cas_read_from_db()
Reads data from local database
cas_reset_db()
Delete a specific table from database
cas_reset_db_contents_data()
Removes from the local database the folder where extracted data are stored
cas_reset_db_contents_id()
Removes from the local database the folder where links to contents associated with their id are stored
cas_reset_db_ignore_id()
Removes from the local database all identifiers included in the ignore list
cas_reset_db_index_id()
Removes from the local database the table where links to index urls are stored
cas_reset_download_contents()
Delete all files and database records for the contents pages of the current website
cas_reset_download_index()
Delete all files and database records for the index pages of the current website
cas_restore()
Restore files from compressed files
cas_set_db()
Set database connection settings for the session
cas_set_db_folder() cas_get_db_folder()
Set folder for storing the database
cas_set_options()
Set key project parameters that determine the folder used for storing project files
cas_show_barchart_ggiraph()
Creates interacative barchart with ggiraph
cas_show_barchart_ggplot2()
Creates barchart with ggplot2
cas_show_gg_base()
Creates base ggplot2 object to be used by ggplot or ggiraph
cas_show_ts_dygraph()
Create dygraphs based on a data frame typically generated with cas_count()
cas_summarise()
Summarise for a given time period word counts, typically calculatd with cas_count()
cas_update()
Update corpus
cas_write_corpus()
Export the textual dataset for the current website
cas_write_db_contents_data()
Write extracted contents to local database
cas_write_db_contents_id()
Write contents URLs to local database
cas_write_db_ignore_id() cas_ignore_id()
Ignore a set of ids from the download or processing step
cas_write_db_index()
Write index URLs to local database
cas_write_db_urls()
Write index or contents urls directly to the local database
cas_write_to_db()
Generic function for writing to database

Functions interacting with the Internet Archive’s Wayback Machine

cas_ia_check()
Gets an Archive.org Wayback Machine URL
cas_ia_save()
Save a URL the Internet Archive's Wayback Machine