Skip to contents
Compress, archive, and backup files
-
cas_archive()
- Archive originals of downloaded files in compressed folders
-
cas_backup_gd()
- Backup files to Google Drive
-
cas_browse()
- Open in a browser a URL stored in the local database
-
cas_build_urls()
- URL builder
-
cas_check_corpus()
- Checks if given corpus exists, and, optionally updates it
-
cas_check_db_folder()
- Checks if database folder exists, if not returns an informative message
-
cas_check_read_db_contents_data()
- Returns a corpus from the
contents_data
table in the database; if corpus is give, it just returns that instead.
-
cas_check_use_db()
- Check caching status in the current session, and override it upon request
-
cas_check_website_folder()
- Checks if current website folder exists
-
cas_connect_to_db()
- Return a connection to be used for caching
-
cas_convert_db_type()
- Convert database type, e.g. from DuckDB to SQLite
-
cas_count()
- Count strings in a corpus
-
cas_count_relative()
- Count strings in a corpus relative to the number of words
-
cas_count_total_words()
- Count total words in a dataset
-
cas_create_db_folder()
- Creates the base folder where
castarter
stores the project database.
-
cas_delete_corpus()
- Delete previously stored corpora written with
cas_write_corpus()
.
-
cas_delete_from_db()
- Delete rows from selected database table
-
cas_disable_db()
- Disable caching for the current session
-
cas_disconnect_from_db()
- Ensure that connection to database is disconnected consistently
-
cas_download()
- Downloads files systematically, and stores details about the download in a local database
-
cas_download_chromote()
- Downloads one file at a time with chromote
-
cas_download_httr()
- Downloads one file at a time with httr
-
cas_download_index()
- Downloads index files systematically, and stores details about the download in a local database
-
cas_download_internal()
- Downloads one file at a time with readLines
-
cas_download_legacy()
- Downloads html pages based on a vector of links
-
cas_enable_db()
- Enable caching for the current session
-
cas_explorer()
- Run the Shiny Application
-
cas_explorer_legacy()
- Run the Shiny Application
-
cas_export_tables()
- Export database tables to another format such as csv
-
cas_extract()
- Extract fields and contents from downloaded files
-
cas_extract_html()
- Facilitates extraction of contents from an html file
-
cas_extract_links()
- Extract direct links to individual content pages from index pages
-
cas_extract_script()
- Extracts scripts from an html page
-
cas_find_extractor()
- Facilitate finding extractors, typically to be used with
cas_extract_html()
-
cas_generate_metadata()
- Generate basic metadata about the corpus, including start and end date and total number of items available.
-
cas_get_base_folder()
- Get base folder under which files will be stored.
-
cas_get_base_path()
- Build full path to base working folder
-
cas_get_corpus_path()
- Get path to folder where the corpus is stored.
-
cas_get_db()
- Get connection to database with details about current website
-
cas_get_db_file()
- Gets location of database file
-
cas_get_db_settings()
- Get database connection settings from the environment
-
cas_get_files_to_download()
- Create a data frame with not yet downloaded files
-
cas_get_options()
- Get key project parameters that determine the folder used for storing project files
-
cas_get_path_to_files()
- Get path to locally downloaded files
-
cas_get_urls_df()
- Checks that a given input corresponds to the format expected of a download data frame, consistently returns expected format
-
cas_get_website_folder()
- Get folder were files and data related to the current website are stored
-
cas_ia_check()
- Gets an Archive.org Wayback Machine URL
-
cas_ia_save()
- Save a URL the Internet Archive's Wayback Machine
-
cas_kwic()
- Adds a column with n words before and after the selected pattern to see keywords in context
-
cas_kwic_single_pattern()
- Adds a column with n words before and after the selected pattern to see keywords in context
-
cas_read_corpus()
- Read datasets created with
cas_write_dataset
-
cas_read_db_contents_data()
- Read contents data from local database
-
cas_read_db_contents_id()
- Read contents from local database
-
cas_read_db_download()
- Read index from local database
-
cas_read_db_ia()
- Read status on the Internet Archive of given URLs
-
cas_read_db_ignore_id()
- Read identifiers to be ignored from the local database
-
cas_read_db_index()
- Read index from local database
-
cas_read_db_urls()
- Read urls stored in the local database
-
cas_read_from_db()
- Reads data from local database
-
cas_reset_db()
- Delete a specific table from database
-
cas_reset_db_contents_data()
- Removes from the local database the folder where extracted data are stored
-
cas_reset_db_contents_id()
- Removes from the local database the folder where links to contents associated with their id are stored
-
cas_reset_db_ignore_id()
- Removes from the local database all identifiers included in the ignore list
-
cas_reset_db_index_id()
- Removes from the local database the table where links to index urls are stored
-
cas_reset_download_contents()
- Delete all files and database records for the contents pages of the current website
-
cas_reset_download_index()
- Delete all files and database records for the index pages of the current website
-
cas_restore()
- Restore files from compressed files
-
cas_set_db()
- Set database connection settings for the session
-
cas_set_db_folder()
cas_get_db_folder()
- Set folder for storing the database
-
cas_set_options()
- Set key project parameters that determine the folder used for storing project files
-
cas_show_barchart_ggiraph()
- Creates interacative barchart with ggiraph
-
cas_show_barchart_ggplot2()
- Creates barchart with ggplot2
-
cas_show_gg_base()
- Creates base ggplot2 object to be used by ggplot or ggiraph
-
cas_show_ts_dygraph()
- Create dygraphs based on a data frame typically generated with cas_count()
-
cas_summarise()
- Summarise for a given time period word counts, typically calculatd with
cas_count()
-
cas_update()
- Update corpus
-
cas_write_corpus()
- Export the textual dataset for the current website
-
cas_write_db_contents_data()
- Write extracted contents to local database
-
cas_write_db_contents_id()
- Write contents URLs to local database
-
cas_write_db_ignore_id()
cas_ignore_id()
- Ignore a set of ids from the download or processing step
-
cas_write_db_index()
- Write index URLs to local database
-
cas_write_db_urls()
- Write index or contents urls directly to the local database
-
cas_write_to_db()
- Generic function for writing to database
Functions interacting with the Internet Archive’s Wayback Machine