Facilitates extraction of contents from an html file
Source:R/cas_extract_html_custom.R
cas_extract_html_custom.RdFacilitates extraction of contents from an html file
Usage
cas_extract_html_custom(
html_document,
container,
container_type,
container_match,
attribute = NULL,
sub_element = NULL
)Arguments
- html_document
An html document parsed with
xml2::read_html()orrvest::read_html().- container
Defaults to
NULL. Type of html container from where links are to be extracted, such as "div", "ul", and others. Eithercontainer_classorcontainer_idmust also be provided.- container_match
String to be used for filtering nodes in combination with
container_type.- attribute
Defaults to
NULL. If given, type of attribute to extract. Typically used in combination with container, as incas_extract_html(container = "time", attribute = "datetime").- sub_element
Defaults to
NULL. If provided, alsocontainermust be given. Only text within elements of given type under the chosen combination of container/containerClass will be extracted. When given, it will tipically be "p", to extract all p elements inside the selected div.