Pdfsearch

Brandon LeBeau

11 minute read

I’m happy to formally announce a major update to the developmental version of the pdfsearch R package. In brief, this version includes support for splitting the PDF by sentences instead of by lines of text. Secondly, initial testing of splitting PDFs that are aligned in multiple columns has been promising. This functionality attempts to align the multiple columns into a single column in which the keyword searching peformed by pdfsearch can be stronger with the search being done in context.

Brandon LeBeau

8 minute read

I’m happy to introduce an add-on package, pdfsearch, that adds the ability to do keyword searches on pdf files. This add-on package uses the excellent pdftools package from the ropensci project to read in pdf files and perform keyword searches based character strings of interest. Installation The package is currently only hosted on github and can be installed with the devtools library. devtools::install_github('lebebr01/pdfsearch') Basic Example Doing a simple keyword search on a single pdf file uses the keyword_search function.