Pdf

Brandon LeBeau

11 minute read

I’m happy to formally announce a major update to the developmental version of the pdfsearch R package. In brief, this version includes support for splitting the PDF by sentences instead of by lines of text. Secondly, initial testing of splitting PDFs that are aligned in multiple columns has been promising. This functionality attempts to align the multiple columns into a single column in which the keyword searching peformed by pdfsearch can be stronger with the search being done in context.