Introduction of the pdfsearch package

Pdfsearch

Package

Author

Brandon LeBeau

Published

December 2, 2016

I’m happy to introduce an add-on package, pdfsearch, that adds the ability to do keyword searches on pdf files. This add-on package uses the excellent pdftools package from the ropensci project to read in pdf files and perform keyword searches based character strings of interest.

Installation

The package is currently only hosted on github and can be installed with the devtools library.

devtools::install_github('lebebr01/pdfsearch')

Basic Example

Doing a simple keyword search on a single pdf file uses the keyword_search function. The following is a simple example using a pdf from arXiv.

library(pdfsearch)

file <- system.file('pdf', '1501.00450.pdf', package = 'pdfsearch')

key_res <- keyword_search(file, 
                          keyword = c('repeated measures', 'mixed effects'),
                          path = TRUE)

In the following example, the function keyword_search takes two required arguments, the path to the pdf file and the keyword(s) to search for in the pdf. The optional argument shown above, path tells the function to read in the raw pdf using the pdftools package.

data.frame(key_res)

            keyword page_num line_num
1 repeated measures        1        9
2 repeated measures        2       31
3 repeated measures        2       58
4 repeated measures        2       60
5 repeated measures        3       70
6 repeated measures        6      169
7 repeated measures        6      180
8 repeated measures        6      185
9 repeated measures        9      315
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                line_text
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Not introduce more sophisticated experimental designs, specifi-           only would we miss potentially beneficial effects, we may also cally the repeated measures design, including the crossover           get false confidence about lack of negative effects. 
2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             We also discuss practical considfast iterations and testing many ideas can reap the most           erations to repeated measures design, with variants to the rewards.                                                           crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). 
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        To facilitate our illustration, in all the derivation repeated measures design in different stages of treatment          in this section we assume all users appear in all periods, assignment. 
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             We also restrict ourselves ing the repeated measures analysis, reporting a “per week”         to metrics that are defined as simple average and assume treatment effect, as show in row 3 “parallel” design in ta-        treatment and control have the same sample size. 
5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   This way, various In fact, the crossover design is a type of repeated measures       designs considered can be examined in the same framework design commonly used in biomedical research to control for         and easily compared.
6                                                                                                                                                                                                                     FLEXIBLE AND SCALABLE REPEATED One way to see measurements are not missing at random is                MEASURES ANALYSIS VIA FORME to realize infrequent users are more likely to have missing         5.1 Review of Existing Methods values and the absence in a specific time window can still          It is common to analyze data from repeated measures design provide information on the user behavior and in reality there       with the repeated measures ANOVA model and the F-test, might be other factors causing user to be missing that are          under certain assumptions, such as normality, sphericity (honot even observed. 
7 \022P            P            \023          In our cases they are indicators of treatment assignment, k Xik Pk0 Xi k 0 0 Cov(Xi , Xi ) = Cov 0          P        ,                         periods of the measurement, user id, and any other covariate. k Iik     k 0 Ii k 0 0 \022           \023                         As an example, one possible model for repeated measures Xi Xi0                              using lme4’s formula syntax (Bates et al. 2012a;b) is = Cov       , Ii Ii0                                   Y ∼ 1 + IsT reatment + P eriod + (1|U serID), where the last equality is by dividing both numerator and de-       where the only difference of this model to the usual linnominator by the same total number of users who have ever           ear model behind two sample test is the extra random efappeared in the experiments. 
8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    In repeated measures data, users might appear in book treatment of the delta-method.                                 multiple periods, represented as multiple rows in the dataset. 
9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    PRACTICAL CONSIDERATIONS                                          ryover effect, the re-randomized design enables us to At the design stage, we face a few choices under the same               measure it directly and should be used here. framework of repeated measures design. 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 token_text
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                not, introduce, more, sophisticated, experimental, designs, specifi, only, would, we, miss, potentially, beneficial, effects, we, may, also, cally, the, repeated, measures, design, including, the, crossover, get, false, confidence, about, lack, of, negative, effects
2                                                                                                                                                                                                                                                                                                                                                                                                                                   we, also, discuss, practical, considfast, iterations, and, testing, many, ideas, can, reap, the, most, erations, to, repeated, measures, design, with, variants, to, the, rewards, crossover, design, to, study, the, carry, over, effect, including, the, re, randomized, design, row, 5, in, table, 1
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            to, facilitate, our, illustration, in, all, the, derivation, repeated, measures, design, in, different, stages, of, treatment, in, this, section, we, assume, all, users, appear, in, all, periods, assignment
4                                                                                                                                                                                                                                                                                                                                                                                                                                              we, also, restrict, ourselves, ing, the, repeated, measures, analysis, reporting, a, per, week, to, metrics, that, are, defined, as, simple, average, and, assume, treatment, effect, as, show, in, row, 3, parallel, design, in, ta, treatment, and, control, have, the, same, sample, size
5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    this, way, various, in, fact, the, crossover, design, is, a, type, of, repeated, measures, designs, considered, can, be, examined, in, the, same, framework, design, commonly, used, in, biomedical, research, to, control, for, and, easily, compared
6                                             flexible, and, scalable, repeated, one, way, to, see, measurements, are, not, missing, at, random, is, measures, analysis, via, forme, to, realize, infrequent, users, are, more, likely, to, have, missing, 5.1, review, of, existing, methods, values, and, the, absence, in, a, specific, time, window, can, still, it, is, common, to, analyze, data, from, repeated, measures, design, provide, information, on, the, user, behavior, and, in, reality, there, with, the, repeated, measures, anova, model, and, the, f, test, might, be, other, factors, causing, user, to, be, missing, that, are, under, certain, assumptions, such, as, normality, sphericity, honot, even, observed
7 p, p, in, our, cases, they, are, indicators, of, treatment, assignment, k, xik, pk0, xi, k, 0, 0, cov, xi, xi, cov, 0, p, periods, of, the, measurement, user, id, and, any, other, covariate, k, iik, k, 0, ii, k, 0, 0, as, an, example, one, possible, model, for, repeated, measures, xi, xi0, using, lme4, s, formula, syntax, bates, et, al, 2012a, b, is, cov, ii, ii0, y, 1, ist, reatment, p, eriod, 1, u, serid, where, the, last, equality, is, by, dividing, both, numerator, and, de, where, the, only, difference, of, this, model, to, the, usual, linnominator, by, the, same, total, number, of, users, who, have, ever, ear, model, behind, two, sample, test, is, the, extra, random, efappeared, in, the, experiments
8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     in, repeated, measures, data, users, might, appear, in, book, treatment, of, the, delta, method, multiple, periods, represented, as, multiple, rows, in, the, dataset
9                                                                                                                                                                                                                                                                                                                                                                                                                                                                               practical, considerations, ryover, effect, the, re, randomized, design, enables, us, to, at, the, design, stage, we, face, a, few, choices, under, the, same, measure, it, directly, and, should, be, used, here, framework, of, repeated, measures, design

head(key_res$line_text, n = 2)

[[1]]
[1] "Not introduce more sophisticated experimental designs, specifi-           only would we miss potentially beneficial effects, we may also cally the repeated measures design, including the crossover           get false confidence about lack of negative effects. "

[[2]]
[1] "We also discuss practical considfast iterations and testing many ideas can reap the most           erations to repeated measures design, with variants to the rewards.                                                           crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). "

The output includes the keyword, the page number it is located, the line number the keyword was found, and the line of text. By default, only the line matching the keyword is returned. If the context of the result is desired, there is an optional argument surround_lines that can include the lines around the line of the matching keyword.

key_res <- keyword_search(file, 
                          keyword = c('repeated measures', 'mixed effects'),
                          path = TRUE, 
                          surround_lines = 2)
head(key_res$line_text, n = 2)

[[1]]
[1] "This limits the number of candidate variations              is, we wish to be able to detect the effect when there is any. to be evaluated, and the speed new feature iterations. "                                                                                  
[2] "We             Running under powered experiments have many perils. "                                                                                                                                                                                                 
[3] "Not introduce more sophisticated experimental designs, specifi-           only would we miss potentially beneficial effects, we may also cally the repeated measures design, including the crossover           get false confidence about lack of negative effects. "
[4] "Statistical design and related variants, to increase KPI sensitivity with         power increases with larger effect size, and smaller variances. the same traffic size and duration of experiment. "                                                                
[5] "In this pa-         Let us look at these aspects in turn. per we present FORME (Flexible Online Repeated Measures Experiment), a flexible and scalable framework for these de-          While the actual effect size from a potential new feature may signs. "       

[[2]]
[1] "This poses"                                                                                                                                                                                                                                                                                                                                 
[2] "a limitation to any online experimentation platform, where         within-subject variation. "                                                                                                                                                                                                                                              
[3] "We also discuss practical considfast iterations and testing many ideas can reap the most           erations to repeated measures design, with variants to the rewards.                                                           crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). "
[4] "1.1    Motivation To improve sensitivity of measurement, apart from accurate         1.2     Main Contributions implementation and increase sample size and duration, we           In this paper, we propose a framework called FORME (Flexcan employ statistical methods to reduce variance. "                                             
[5] "Using           ible Online Repeated Measures Experiment). "

Directory Search

This package also has the ability to loop over a directory of pdf files in a single run. To do this, the keyword_directory function is of interest. Much of the arguments are the same, except a directory is specified instead of a single path to the location of the pdf files.

# find directory
directory <- system.file('pdf', package = 'pdfsearch')

# do search over two files
head(keyword_directory(directory, 
       keyword = c('repeated measures', 'measurement error'),
       surround_lines = 1, full_names = TRUE), n = 12)

   ID       pdf_name           keyword page_num line_num
1   1 1501.00450.pdf repeated measures        1        9
2   1 1501.00450.pdf repeated measures        2       31
3   1 1501.00450.pdf repeated measures        2       58
4   1 1501.00450.pdf repeated measures        2       60
5   1 1501.00450.pdf repeated measures        3       70
6   1 1501.00450.pdf repeated measures        6      169
7   1 1501.00450.pdf repeated measures        6      180
8   1 1501.00450.pdf repeated measures        6      185
9   1 1501.00450.pdf repeated measures        9      315
10  2 1610.00147.pdf measurement error        1        2
11  2 1610.00147.pdf measurement error        1       10
12  2 1610.00147.pdf measurement error        1       12
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     line_text
1                                                                                                                                                                                                                                                                                                                                                                                                                                                              We             Running under powered experiments have many perils. , Not introduce more sophisticated experimental designs, specifi-           only would we miss potentially beneficial effects, we may also cally the repeated measures design, including the crossover           get false confidence about lack of negative effects. , Statistical design and related variants, to increase KPI sensitivity with         power increases with larger effect size, and smaller variances. the same traffic size and duration of experiment. 
2                                                                                                                                                                                                                                                                   a limitation to any online experimentation platform, where         within-subject variation. , We also discuss practical considfast iterations and testing many ideas can reap the most           erations to repeated measures design, with variants to the rewards.                                                           crossover design to study the carry over effect, including the “re-randomized” design (row 5 in table 1). , 1.1    Motivation To improve sensitivity of measurement, apart from accurate         1.2     Main Contributions implementation and increase sample size and duration, we           In this paper, we propose a framework called FORME (Flexcan employ statistical methods to reduce variance. 
3                                                                                                                                                                                                                                                                                                                                                                                                                                                        In the Table 1: Repeated Measures Designs                        following section we assume the minimum experimentation “period” to be one full week, and may extend to up to two In this paper we extend the idea further by employing the          weeks. , To facilitate our illustration, in all the derivation repeated measures design in different stages of treatment          in this section we assume all users appear in all periods, assignment. , The traditional A/B test can be analyzed us-           i.e. no missing measurement. 
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             The traditional A/B test can be analyzed us-           i.e. no missing measurement. , We also restrict ourselves ing the repeated measures analysis, reporting a “per week”         to metrics that are defined as simple average and assume treatment effect, as show in row 3 “parallel” design in ta-        treatment and control have the same sample size. , We furble 1. 
5                                                                                                                                                                                                                                                                                                This way              average treatment effect (ATE) δ = µT − µC which is a each user serves as his/her own control in the measurement.        fixed effects in the model in this section. , This way, various In fact, the crossover design is a type of repeated measures       designs considered can be examined in the same framework design commonly used in biomedical research to control for         and easily compared., We will proceed to show, with theoretical derivations, that        2.1    Two Sample T-test given the same total traffic                                       Let X denote the observed average metric value in control group and Y denote that in the treatment group. 
6             5.  , FLEXIBLE AND SCALABLE REPEATED One way to see measurements are not missing at random is                MEASURES ANALYSIS VIA FORME to realize infrequent users are more likely to have missing         5.1 Review of Existing Methods values and the absence in a specific time window can still          It is common to analyze data from repeated measures design provide information on the user behavior and in reality there       with the repeated measures ANOVA model and the F-test, might be other factors causing user to be missing that are          under certain assumptions, such as normality, sphericity (honot even observed. , Instead of throwing away data points             mogeneity of variances in differences between each pair of where user appeared in only one period and is exposed to            within-subject values), equal time points between subjects, only one of the two treatments, in practice, we included an         and no missing data. 
7  X and Z are covariates in the model. , \022P            P            \023          In our cases they are indicators of treatment assignment, k Xik Pk0 Xi k 0 0 Cov(Xi , Xi ) = Cov 0          P        ,                         periods of the measurement, user id, and any other covariate. k Iik     k 0 Ii k 0 0 \022           \023                         As an example, one possible model for repeated measures Xi Xi0                              using lme4’s formula syntax (Bates et al. 2012a;b) is = Cov       , Ii Ii0                                   Y ∼ 1 + IsT reatment + P eriod + (1|U serID), where the last equality is by dividing both numerator and de-       where the only difference of this model to the usual linnominator by the same total number of users who have ever           ear model behind two sample test is the extra random efappeared in the experiments. , Thanks to the central limit            fect(clustered by UserID) to model user “baseline”. 
8                                                                                                                                                                                                                                                                                                                                                                                                                                                                           (2013, Appendix B) for            Random effect makes modeling within-subject variability a similar example; also see (Van der Vaart 2000) for a text         possible. , In repeated measures data, users might appear in book treatment of the delta-method.                                 multiple periods, represented as multiple rows in the dataset. , As a result, rows of the dataset are not independent but 4.2    Metrics Beyond Average                                       with dependencies clustered by user. 
9                                                                                                                                                                                                                                                                                                                                                                                                                                           • Re-randomized: If we suspect the presence of car7. , PRACTICAL CONSIDERATIONS                                          ryover effect, the re-randomized design enables us to At the design stage, we face a few choices under the same               measure it directly and should be used here. framework of repeated measures design. , Experimenters should           • Wash-out and decide: If we have little informause domain knowledge and past experiments to inform the                 tion to judge carry over effect, we can run the first design. 
10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Data Fusion for Correcting Measurement Errors Tracy Schifeling, Jerome P. , Reiter, Maria DeYoreo∗ arXiv:1610.00147v1 [stat.ME] 1 Oct 2016 Abstract Often in surveys, key items are subject to measurement errors. , Given just the data, it can be difficult to determine the distribution of this error process, and hence to obtain accurate inferences that involve the error-prone variables. 
11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     In doing so, we account for the informative sampling design used to select the National Survey of College Graduates. , We also present a process for assessing the sensitivity of various analyses to different choices for the measurement error models. , Supplemental material is available online. 
12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Supplemental material is available online. , KEY WORDS: fusion, imputation, measurement error, missing, survey. , ∗ This research was supported by The National Science Foundation under award SES-11-31897. 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          token_text
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        we, running, under, powered, experiments, have, many, perils, not, introduce, more, sophisticated, experimental, designs, specifi, only, would, we, miss, potentially, beneficial, effects, we, may, also, cally, the, repeated, measures, design, including, the, crossover, get, false, confidence, about, lack, of, negative, effects, statistical, design, and, related, variants, to, increase, kpi, sensitivity, with, power, increases, with, larger, effect, size, and, smaller, variances, the, same, traffic, size, and, duration, of, experiment
2                                                                                                                                                                                                                                                                                                                                         a, limitation, to, any, online, experimentation, platform, where, within, subject, variation, we, also, discuss, practical, considfast, iterations, and, testing, many, ideas, can, reap, the, most, erations, to, repeated, measures, design, with, variants, to, the, rewards, crossover, design, to, study, the, carry, over, effect, including, the, re, randomized, design, row, 5, in, table, 1, 1.1, motivation, to, improve, sensitivity, of, measurement, apart, from, accurate, 1.2, main, contributions, implementation, and, increase, sample, size, and, duration, we, in, this, paper, we, propose, a, framework, called, forme, flexcan, employ, statistical, methods, to, reduce, variance
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  in, the, table, 1, repeated, measures, designs, following, section, we, assume, the, minimum, experimentation, period, to, be, one, full, week, and, may, extend, to, up, to, two, in, this, paper, we, extend, the, idea, further, by, employing, the, weeks, to, facilitate, our, illustration, in, all, the, derivation, repeated, measures, design, in, different, stages, of, treatment, in, this, section, we, assume, all, users, appear, in, all, periods, assignment, the, traditional, a, b, test, can, be, analyzed, us, i.e, no, missing, measurement
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    the, traditional, a, b, test, can, be, analyzed, us, i.e, no, missing, measurement, we, also, restrict, ourselves, ing, the, repeated, measures, analysis, reporting, a, per, week, to, metrics, that, are, defined, as, simple, average, and, assume, treatment, effect, as, show, in, row, 3, parallel, design, in, ta, treatment, and, control, have, the, same, sample, size, we, furble, 1
5                                                                                                                                                                                                                                                                                                                                   this, way, average, treatment, effect, ate, δ, µt, µc, which, is, a, each, user, serves, as, his, her, own, control, in, the, measurement, fixed, effects, in, the, model, in, this, section, this, way, various, in, fact, the, crossover, design, is, a, type, of, repeated, measures, designs, considered, can, be, examined, in, the, same, framework, design, commonly, used, in, biomedical, research, to, control, for, and, easily, compared, we, will, proceed, to, show, with, theoretical, derivations, that, 2.1, two, sample, t, test, given, the, same, total, traffic, let, x, denote, the, observed, average, metric, value, in, control, group, and, y, denote, that, in, the, treatment, group
6  5, flexible, and, scalable, repeated, one, way, to, see, measurements, are, not, missing, at, random, is, measures, analysis, via, forme, to, realize, infrequent, users, are, more, likely, to, have, missing, 5.1, review, of, existing, methods, values, and, the, absence, in, a, specific, time, window, can, still, it, is, common, to, analyze, data, from, repeated, measures, design, provide, information, on, the, user, behavior, and, in, reality, there, with, the, repeated, measures, anova, model, and, the, f, test, might, be, other, factors, causing, user, to, be, missing, that, are, under, certain, assumptions, such, as, normality, sphericity, honot, even, observed, instead, of, throwing, away, data, points, mogeneity, of, variances, in, differences, between, each, pair, of, where, user, appeared, in, only, one, period, and, is, exposed, to, within, subject, values, equal, time, points, between, subjects, only, one, of, the, two, treatments, in, practice, we, included, an, and, no, missing, data
7                                                                                                                                                                     x, and, z, are, covariates, in, the, model, p, p, in, our, cases, they, are, indicators, of, treatment, assignment, k, xik, pk0, xi, k, 0, 0, cov, xi, xi, cov, 0, p, periods, of, the, measurement, user, id, and, any, other, covariate, k, iik, k, 0, ii, k, 0, 0, as, an, example, one, possible, model, for, repeated, measures, xi, xi0, using, lme4, s, formula, syntax, bates, et, al, 2012a, b, is, cov, ii, ii0, y, 1, ist, reatment, p, eriod, 1, u, serid, where, the, last, equality, is, by, dividing, both, numerator, and, de, where, the, only, difference, of, this, model, to, the, usual, linnominator, by, the, same, total, number, of, users, who, have, ever, ear, model, behind, two, sample, test, is, the, extra, random, efappeared, in, the, experiments, thanks, to, the, central, limit, fect, clustered, by, userid, to, model, user, baseline
8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             2013, appendix, b, for, random, effect, makes, modeling, within, subject, variability, a, similar, example, also, see, van, der, vaart, 2000, for, a, text, possible, in, repeated, measures, data, users, might, appear, in, book, treatment, of, the, delta, method, multiple, periods, represented, as, multiple, rows, in, the, dataset, as, a, result, rows, of, the, dataset, are, not, independent, but, 4.2, metrics, beyond, average, with, dependencies, clustered, by, user
9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         re, randomized, if, we, suspect, the, presence, of, car7, practical, considerations, ryover, effect, the, re, randomized, design, enables, us, to, at, the, design, stage, we, face, a, few, choices, under, the, same, measure, it, directly, and, should, be, used, here, framework, of, repeated, measures, design, experimenters, should, wash, out, and, decide, if, we, have, little, informause, domain, knowledge, and, past, experiments, to, inform, the, tion, to, judge, carry, over, effect, we, can, run, the, first, design
10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      data, fusion, for, correcting, measurement, errors, tracy, schifeling, jerome, p, reiter, maria, deyoreo, arxiv, 1610.00147v1, stat.me, 1, oct, 2016, abstract, often, in, surveys, key, items, are, subject, to, measurement, errors, given, just, the, data, it, can, be, difficult, to, determine, the, distribution, of, this, error, process, and, hence, to, obtain, accurate, inferences, that, involve, the, error, prone, variables
11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         in, doing, so, we, account, for, the, informative, sampling, design, used, to, select, the, national, survey, of, college, graduates, we, also, present, a, process, for, assessing, the, sensitivity, of, various, analyses, to, different, choices, for, the, measurement, error, models, supplemental, material, is, available, online
12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          supplemental, material, is, available, online, key, words, fusion, imputation, measurement, error, missing, survey, this, research, was, supported, by, the, national, science, foundation, under, award, ses, 11, 31897

Two relavent arguments for the keyword_directory function are full_names and recursive. These functions ask whether the full path for the pdf files in the directory will be used and whether subfolders within the directory will also be searched.

Uses for pdfsearch

This package may be extremely useful when conducting research syntheses or meta analyses, particularly when screening articles for inclusion into the research synthesis or meta analysis. This aim is hopeful to be explored later in more depth.

Limitations

The limitations of the package and the quality of text matches will depend on the pdfs being searched. For example, words that wrap across lines (i.e. hyphenated words) will not be included in the matches as entire words are currently being searched to be matched.

Moving Forward

The package will be submitted to CRAN next week, however, any bugs or problems can be submitted to the github site https://github.com/lebebr01/pdfsearch/issues.