<h1>Evolution of Statistical Software and Quantitative Methods</h1>
<h2>Brandon LeBeau & Ariel M. Aloe</h2>
<h3>University of Iowa</h3>
# Rationale
- Extension of work done by Robert Muenchen (http://r4stats.com/articles/popularity/).
+ Focus is on statistical software
+ Addition of quantitative methods
+ Particularly interested in the interaction.
- Questions of Interest:
+ Exploring which software is popular in published research.
+ How many empirical analyses cite software?
+ Any patterns in software use with quantitative methods?
# Methods
- Research synthesis methods were used
- Web of Knowledge was used to pull in citations for 12 social science journals
- 1995 to 2018
- EndNote's "Find Text" feature was used to pull in PDFs of all articles from the journals
- `pdfsearch` R package was used to perform keyword searching.
# `pdfsearch`
- Converts lines of text into sentences
- Removes multiple columns into a single column
- Identifies location of keyword found within document.
# `pdfsearch`
- Short live code demo
# Software Keywords
| Keyword Group | Keywords |
|---------------|----------|
| AMOS | AMOS |
| IRT | BILOG; BILOG-MG; IRT PRO; MULTILOG; PARSCALE |
| R | CRAN; R-project; R Project; R Core Team; R software; RStudio |
| HLM | HLM [0-9]; HLM[0-9] |
| Java | Java |
| SAS | SAS; SAS Institute; JMP |
| LISREL | LISREL |
| Mplus | Mplus; M-Plus |
| Python | Python |
| SPSS | SPSS; SPSS Statistics |
| STATA | STATA |
| Other | Matlab; Scala; Systat; Statistica; Tableau; Minitab |
# Method Keywords
| Keyword Group | Keywords |
|---------------|----------|
| ANOVA | Analysis of Variance; ANOVA; Analysis of Covariance; ANCOVA; MANOVA; Multivariate Analysis of Variance; Repeated Measures Analysis of Variance; RM-ANOVA |
| IRT | IRT; Item Response Theory |
| CFA | CFA; Confirmatory Factor Analysis |
| Chi-Square | Chi-square( analysis)?; Nonparametric Analysis |
| Cluster Analysis | Cluster Analysis; Hierarchical Cluster Analysis |
| t-test | Dependent Samples t-test; one-sample t-test; two-sample t-test |
| EFA | EFA; Exploratory Factor Analysis |
| GAM | GAM; Generalized additive models |
# Method Keywords Continued
| Keyword Group | Keywords |
|---------------|----------|
| Linear Mixed Model | LMM; HLM; Multilevel Model; Multi-level Model; Hierarchical Linear Model; General(ized)? Linear Mixed Model |
| Linear Model | Linear Regression; Multiple Linear Regression; Multiple Regression; General(ized)? Linear Model |
| Growth | Growth Model; Latent Growth Model; LGM |
| SEM | Latent Variable Modeling; SEM; Structural Equation Modeling |
| Logistic Regression | Logistic Regression; Multinomial Regression; Multinomial Logistic Regression; Ordinal Regression |
| meta-analysis | meta-analysis; meta analysis |
| Non-Linear Regression | Non-Linear Regression; Nonlinear Regression |
| Propensity Score | Propensity Score Analysis; Propensity score matching |
# Journals Sampled
* American Economic Journal (AEJ)
* American Educational Research Journal (AERJ)
* American Journal of Political Science (AJPS)
* Economic Journal (EJ)
* Educational Evaluation and Policy Analysis (EEPA)
* Educational Researcher (ER)
* Higher Education (HE)
* Journal of Experimental Education (JEE)
* Journal of Public Policy (JPP)
* Political Science Quarterly (PSQ)
* Public Policy Administration (PPA)
* Sociology of Education (SE)
# How many articles obtained?
![](/figs/pdf-time-1.png)
# Software Counts
![](/figs/count-software-1.png)
# Software Keywords by Discipline
![](/figs/discipline-software-1.png)
# Number of software keywords
<table>
<thead>
<tr>
<th style="text-align:left;"> Discipline </th>
<th style="text-align:right;"> Avg Keywords </th>
<th style="text-align:right;"> Min Keywords </th>
<th style="text-align:right;"> Max Keywords </th>
<th style="text-align:right;"> Prop. Uniq </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Economics </td>
<td style="text-align:right;"> 1.08 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 3 </td>
<td style="text-align:right;"> 0.16 </td>
</tr>
<tr>
<td style="text-align:left;"> Education </td>
<td style="text-align:right;"> 1.67 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 7 </td>
<td style="text-align:right;"> 0.21 </td>
</tr>
<tr>
<td style="text-align:left;"> Political Science </td>
<td style="text-align:right;"> 1.14 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 4 </td>
<td style="text-align:right;"> 0.10 </td>
</tr>
<tr>
<td style="text-align:left;"> Public Policy </td>
<td style="text-align:right;"> 1.00 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0.19 </td>
</tr>
<tr>
<td style="text-align:left;"> Sociology </td>
<td style="text-align:right;"> 1.22 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 6 </td>
<td style="text-align:right;"> 0.28 </td>
</tr>
</tbody>
</table>
# Analysis keywords
![](/figs/model-discipline-1.png)
# Number of analysis keywords
<table>
<thead>
<tr>
<th style="text-align:left;"> Discipline </th>
<th style="text-align:right;"> Avg Keywords </th>
<th style="text-align:right;"> Min Keywords </th>
<th style="text-align:right;"> Max Keywords </th>
<th style="text-align:right;"> Prop. Uniq </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Economics </td>
<td style="text-align:right;"> 1.38 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 4 </td>
<td style="text-align:right;"> 0.59 </td>
</tr>
<tr>
<td style="text-align:left;"> Education </td>
<td style="text-align:right;"> 3.36 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 12 </td>
<td style="text-align:right;"> 0.27 </td>
</tr>
<tr>
<td style="text-align:left;"> Political Science </td>
<td style="text-align:right;"> 1.62 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 5 </td>
<td style="text-align:right;"> 0.23 </td>
</tr>
<tr>
<td style="text-align:left;"> Public Policy </td>
<td style="text-align:right;"> 1.43 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 4 </td>
<td style="text-align:right;"> 0.25 </td>
</tr>
<tr>
<td style="text-align:left;"> Sociology </td>
<td style="text-align:right;"> 2.39 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 7 </td>
<td style="text-align:right;"> 0.74 </td>
</tr>
</tbody>
</table>
# General Software Keyword Percentages by Year - Education
![](/figs/software-year-at1-1.png)
# Specialty Software Keyword Percentages by Year - Education
![](/figs/software-year-spec-1.png)
# Analysis Keyword Percentages by Year - Education
![](/figs/model-year-at1-1.png)
# Analysis Keyword Percentages by Year - Education
![](/figs/model-year-other-1.png)
# Interaction between Software and Analysis - Education
![s](/figs/software-statmethods-1.png)
# Interaction between Software and Analysis by Year - Education
![](/figs/software-statmethods-year-1.png)
# Interaction between Software and Analysis by Year - Education
![](/figs/software-statmethods-year2-1.png)
# Conclusions
- Cite the software you use!
+ It benefits the software developer
+ It benefits the reproducibility
+ It benefits the replicability
- There are discipline/journal differences in methods and software used.
# Connect
- slides: https://brandonlebeau.org/slides/canam2020/
- twitter: blebeau11