<h1>Tidy Meta-Analytic Data</h1>
<h2>Brandon LeBeau & Ariel M. Aloe</h2>
<h3>University of Iowa</h3>
# Rationale
- Data entry is an important component for quantitative studies
+ Often neglected in courses
- Messy data can make data manipulation much more difficult
+ Substantial time could be lost due to poor data entry procedures
- Strong data entry procedures are particularly important in evidence synthesis
# Rationale 2
- Data Organization in Spreadsheets - Broman and Woo (2018), *The American Statistician*, https://doi.org/10.1080/00031305.2017.1375989
- Tidy Data - Wickham (2014), *Journal of Statistical Software*, https://www.jstatsoft.org/index.php/jss/article/view/v059i10
- Nine simple ways to make it easier to (re)use your data - White et al., (2013), https://ojs.library.queensu.ca/index.php/IEE/article/view/4608
# Tidy Meta Analytic Data
- A series of data entry rules
- Promotes more consistent evidence synthesis data
- Promotes reproducible analyses (See Reproducible Analyses in Education Research by LeBeau, Ellison, Aloe (2021) in Review of Research in Education).
- Promotes reusable analyses across evidence synthesis projects
- Facilitates a split-apply-combine data analysis framework (Wickham, 2011; https://www.jstatsoft.org/v40/i01/)
# Data Entry Rules
- These rules are mostly agnostic to data storage mode/program
+ Text based (csv, tsv, etc.)
+ SQL databases of all types
+ Excel (but please avoid)
- Rows should be the unit of analysis
- Columns are attributes/characteristics about a unit of analysis
- Data should be rectangular
# Column Rules
1. Avoid placing attributes/characteristics in column names
2. Do not use spaces in names
3. Columns should contain one attribute/characteristic
4. Use one row for column names
5. Ensure appropriate ID columns
# Column Example(s)
![](/figs/data-screen.png)
# Row Rules
1. Have deliberate missing data codes:
+ Using -99, 99, -9999, N/A are problematic
+ NA is a good option
2. No calculations in the data file
3. Use ISO 8601 standard for dates YYYY-MM-DD (https://xkcd.com/1179/)
+ Particularly needed when using Excel
4. Don't use highlighting as data
+ TRUE/FALSE attributes can be helpful here
# Date Conversion Excel
![](/figs/date-convert.gif)
# Highlighting Cells
![](/figs/data-highlighting.gif)
# Split-Apply-Combine
- Process of splitting a hard task into smaller manageable tasks
- This framework is particularly powerful in functional programming languages, like R, Python, Julia, Scala, Mathematica, Javascript, etc.
- Split-Apply-Combine in Data Analysis
+ Split observations into similar type
+ Apply a function (ie, often a computation)
+ Combine function results across observations
# Synthesis of Correlation Matrices
- One particular way to implement of tidy meta analytic data is the synthesis of correlation matrices.
- See *Meta-Analysis of Correlations, Correlation Matrices, and Their Functions* by Becker, Aloe, Cheung (2020) in Handbook of Meta-Analysis
- We will use an R package developed in tandem with this chapter, `metaRmat`.
# Install `metaRmat`
```r
remotes::install_github("lebebr01/metaRmat")
```
```r
library(metaRmat)
```
# Correlation Data Example
```r
becker09[, 1:6]
```
```
## ID N Team Cognitive_Performance Somatic_Performance
## 1 1 142 I -0.55 -0.48
## 2 3 37 I 0.53 -0.12
## 3 6 16 T 0.44 0.46
## 4 10 14 I -0.39 -0.17
## 5 17 45 I 0.10 0.31
## 6 22 100 I 0.23 0.08
## 7 26 51 T -0.52 -0.43
## 8 28 128 T 0.14 0.02
## 9 36 70 T -0.01 -0.16
## 10 38 30 I -0.27 -0.13
## Selfconfidence_Performance
## 1 0.66
## 2 0.03
## 3 NA
## 4 0.19
## 5 -0.17
## 6 0.51
## 7 0.16
## 8 0.13
## 9 0.42
## 10 0.15
```
# Split Correlation Matrices
```r
becker09_list <- df_to_corr(becker09,
variables =
c('Cognitive_Performance',
'Somatic_Performance',
'Selfconfidence_Performance',
'Somatic_Cognitive',
'Selfconfidence_Cognitive',
'Selfconfidence_Somatic'),
ID = 'ID')
```
# View Split Correlation Matrices
```r
becker09_list[1:3]
```
```
## $`1`
## Performance Cognitive Somatic Selfconfidence
## Performance 1.00 -0.55 -0.48 0.66
## Cognitive -0.55 1.00 0.47 -0.38
## Somatic -0.48 0.47 1.00 -0.46
## Selfconfidence 0.66 -0.38 -0.46 1.00
##
## $`3`
## Performance Cognitive Somatic Selfconfidence
## Performance 1.00 0.53 -0.12 0.03
## Cognitive 0.53 1.00 0.52 -0.48
## Somatic -0.12 0.52 1.00 -0.40
## Selfconfidence 0.03 -0.48 -0.40 1.00
##
## $`6`
## Performance Cognitive Somatic Selfconfidence
## Performance 1.00 0.44 0.46 NA
## Cognitive 0.44 1.00 0.67 NA
## Somatic 0.46 0.67 1.00 NA
## Selfconfidence NA NA NA 1
```
# Correlations as Tidy Meta Analytic Data
```r
input_metafor <- prep_data(becker09,
becker09$N,
type = 'weighted', missing = FALSE,
variable_names =
c('Cognitive_Performance',
'Somatic_Performance',
'Selfconfidence_Performance',
'Somatic_Cognitive',
'Selfconfidence_Cognitive',
'Selfconfidence_Somatic'),
ID = 'ID')
```
# View Tidy Meta Analytic Correlations
```r
head(input_metafor$data, n = 15)
```
```
## Variable1 Variable2 yi outcome study
## 1 Performance Cognitive -0.55 1 1
## 2 Performance Somatic -0.48 2 1
## 3 Performance Selfconfidence 0.66 3 1
## 4 Cognitive Somatic 0.47 4 1
## 5 Cognitive Selfconfidence -0.38 5 1
## 6 Somatic Selfconfidence -0.46 6 1
## 7 Performance Cognitive 0.53 1 2
## 8 Performance Somatic -0.12 2 2
## 9 Performance Selfconfidence 0.03 3 2
## 10 Cognitive Somatic 0.52 4 2
## 11 Cognitive Selfconfidence -0.48 5 2
## 12 Somatic Selfconfidence -0.40 6 2
## 13 Performance Cognitive 0.44 1 3
## 14 Performance Somatic 0.46 2 3
## 15 Performance Selfconfidence NA 3 3
```
# Fit a random effects meta analytic model
```r
random_model <- fit_model(data = input_metafor, effect_size = 'yi',
var_cor = 'V', moderators = ~ -1 + factor(outcome),
random_params = ~ factor(outcome) | factor(study))
```
```r
round(random_model$tau2, 3) # between studies variance
```
```
## [1] 0.126 0.060 0.062 0.002 0.011 0.006
```
```r
round(random_model$b, 3) # random effect estimate
```
```
## [,1]
## factor(outcome)1 -0.034
## factor(outcome)2 -0.071
## factor(outcome)3 0.233
## factor(outcome)4 0.544
## factor(outcome)5 -0.453
## factor(outcome)6 -0.397
```
# Average correlation matrix
```r
model_out_random <- extract_model(random_model,
variable_names = c('Cognitive_Performance',
'Somatic_Performance',
'Selfconfidence_Performance',
'Somatic_Cognitive',
'Selfconfidence_Cognitive',
'Selfconfidence_Somatic'))
round(model_out_random$beta_matrix, 3)
```
```
## Performance Cognitive Somatic Selfconfidence
## Performance 1.000 -0.034 -0.071 0.233
## Cognitive -0.034 1.000 0.544 -0.453
## Somatic -0.071 0.544 1.000 -0.397
## Selfconfidence 0.233 -0.453 -0.397 1.000
```
# Fit path model
```r
model <- "## Regression paths
Performance ~ Cognitive + Somatic + Selfconfidence
Selfconfidence ~ Cognitive + Somatic
"
path_output <- path_model(data = model_out_random, model = model,
num_obs = sum(becker09$N))
```
# Extract some results
```r
path_output$parameter_estimates
```
```
## [[1]]
## predictor outcome estimate
## Cognitive -> Performance Cognitive Performance 0.09757045
## Somatic -> Performance Somatic Performance -0.01663048
## Selfconfidence -> Performance Selfconfidence Performance 0.27041818
##
## [[2]]
## predictor outcome estimate
## Cognitive -> Selfconfidence Cognitive Selfconfidence -0.3359884
## Somatic -> Selfconfidence Somatic Selfconfidence -0.2146362
```
# Summary
- Be mindful and plan for data entry - this is hard!
- Do not assume that data entry will "take care of itself"
- Think "long" instead of wide
- Ensure attributes contain one piece of information
- Ensure attributes are named well, but do not contain information directly
- Use text based or database systems rather than Excel
# Connect
- slides: https://brandonlebeau.org/slides/canam2021/
- twitter: blebeau11