Rationale

Data entry is an important component for quantitative studies
- Often neglected in courses
Messy data can make data manipulation much more difficult
- Substantial time could be lost due to poor data entry procedures
Strong data entry procedures are particularly important in evidence synthesis

Rationale 2

Data Organization in Spreadsheets - Broman and Woo (2018), The American Statistician, https://doi.org/10.1080/00031305.2017.1375989
Tidy Data - Wickham (2014), Journal of Statistical Software, https://www.jstatsoft.org/index.php/jss/article/view/v059i10
Nine simple ways to make it easier to (re)use your data - White et al., (2013), https://ojs.library.queensu.ca/index.php/IEE/article/view/4608

Tidy Meta Analytic Data

A series of data entry rules
Promotes more consistent evidence synthesis data
Promotes reproducible analyses (See Reproducible Analyses in Education Research by LeBeau, Ellison, Aloe (2021) in Review of Research in Education).
Promotes reusable analyses across evidence synthesis projects
Facilitates a split-apply-combine data analysis framework (Wickham, 2011; https://www.jstatsoft.org/v40/i01/)

Data Entry Rules

These rules are mostly agnostic to data storage mode/program
- Text based (csv, tsv, etc.)
- SQL databases of all types
- Excel (but please avoid)
Rows should be the unit of analysis
Columns are attributes/characteristics about a unit of analysis
Data should be rectangular

Column Rules

Avoid placing attributes/characteristics in column names
Do not use spaces in names
Columns should contain one attribute/characteristic
Use one row for column names
Ensure appropriate ID columns

Column Example(s)

Row Rules

Have deliberate missing data codes:
- Using -99, 99, -9999, N/A are problematic
- NA is a good option
No calculations in the data file
Use ISO 8601 standard for dates YYYY-MM-DD (https://xkcd.com/1179/)
- Particularly needed when using Excel
Don't use highlighting as data
- TRUE/FALSE attributes can be helpful here

Date Conversion Excel

Highlighting Cells

Split-Apply-Combine

Process of splitting a hard task into smaller manageable tasks
This framework is particularly powerful in functional programming languages, like R, Python, Julia, Scala, Mathematica, Javascript, etc.
Split-Apply-Combine in Data Analysis
- Split observations into similar type
- Apply a function (ie, often a computation)
- Combine function results across observations

Synthesis of Correlation Matrices

One particular way to implement of tidy meta analytic data is the synthesis of correlation matrices.
See Meta-Analysis of Correlations, Correlation Matrices, and Their Functions by Becker, Aloe, Cheung (2020) in Handbook of Meta-Analysis
We will use an R package developed in tandem with this chapter, metaRmat.

Install `metaRmat`

remotes::install_github("lebebr01/metaRmat")

library(metaRmat)

Correlation Data Example

becker09[, 1:6]

##    ID   N Team Cognitive_Performance Somatic_Performance
## 1   1 142    I                 -0.55               -0.48
## 2   3  37    I                  0.53               -0.12
## 3   6  16    T                  0.44                0.46
## 4  10  14    I                 -0.39               -0.17
## 5  17  45    I                  0.10                0.31
## 6  22 100    I                  0.23                0.08
## 7  26  51    T                 -0.52               -0.43
## 8  28 128    T                  0.14                0.02
## 9  36  70    T                 -0.01               -0.16
## 10 38  30    I                 -0.27               -0.13
##    Selfconfidence_Performance
## 1                        0.66
## 2                        0.03
## 3                          NA
## 4                        0.19
## 5                       -0.17
## 6                        0.51
## 7                        0.16
## 8                        0.13
## 9                        0.42
## 10                       0.15

Split Correlation Matrices

becker09_list <- df_to_corr(becker09, 
                           variables = 
                             c('Cognitive_Performance',
                               'Somatic_Performance',
                               'Selfconfidence_Performance', 
                               'Somatic_Cognitive',
                               'Selfconfidence_Cognitive',
                               'Selfconfidence_Somatic'),
                           ID = 'ID')

View Split Correlation Matrices

becker09_list[1:3]

## $`1`
##                Performance Cognitive Somatic Selfconfidence
## Performance           1.00     -0.55   -0.48           0.66
## Cognitive            -0.55      1.00    0.47          -0.38
## Somatic              -0.48      0.47    1.00          -0.46
## Selfconfidence        0.66     -0.38   -0.46           1.00
## 
## $`3`
##                Performance Cognitive Somatic Selfconfidence
## Performance           1.00      0.53   -0.12           0.03
## Cognitive             0.53      1.00    0.52          -0.48
## Somatic              -0.12      0.52    1.00          -0.40
## Selfconfidence        0.03     -0.48   -0.40           1.00
## 
## $`6`
##                Performance Cognitive Somatic Selfconfidence
## Performance           1.00      0.44    0.46             NA
## Cognitive             0.44      1.00    0.67             NA
## Somatic               0.46      0.67    1.00             NA
## Selfconfidence          NA        NA      NA              1

Correlations as Tidy Meta Analytic Data

input_metafor <- prep_data(becker09, 
                           becker09$N, 
                           type = 'weighted', missing = FALSE, 
                           variable_names =
                             c('Cognitive_Performance',
                               'Somatic_Performance',
                               'Selfconfidence_Performance', 
                               'Somatic_Cognitive',
                               'Selfconfidence_Cognitive',
                               'Selfconfidence_Somatic'),
                           ID = 'ID')

View Tidy Meta Analytic Correlations

head(input_metafor$data, n = 15)

##      Variable1      Variable2    yi outcome study
## 1  Performance      Cognitive -0.55       1     1
## 2  Performance        Somatic -0.48       2     1
## 3  Performance Selfconfidence  0.66       3     1
## 4    Cognitive        Somatic  0.47       4     1
## 5    Cognitive Selfconfidence -0.38       5     1
## 6      Somatic Selfconfidence -0.46       6     1
## 7  Performance      Cognitive  0.53       1     2
## 8  Performance        Somatic -0.12       2     2
## 9  Performance Selfconfidence  0.03       3     2
## 10   Cognitive        Somatic  0.52       4     2
## 11   Cognitive Selfconfidence -0.48       5     2
## 12     Somatic Selfconfidence -0.40       6     2
## 13 Performance      Cognitive  0.44       1     3
## 14 Performance        Somatic  0.46       2     3
## 15 Performance Selfconfidence    NA       3     3

Fit a random effects meta analytic model

random_model <- fit_model(data = input_metafor, effect_size = 'yi', 
                          var_cor = 'V', moderators = ~ -1 + factor(outcome), 
                          random_params = ~ factor(outcome) | factor(study))

round(random_model$tau2, 3) # between studies variance

## [1] 0.126 0.060 0.062 0.002 0.011 0.006

round(random_model$b, 3) # random effect estimate

##                    [,1]
## factor(outcome)1 -0.034
## factor(outcome)2 -0.071
## factor(outcome)3  0.233
## factor(outcome)4  0.544
## factor(outcome)5 -0.453
## factor(outcome)6 -0.397

Average correlation matrix

model_out_random <- extract_model(random_model, 
                      variable_names = c('Cognitive_Performance',
                                         'Somatic_Performance',
                                         'Selfconfidence_Performance', 
                                         'Somatic_Cognitive',
                                         'Selfconfidence_Cognitive',
                                         'Selfconfidence_Somatic'))
round(model_out_random$beta_matrix, 3)

##                Performance Cognitive Somatic Selfconfidence
## Performance          1.000    -0.034  -0.071          0.233
## Cognitive           -0.034     1.000   0.544         -0.453
## Somatic             -0.071     0.544   1.000         -0.397
## Selfconfidence       0.233    -0.453  -0.397          1.000

Fit path model

model <- "## Regression paths
Performance ~ Cognitive + Somatic + Selfconfidence
Selfconfidence ~ Cognitive + Somatic
"
path_output <- path_model(data = model_out_random, model = model, 
                          num_obs = sum(becker09$N))

Extract some results

path_output$parameter_estimates

## [[1]]
##                                    predictor     outcome    estimate
## Cognitive -> Performance           Cognitive Performance  0.09757045
## Somatic -> Performance               Somatic Performance -0.01663048
## Selfconfidence -> Performance Selfconfidence Performance  0.27041818
## 
## [[2]]
##                             predictor        outcome   estimate
## Cognitive -> Selfconfidence Cognitive Selfconfidence -0.3359884
## Somatic -> Selfconfidence     Somatic Selfconfidence -0.2146362

Summary

Be mindful and plan for data entry - this is hard!
Do not assume that data entry will "take care of itself"
Think "long" instead of wide
Ensure attributes contain one piece of information
Ensure attributes are named well, but do not contain information directly
Use text based or database systems rather than Excel

Connect

slides: https://brandonlebeau.org/slides/canam2021/
twitter: blebeau11

Tidy Meta-Analytic Data

Brandon LeBeau & Ariel M. Aloe

University of Iowa

Rationale

Rationale 2

Tidy Meta Analytic Data

Data Entry Rules

Column Rules

Column Example(s)

Row Rules

Date Conversion Excel

Highlighting Cells

Split-Apply-Combine

Synthesis of Correlation Matrices

Install `metaRmat`

Correlation Data Example

Split Correlation Matrices

View Split Correlation Matrices

Correlations as Tidy Meta Analytic Data

View Tidy Meta Analytic Correlations

Fit a random effects meta analytic model

Average correlation matrix

Fit path model

Extract some results

Summary

Connect