R

Brandon LeBeau

7 minute read

The simglm package has an update on CRAN bumping the version up to 0.6.0. This update has added the ability to simulate count data (poisson) and also has fixed (I think) the Shiny app that comes with the package. As I have not posted about this package since the first CRAN release (v 0.5.0), I plan to give an overview of all that the package offers in addition to the new additions.

Brandon LeBeau

3 minute read

This is a quick note looking for any further feedback on the simglm package prior to CRAN submission later this week. The goal is to submit Thursday or Friday this week. The last few documentation finishing touches are happening now working toward a version 0.5.0 release on CRAN. For those who have not seen this package yet, the aim is to simulate regression models (single level and multilevel models) as well as employ empirical power analyses based on Monte Carlo simulation.

Brandon LeBeau

4 minute read

Markdown (and Rmarkdown) are great ways to quickly develop material without worrying about the formatting. The documents can then be compiled using the knitr or rmarkdown packages to output formats such as HTML, latex, or even word. The main drawback of this approach is that formatting of documents is limited to italics, bold, or strikethrough. Markdown does have support for inline HTML, therefore you can add your own formatting inline using CSS or other HTML attributes, however this moves away from the quick markdown flavor.

Brandon LeBeau

8 minute read

I’m happy to introduce an add-on package, pdfsearch, that adds the ability to do keyword searches on pdf files. This add-on package uses the excellent pdftools package from the ropensci project to read in pdf files and perform keyword searches based character strings of interest. Installation The package is currently only hosted on github and can be installed with the devtools library. devtools::install_github('lebebr01/pdfsearch') Basic Example Doing a simple keyword search on a single pdf file uses the keyword_search function.

Brandon LeBeau

3 minute read

I have a simulation package that allows for the simulation of regression models including nested data structures. You can see the package on github here: simReg. Over the weekend I updated the package to allow for the simulation of unbalanced designs. I’m hoping to put together a new vigenette soon highlighting the functionality. I am working on a simulation that uses the unbalanced functionality and while simulating longitudinal data I’ve found the function is much slower than the cross sectional counterparts (and balanced designs).

Brandon LeBeau

5 minute read

I recently had an occasion while working on a three variable interaction plot for a paper where I wanted to remove the leading 0’s in the x-axis text labels using ggplot2. This was primarily due to some space concerns I had for the x-axis labels. Unfortunately, I did not find an obvious way to do this in my first go around. After tickering a bit, I’ve found a workaround. The process is walked through below.

Brandon LeBeau

4 minute read

I’d like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of continuous regression models. In the future, I’d like to add the ability to simulate generalized linear models. This package is an extension of the functions I used to simulate data for my dissertation.

Brandon LeBeau

3 minute read

I was emailed by a friend that was looking into their google location data and had asked if I had ever used a json file before in R. I said I had not, but I knew there were packages to do such things. The things I sent were things he had already tried, so what did I decide to do? I went ahead and downloaded my own google location data.

Brandon LeBeau

8 minute read

I saw a post recently about the likelihood of a baseball team winning based on how many runs, hits, and other baseball statistics. I liked the idea and thought of applying that to college football. Particularly, I’m interested in knowing whether scoring more points or having a stout defense improves the likelihood of becoming bowl eligible. Using some data scraped from the cfbDatawarehouse to figure out how likely a team would be bowl eligible based on the number of points they score.

Brandon LeBeau

3 minute read

I often see graphs that are poorly implemented in that they do not achieve their goal. One such type of graph that I see are dodged bar charts. Here is an example of a dodged bar chart summarizing the number of all star players by team (focusing specifically on the AL central division) and year from the Lahman r package: library(Lahman) library(dplyr) library(ggplot2) library(RColorBrewer) AllstarFull$selected <- 1 numAS <- AllstarFull %>% filter(yearID > 2006, lgID == 'AL', teamID %in% c('MIN', 'CLE', 'DET', 'CHA', 'KCA')) %>% group_by(teamID, yearID) %>% summarise(number = sum(selected)) b <- ggplot(numAS, aes(x = teamID, y = number, fill = factor(yearID))) + theme_bw() b + geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer("Year", palette = "Dark2") Note: If you are curious from the above graph, there appears to be two typos in the teamIDs, where CHA should be CHW (Chicago White Sox) and KCA should be KCR (Kansas City Royals).