R

Brandon LeBeau

3 minute read

I often see graphs that are poorly implemented in that they do not achieve their goal. One such type of graph that I see are dodged bar charts. Here is an example of a dodged bar chart summarizing the number of all star players by team (focusing specifically on the AL central division) and year from the Lahman r package: library(Lahman) library(dplyr) library(ggplot2) library(RColorBrewer) AllstarFull$selected <- 1 numAS <- AllstarFull %>% filter(yearID > 2006, lgID == 'AL', teamID %in% c('MIN', 'CLE', 'DET', 'CHA', 'KCA')) %>% group_by(teamID, yearID) %>% summarise(number = sum(selected)) b <- ggplot(numAS, aes(x = teamID, y = number, fill = factor(yearID))) + theme_bw() b + geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer("Year", palette = "Dark2") Note: If you are curious from the above graph, there appears to be two typos in the teamIDs, where CHA should be CHW (Chicago White Sox) and KCA should be KCR (Kansas City Royals).

Brandon LeBeau

3 minute read

Have you ever used a markdown file to create an html file? Have you ever wanted to quickly format the subsequent html file to add some color or other aspects? If your answer is yes to both of those questions, this package may be of interest to you. The highlightHTML package aims to develop a flexible approach to add formatting to an html document by injecting CSS into the file. To do this, tags are created within the markdown document telling the R routine where to look for these tags.

Brandon LeBeau

3 minute read

The American Educational Research Association (AERA) annual conference is this weekend in Philadelphia. I was lucky to have a paper accepted into the conference. I am presenting a meta analysis that I have been working on for the past two years or so titled: Model misspecification and assumption violations with the linear mixed model: A meta analysis. In this paper, I have compiled numerous monte carlo studies perform a quantitative synthesis of the literature.

Brandon LeBeau

3 minute read

Recently while scraping some data from the college football data warehouse site, I started to realize the evolution of my code. To preface this, I am definitely not a trained programmer, just a self taught junky who enjoys doing it when I have time. I’ve slowly evolved my programming skills from simply statistics languages like r or SPSS, to some other languages like LaTeX, HTML, CSS, Javascript, and I’ve started to work through some python.

Brandon LeBeau

3 minute read

I’ve added a new functionality to my highlightHTML package. This package post-processes HTML files and injects CSS and adds tags to create some further customization (for example highlight cells of a HTML table). This is most useful when writing a document using markdown and converting it into a HTML document using a tool like knitr, slidify, or even pandoc. Up to now, my package only worked with tables, see my old post that talks about this if you are interested: http://educate-r.

Brandon LeBeau

7 minute read

My last post I talked about using rCharts to create interactive graphics for my presentation. They seemed to go over pretty well in my interviews and helped me greatly as I did not need to remember or write down specific numbers to talk about. I use slidy to create my HTML slideshows and there was some interest to see exactly how I had these charts into a slidy html presentation.

Brandon LeBeau

3 minute read

Recently I decided to switch statistical programs used for the master’s level introductory statistics course I teach here at the University of Arkansas. Historically this course has been taught with SPSS, but I am attempting the switch to R this semester. My reasons for having students use the gui interface is primarily due to the lack of programming experience. A brief initial poll revealed that only one student had prior programming/code writing experience.

Brandon LeBeau

4 minute read

My last post I talked about how I use the data.table package for aggregating and removing duplicate observations. Although I use the data.table package quite often, there are many times when I use plyr (and now the new dplyr) package, primarily because of its easy, intuitive syntax. Arrange One of my personal favorite functions in the plyr suite of basic functions is the arrange function. The base functions for sorting/ordering are more difficult to use.

Brandon LeBeau

6 minute read

When I started to use the data.table package I was primarily using it to aggregate. I had read about data.table and its blazing speed compared to the other options from base or the plyr package especially with large amounts of data. As an example, I remember calculating averages or percentages while at Saint Paul Public Schools and while the calculations were running would walk away for 5 minutes to wait for them to finish.

Brandon LeBeau

3 minute read

My first statistical software package I used as an undergrad was SPSS. I was fortunate during my senior year at Luther College to be initially introduced to R. I did not realize it at the time (except for the pretty graphics) that this was the start of something big for me. Fast forward a year to graduate school at the University of Minnesota and the majority of my program was again using SPSS.