I saw a post recently about the likelihood of a baseball team winning based on how many runs, hits, and other baseball statistics. I liked the idea and thought of applying that to college football. Particularly, I’m interested in knowing whether scoring more points or having a stout defense improves the likelihood of becoming bowl eligible. Using some data scraped from the cfbDatawarehouse to figure out how likely a team would be bowl eligible based on the number of points they score.
I often see graphs that are poorly implemented in that they do not achieve their goal. One such type of graph that I see are dodged bar charts. Here is an example of a dodged bar chart summarizing the number of all star players by team (focusing specifically on the AL central division) and year from the Lahman r package: library(Lahman) library(dplyr) library(ggplot2) library(RColorBrewer) AllstarFull$selected <- 1 numAS <- AllstarFull %>% filter(yearID > 2006, lgID == 'AL', teamID %in% c('MIN', 'CLE', 'DET', 'CHA', 'KCA')) %>% group_by(teamID, yearID) %>% summarise(number = sum(selected)) b <- ggplot(numAS, aes(x = teamID, y = number, fill = factor(yearID))) + theme_bw() b + geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer("Year", palette = "Dark2") Note: If you are curious from the above graph, there appears to be two typos in the teamIDs, where CHA should be CHW (Chicago White Sox) and KCA should be KCR (Kansas City Royals).
Have you ever used a markdown file to create an html file? Have you ever wanted to quickly format the subsequent html file to add some color or other aspects? If your answer is yes to both of those questions, this package may be of interest to you. The highlightHTML package aims to develop a flexible approach to add formatting to an html document by injecting CSS into the file. To do this, tags are created within the markdown document telling the R routine where to look for these tags.
The American Educational Research Association (AERA) annual conference is this weekend in Philadelphia. I was lucky to have a paper accepted into the conference. I am presenting a meta analysis that I have been working on for the past two years or so titled: Model misspecification and assumption violations with the linear mixed model: A meta analysis. In this paper, I have compiled numerous monte carlo studies perform a quantitative synthesis of the literature.
I’ve added a new functionality to my highlightHTML package. This package post-processes HTML files and injects CSS and adds tags to create some further customization (for example highlight cells of a HTML table). This is most useful when writing a document using markdown and converting it into a HTML document using a tool like knitr, slidify, or even pandoc. Up to now, my package only worked with tables, see my old post that talks about this if you are interested: http://educate-r.
My last post I talked about using rCharts to create interactive graphics for my presentation. They seemed to go over pretty well in my interviews and helped me greatly as I did not need to remember or write down specific numbers to talk about. I use slidy to create my HTML slideshows and there was some interest to see exactly how I had these charts into a slidy html presentation.
Recently I decided to switch statistical programs used for the master’s level introductory statistics course I teach here at the University of Arkansas. Historically this course has been taught with SPSS, but I am attempting the switch to R this semester. My reasons for having students use the gui interface is primarily due to the lack of programming experience. A brief initial poll revealed that only one student had prior programming/code writing experience.
My last post I talked about how I use the data.table package for aggregating and removing duplicate observations. Although I use the data.table package quite often, there are many times when I use plyr (and now the new dplyr) package, primarily because of its easy, intuitive syntax. Arrange One of my personal favorite functions in the plyr suite of basic functions is the arrange function. The base functions for sorting/ordering are more difficult to use.
When I started to use the data.table package I was primarily using it to aggregate. I had read about data.table and its blazing speed compared to the other options from base or the plyr package especially with large amounts of data. As an example, I remember calculating averages or percentages while at Saint Paul Public Schools and while the calculations were running would walk away for 5 minutes to wait for them to finish.