<h1>Data Visualization - Static and Interactive Graphics using R</h1>
<h2>Brandon LeBeau</h2>
<h3>University of Iowa</h3>
# About Me
- I'm an Assistant Professor in the College of Education
+ I enjoy model building, particularly longitudinal models, and statistical programming.
- I've used R for over 10 years.
+ I have 4 R packages, 3 on CRAN, 1 on GitHub
* simglm
* pdfsearch
* highlightHTML
* SPSStoR
- GitHub Repository for this workshop: <https://github.com/lebebr01/iowa_data_science>
# Why teach the tidyverse
- The tidyverse is a series of packages developed by Hadley Wickham and his team at RStudio. <https://www.tidyverse.org/>
- I teach/use the tidyverse for 3 major reasons:
+ Simple functions that do one thing well
+ Consistent implementations across functions within tidyverse (i.e. common APIs)
+ Provides a framework for data manipulation
# Course Setup
```r
install.packages("tidyverse")
```
```r
library(tidyverse)
```
# Explore Data
![plot of chunk data](/figure/data-1.png)
# First ggplot
```r
ggplot(data = midwest) +
geom_point(mapping = aes(x = popdensity, y = percollege))
```
![plot of chunk plot1](/figure/plot1-1.png)
# Equivalent Code
```r
ggplot(midwest) +
geom_point(aes(x = popdensity, y = percollege))
```
![plot of chunk plot1_reduced](/figure/plot1_reduced-1.png)
# Your Turn
1. Try plotting `popdensity` by `state`.
2. Try plotting `county` by `state`.
+ Does this plot work?
3. Bonus: Try just using the `ggplot(data = midwest)` from above.
+ What do you get?
+ Does this make sense?
# Add Aesthetics
```r
ggplot(midwest) +
geom_point(aes(x = popdensity, y = percollege, color = state))
```
![plot of chunk aesthetic](/figure/aesthetic-1.png)
# Global Aesthetics
```r
ggplot(midwest) +
geom_point(aes(x = popdensity, y = percollege), color = 'pink')
```
![plot of chunk global_aes](/figure/global_aes-1.png)
# Your Turn
1. Instead of using colors, make the shape of the points different for each state.
2. Instead of color, use `alpha` instead.
+ What does this do to the plot?
3. Try the following command: `colors()`.
+ Try a few colors to find your favorite.
4. What happens if you use the following code:
```r
ggplot(midwest) +
geom_point(aes(x = popdensity, y = percollege, color = 'green'))
```
# Additional Geoms
```r
ggplot(midwest) +
geom_smooth(aes(x = popdensity, y = percollege))
```
![plot of chunk smooth](/figure/smooth-1.png)
# Add more Aesthetics
```r
ggplot(midwest) +
geom_smooth(aes(x = popdensity, y = percollege, linetype = state),
se = FALSE)
```
![plot of chunk smooth_states](/figure/smooth_states-1.png)
# Your Turn
1. It is possible to combine geoms, which we will do next, but try it first. Try to recreate this plot.
![plot of chunk combine](/figure/combine-1.png)
# Layered ggplot
```r
ggplot(midwest) +
geom_point(aes(x = popdensity, y = percollege, color = state)) +
geom_smooth(aes(x = popdensity, y = percollege, color = state),
se = FALSE)
```
![plot of chunk combine_geoms](/figure/combine_geoms-1.png)
# Remove duplicate aesthetics
```r
ggplot(midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
geom_smooth(se = FALSE)
```
![plot of chunk two_geoms](/figure/two_geoms-1.png)
# Your Turn
1. Can you recreate the following figure?
![plot of chunk differ_aes](/figure/differ_aes-1.png)
# Brief plot customization
```r
ggplot(midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
scale_x_continuous("Population Density",
breaks = seq(0, 80000, 20000)) +
scale_y_continuous("Percent College Graduates") +
scale_color_discrete("State")
```
# Brief plot customization Output
![plot of chunk breaks_x2](/figure/breaks_x2-1.png)
# Change plot theme
```r
ggplot(midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
geom_smooth(se = FALSE) +
theme_bw()
```
![plot of chunk theme_bw](/figure/theme_bw-1.png)
# More themes
+ Themes in ggplot2: <http://ggplot2.tidyverse.org/reference/ggtheme.html>
+ Themes from ggthemes package: <https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html>
# Base plot for reference
```r
p1 <- ggplot(midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
scale_x_continuous("Population Density",
breaks = seq(0, 80000, 20000)) +
scale_y_continuous("Percent College Graduates") +
theme_bw()
```
# Add plot title or subtitle
```r
p1 +
labs(title = "Percent College Educated by Population Density",
subtitle = "County level data for five midwest states")
```
![plot of chunk title_subtitle_ggplot2](/figure/title_subtitle_ggplot2-1.png)
# Color Options
```r
p1 + scale_color_grey("State")
```
![plot of chunk grey_color](/figure/grey_color-1.png)
# Using colorbrewer2.org
+ <http://colorbrewer2.org>
```r
p1 + scale_color_brewer("State", palette = 'Dark2')
```
![plot of chunk color_brewer](/figure/color_brewer-1.png)
# Two additional color options
+ viridis: <https://github.com/sjmgarnier/viridis>
+ scico: <https://github.com/thomasp85/scico>
# viridis colors
```r
library(viridis)
p1 + scale_color_viridis(discrete = TRUE)
```
![plot of chunk viridis](/figure/viridis-1.png)
# viridis colors
```r
p1 + scale_color_viridis(option = 'cividis', discrete = TRUE)
```
![plot of chunk viridis2](/figure/viridis2-1.png)
# Zoom in on a plot
```r
ggplot(data = midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
scale_x_continuous("Population Density") +
scale_y_continuous("Percent College Graduates") +
scale_color_discrete("State") +
coord_cartesian(xlim = c(0, 15000))
```
# Zoom in on a plot output
![plot of chunk zoom_out](/figure/zoom_out-1.png)
# Zoom using `scale_x_continuous` - Bad Practice
```r
ggplot(data = midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_x_continuous("Population Density", limits = c(0, 15000)) +
scale_y_continuous("Percent College Graduates") +
scale_color_discrete("State")
```
# Comparing output
```
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
```
```
## Warning: Removed 16 rows containing non-finite values (stat_smooth).
```
```
## Warning: Removed 16 rows containing missing values (geom_point).
```
```
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
```
![plot of chunk zoom_x_output](/figure/zoom_x_output-1.png)
# Lord of the Rings Data
- Data from Jenny Bryan: <https://github.com/jennybc/lotr>
```r
lotr <- read_tsv('https://raw.githubusercontent.com/jennybc/lotr/master/lotr_clean.tsv')
```
```
## Parsed with column specification:
## cols(
## Film = col_character(),
## Chapter = col_character(),
## Character = col_character(),
## Race = col_character(),
## Words = col_integer()
## )
```
```r
head(lotr)
```
```
## # A tibble: 6 x 5
## Film Chapter Character Race Words
## <chr> <chr> <chr> <chr> <int>
## 1 The Fellowship Of The Ring 01: Prologue Bilbo Hobbit 4
## 2 The Fellowship Of The Ring 01: Prologue Elrond Elf 5
## 3 The Fellowship Of The Ring 01: Prologue Galadriel Elf 460
## 4 The Fellowship Of The Ring 02: Concerning Hobbits Bilbo Hobbit 214
## 5 The Fellowship Of The Ring 03: The Shire Bilbo Hobbit 70
## 6 The Fellowship Of The Ring 03: The Shire Frodo Hobbit 128
```
# Geoms for single variables
```r
ggplot(lotr, aes(x = Words)) +
geom_histogram() +
theme_bw()
```
```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```
![plot of chunk histogram](/figure/histogram-1.png)
# Customize histogram
```r
ggplot(lotr, aes(x = Words)) +
geom_histogram(bins = 20) +
theme_bw()
```
![plot of chunk cust_hist](/figure/cust_hist-1.png)
# Customize histogram 2
```r
ggplot(lotr, aes(x = Words)) +
geom_histogram(binwidth = 25) +
theme_bw()
```
![plot of chunk cust_hist2](/figure/cust_hist2-1.png)
# Histograms by other variables - likely not useful
```r
ggplot(lotr, aes(x = Words, color = Film)) +
geom_histogram(binwidth = 25) +
theme_bw()
```
![plot of chunk hist_film](/figure/hist_film-1.png)
# Histograms by other variables - one alternative
```r
ggplot(lotr, aes(x = Words)) +
geom_histogram(binwidth = 25) +
theme_bw() +
facet_wrap(~ Film)
```
![plot of chunk hist_film_alt](/figure/hist_film_alt-1.png)
# Your Turn
1. With more than two groups, histograms are difficult to interpret due to overlap. Instead, use the `geom_density` to create a density plot for `Words` for each film.
2. Using `geom_boxplot`, create boxplots with `Words` as the y variable and `Film` as the x variable. Bonus: facet this plot by the variable `Race`. Bonus2: Zoom in on the bulk of the data.
# Rotation of axis labels
```r
ggplot(lotr, aes(x = Film, y = Words)) +
geom_boxplot() +
facet_wrap(~ Race) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
```
![plot of chunk rotate](/figure/rotate-1.png)
# Many times `coord_flip` is better
```r
ggplot(lotr, aes(x = Film, y = Words)) +
geom_boxplot() +
facet_wrap(~ Race) +
theme_bw() +
coord_flip()
```
![plot of chunk flip](/figure/flip-1.png)
# Bar graphs
```r
ggplot(lotr, aes(x = Race)) +
geom_bar() +
theme_bw()
```
![plot of chunk simple_bar](/figure/simple_bar-1.png)
# Add aesthetic
```r
ggplot(lotr, aes(x = Race)) +
geom_bar(aes(fill = Film)) +
theme_bw()
```
![plot of chunk bar_fill](/figure/bar_fill-1.png)
# Stacked Bars Relative
```r
ggplot(lotr, aes(x = Race)) +
geom_bar(aes(fill = Film), position = 'fill') +
theme_bw() +
ylab("Proportion")
```
![plot of chunk stacked](/figure/stacked-1.png)
# Dodged Bars
```r
ggplot(lotr, aes(x = Race)) +
geom_bar(aes(fill = Film), position = 'dodge') +
theme_bw()
```
![plot of chunk unnamed-chunk-1](/figure/unnamed-chunk-1-1.png)
# Change Bar Col bar_coloror
```r
ggplot(lotr, aes(x = Race)) +
geom_bar(aes(fill = Film), position = 'fill') +
theme_bw() +
ylab("Proportion") +
scale_fill_viridis(option = 'cividis', discrete = TRUE)
```
![plot of chunk bar_color](/figure/bar_color-1.png)
# Your Turn
1. Using the gss_cat data, create a bar chart of the variable `partyid`.
2. Add the variable `marital` to the bar chart created in step 1. Do you prefer a stacked or dodged version?
3. Take steps to make one of the plots above close to publication quality.
# Additional ggplot2 resources
+ ggplot2 website: <http://docs.ggplot2.org/current/index.html>
+ ggplot2 book: <http://www.springer.com/us/book/9780387981413>
+ R graphics cookbook: <http://www.cookbook-r.com/Graphs/>
# Additional R Resources
+ R for Data Science: <http://r4ds.had.co.nz/>
# Moving to Interactive Graphics
* Why interactive graphics?
+ Created specifically for the web.
+ Can focus, explore, zoom, or remove data at will.
+ Allows users to customize their experience.
+ It is fun!
# Interactive graphics with plotly
```r
install.packages("plotly")
```
# First Interactive Plot
```r
library(plotly)
p <- ggplot(data = midwest) +
geom_point(mapping = aes(x = popdensity, y = percollege))
print(ggplotly(p))
```
# Customized Interactive Plot
```r
p <- ggplot(midwest,
aes(x = popdensity, y = percollege, color = state)) +
geom_point() +
scale_x_continuous("Population Density",
breaks = seq(0, 80000, 20000)) +
scale_y_continuous("Percent College Graduates") +
scale_color_discrete("State") +
theme_bw()
print(ggplotly(p))
```
# Your Turn
1. Using the `starwars` data, create a static ggplot and use the `ggplotly` function to turn it interactive.
# Lord of the Rings Data
- Data from Jenny Bryan: <https://github.com/jennybc/lotr>
```r
lotr <- read_tsv('https://raw.githubusercontent.com/jennybc/lotr/master/lotr_clean.tsv')
```
```
## Parsed with column specification:
## cols(
## Film = col_character(),
## Chapter = col_character(),
## Character = col_character(),
## Race = col_character(),
## Words = col_integer()
## )
```
```r
lotr
```
```
## # A tibble: 682 x 5
## Film Chapter Character Race Words
## <chr> <chr> <chr> <chr> <int>
## 1 The Fellowship Of The Ring 01: Prologue Bilbo Hobb~ 4
## 2 The Fellowship Of The Ring 01: Prologue Elrond Elf 5
## 3 The Fellowship Of The Ring 01: Prologue Galadriel Elf 460
## 4 The Fellowship Of The Ring 02: Concerning Hobbits Bilbo Hobb~ 214
## 5 The Fellowship Of The Ring 03: The Shire Bilbo Hobb~ 70
## 6 The Fellowship Of The Ring 03: The Shire Frodo Hobb~ 128
## 7 The Fellowship Of The Ring 03: The Shire Gandalf Wiza~ 197
## 8 The Fellowship Of The Ring 03: The Shire Hobbit K~ Hobb~ 10
## 9 The Fellowship Of The Ring 03: The Shire Hobbits Hobb~ 12
## 10 The Fellowship Of The Ring 04: Very Old Friends Bilbo Hobb~ 339
## # ... with 672 more rows
```
# Create plotly by hand
```r
plot_ly(lotr, x = ~Words) %>% add_histogram() %>% print()
```
# Subplots
```r
one_plot <- function(d) {
plot_ly(d, x = ~Words) %>%
add_histogram() %>%
add_annotations(
~unique(Film), x = 0.5, y = 1,
xref = "paper", yref = "paper", showarrow = FALSE
)
}
lotr %>%
split(.$Film) %>%
lapply(one_plot) %>%
subplot(nrows = 1, shareX = TRUE, titleX = FALSE) %>%
hide_legend() %>% print()
```
# Grouped bar plot
```r
plot_ly(lotr, x = ~Race, color = ~Film) %>% add_histogram() %>% print()
```
# Plot of proportions
```r
# number of diamonds by cut and clarity (n)
lotr_count <- count(lotr, Race, Film)
# number of diamonds by cut (nn)
lotr_prop <- left_join(lotr_count, count(lotr_count, Race, wt = n))
lotr_prop %>%
mutate(prop = n / nn) %>%
plot_ly(x = ~Race, y = ~prop, color = ~Film) %>%
add_bars() %>%
layout(barmode = "stack") %>% print()
```
# Your Turn
1. Using the `gss_cat` data, create a histrogram for the `tvhours` variable.
2. Using the `gss_cat` data, create a bar chart showing the `partyid` variable by the `marital` status.
# Scatterplots by Hand
```r
plot_ly(midwest, x = ~popdensity, y = ~percollege) %>%
add_markers() %>% print()
```
# Change symbol
```r
plot_ly(midwest, x = ~popdensity, y = ~percollege) %>%
add_markers(symbol = ~state) %>% print()
```
# Change color
```r
plot_ly(midwest, x = ~popdensity, y = ~percollege) %>%
add_markers(color = ~state, colors = viridis::viridis(5)) %>% print()
```
# Line Graph
```r
storms_yearly <- storms %>%
group_by(year) %>%
summarise(num = length(unique(name)))
plot_ly(storms_yearly, x = ~year, y = ~num) %>%
add_lines() %>% print()
```
# Your Turn
1. Using the `gss_cat` data, create a scatterplot showing the `age` and `tvhours` variables.
2. Compute the average time spent watching tv by year and marital status. Then, plot the average time spent watching tv by year and marital status.
# Highcharter; Highcharts for R
```r
devtools::install_github("jbkunst/highcharter")
```
# `hchart` function
```r
library(highcharter)
lotr_count <- lotr %>%
count(Film, Race)
hchart(lotr_count, "column", hcaes(x = Race, y = n, group = Film)) %>% print()
```
# A second `hchart`
```r
hchart(midwest, "scatter", hcaes(x = popdensity, y = percollege, group = state)) %>% print()
```
# Histogram
```r
hchart(lotr$Words) %>% print()
```
# Your Turn
1. Using the `hchart` function, create a bar chart or histogram with the `gss_cat` data.
2. Using the `hchart` function, create a scatterplot with the `gss_cat` data.
# Build Highcharts from scratch
```r
hc <- highchart() %>%
hc_xAxis(categories = lotr_count$Race) %>%
hc_add_series(name = 'The Fellowship Of The Ring',
data = filter(lotr_count, Film == 'The Fellowship Of The Ring')$n) %>%
hc_add_series(name = 'The Two Towers',
data = filter(lotr_count, Film == 'The Two Towers')$n) %>%
hc_add_series(name = 'The Return Of The King',
data = filter(lotr_count, Film == 'The Return Of The King')$n)
hc %>% print()
```
# Change Chart type
```r
hc <- hc %>%
hc_chart(type = 'column')
hc %>% print()
```
# Change Colors
```r
hc <- hc %>%
hc_colors(substr(viridis(3), 0, 7))
hc %>% print()
```
# Modify Axes
```r
hc <- hc %>%
hc_xAxis(title = list(text = "Race")) %>%
hc_yAxis(title = list(text = "Number of Words Spoken"),
showLastLabel = FALSE)
hc %>% print()
```
# Add title, subtitle, move legend
```r
hc <- hc %>%
hc_title(text = 'Number of Words Spoken in Lord of the Rings Films',
align = 'left') %>%
hc_subtitle(text = 'Broken down by <i>Film</i> and <b>Race</b>',
align = 'left') %>%
hc_legend(align = 'right', verticalAlign = 'top', layout = 'vertical',
x = 0, y = 80) %>%
hc_exporting(enabled = TRUE)
hc %>% print()
```
# Your Turn
1. Build up a plot from scratch, getting the figure close to publication quality using the `gss_cat` data.
# Correlation Matrices
```r
select(storms, wind, pressure, ts_diameter, hu_diameter) %>%
cor(use = "pairwise.complete.obs") %>%
hchart() %>% print()
```
# Leaflet Example
```r
library(leaflet)
storms %>%
filter(name %in% c('Ike', 'Katrina'), year > 2000) %>%
leaflet() %>%
addTiles() %>%
addCircles(lng = ~long, lat = ~lat, popup = ~name, weight = 1,
radius = ~wind*1000) %>% print()
```
# gganimate
```{r gganimate, eval = FALSE}
install.packages("gganimate")
```
# gganimate example
```r
library(gganimate)
ggplot(storms, aes(x = pressure, y = wind, color = status)) +
geom_point(show.legend = FALSE) +
xlab("Pressure") +
ylab("Wind Speed (MPH)") +
facet_wrap(~status) +
theme_bw(base_size = 14) +
labs(title = 'Year: {frame_time}') +
transition_time(as.integer(year)) +
ease_aes('linear')
```
# gganimate output
![](/figure/storms.gif)
# Additional Resources
* plotly for R book: <https://plotly-book.cpsievert.me/>
* plotly: <https://plot.ly/>
* highcharter: <http://jkunst.com/highcharter/index.html>
* highcharts: <https://www.highcharts.com/>
* htmlwidgets: <https://www.htmlwidgets.org/>
* gganimate: <https://gganimate.com/>