Brandon LeBeau

5 minute read

I recently had an occasion while working on a three variable interaction plot for a paper where I wanted to remove the leading 0’s in the x-axis text labels using ggplot2. This was primarily due to some space concerns I had for the x-axis labels. Unfortunately, I did not find an obvious way to do this in my first go around. After tickering a bit, I’ve found a workaround. The process is walked through below.

First, some simulated data.

# Sim some data
simdata <- data.frame(x = runif(2400, min = .032, max = .210),
                      y = c(rnorm(2000, mean = 0, sd = .1), 
                            rnorm(400, mean = 1, sd = .25)),
                      group = c(sample(1:2, 1600, replace = TRUE),
                                rep(1, 400), 
                                rep(2, 400)),
                      facet = rep(1:3, each = 800))

As shown below, initially there is no group differences, but there are facet differences. Exploring the interaction between the grouping variables shows there is a two variable interaction. Note: This example is not identical to the three variable interaction I originally described above, but assume here that the x variable is also important.

with(simdata, tapply(y, group, mean))
##            1            2 
## -0.003350651  0.329433558
with(simdata, tapply(y, facet, mean))
##            1            2            3 
##  0.002838181 -0.007796735  0.504066440
with(simdata, tapply(y, interaction(group, facet), mean))
##          1.1          2.1          1.2          2.2          1.3 
## -0.003598492  0.009022435 -0.002953537 -0.012267379 -0.003488997 
##          2.3 
##  1.011621878

In the example in the paper, I aggregated the unique x values to the third decimal place. That is done with the following dplyr code. Note: The data did not need to be aggregated, but it is a bit easier to work with when plotting later.

# round value to .001 and aggregate
simdata$x_rd <- round(simdata$x, 3)

# aggregate
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
simdata_agg <- simdata %>%
  group_by(x_rd, group, facet) %>%
  summarise(y = mean(y))
simdata_agg 
## # A tibble: 945 x 4
## # Groups:   x_rd, group [?]
##     x_rd group facet            y
##    <dbl> <dbl> <int>        <dbl>
##  1 0.032     1     1  0.089798776
##  2 0.032     1     2 -0.060177407
##  3 0.032     1     3 -0.172629239
##  4 0.032     2     1  0.002964474
##  5 0.032     2     2  0.102498122
##  6 0.032     2     3  1.008216699
##  7 0.033     1     1  0.301937939
##  8 0.033     1     2 -0.014438654
##  9 0.033     1     3  0.112685915
## 10 0.033     2     1  0.014449176
## # ... with 935 more rows

Now that the data is aggregated, it can be directly plotted with ggplot2. This is the base plot that contains the leading 0’s by default and treats the x variable as continuous (which it really is continuous).

library(ggplot2)
p <- ggplot(simdata_agg, aes(x = x_rd, y = y, shape = factor(group))) + 
  theme_bw()
p + geom_point(size = 3) + facet_grid(. ~ facet) + 
  scale_x_continuous("x", limits = c(0, .25), 
                     breaks = seq(0, .25, .05))

The plot above is a good start, but I was worried about the x-axis labels being too close together and ultimately being difficult to read. I decided I wanted to omit the leading 0’s to omit some space. This was useful in my scenario as the variable on the x-axis could only take on values between 0 and 1, therefore the leading 0 is not important.

One way to remove the leading 0 is to convert the continuous variable into a character variable and use a simple regular expression (with gsub) to remove the 0 at the beginning of the character string. Below is the code to do that and also the resulting plot. The key point of the plotting code below is the use of the breaks argument to scale_x_discrete. Without this all the unique character values will be plotted, not good.

simdata_agg$x_char <- as.character(simdata_agg$x_rd)
simdata_agg$x_char <- gsub("^0", "", simdata_agg$x_char)
p <- ggplot(simdata_agg, aes(x = x_char, y = y, shape = factor(group))) + 
  theme_bw()
p + geom_point(size = 3) + facet_grid(. ~ facet) + 
  scale_x_discrete("x", breaks = c('.00', '.05', '.1', '.15', '.2', '.25'))

The plot above has a few flaws. First, there are values at the edge of each facet. This could be fixed with the expand argument to scale_x_discrete, but I still wanted to include the value of .00 on the x-axis. Secondly, the x-axis text labels are not uniformly formatted which is not ideal (e.g. .1 should be .10).

To fix this, some made up data needs to be added to the data frame. Some care needs to be done here as well as a value of .00 can not just be added to the x variable plotted. This would place a non-uniform gap between .00 and .05 (not shown, but try it for yourself by adapting the code below). Therefore, all values between 0 and .031 need to be manually added to the data frame to keep the spacing uniform. Finally, to not plot the made up values, I created a transparency variable called alpha. This variable was used to set the alpha values to 0 for the made up values and 1 for the real values. scale_alpha_discrete was used to specify the range of alpha values, this is important otherwise the made up numbers will show up as a light gray. The final code to manually add the new data is shown below. Anyone have a less workaround procedure?

# Reset aggregation vector
simdata_agg <- simdata %>%
  group_by(x_rd, group, facet) %>%
  summarise(y = mean(y))

# Create x_char variable again
simdata_agg$x_char <- as.character(simdata_agg$x_rd)
simdata_agg$x_char <- gsub("^0", "", simdata_agg$x_char)

# Needed formatting
simdata_agg$x_char <- ifelse(simdata_agg$x_char == '', '.00',
                             simdata_agg$x_char)
simdata_agg$x_char <- ifelse(simdata_agg$x_char == '.2', '.20',
                             simdata_agg$x_char)
simdata_agg$x_char <- ifelse(simdata_agg$x_char == '.1', '.10',
                             simdata_agg$x_char)

# Final plot
p <- ggplot(simdata_agg, aes(x = x_char, y = y, shape = factor(group))) + 
  theme_bw()
p + geom_point(size = 3) + 
  facet_grid(. ~ facet) + 
  scale_x_discrete("x", breaks = c('.00', '.05', '.10', '.15', '.20'),
                   expand = c(.05, .05))

comments powered by Disqus