Where we define a dataset, choose which variables map to which aspects of the plot, and then choose the geom() or type of plot to draw.
Let’s plug the NHANES dataset into a plot.
ggplot(nhanes_processed)
1
data and mapping are positional arguments in the ggplot function, so we don’t have to name them. However, it can be good practice to include the argument names so that it’s immediately obvious what each argument is.
Since we haven’t told ggplot what we want to display, we just get a blank plot. If we add some mappings for the x and y axes:
ggplot(nhanes_processed, aes(x = hba1c, y = plasma.glucose))
We now get labeled axes and scales based on the variable range. Finally, we can add a geom(). Let’s make a scatterplot, created with geom_point() in ggplot.
ggplot(nhanes_processed, aes(x = hba1c, y = plasma.glucose)) +geom_point()
Now, let’s make a boxplot showing how BMI varies by subject ethnicity. Inside of the geom_boxplot function, we’ll also set the varwidth parameter to true so that the box sizes vary with how many samples are in each category.
ggplot(nhanes_processed, aes(x = ethnicity, y = bmi)) +geom_boxplot(varwidth =TRUE)
Instead of a boxplot, try making a bee swarm plot or a violin plot. Note that the beeswarm plot is in a separate package, ggbeeswarm. You might need to vary the cex argument in the beeswarm plot to increase the spacing between the strips.
# install.packages("ggbeeswarm")library(ggbeeswarm)#TODO your plot here# Beeswarm plotggplot(nhanes_processed, aes(x = ethnicity, y = bmi)) +geom_beeswarm(cex =0.5)
Note that we can also easily make boxplots using R’s builtin plotting boxplot function.
boxplot(bmi ~ ethnicity, data = nhanes_processed)
Mapping Variables
Beyond the actual axes we can use mappings to encode variables as various aspects of a plot. Some of the most commonly used other mapping types are shape, fill, color, size, and linetype.
For instance, let’s take our scatterplot from before and color the points by diabetic status.
ggplot(nhanes_processed, aes(x = hba1c, y = plasma.glucose, color = diabetes)) +geom_point()
It is difficult to tell how many diabetic participants are in this plot, as it’s possible that the red diabetic points have been covered by the blue and green points. We can alter the transparency of the points by changing alpha. Remember we can also change parts of the plot outside of aes() to have them not depend on any variable.
ggplot(nhanes_processed, aes(x = hba1c, y = plasma.glucose, color = diabetes)) +geom_point(alpha =0.6)
Try coloring your boxplot from before by age.years. What happens? What about when you use age.cat? Remember to use fill instead of color for shapes like boxplots.
# We can't color by age since it's numeric, ggplot gives an error. ggplot(nhanes_processed, aes(x = ethnicity, y = bmi, fill = age.years)) +geom_boxplot()
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
ggplot(nhanes_processed, aes(x = ethnicity, y = bmi, fill = age.cat)) +geom_boxplot()
Taking a figure all the way to publication-quality can require careful fine tuning. ggplot has a variety of useful themes and other ways to improve a figure’s appearance and readability.
Here’s an example of some of what you can do. Note that changing the fig.width setting for the code block will not effect how the image looks when exported.
#Maybe we want a color scheme from a Wes Anderson movie:library(wesanderson)#Note that this font import can take multiple minutes to runpal <-wes_palette("Zissou1", 2, type ="continuous")ggplot(nhanes_processed, aes(x = ethnicity, y = bmi, fill = dental.caries)) +geom_boxplot() +theme_minimal() +ggtitle("BMI by ethicity and dental caries") +xlab("Ethnicity") +ylab("BMI") +scale_fill_manual(values = pal, name ="Dental Caries") +theme(text =element_text(size=14), axis.text.x =element_text(angle =45, vjust =1, hjust =1))
Use a figure you’ve already created, or choose a new plot to create. Try to get that figure to as close to publication-ready as possible. The R Graph Gallery is a great place to look for inspiration on what you can do to improve your plots.
Formats for Figures If your electronic artwork is created in a Microsoft Office application (Word, PowerPoint, Excel) then please supply “as is” in the native document format. Otherwise, regardless of the application used to create figures, the final artwork should be saved as or converted to 1 of the following formats:
TIFF, JPEG, or PPT: Color or grayscale photographs (halftones): always use a minimum of 300 dpi.
TIFF, JPEG, or PPT: Bitmapped line drawings: use a minimum of 1,000 dpi.
TIFF, JPEG, or PPT: Combinations bitmapped line/halftone (color or grayscale): a minimum of 500 dpi is required.
Layered Photoshop (PSD) or TIFF format (high resolution, 300–600 dots per inch (dpi) for photographic images. In Photoshop, it is possible to create images with separate components on different layers. This is particularly useful for placing text labels or arrows over an image, as it allows them to be edited later. If you have done this, please send the Photoshop file (.psd) with the layers intact.
Adobe Illustrator (AI), Postscript, Vector EPS or PDF format for figures containing line drawings and graphs, including figures combining text and line art with photographs or scans.
If these formats are not possible, we can also accept the following formats: JPEG (high-resolution, 300–600 dpi), CorelDraw (up to version 8), Microsoft Word, Excel or PowerPoint.
Export your figure using ggsave to comply with one of these sets of guidelines.
# let's assume this is the plot we want to save, we will save the most recently created plotggplot(nhanes_processed, aes(x = ethnicity, y = bmi, fill = age.cat)) +geom_boxplot()