# access help documentation for hist
#Side panel should open with help doc ?hist
Appendix B — An Overview of Plotting Data in R
Introduction
Plots can provide a useful visual summary of the data. Sometimes, a nice plot or two is all that is need for statistical analysis. In this document, we cover a basic overview of creating some plots in R.
Here’s a link to a more thorough coverage of plotting in R: https://r-graph-gallery.com/index.html.
Help Documentation
The plotting functions introduced in this document have robust help documentation with lots of options to customize your plots. If you want to view help documentation for any of the functions used in this document, run commands such?hist
, ?plot
, ?table
, and so on.
What Are Packages in R?
R packages are a collection functions, sample data, and/or other code scripts. R installs a set of default packages during installation.
Run the code cell below to get a list of all default R packages that are already installed.
# See a list of installed default packages
<- installed.packages()
allpack rownames(allpack)
Loading Packages with the library()
Command
Each time we start or restart a new session and want to access the library of functions and data in the package, we need to load the library of files in the package with the library()
command.
To demonstrate how to create common statistical plots in R, we will use the storms
data set which is located in the package dplyr
.
- The
dplyr
package is already installed in Google Colaboratory - We still need to use a
library
command to load the package. - Run the code cell below to load the
dplyr
package.
# load the library of functions and data in dplyr
library(dplyr)
Reloading Packages When Restarting a Session
If we take a break in our work, it is possible our R session will time out and close. Each time we restart an R session, we will need to rerun library()
commands in order reload any packages we plan to use.
The same caution applies to any objects, vectors, or data frames we create or edit in an R session. If a session times out, and we want to use an object x
that we previously created, we will need to run the code cell(s) where object x
is created again before we can refer back to x
in the current session.
BE SURE YOU RUN THE COMMAND library(dplyr)
BEFORE ATTEMPTING TO RUN ANY OF THE CODE CELLS BELOW!
Summarizing storms
Data
The package dplyr
contains a data set called storms
. Let’s find some useful information about this data.
- The first code cell below will open the help manual for
storms
in a side bar.- Feel free to close the help side bar.
- The second code cell below will provide a numeric summary of all variables in the
storms
data. - Recall we need to first run the command
library(dplyr)
in the code cell above to be able to accessstorms
.
# be sure to run the code cell above first
# so you have loaded the dplyr package
?storms
# See a summary of all variables
summary(storms)
name year month day
Length:19066 Min. :1975 Min. : 1.000 Min. : 1.00
Class :character 1st Qu.:1993 1st Qu.: 8.000 1st Qu.: 8.00
Mode :character Median :2004 Median : 9.000 Median :16.00
Mean :2002 Mean : 8.699 Mean :15.78
3rd Qu.:2012 3rd Qu.: 9.000 3rd Qu.:24.00
Max. :2021 Max. :12.000 Max. :31.00
hour lat long status
Min. : 0.000 Min. : 7.00 Min. :-109.30 tropical storm :6684
1st Qu.: 5.000 1st Qu.:18.40 1st Qu.: -78.70 hurricane :4684
Median :12.000 Median :26.60 Median : -62.25 tropical depression:3525
Mean : 9.094 Mean :26.99 Mean : -61.52 extratropical :2068
3rd Qu.:18.000 3rd Qu.:33.70 3rd Qu.: -45.60 other low :1405
Max. :23.000 Max. :70.70 Max. : 13.50 subtropical storm : 292
(Other) : 408
category wind pressure tropicalstorm_force_diameter
Min. :1.000 Min. : 10.00 Min. : 882.0 Min. : 0.0
1st Qu.:1.000 1st Qu.: 30.00 1st Qu.: 987.0 1st Qu.: 0.0
Median :1.000 Median : 45.00 Median :1000.0 Median : 110.0
Mean :1.898 Mean : 50.02 Mean : 993.6 Mean : 146.3
3rd Qu.:3.000 3rd Qu.: 65.00 3rd Qu.:1007.0 3rd Qu.: 220.0
Max. :5.000 Max. :165.00 Max. :1024.0 Max. :1440.0
NA's :14382 NA's :9512
hurricane_force_diameter
Min. : 0.00
1st Qu.: 0.00
Median : 0.00
Mean : 14.81
3rd Qu.: 0.00
Max. :300.00
NA's :9512
One Quantitative Variable
Often a graph or plot is a more preferred format to summarize a variable than a summary statistics. The documentation below explains we could graphically summarize the quantitative variable pressure
.
Histograms
The hist
function can be used create a histogram of a numerical vector.
- See histogram documentation: https://r-graph-gallery.com/histogram.html
- Like making colorful plots? Here’s a guide to colors in R.
- We use a
$
symbol to indicate the name of the variable instorms
we will access in the plot.
hist(storms$pressure, # plot pressure variable in storms data
xlab = "storm pressure (in millibars)", # x-axis label
main = "Distribution of Storm Pressure", # main title
breaks = 10, # number of breaks or bins
col = "aquamarine4") # color of bars
Density plots
A histogram is more sensitive to its options. For example, a histogram with 3 breaks may tell a different story than plotting the same data with 20 breaks.
Thus, we may prefer to use a density plot.
- First compute density of
pressure
.
- For more information, see density help documentation.
- The
plot()
function will then create a density plot.
- For more advanced density plots see https://r-graph-gallery.com/density-plot.html.
- If a variable is categorical,
plot()
will create a different plot, namely a bar chart. plot()
can also be used to generate a plot to compare two different variables.- The output of
plot()
depends on the type and number of variables that we input in the function.
# approximate densities and then plot
plot(density(storms$pressure),
xlab = "storm pressure (in millibars)", # horizontal axis label
main = "Distribution of Storm Pressure") # main title
Boxplots
Boxplots are another useful plot for presenting the distribution of a quantitative variable using quartiles and the five number summary.
- See boxplot documentation at https://r-graph-gallery.com/boxplot.html.
- Run the command
?boxplot
to see more options.
# create boxplot of quantitative variable
boxplot(storms$pressure,
ylab = "storm pressure (in millibars)", # horizontal axis label
col = "gold", # color of box
main = "Distribution of Storm Pressure") # main title
Changing the Layout of Boxplots
# horizontally aligned boxplot
boxplot(storms$pressure,
horizontal = TRUE, # display horizontally
xlab = "storm pressure (in millibars)", # horizontal axis label
main = "Distribution of Storm Pressure", # main title
col = "azure3") # color
One Qualitative Variable
Qualitative (also called categorical) variables required other types of plots. For example, we cannot create a density or boxplot for a qualitative variable. Qualitative variables may be stored as characters (such as the status
variable) or values (such as the category
variable). This brings up a good question:
How can we tell whether a variable is stored as a numerical variable, a categorical variable, or perhaps as a string of characters?
Checking the Data Type
The typeof()
command can help identify what is the type of a variable.
typeof(storms$status)
[1] "integer"
typeof(storms$category)
[1] "double"
Data Types
From the output above, we see:
- The variable
status
is initially read as aninteger
. - The individual values are strings of characters such as “tropical storm” or “hurricane”.
- The summary statistics of
status
are counts that are stored as integers. - The variable
category
is initially read asdouble
or decimal values. - The individual values are ordinal integers “1”, “2”, “3”, “4”, and “5” for category of hurricane.
- There are 14,2328
NA
(or missing) values corresponding to the observations that are not hurricanes. - The summary statistics of
category
(such as the mean) are stored decimals. - However, we would like to treat
category
as a qualitative variable and plot how many storms fall into each category.
Caution with Data Types and Using plot()
If we try to use the general plot()
function, R will give its best guess at which plot makes the most sense to display the data. If the data is stored as the wrong data type, plot()
will not work as we might expect.
- Run the two code cells below, and notice the following:
- The output of the
plot(storms$status)
looks like a reasonable bar chart. - The output of
plot(storms$category)
does not nicely summarize the counts of how many storms are in each category.
- The output of the
plot(storms$status) # plot of status
plot(storms$category) # plot of category
Creating Bar Charts From Tables
The table()
function will count the number of times a value (or string of characters) occurs in a vector or variable.
One way to improve the initial plot of categories above is as follows:
- First use the
table()
command to count how many storms are in each category. - Then create a bar chart using the
barplot()
function.
<- table(storms$category) # create table of counts
cat.table # print table to screen cat.table
1 2 3 4 5
2478 973 579 539 115
# create bar chart from table counts
barplot(cat.table, # input table from previous code cell
main = "Distribution of Hurricane Categories", # main title
xlab = "Hurricane Category", # horizontal axis label
ylab = "Frequency", # vertical axis label
col = "steelblue") # fill color of bars
Relative Frequency Tables and Bar Charts
If instead of plotting the number of hurricanes in each category we wish to plot the proportion of all hurricanes in each category, we can use the prop.table()
function to convert the table counts to proportions relative to the grand total.
Run the two code cells below to create a relative frequency bar chart.
- We input our previous table of counts,
cat.table
, into the functionprop.table()
to convert counts to proportions. - Then we create a bar chart of the resulting proportions.
<- prop.table(cat.table) # create table of proportions
cat.prop barplot(cat.prop, # input table of proportions
main = "Relative Frequency of Hurricane Categories", # main title
xlab = "Hurricane Category", # horizontal axis label
ylab = "Proportion", # vertical axis label
col = "steelblue") # fill color of bars
prop.table()
- The input into
prop.table()
must be a table rather than a vector or data frame column. - The code cell below does return a relative frequency table as we would expect since we did not first create a table of counts from
storms$category
.
<- prop.table(storms$category) # do not input a vector
temp head(temp) # print first several entries of result
[1] NA NA NA NA NA NA
Pie Charts with pie()
Pie charts can also be used to illustrate the distribution of one qualitative variable.
- See https://r-graph-gallery.com/pie-plot.html.
- For help and a list of options, you can run
?pie
.
?pie
# create pie chart of one qualitative variable
pie(cat.table, # input table
main = "Distribution of Hurricane Categories") # main title
Converting to a factor()
and Then plot()
One common issue with a qualitative variable is that it is often stored as the wrong datatype.
- Qualitative data should typically be stored as a
factor
.
Another way we can create a bar chart of the counts in each category is to:
- First convert the qualitative variable to a
factor
. - Then use
plot()
to create an appropriate plot.
Run the code cell below to first see the summary output of the category
variable after converting it to a factor
.
# creates a copy of storms data set
# so we don't overwrite original storms
<- storms
storms2
$category <- factor(storms$category) # convert category to factor
storms2summary(storms2$category) # get new summary of categories
1 2 3 4 5 NA's
2478 973 579 539 115 14382
Notice the summary is a table of counts in each hurricane category.
- Once the variable
status
is converted to afactor
, theplot()
function will know to use a bar chart to give a summary display.
# create bar chart from counts of a factor
plot(storms2$category, # input a factor
main = "Distribution of Hurricane Category", # main title
xlab = "Hurrican Category", # horizontal axis label
ylab = "Frequency", # vertical axis label
col = "steelblue") # color of fill of ba
- Recall without first changing
category
to afactor
,plot()
will create a different graph.
# default plot of category when not first converted to factor
plot(storms$category)
Plotting One Quantitative and One Qualitative Variable<
Imagine we would like to compare the wind speeds of storms by status. In this case, we would like to compare a quantitative variable (wind
) for different classes of a qualitative variable (status
).
Side by Side Boxplots
There are many classes of storms status in storms
.
In the storms data:
wind
is a quantitative variable.status
is a qualitative variable.- We can use the default
plot()
function to create a side by side boxplots.
# create a vector of fill colors
# one color for each status type.
<- c("green", "purple", "grey", "red",
my.colors "blue", "gold", "cyan", "pink", "orange")
plot(wind ~ status, # quantitative first ~ categorical second
data = storms, # name of data frame
col = my.colors, # fill colors
main = "Wind Speeds of Storms by Status") # main title
Adding a Legend to Plots
- There are a lot of different status of storms.
- It is not easy (or possible) to tell which boxplot corresponds to which storm status.
- Adding a legend to the plot will help!
# create a table of status counts
# we will pull of the row names of the table
# as the labels in the legend
<- table(storms$status)
status.table
plot(wind ~ status, # quantitative first ~ categorical second
data = storms, # name of data frame
col = my.colors, # fill colors colors
ylab = "Wind speed in knots", # vertical axis label
main = "Wind Speeds of Storms by Status") # main title
# we can add a legend to identify which plot is which storm status
legend(x = "topright", # place legend in top right corner
legend=rownames(status.table), # each row of table is label in legend
fill = my.colors) # fill colors
Subsetting Data by Category
There are many classes of storms status in storms
. Often, we want to only focus on a smaller subset of classes. We can restrict our attention to compare the wind speeds of three of the classes: “tropical storm”, “tropical depression”, and “hurricane”.
- We can subset
storms
data frame into three separate data frames, one for each status of storm, using thesubset()
function. - Curious to learn more about
subset
? Run?subset
in a code cell to access help documentation. - Then we can create three separate boxplots of the wind speeds for each status.
# split data by storm status
<- subset(storms, # data frame name
hur == "hurricane", # logical test to select observations
status select = wind) # which quantitative variable(s) to select
<- subset(storms,
trop.storm == "tropical storm", # tropical storms
status select = wind)
<- subset(storms,
trop.dep == "tropical depression", # tropical depressions
status select = wind)
# create side by side boxplot
# for each of the three subsets
boxplot(hur$wind, trop.storm$wind, trop.dep$wind,
main = "Windspeed of Storms",
names = c("Hurricanes", "Tropical Storms", "Tropical Depressions"),
col = c("red", "blue", "green"),
xlab = "Wind speed in knots",
horizontal = TRUE)
Relationship Between Two Qualitative Variables
Imagine we would like to compare the number of different category hurricanes that occurred in each month. In this case, we would like to compare two qualitative variables, namely category
and month
.
Creating Contingency or Two-Way Table
The command table(x)
will count the number of times a value (or string of characters) occurs in a vector x
.
The command table(x, y)
will similarly create a contingency (or two-way) table to jointly compare counts of x
and y
.
# create a contingency table for status and month
<- table(storms$category, storms$month)
con.table # print output to screen con.table
1 4 5 6 7 8 9 10 11 12
1 5 0 0 18 140 581 1099 462 140 33
2 0 0 0 0 25 198 571 150 29 0
3 0 0 0 0 18 113 346 86 16 0
4 0 0 0 0 18 114 295 88 24 0
5 0 0 0 0 1 32 69 13 0 0
Creating Grouped Frequency Bar Charts
After creating a two-way table, we can present the results visually in a grouped bar chart.
- See documentation at https://r-graph-gallery.com/211-basic-grouped-or-stacked-barplot.html.
# create a vector of colors
<- c("green", "purple", "grey", "red", "blue")
my.colors2
# create side by side bar chart
barplot(con.table, # use counts from contingency table
beside = TRUE, # groups side-by-side
main = "Category Hurricanes By Month", # main title
xlab = "Month", # horizontal axis label
col = my.colors2, # fill color of bars
ylab = "Frequency") # vertical axis label
# add a legend to plot
legend(x="topleft", # place legend in top left
legend=rownames(con.table), # get labels from row name in contingency table
fill = my.colors2) # use same fill colors
Grouped Frequency Bar Charts
- Note
beside = FALSE
is the default. - If we do not specify a
beside
option, a stacked bar chart is created instead. - In the second code cell, we also add a legend to the plot.
########################################################
# Note this has already been run in a previous section
# You do not need to run again if already created
#######################################################
# create a contingency table for status and month
<- table(storms$category, storms$month)
con.table # print output to screen con.table
# create a vector of colors
<- c("green", "purple", "grey", "red", "blue")
my.colors2
# create stacked bar chart
barplot(con.table, # use counts from contingency table
main = "Category Hurricanes By Month", # main title
xlab = "Month", # horizontal axis label
col = my.colors2, # color of bars
ylab = "Frequency") # vertical axis label
# add legend to plot
legend(x="topleft", # place legend in top left
legend=rownames(con.table), # get labels
fill = my.colors2) # use same colors
Stacked Bar Charts Relative to Grand Total
- First we create a contingency table using
table(x, y)
. - Then we use
prop.table([table_name])
to convert to frequencies to proportions out of the grand total. - Finally we can create a group bar chart of relative frequencies.
# create two-table of counts
<- table(storms$category, storms$month)
con.table
# convert counts to proportions
<- prop.table(con.table)
con.prop
# create a vector of colors
<- c("green", "purple", "grey", "red", "blue")
my.colors2
# create stacked bar chart
barplot(con.table, # use counts from contingency table
main = "Category Hurricanes By Month", # main title
xlab = "Month", # horizontal axis label
col = my.colors2, # color of bars
ylab = "Relative Frequency (of grand total)") # vertical axis label
legend(x="topleft", # place legend in top left
legend=rownames(con.table), # get labels
fill = my.colors) # use same fill colors
Stacked Bar Chart Relative to Column Totals
Often, we would like the proportions in the table to be computed out of the total in each column (instead of the grand total).
- We add the option
2
insideprop.table()
. - In this example, each column is a different month.
# create two-table of counts
<- table(storms$category, storms$month)
con.table
# convert counts to proportions
# note the option 2 added to command below
<- prop.table(con.table, 2)
con.prop.column
# create a vector of colors
<- c("green", "purple", "grey", "red", "blue")
my.colors2
# create stacked bar chart
barplot(con.prop.column, # use counts from contingency table
main = "Category Hurricanes By Month", # main title
xlab = "Month", # horizontal axis label
col = my.colors2, # color of bars
ylab = "Relative Frequency (to month totals") # vertical axis label
legend(x="topleft", # place legend in top left
legend=rownames(con.table), # get labels
fill = my.colors) # use same fill colors
Relationship Between Two Quantitative Variables
Imagine we would like to compare the wind speeds (wind
) to the pressure (pressure
). In this case, we would like to compare two quantitative variables.
A scatter plot can be used to identify the relationship between two quantitative variables.
If both variables are quantitative, the
plot()
function by default will create a scatter plot to compare the two variables.For other types of scatter plots, see documentation: https://r-graph-gallery.com/scatterplot.html.
# create a scatter plot
# first variable wind is response (y-axis)
# second variable pressure is predictor (x-axis)
plot(wind ~ pressure, # response ~ predictor(s)
data = storms, # data frame name
main = "Relation of Pressure and Wind Speed of Storms", # main title
xlab = "Pressure (in millibars)", # horizontal axis label
ylab = "Wind Speed (in knots)") # vertical axis label
Arranging Multiple Plots in an Array
The command
par(mfrow =c(n,m)
creates an array of \(n\) rows and \(m\) columns.Plots will fill the array based on the order they are plotted.
See https://bookdown.org/ndphillips/YaRrr/arranging-plots-with-parmfrow-and-layout.html for more info.
See previous sections for further information about each of the plots created below.
par(mfrow = c(2, 2)) # create a 2 x 2 array of plots
# the next 5 plots created will be arranged in the array
boxplot(storms$wind) # create boxplot of wind speed
# code below creates a histogram of wind speed
# we can add many options to customize
hist(storms$wind, xlab = "wind speed (in knots)", # x-axis label
ylab = "Frequency", # y-axis label
main = "Distribution of Storm Wind Speed 1975-2020", # main label
col = "steelblue") # change color of bars
plot(storms$status, col = "gold") # plots status, which is categorical
plot(wind ~ pressure, data = storms) # plots two numerical variables
# create a table of status counts
# we will pull of the row names of the table
# as the labels in the legend
<- table(storms$status)
status.table
plot(wind ~ status, # quantitative first ~ categorical second
data = storms, # name of data frame
col = my.colors, # fill colors colors
ylab = "Wind speed in knots", # vertical axis label
main = "Wind Speeds of Storms by Status") # main title
# we can add a legend to identify which plot is which storm status
legend(x = "topright", # place legend in top right corner
legend=rownames(status.table), # each row of table is label in legend
fill = my.colors) # fill colors
More Advanced Plots with ggplot2
The previous plots were created using R’s base graphics system.
- base graphics are fast and simple to produce while looking professional.
A fancier alternative is to construct plots using the ggplot2
package.
- The
gg
stands for Grammar of Graphics.
In its simplest form, to construct a (useful) plot in ggplot2
, you need to provide:
- A
ggplot
object.- This is usually the object that holds your data frame.
- The data frame is passed to
ggplot
via the firstdata
argument.
- A geometry object.
- Roughly speaking, this is the kind of plot you want.
- e.g.,
geom_histogram
for a histogram,geom_point
for a scatter plot,geom_density
for a density plot.
- An aesthetic mapping.
- Aesthetic mappings describe how variables in the data are mapped to visual properties of a geometry.
- This is where you specify which variable with be the
x
variable, they
variable, which variable will control color in the plots, etc.
- See https://ggplot2.tidyverse.org for documentation.
- Download a ggplot2 cheatsheet.
Loading ggplot2
The
ggplot2
package is already installed as a default package in Google Colaboratory.However, recall each time we start or restart a new session and want to access the library of functions and data in the package, we need to load the library of files in the package with the
library
command.Run the first code cell below to load the
ggplot2
package.If restarting a new session, you also need to reload the
dplyr
package to accessstorms
data.
library(ggplot2) # make sure you have installed ggplot2 package
# may need to reload
library(dplyr)
Plotting One Numerical Variable with ggplot2
To create various types of plots for one quantitative variable, such as wind
:
- The ggplot object is the data frame
storms
. - The aesthetic is the variable
wind
that we will plot on the x-axis. - Geometric objects histogram, density, and boxplot are specified in each of the three code cells below.
- There a numerous options we can include as well.
# create a histogram
ggplot(storms, aes(x = wind)) +
geom_histogram(fill = "steelblue", color="black")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# create a density plot
ggplot(storms, aes(x = wind)) +
geom_density(color="red") +
theme_bw() # adding theme_bw() makes white background
# create a boxplot
ggplot(storms, aes(x = wind)) +
geom_boxplot(color="black", fill="blueviolet")
Scatter Plots with ggplot2
To create a scatter plot to compare two quantitative variables such as wind speed and pressure of storms:
- The ggplot object is the data frame
storms
. - The aesthetic are the variables
pressure
is the predictor plotted on the x-axis.wind
is the response plotted on the y-axis.t
- Geometric object is scatter.
# create a scatter plot
ggplot(storms) +
geom_point(aes(x = pressure, y = wind))
Scaling ggplot2
plots
In general, scaling is the process by which ggplot2
maps variables to unique values. When this is done for discrete numeric or qualitative variables, ggplot2
will often scale the variable to distinct colors, symbols, or sizes, depending on the aesthetic mapped.
In the example below, we map the status
variable to the color
aesthetic, which is then scaled to different colors for the different status
levels.
# scatter plot with scaling
ggplot(storms) +
geom_point(aes(x = pressure, y = wind, color = status))
Scaling by Shape
Alternatively, we can map the status
variable to the shape
aesthetic, which creates a plot with different shapes for each observation based on the status
level.
- By default, 6 shapes can be used.
- There are 9 different status of storms.
- The last option manually sets the shapes for each status to avoid an error.
# scaling by shape
ggplot(storms) +
geom_point(aes(x = pressure, y = wind, shape = status)) +
scale_shape_manual(values=0:8) # manually setting shapes
Applying Multiple Scales
We can even combine these two aesthetic mappings in a single plot to get different colors and symbols for each level of month
and status
, respectively.
- By default, 6 shapes can be used.
- There are 9 different status of storms.
- The last option manually sets the shapes for each status to avoid an error.
# scaling by month and status
ggplot(storms) +
geom_point(aes(x = pressure, y = wind, color = month, shape = status)) +
scale_shape_manual(values=0:8) # manually setting shapes for status
Facetting in ggplot2
Faceting creates separate panels (facets) of a data frame based on one or more faceting variables.
To create various scatter plots (one for each category) to compare two quantitative variables such as wind speed and pressure of storms, we can add a facet_grid
.
- Note the NA plot corresponds to the storms that are not hurricanes.
# faceting by category
ggplot(storms) +
geom_point(aes(x = pressure, y = wind)) +
facet_grid(~ category)
Bar Charts with ggplot2
Imagine we would like to compare the number of different types of storms (status
) that occurred in each month.
Stacked Bar Charts of Counts with ggplot2
To create a stacked bar chart of counts for one or more qualitative variable:
- The ggplot object is the data frame
storms
. - Geometric object is
geom_bar
. - The aesthetic is specified as:
- Fill color, (
fill
) isstatus
. - The height of each bar is summarizing the statistic (
stat
) is"count"
. - The
position="stack"
creates a stacked bar chart of counts.
- Fill color, (
# stacks bars on top of each other
ggplot(storms, aes(x=month)) +
geom_bar(aes(fill=status), stat = "count", position="stack") +
ggtitle("Occurrence of Storms by Month")
Stacked Relative Frequency Bar Charts with ggplot2
To create a stacked bar chart of relative frequencies for two qualitative variables:
- The ggplot object is the data frame
storms
. - Geometric object is
geom_bar
. - The aesthetic is specified as:
- Fill color, (
fill
) isstatus
. - The height of each bar is summarizing the statistic (
stat
) is"count"
. - The
position="fill"
creates a stacked bar chart of relative frequencies.
- Fill color, (
# stacks bars and standardizing each stack
ggplot(storms, aes(x=month)) +
geom_bar(aes(fill=status), stat = "count", position="fill") +
ggtitle("Occurrence of Storms by Month")
Grouped Bar Charts of Counts with ggplot2
To create various types of bar plots for one or more qualitative variables:
- The ggplot object is the data frame
storms
. - Geometric object is
geom_bar
. - The aesthetic is specified as:
- Fill color, (
fill
) isstatus
. - The height of each bar is summarizing the statistic (
stat
) is"count"
. - The
position="dodge"
creates a stacked bar chart.
- Fill color, (
# creates grouped bar chart
ggplot(storms, aes(x=month)) +
geom_bar(aes(fill=status), stat = "count", position="dodge") +
ggtitle("Occurrence of Storms by Month")
Spatial Plots with mapview
Load Library
library(mapview) # load spatial mapping package
Mapping All Storms by Status
mapview(storms, xcol = "long", ycol = "lat",
zcol = "status",
crs = 4269, grid = FALSE)
Mapping Category 5 Hurricanes
First we filter out observations with category
equal to 5.
<- subset(storms , category == "5") # keep only category 5 cat5
mapview(cat5, xcol = "long", ycol = "lat", cex = "wind", crs = 4269, grid = FALSE)
mapview(cat5, xcol = "long", ycol = "lat", zcol = "name", cex = "wind", crs = 4269, grid = FALSE)
Creative Commons License Information
Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.