# solution using factorial(x)
2.4: Common Discrete Distributions
In statistics, we are typically using data from a sample to make inferences about characteristics of a population. We want to may have claims or questions to test. We may want to estimate some characteristics or parameters of the population. By identifying patterns in the sample data and applying probability, we can build theoretical models to inform predictions about the population. These predictions depend on the level of uncertainty due to sampling, that may be flawed, biased, or simply unlikely. Data and models come in all shapes and levels of complexity.
We will take a look at several of the most common and useful distributions when working with discrete random variables, such as counting the frequency of occurrences in certain classes of categorical variables. There are many more distributions for discrete random variables, some of which we will see later!
A Case Study in Randomness: Jury Duty
The jury plays the most crucial role in a trial. They determine how cases are ultimately resolved! The jury serves as an impartial reviewer of the facts presented in criminal and civil cases. The goal of randomly selecting the jury is to remove the bias in the jury selection process.
How can we measure whether a jury that has been selected is “representative”?
A random sample of 12 adults cannot be a perfect representative of the population in all ways. No two distinct random samples of 12 adults is going to be same, yet we hope different juries would rule similarly based on the same set of facts if they are truly impartial. The jury system plays a vital role in the system of justice in the United States, and randomness plays a central role in helping reduce bias in court rulings.
How Jurors Are Selected
The federal judicial branch in the United States decides the constitutionality of federal laws and resolves disputes about federal laws. The federal court system is divided into 94 district courts. The US District Court in the District of Colorado randomly selects jurors from voter registration lists, driver license records, and state-issued adult identification records, by a computerized method. Below is an explanation of how juries are chosen in the District of Colorado. See The District of Colorado Juror Information for full details.
“This selection process creates the court’s ‘Master Jury Wheel’. (This term originated in the days when names were placed into a large barrel-type wheel and turned around to mix them up. Today, computers are used to select names randomly.) From the ‘Master Jury Wheel’, jurors are randomly selected for a one month term of service or occasionally longer depending on the court’s jury needs.”
Question 1
Consider the selection of a jury of 5 people. We want to see whether the jury is representative in terms of political party. Note we initially choose a jury of 5 people to more quickly recognize a pattern. After identifying a pattern, we can extend our results to juries of 12 people. Let
- There is one possible outcome in the event: 0 people on the jury identify an Independent.
- We can represent that outcome as NNNNN.
Question 1a
List all possible outcomes in the event: Exactly 1 out of 5 jurors is independent.
Solution to Question 1a
Question 1b
List all possible outcomes in the event: Exactly 2 out of 5 jurors are independent.
Solution to Question 1b
Counting Outcomes
Consider a district with
- There are
possible people that can be chosen first. - Then there are
remaining possible people that can be selected as person 2. - Then there are
remaining possible people that can be selected as person 3. - And so until there are
remaining people to select person .
Note the factorial of a non-negative integer
Ignoring the Order of Selection: N Choose K
If we want to count the total number of possible jury pools, and we do not care about the order in which the people are selected, then we want to count the number of combinations.
- First count the possible outcomes as we illustrated above, taking into account the order in which the jury is selected.
- Then we ignore the order that people are selected by considering the same set of people (chosen in any order) as the same. Using the result that there are
ways of permuting the order of the people selected.
Thus, the number of combinations of
See the Appendix: Why Divide By
Question 2
Use R to calculate the total number of outcomes in the event: Exactly 2 out of 5 jurors are independent voters.
The function factorial(x)
calculates
Solution to Question 2
An Even Better Tip: The function choose(n, k)
calculates
# solution using choose(n, k)
Defining a Random Variable
Recall random variables map outcomes in a sample space to a subset of the real numbers. In this example, an outcome is 5 randomly selected people for the jury. We define random variable
- The outcome NNNNN is mapped to
. - The outcome YNNNN is mapped to
. - The outcome NNNYN is also mapped to
. - The outcome YNNYN is mapped to
.
We map each outcome to an integer
Question 3
According to a recent article “Why Independent Voters Are Key To Winning Colorado”, PBS News. Oct 3, 2020, approximately 42% of Colorado’s voters identify as independent.
In solving Question 2, there is a general pattern we can model to compute the probability of choose exactly
Question 3a
What is the probability of randomly selecting a jury of 12 people with exactly 0 jurors that identify as an independent voter?
Solution to Question 3a
# Use R to help compute P(X=0)
Question 3b
What is the probability of randomly selecting a jury of 12 people with exactly 1 juror that identify as an independent voter?
Solution to Question 3b
# Use R to help compute P(X=1)
Question 3c
What is the probability of randomly selecting a jury of 12 people with exactly 2 jurors that identify as independent voters?
Solution to Question 3c
# Use R to help compute P(X=2)
Question 3d
What is the probability of randomly selecting a jury of 12 people with at most 2 jurors that identify as independent voters?
Solution to Question 3d
# Use R to help compute probability X is at most 2
Question 3e
What is the probability of randomly selecting a jury of 12 people with at least 2 jurors that identify as independent voters?
Solution to Question 3e
# Use R to help compute probability X is at least 2
Question 4
A jury of 12 people is randomly selected from a population that is 42% independent voters. Let random variable
Question 4a
Using your answers to Question 3, give the values for
Solution to Question 4a
?? ?? ?? ??
Question 4b
Give a possible formula for the pmf of
Solution to Question 4b
A formula has been partially completed. Fill in the missing parts.
The Binomial Distribution
A Trial with Two Possible Outcomes
A Bernoulli trial is an experiment that has exactly two possible outcomes:
- The probability that the outcome of a trial is a success (
) is denoted . - Otherwise, the probability of a failure (
) is . has a Bernoulli Distribution with probability mass function
A Formula for the Binomial Distribution
Let random variable
has a Binomial Distribution, written .- The probability mass function is
See Appendix: Typsetting Arrays in LaTeX for more information on how to typeset formulas such as the piecewise function above using LaTeX.
Expected Value and Variance of Binomial Distributions
- The expected value can be calculated with the shortcut
. - The variance can be calculated with the shortcut
.
Binomial Distribution Functions in R
In R, the we can use the functions:
dbinom(x, n, p)
calculates the probability of exactly successes out trials, .pbinom(x, n, p)
calculates the probability of at most successes out trials, .rbinom(m, n, p)
randomly sample values from (with replacement).qbinom(q, n, p)
compute the qth quantile.
Plotting a Binomial Distribution
The figure below gives the graphs of the pmf and cdf for a binomial distribution with
Question 5
Suppose a manufacturer of electronic chargers knows that 5% of the chargers it produces are defective. The manufacturer sends a shipment of 20 randomly selected chargers to a customer. Write an R command to compute the probability of each event.
Question 5a
All 20 chargers sent are good (not defective).
Solution to Question 5a
# solution to 5a
Question 5b
At most 18 batteries are good.
Solution to Question 5b
# solution to 5b
Question 5c
At most 2 batteries are defective.
Solution to Question 5c
# solution to 5c
The Discrete Uniform Distribution
Question 6
Let
Question 6a
Write out the probability mass function
Solution to Question 6a
Question 6b
What is the expected value of rolling a fair-six sided die?
Solution to Question 6b
Question 6c
What is the expected value of rolling a fair die with sides
Recall the sum of the first
Solution to Question 6c
See Appendix: The Discrete Uniform Distribution for a summary of key formulas, graphs, shortcuts, and R functions for working with a Discrete Uniform Distribution.
The Geometric Distribution
If we repeat a Bernoulli trial that has probability of success
See Appendix: The Geometric Distribution for a summary of key formulas, graphs, shortcuts, and R functions for working with a Geometric Distribution.
The Poisson Distribution
The Poisson distribution applies when the average frequency of occurrences in a given time period is known to be
See Appendix: The Poisson Distribution for a summary of key formulas, graphs, shortcuts, and R functions for working with a Poisson Distribution.
Practice with Discrete Random Variables
For each situation, identify which distribution best describes the distribution of the random variable. Then calculate the probability.
Question 7
A sports marketer for the Denver Nuggets randomly calls people in the Denver area until they encounter someone who attended a Nuggets game last season. Suppose we know that 10% of the population attended a Nuggets game last season, and we consider a call to a person who did attend a game last season as a success. What is the probability that the marketer has their first success on their eighth call?
Solution to Question 7
# solution to question 7
Question 8
Suppose we know that on average twelve cars cross a certain bridge each minute during rush hour. Find the probability that seventeen or more cars cross the bridge during a one minute span of time in rush hour.
Solution to Question 8
# solution to question 8
Question 9
Recently, a nurse commented that when a patient calls the medical advice line claiming to have the flu, the chance that he or she truly has the flu (and not just a nasty cold) is only about 4%. Of the next 25 patients calling in claiming to have the flu, what is the probability that exactly 4 patients will have the flu?
Solution to Question 9
# solution to question 9
Question 10
A online retailer sells an average of 5 big screen TV’s on a given day. What is the probability they sell 9 TV’s in a day?
Solution to Question 10
# solution to question 10
Question 11
It is known that 3% of airbags manufactured by a certain car company are defective. What is the probability that the first defective air bag occurs when the fifth item is inspected?
Solution to Question 11
# solution to question 11
Appendix A: Summary of Common Discrete Random Variables
The Binomial Distribution
- The expected value is
. - The variance is
. - Use
dbinom(x, n, p)
to calculate . - Use
pbinom(x, n, p)
to calculate . - Use
rbinom(m, n, p)
to generate a random sample of size from population . - Use
qbinom(q, n, p)
to find the qth percentile of .
The Discrete Uniform Distribution
There are
- The expected value is
. - The variance is
. - Note functions such as
dunif()
,punif()
,runif()
, andqunif()
are for the continuous (not discrete) uniform distribution. - It is typically easier to calculate probabilities directly rather than use technology.
The Geometric Distribution
- The expected value is
. - The variance is
. - Use
dgeom(x, p)
to calculate . - Use
pgeom(x, p)
to calculate . - Use
rgeom(m, p)
to generate a random sample of size from population . - Use
qgeom(q, p)
to find the qth percentile of .
The Poisson Distribution
- The expected value is
. - The variance is
. - Use
dpois(x, lambda)
to calculate . - Use
ppois(x, lambda)
to calculate . - Use
rpois(m, lambda)
to generate a random sample of size from population . - Use
qpois(q, lambda)
to find the qth percentile of .
Appendix B: Additional Notes
Why Divide By ?
Where does
- The symmetric group
on items consists of all the possible ways of permuting the order of the items. - Each element corresponds to a permutation applied to the group of
items. - For example
is the symmetric group on three items.- The identity element is
which does not change the ordering of items . - The element
is the transpose of items 2 and 3. - The element
is a composition of two transpositions: - First transpose items 1 and 2, that is element . - Then transpose items 1 and 3 to get .
- The identity element is
- The number of elements in the group is called the order of the group and denoted
. - For example,
has elements that can be represented as
- In fact, for any symmetric group we have
. - Thus, there are
ways of permutating the order of items.
Typsetting Arrays in LaTeX
- The
array
environment is started with\begin{array}
. - Next we indicate how many columns and how each column is aligned.
{ll}
means two columns aligned to the left.{lrc}
would be three columns: the first aligned left, then right, then the last column is centered.- We use the
&
symbol to indicate a column break. - We use
\\
to indicate a row break.
- If we want a big curly brace on the left of the array, enter
\left\{
before beginning the array. - We do not want a brace on the other side of the array, so we use
\right.
to close off the left brace without using any matching symbol on the right. - We use
\mbox{otherwise}
to type the text otherwise inside the equation.
Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.