<- read.csv("https://raw.githubusercontent.com/CU-Denver-MathStats-OER/Statistical-Theory/main/Data/spotify-hits.csv") hits
2.3: Expected Value and Variance
Additional Reference: See Introduction to Random Variables where we discovered some important properties for a random variable \(X\) that are summarized in the Appendix.
How Much is the Raffle Ticket Worth?
A raffle sells 1,000 tickets with the following payouts:
- There is one winning randomly selected ticket that will win the grand prize of $5,000.
- Two tickets will be randomly selected to win a second prize each worth $1,000.
- Ten tickets will be randomly selected to win a third prize each worth $200.
- The remaining tickets do not win a prize.
Question 1
Let random variable \(X\) be the amount of money won by a randomly selected raffle ticket. We let \(p(x)\) denote the corresponding probability mass function of \(X\).
Question 1a
Fill in the values of \(x\) and \(p(x)\) in the table below.
Solution to Question 1a
Fill in the blanks to complete the table.
\(x\) | ?? | ?? | ?? | ?? |
---|---|---|---|---|
\(p(x)\) | ?? | ?? | ?? | ?? |
Question 1b
If somebody offered to buy a raffle ticket from you, what do you think fair value is for the ticket?
Solution to Question 1b
Question 1c
Consider another ticket raffle. There are 1,000 tickets with the following payouts:
- 500 randomly selected tickets will win a grand prize of $15.
- The other 500 tickets do not win a prize.
Let random variable \(Y\) be the amount of money won by a randomly selected raffle ticket.
Somebody offers one free ticket to either raffle \(X\) or raffle \(Y\) (but not both!), which would you prefer and why?
Solution to Question 1c
The Expected Value of a Discrete Random Variable
The expected value for a discrete random variable \(X\) is denoted \(\color{dodgerblue}{E(X)}\). We compute
\[\color{dodgerblue}{E(X) = \mu_{X} = \sum_x \left( x \cdot p(x) \right)}. \]
- \(E(X)\) is the average or mean value of random variable \(X\).
- The expected value of \(X\) is denoted \(\color{dodgerblue}{E(X)}\) or \(\color{dodgerblue}{\mu_X}\).
- The expected value of \(X\) might not be a possible value of \(X\).
The Variance of a Discrete Random Variable
The variance for a discrete random variable \(X\) is one common way to measure how spread out (in relation to the expected value) are the values of \(X\). We compute
\[\color{dodgerblue}{\mbox{Var}(X) = \sigma_X^2 = E\big( (x-\mu_X)^2 \big) = \sum_x \left( (x-\mu_X)^2 \cdot p(x) \right)}. \]
- \(\mbox{Var}(X)\) is the expected value of the squared distance from the mean.
- The variance is denoted \(\color{dodgerblue}{\mbox{Var}(X)}\) or \(\color{dodgerblue}{\sigma_X^2}\).
The Standard Deviation of a Discrete Random Variable
The standard deviation for a discrete random variable \(X\) is the square root of the variance,
\[ \mbox{SD}(X) = \sigma_X = \sqrt{\mbox{Var}(X)}. \]
- The standard deviation of random variable \(X\) is denoted \(\color{dodgerblue}{\sigma_X}\).
- The standard deviation essentially measures the average distance of the values of \(X\) from its mean \(\mu_X\).
- The units of \(\sigma_X\) and \(X\) are the same, and thus standard deviation is often a practical way to describe the spread of \(X\).
Question 2
Using properties of discrete random variables, show that for any discrete random variable \(X\) with pmf \(p(X)\) and expected value \(E(X) = \mu\), we have
\[\mbox{Var}(X) = E(X^2) - \mu^2.\]
Solution to Question 2
Finish the proof below!
Let \(X\) be a discrete random variable with pmf \(p(x)\) and expected value \(E(X)=\mu\). Then we have
\[\begin{array}{rcll} \mbox{Var}(X) &=& \sum_x \left( (x-\mu)^2 \cdot p(x) \right) & \mbox{(by definition)}\\ &=& \sum_x \left( (x^2 -2 \mu x + \mu^2) \cdot p(x) \right) & \mbox{(distributing squared term)}\\ &=& \sum_x \left( x^2 \cdot p(x) -2 \mu x \cdot p(x) + \mu^2 \cdot p(x) \right) & \mbox{(distributing p(x))}\\ &=& \sum_x \left( x^2 \cdot p(x) \right) - \sum_x \left(2 \mu x \cdot p(x) \right) + \sum_x \left( \mu^2 \cdot p(x) \right) & \mbox{(properties of summation)}\\ &=& ?? & \mbox{(??)} \\ &=& ?? & \mbox{(??)} \\ &=& ?? & \mbox{(??)} \\ &=& ?? & \mbox{(??)} \\ \end{array}\]
Therefore, we see that \(\mbox{Var}(X) = E(X^2) - \mu^2\).
Question 3
Let \(X\) and \(Y\) denote the raffle ticket random variables from Question 1.
Question 3a
Calculate \(E(X)\) and \(\mbox{Var}(X)\).
Solution to Question 3a
Question 3b
Calculate \(E(Y)\) and \(\mbox{Var}(Y)\).
Solution to Question 3b
Mean, Median, and Variance of Continuous Random Variables
Let \(f(x)\) and \(F(x)\) denote the probability density function and cumulative distribution function for a continuous random variable \(X\).
- The expected value or mean is
\[E(X) = \mu_X = \int_{-\infty}^{\infty} x \cdot f(x) \, dx.\]
- The variance is
\[\mbox{Var}(X) = E\big[ (X-\mu_X)^2 \big] = E(X^2) - \mu_X^2.\]
- The median is the value \(x\) such that \(P(X < x) = 0.5\). Thus, to find the median we solve the equations below for \(x\),
\[\int_{-\infty}^x f(t) \, dt = 0.5 \qquad \mbox{or equivalently} \qquad F(x) =0.5.\]
Question 4
Consider the random variable \(X\) with pdf
\[ f_X(x) = \left\{ \begin{array}{ll} \frac{x}{8}, & 0 \leq x \leq 4 \\ 0, & \mbox{otherwise} \end{array} \right. .\]
Question 4a
On a separate piece of paper, sketch a graph of the pdf, \(f_X\).
Solution to Question 4a
Sketch a graph on a separate piece of paper.
Question 4b
Enter the formula for \(F_X\) below. Then on a separate piece of paper sketch the graph \(F_X\).
Solution to Question 4b
\[F_X(x) = \left\{ \begin{array}{ll} 0 & x < 0\\ ?? & 0 \leq x \leq 4 \\ 1 & x > 4 \end{array} \right.\]
Sketch a graph on a separate piece of paper.
Question 4c
Calculate \(P(X < 1)\) and illustrate this value on each of your graphs in the solutions to Question 4a and Question 4b.
Solution to Question 4c
Question 4d
Calculate \(E(X)\) and illustrate this on your graph in the solution to Question 4a.
Solution to Question 4d
Question 4e
Give the median value and illustrate this value on both of your graphs in the solutions to Question 4a and Question 4b.
Solution to Question 4e
Question 4f
Compute \(\mbox{Var}(X)\).
Solution to Question 4f
Question 5
Let \(X\) and \(Y\) denote the raffle ticket random variables from Question 1.
Question 5a
Do you believe random variables \(X\) and \(Y\) are independent random variables? Explain why or why not.
Solution to Question 5a
Question 5b
If you purchase 3 raffle tickets from raffle \(X\) and 2 raffle tickets from raffle \(Y\), what is the expected value of your winnings?
Solution to Question 5b
Question 5c
If you purchase 3 raffle tickets from raffle \(X\) and 2 raffle tickets from raffle \(Y\), what is the variance of your winnings?
Solution to Question 5c
Properties of Expected Value
Let \(X\) and \(Y\) denote a random variables, and let \(a\) and \(b\) denote constants. Then we have the following properties.
- \(E(a) = a\)
- \(E(aX + bY) = aE(X) + bE(Y)\)
Properties of Variance
For any random variable \(X\), we have \(\mbox{Var}(X) = E(X^2) - \mu_X^2\).
If \(X\) and \(Y\) are independent random variables and \(a\) and \(b\) constants, then \(\mbox{Var}(aX + bY) = a^2 \mbox{Var}(X) + b^2 \mbox{Var}(Y)\).
Question 6
The data set spotify-hits.csv1 is stored online and contains audio statistics of the top 2000 tracks on Spotify from 2000-2019. The data is stored in a comma separated file (csv).
- We can use the function
read.csv()
to import the csv file into an R data frame we callhits
.
In the code cell below:
- We convert
artist
,song
, andgenre
to categorical variables using thefactor()
function. - Extract the variables
artist
,song
,energy
,acousticness
, andgenre
(ignoring the rest). - Print the first 6 rows to screen to get a glimpse of the resulting data frame.
$artist <- factor(hits$artist) # artist is categorical
hits$song <- factor(hits$song) # song is categorical
hits$genre <- factor(hits$genre) # genre is categorical
hits<- hits[,c("artist", "song", "energy", "acousticness", "genre")]
hits head(hits) # display first 6 rows of data frame
artist song energy acousticness genre
1 Britney Spears Oops!...I Did It Again 0.834 0.3000 pop
2 blink-182 All The Small Things 0.897 0.0103 rock, pop
3 Faith Hill Breathe 0.496 0.1730 pop, country
4 Bon Jovi It's My Life 0.913 0.0263 rock, metal
5 *NSYNC Bye Bye Bye 0.928 0.0408 pop
6 Sisqo Thong Song 0.888 0.1190 hip hop, pop, R&B
- Energy: A measure of how energetic a song is from \(0.0\) to \(1.0\) (least to most energy) of. Typically, energetic songs are fast, loud, and noisy.
- Acousticness: A measure from \(0.0\) to \(1.0\) (least to most acoustic) of depending on how significant the use of acoustic instruments are in the song.
Let \(X\) denote the energy of a randomly selected song, and let \(Y\) denote the acousticness of a randomly selected song. We define a new song metric, \(Z\), that is a weighted mean of score \(X\) and \(Y\):
\[Z = \frac{3X + 2Y}{5}\]
Question 6a
Do you believe \(X\) and \(Y\) are independent random variables? Explain why or why not.
Solution to Question 6a
Question 6b
Use R to compute \(E(X)\), \(E(Y)\), and \(E(Z)\). Check whether or not the property \(E(aX + bY) = aE(X) + bE(Y)\) holds in this context.
- Hint: Recall R, the function
mean(x)
calculates the mean (expected value) ofx
.
Solution to Question 6b
<- hits$energy # random variable x
x <- hits$acousticness # random variable y
y <- (3*x + 2*y) / 5 # random variable z z
# use code cell to compare expected values
Question 6c
Use R to compute \(\mbox{Var}(X)\), \(\mbox{Var}(Y)\), and \(\mbox{Var}(Z)\). Check whether or not the property \(\mbox{Var}(aX + bY) = a^2\mbox{Var}(X) + b^2\mbox{Var}(Y)\) holds in this context.
- Hint: The function
var(x)
calculates the variance ofx
.
Solution to Question 6c
# use code cell to compare variances
Question 6d
Determine whether each of the statements below are true or false. If false, explain why.
For any two random variables \(X\) and \(Y\) and constants \(a\) and \(b\):
- It always follows that \(E(aX + bY) = aE(X) + bE(Y)\).
- It always follows that \(\mbox{Var}(aX + bY) = a^2\mbox{Var}(X) + b^2\mbox{Var}(Y)\).
Question 6d
Appendix: Properties of Random Variables
In Introduction to Random Variables we discovered the following properties for a random variable \(X\).
Properties of Discrete Random Variables
For a discrete random variable \(X\), let \(p(x)\) and \(F(x)\) denote the pmf and cdf, respectively, we have:
\(0 \leq p(x) \leq 1\) for all \(x\)
\(\displaystyle \sum_{\rm{all}\ x} p(x) = 1\)
\(F(x) = \displaystyle P(X \leq x) = \sum_{k= x_{\rm min}}^x p(k)\)
\(0 \leq F(x) \leq 1\) for all \(x\)
\(\displaystyle \lim_{x \to \infty} F(x) = 1\)
\(F(x)\) is nondecreasing.
Properties of Continuous Random Variables
For a continuous random variable \(X\), let \(f(x)\) and \(F(x)\) denote the pdf and cdf, respectively, we have:
\(f(x) \geq 0\) for all \(x\)
\(\displaystyle \int_{-\infty}^{\infty} f(x) = 1\)
\(\displaystyle P(a < x < b) = \int_a^b f(x) \, dx\)
\(\displaystyle F(x) = \int_{-\infty}^x f(t) \, dt\)
The \(F(x)\) is an antiderivative of \(f\).
The \(f(x)\) is the derivative of \(F(x)\).
\(0 \leq F(x) \leq 1\) for all \(x\).
\(\displaystyle \lim_{x \to \infty} F(x) = 1\).
\(F(x)\) is nondecreasing.
Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.