<- read.csv("https://raw.githubusercontent.com/CU-Denver-MathStats-OER/Statistical-Theory/main/Data/spotify-hits.csv") hits
2.3: Expected Value and Variance
Additional Reference: See Introduction to Random Variables where we discovered some important properties for a random variable
How Much is the Raffle Ticket Worth?
A raffle sells 1,000 tickets with the following payouts:
- There is one winning randomly selected ticket that will win the grand prize of $5,000.
- Two tickets will be randomly selected to win a second prize each worth $1,000.
- Ten tickets will be randomly selected to win a third prize each worth $200.
- The remaining tickets do not win a prize.
Question 1
Let random variable
Question 1a
Fill in the values of
Solution to Question 1a
Fill in the blanks to complete the table.
?? | ?? | ?? | ?? | |
---|---|---|---|---|
?? | ?? | ?? | ?? |
Question 1b
If somebody offered to buy a raffle ticket from you, what do you think fair value is for the ticket?
Solution to Question 1b
Question 1c
Consider another ticket raffle. There are 1,000 tickets with the following payouts:
- 500 randomly selected tickets will win a grand prize of $15.
- The other 500 tickets do not win a prize.
Let random variable
Somebody offers one free ticket to either raffle
Solution to Question 1c
The Expected Value of a Discrete Random Variable
The expected value for a discrete random variable
is the average or mean value of random variable .- The expected value of
is denoted or . - The expected value of
might not be a possible value of .
The Variance of a Discrete Random Variable
The variance for a discrete random variable
is the expected value of the squared distance from the mean.- The variance is denoted
or .
The Standard Deviation of a Discrete Random Variable
The standard deviation for a discrete random variable
- The standard deviation of random variable
is denoted . - The standard deviation essentially measures the average distance of the values of
from its mean . - The units of
and are the same, and thus standard deviation is often a practical way to describe the spread of .
Question 2
Using properties of discrete random variables, show that for any discrete random variable
Solution to Question 2
Finish the proof below!
Let
Therefore, we see that
Question 3
Let
Question 3a
Calculate
Solution to Question 3a
Question 3b
Calculate
Solution to Question 3b
Mean, Median, and Variance of Continuous Random Variables
Let
- The expected value or mean is
- The variance is
- The median is the value
such that . Thus, to find the median we solve the equations below for ,
Question 4
Consider the random variable
Question 4a
On a separate piece of paper, sketch a graph of the pdf,
Solution to Question 4a
Sketch a graph on a separate piece of paper.
Question 4b
Enter the formula for
Solution to Question 4b
Sketch a graph on a separate piece of paper.
Question 4c
Calculate
Solution to Question 4c
Question 4d
Calculate
Solution to Question 4d
Question 4e
Give the median value and illustrate this value on both of your graphs in the solutions to Question 4a and Question 4b.
Solution to Question 4e
Question 4f
Compute
Solution to Question 4f
Question 5
Let
Question 5a
Do you believe random variables
Solution to Question 5a
Question 5b
If you purchase 3 raffle tickets from raffle
Solution to Question 5b
Question 5c
If you purchase 3 raffle tickets from raffle
Solution to Question 5c
Properties of Expected Value
Let
Properties of Variance
For any random variable
, we have .If
and are independent random variables and and constants, then .
Question 6
The data set spotify-hits.csv1 is stored online and contains audio statistics of the top 2000 tracks on Spotify from 2000-2019. The data is stored in a comma separated file (csv).
- We can use the function
read.csv()
to import the csv file into an R data frame we callhits
.
In the code cell below:
- We convert
artist
,song
, andgenre
to categorical variables using thefactor()
function. - Extract the variables
artist
,song
,energy
,acousticness
, andgenre
(ignoring the rest). - Print the first 6 rows to screen to get a glimpse of the resulting data frame.
$artist <- factor(hits$artist) # artist is categorical
hits$song <- factor(hits$song) # song is categorical
hits$genre <- factor(hits$genre) # genre is categorical
hits<- hits[,c("artist", "song", "energy", "acousticness", "genre")]
hits head(hits) # display first 6 rows of data frame
artist song energy acousticness genre
1 Britney Spears Oops!...I Did It Again 0.834 0.3000 pop
2 blink-182 All The Small Things 0.897 0.0103 rock, pop
3 Faith Hill Breathe 0.496 0.1730 pop, country
4 Bon Jovi It's My Life 0.913 0.0263 rock, metal
5 *NSYNC Bye Bye Bye 0.928 0.0408 pop
6 Sisqo Thong Song 0.888 0.1190 hip hop, pop, R&B
- Energy: A measure of how energetic a song is from
to (least to most energy) of. Typically, energetic songs are fast, loud, and noisy. - Acousticness: A measure from
to (least to most acoustic) of depending on how significant the use of acoustic instruments are in the song.
Let
Question 6a
Do you believe
Solution to Question 6a
Question 6b
Use R to compute
- Hint: Recall R, the function
mean(x)
calculates the mean (expected value) ofx
.
Solution to Question 6b
<- hits$energy # random variable x
x <- hits$acousticness # random variable y
y <- (3*x + 2*y) / 5 # random variable z z
# use code cell to compare expected values
Question 6c
Use R to compute
- Hint: The function
var(x)
calculates the variance ofx
.
Solution to Question 6c
# use code cell to compare variances
Question 6d
Determine whether each of the statements below are true or false. If false, explain why.
For any two random variables
- It always follows that
. - It always follows that
.
Question 6d
Appendix: Properties of Random Variables
In Introduction to Random Variables we discovered the following properties for a random variable
Properties of Discrete Random Variables
For a discrete random variable
for all for all is nondecreasing.
Properties of Continuous Random Variables
For a continuous random variable
for allThe
is an antiderivative of .The
is the derivative of . for all . . is nondecreasing.
Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.