2.6: Joint Distributions

Situations with Multiple Random Variables

Thus far we have been studying probability distributions for a single random variable. In many situations, we would like to incorporate multiple random variables in our analysis.

Question 1

A large insurance agency services a number of customers who have purchased both a homeowner’s policy and an automobile policy from the agency. For each type of policy, a deductible amount must be specified. For an automobile policy, the choices are $\$100$ and $\$250$, whereas for a homeowner’s policy, the choices are $\$0$, $\$100$, and $\$200$.

Suppose an individual with both types of policy is selected at random from the agency’s files. Let $A$ be the deductible amount on the auto policy and $H$ the deductible amount on the homeowner’s policy. The joint probability mass function for $A$ and $H$ is denoted $\color{dodgerblue}{p(a,h)=P(A=a \mbox{ and } H=h)}$ and can be summarized in two-way tables.

$p(a,h)$	$H=0$	$H=100$	$H=200$	Total
$a=100$	$0.20$	$0.10$	$0.20$	??
$a=250$	$0.05$	$0.15$	$0.30$	??
Total	??	??	??	$1$

Question 1a

Interpret the meaning of the value $p(100,0)=0.2$ in this context.

Solution to Question 1a

Question 1b

Compute $P(A=250)$ and interpret the meaning in this context.

Solution to Question 1b

Question 1c

Compute $P(H=100)$ and interpret the meaning in this context.

Solution to Question 1c

Marginal Probability Mass Functions

The marginal probability mass function of X is given by

\[p_X(x) = P(X=x) = \sum_y p(x,y). \]

The marginal probability mass function of Y is given by

\[p_Y(y) = P(Y=y) = \sum_x p(x,y).\]

Question 2

Using the pmf from the insurance example Question 1, write a piecewise formula for $p_A(a)$ and $p_H(h)$.

Solution to Question 2

\[p_A(a) = \left\{ \begin{array}{ll} ?? & a=100 \\ ?? & a=250 \\ 0 & \mbox{otherwise} \end{array} \right.\]

\[p_H(h) = \left\{ \begin{array}{ll} ?? & h=0 \\ ?? & h=100 \\ ?? & h=200 \\ 0 & \mbox{otherwise} \end{array} \right.\]

Joint and Marginal Probability Density Functions

Let $X$ and $Y$ be continuous random variables with joint probability density function $\color{dodgerblue}{f(x,y)}$.

The marginal probability density function of X is given by

\[\color{dodgerblue}{f_X(x) = \int_{-\infty}^{\infty} f(x,y) \, dy}. \]

The marginal probability density function of Y is given by

\[\color{dodgerblue}{f_Y(y) = \int_{-\infty}^{\infty} f(x,y) \, dx}. \]

Question 3

A grocery store has two types of checkout lines:

Self-checkout registers where customers scan items, pay, and bag their groceries on their own.
Full-service registers where a cashier scans and bags items for the customer.

On a randomly selected day, let $X$ be the proportion of time that the self-checkout registers are in use, and let $Y$ be the proportion of time that the full-service cashiers are in use. Random variables $X$ and $Y$ are continuous each with values between 0 and 1. Then the set of possible values for the pair $(X, Y)$ is therefore rectangle $A= \left\{ (x, y): 0 \leq x \leq 1, 0 \leq y \leq 1 \right\}$ in $\mathbb{R}^2$. Suppose the joint pdf of $(X,Y)$ is given by

\[ f(x,y) = \left\{ \begin{array}{ll} \dfrac{3}{4} \left( 2x+y^2 \right), & 0 \leq x \leq 1, 0 \leq y \leq 1\\ 0 , & \mbox{otherwise} \end{array} \right. .\]

Question 3a

Give a formula for $f_X(x)$ (using integrals).

Solution to Question 3a

Question 3b

Use your answer from Question 3a to calculate and interpret the value of $P \left( 0 \leq X \leq \frac{1}{4} \right)$.

Solution to Question 3b

Question 3c

Give a formula for $f_Y(y)$.

Solution to Question 3c

Question 3d

Calculate and interpret the value of $\displaystyle P \left( 0 \leq X \leq \frac{1}{4} , \ 0 \leq Y \leq \frac{1}{2} \right)$.

Solution to Question 3d

Question 3e

Set up and evaluate a double integral to compute the probability that the self-checkout registers are in use more than the full-service registers.

Solution to Question 3e

Computing Probabilities with Continuous Joint PDFs

Let $X$ and $Y$ be continuous random variables with joint pdf $f(x,y)$. Then for any two dimensional subset $A \subseteq \mathbb{R}^2$,

\[ P \big( (X,Y) \in A \big) = \int \int_A f(x,y) \, dx \, dy .\]

In particular if $A$ is a rectangular region $A= \left\{ (x, y): a \leq x \leq b, c \leq y \leq d \right\}$, then \[ P( a \leq X \leq b, \ c \leq Y \leq d )= \int_c^d \int_a^b f(x,y) \, dx \, dy .\]
If the region $A$ is not rectangular (as in Question 3e), then the limits of the inner integral will not all be constants.

Independent Random Variables

Two random variables $X$ and $Y$ are said to be independent if for every pair of $x$ and $y$ values,

\[\begin{aligned} \color{dodgerblue}{f(x,y) = f_X(x) \cdot f_Y(y)} & \qquad \mbox{when } X \mbox{ and } Y \mbox{ are continuous, or}\\ \color{dodgerblue}{p(x,y) = p_X(x) \cdot p_Y(y)} & \qquad \mbox{when } X \mbox{ and } Y \mbox{ are discrete.} \end{aligned}\]

Notice this definition implies when $A$ and $B$ are independent events, then $\color{dodgerblue}{P(A \cap B) = P(A)P(B)}$.

Question 4

In the insurance example in Question 1, are random variables $A$ and $H$ independent? Explain how you determined your answer, and then interpret the practical significance of your answer.

Solution to Question 4

Question 5

In the grocery store example in Question 3, are random variables $X$ and $Y$ independent? Explain how you determined your answer, and then interpret the practical significance of your answer.

Solution to Question 5

Expected Values with Joint Distributions

Let $X$ and $Y$ be two random variables with joint pdf $f(x,y)$. If $\color{dodgerblue}{Z=h(X,Y)}$, then

\[E( {\color{dodgerblue}{Z}} ) = E( {\color{dodgerblue}{h(X,Y)}} ) = \left\{ \begin{array}{ll} \displaystyle \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} {\color{dodgerblue}{h(x,y)}} \cdot f(x,y) \, dx \, dy , & \mbox{if X and Y are continuous} \\ & \\ \displaystyle \sum_y \sum_x {\color{dodgerblue}{h(x,y)}} \cdot f(x,y) , & \mbox{if X and Y are discrete} \end{array} \right. .\]

This is often referred to as the Law of the Unconscious Statistician since we do not need to know the distribution $f_Z(z)$ in order to compute $E(Z)$.

Question 6

Let $X$ and $Y$ be the values ($1, 2, \ldots ,6$) rolled by each of two die. Assume that $X$ and $Y$ are independent, and define the random variable $Z=h(x,y)=xy$ which is the product of the two rolls. Calculate $E(Z)$, the expected value of $Z$, the product of the two rolls.

Solution to Question 6

Linear Combinations of Random Variables

Let $X$ and $Y$ be two random variables and consider a linear combination $aX+bY$ for $a$ and $b$ two constants. Then

\[\color{dodgerblue}{E(aX+bY)=aE(X)+bE(Y)}.\]

Note

The property above is true regardless of whether $X$ and $Y$ are independent or dependent.

Question 7

Prove that expected value and property above.

Solution to Question 7

Let $X$ and $Y$ be two continuous random variables and let $a$ and $b$ denote two constants. Then we have

\[\begin{aligned} E(aX+bY) &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (ax+by)f(x,y) \, dx \, dy & \mbox{Law of the Unconscious Statistician} \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} {\color{dodgerblue}{axf(x,y)}} \, dx \, dy + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} {\color{dodgerblue}{byf(x,y)}} \, dx \, dy & \mbox{Explain step 1} \\ &= {\color{dodgerblue}{a}} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xf(x,y) \, dx \, dy + {\color{dodgerblue}{b}} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} yf(x,y) \, dx \, dy & \mbox{Explain step 2} \\ &= a \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xf(x,y) \, {\color{dodgerblue}{dy \, dx}} + b \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} yf(x,y) \, dx \, dy & \mbox{Explain step 3} \\ &= a \int_{-\infty}^{\infty} {\color{dodgerblue}{x}} \left( \int_{-\infty}^{\infty} f(x,y) \, dy \right) \, dx + b \int_{-\infty}^{\infty} {\color{dodgerblue}{y}} \left( \int_{-\infty}^{\infty} f(x,y) \, dx \right) \, dy & \mbox{Explain step 4} \\ &= a \int_{-\infty}^{\infty} x {\color{dodgerblue}{f_X(x)}} \, dx + b \int_{-\infty}^{\infty} y {\color{dodgerblue}{f_Y(y)}} \, dy & \mbox{Explain step 5} \\ &= a {\color{dodgerblue}{E(X)}} + b {\color{dodgerblue}{E(Y)}} & \mbox{Explain step 6} \\ \end{aligned}\]

Explanation of Steps of Proof:

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Products of Independent Random Variables

A special case for products: Let $X$ and $Y$ be two independent random variables. Then additionally we have the following properties.

Expected value: $\color{dodgerblue}{E(XY) = E(X) \cdot E(Y)}$
Variance of linear combination: $\color{dodgerblue}{\mbox{Var}(aX+bY)=a^2\mbox{Var}(X)+b^2\mbox{Var}(Y)}$
Variance of product: $\color{dodgerblue}{\mbox{Var}(XY) = E(X^2Y^2) - \big( E(X)E(Y) \big)^2}$.

Warning

In general these properties do NOT hold if X and Y are dependent. We can only use the properties above if we know $X$ and $Y$ are independent random variables.

Question 8

Let $X$ and $Y$ be two independent random variables. Prove $\mbox{Var}(aX+bY)=a^2\mbox{Var}(X)+b^2\mbox{Var}(Y)$.

An outline of the proof is provided below. Fill in the missing details.

Solution to Question 8

Let $X$ and $Y$ be two continuous, independent random variables. Then using the property $\mbox{Var}(X) = E(X^2) - \big( E(X) \big)^2$, we have

\[\mbox{Var}(aX+bY) = E \left( {\color{tomato}{??}} \right) - \left( {\color{tomato}{??}} \right)^2.\]

Using the previous result, we have

\[\left( E(aX+bY) \right)^2 = {\color{tomato}{??}}\]

Next we simplify $E \left( (aX+bY)^2 \right)$ as follows

\[E \left( (aX+bY)^2 \right) = {\color{tomato}{??}}\]

Since $X$ and $Y$ are independent, we can apply the property that ${\color{mediumseagreen}{E(XY) = E(X) E(Y)}}$. This gives

\[\begin{aligned} \mbox{Var}(aX+bY) &= E \left( (aX+bY)^2 \right) - \left( E(aX+bY) \right)^2\\ &= \left( {\color{dodgerblue}{a^2 E(X^2)}} + {\color{mediumseagreen}{ab E(X)E(Y)}} + {\color{tomato}{b^2 E(Y^2)}} \right) - \left( {\color{dodgerblue}{a^2 \big( E(X) \big)^2}} + {\color{mediumseagreen}{abE(X)E(Y)}} + {\color{tomato}{b^2 \big( E(Y) \big)^2}} \right) \\ &= {\color{dodgerblue}{\left( a^2 E(X^2) - a^2 \big( E(X) \big)^2 \right)}} + {\color{tomato}{\left( b^2 E(Y^2) - b^2 \big( E(Y) \big)^2 \right)}}\\ &= a^2 {\color{dodgerblue}{\left(E(X^2) - \big( E(X) \big)^2 \right)}} + b^2 {\color{tomato}{\left( E(Y^2) - \big( E(Y) \big)^2 \right)}}\\ &= a^2 {\color{dodgerblue}{\mbox{Var}(X)}} + b^2 {\color{tomato}{\mbox{Var}(Y)}} \end{aligned}\]

Note

We DID need to use the assumption that $X$ and $Y$ are independent! Without the assumption of independence, the proof above would not work.

Creative Commons License

Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

\(p(a,h)\)	\(H=0\)	\(H=100\)	\(H=200\)	Total
\(a=100\)	\(0.20\)	\(0.10\)	\(0.20\)	??
\(a=250\)	\(0.05\)	\(0.15\)	\(0.30\)	??
Total	??	??	??	\(1\)