2.6: Joint Distributions

Open In Colab

Situations with Multiple Random Variables


Thus far we have been studying probability distributions for a single random variable. In many situations, we would like to incorporate multiple random variables in our analysis.

Question 1


A large insurance agency services a number of customers who have purchased both a homeowner’s policy and an automobile policy from the agency. For each type of policy, a deductible amount must be specified. For an automobile policy, the choices are \(\$100\) and \(\$250\), whereas for a homeowner’s policy, the choices are \(\$0\), \(\$100\), and \(\$200\).

Suppose an individual with both types of policy is selected at random from the agency’s files. Let \(A\) be the deductible amount on the auto policy and \(H\) the deductible amount on the homeowner’s policy. The joint probability mass function for \(A\) and \(H\) is denoted \(\color{dodgerblue}{p(a,h)=P(A=a \mbox{ and } H=h)}\) and can be summarized in two-way tables.

\(p(a,h)\) \(H=0\) \(H=100\) \(H=200\) Total
\(a=100\) \(0.20\) \(0.10\) \(0.20\) ??
\(a=250\) \(0.05\) \(0.15\) \(0.30\) ??
Total ?? ?? ?? \(1\)

Question 1a


Interpret the meaning of the value \(p(100,0)=0.2\) in this context.

Solution to Question 1a







Question 1b


Compute \(P(A=250)\) and interpret the meaning in this context.

Solution to Question 1b







Question 1c


Compute \(P(H=100)\) and interpret the meaning in this context.

Solution to Question 1c







Marginal Probability Mass Functions


  • The marginal probability mass function of X is given by

\[p_X(x) = P(X=x) = \sum_y p(x,y). \]

  • The marginal probability mass function of Y is given by

\[p_Y(y) = P(Y=y) = \sum_x p(x,y).\]

Question 2


Using the pmf from the insurance example Question 1, write a piecewise formula for \(p_A(a)\) and \(p_H(h)\).

Solution to Question 2


\[p_A(a) = \left\{ \begin{array}{ll} ?? & a=100 \\ ?? & a=250 \\ 0 & \mbox{otherwise} \end{array} \right.\]

\[p_H(h) = \left\{ \begin{array}{ll} ?? & h=0 \\ ?? & h=100 \\ ?? & h=200 \\ 0 & \mbox{otherwise} \end{array} \right.\]


Joint and Marginal Probability Density Functions


Let \(X\) and \(Y\) be continuous random variables with joint probability density function \(\color{dodgerblue}{f(x,y)}\).

  • The marginal probability density function of X is given by

\[\color{dodgerblue}{f_X(x) = \int_{-\infty}^{\infty} f(x,y) \, dy}. \]

  • The marginal probability density function of Y is given by

\[\color{dodgerblue}{f_Y(y) = \int_{-\infty}^{\infty} f(x,y) \, dx}. \]

Question 3


A grocery store has two types of checkout lines:

  • Self-checkout registers where customers scan items, pay, and bag their groceries on their own.
  • Full-service registers where a cashier scans and bags items for the customer.

On a randomly selected day, let \(X\) be the proportion of time that the self-checkout registers are in use, and let \(Y\) be the proportion of time that the full-service cashiers are in use. Random variables \(X\) and \(Y\) are continuous each with values between 0 and 1. Then the set of possible values for the pair \((X, Y)\) is therefore rectangle \(A= \left\{ (x, y): 0 \leq x \leq 1, 0 \leq y \leq 1 \right\}\) in \(\mathbb{R}^2\). Suppose the joint pdf of \((X,Y)\) is given by

\[ f(x,y) = \left\{ \begin{array}{ll} \dfrac{3}{4} \left( 2x+y^2 \right), & 0 \leq x \leq 1, 0 \leq y \leq 1\\ 0 , & \mbox{otherwise} \end{array} \right. .\]

Question 3a


Give a formula for \(f_X(x)\) (using integrals).

Solution to Question 3a







Question 3b


Use your answer from Question 3a to calculate and interpret the value of \(P \left( 0 \leq X \leq \frac{1}{4} \right)\).

Solution to Question 3b







Question 3c


Give a formula for \(f_Y(y)\).

Solution to Question 3c







Question 3d


Calculate and interpret the value of \(\displaystyle P \left( 0 \leq X \leq \frac{1}{4} , \ 0 \leq Y \leq \frac{1}{2} \right)\).

Solution to Question 3d







Question 3e


Set up and evaluate a double integral to compute the probability that the self-checkout registers are in use more than the full-service registers.

Solution to Question 3e







Computing Probabilities with Continuous Joint PDFs


Let \(X\) and \(Y\) be continuous random variables with joint pdf \(f(x,y)\). Then for any two dimensional subset \(A \subseteq \mathbb{R}^2\),

\[ P \big( (X,Y) \in A \big) = \int \int_A f(x,y) \, dx \, dy .\]

  • In particular if \(A\) is a rectangular region \(A= \left\{ (x, y): a \leq x \leq b, c \leq y \leq d \right\}\), then \[ P( a \leq X \leq b, \ c \leq Y \leq d )= \int_c^d \int_a^b f(x,y) \, dx \, dy .\]
  • If the region \(A\) is not rectangular (as in Question 3e), then the limits of the inner integral will not all be constants.

Independent Random Variables


Two random variables \(X\) and \(Y\) are said to be independent if for every pair of \(x\) and \(y\) values,

\[\begin{aligned} \color{dodgerblue}{f(x,y) = f_X(x) \cdot f_Y(y)} & \qquad \mbox{when } X \mbox{ and } Y \mbox{ are continuous, or}\\ \color{dodgerblue}{p(x,y) = p_X(x) \cdot p_Y(y)} & \qquad \mbox{when } X \mbox{ and } Y \mbox{ are discrete.} \end{aligned}\]

Notice this definition implies when \(A\) and \(B\) are independent events, then \(\color{dodgerblue}{P(A \cap B) = P(A)P(B)}\).

Question 4


In the insurance example in Question 1, are random variables \(A\) and \(H\) independent? Explain how you determined your answer, and then interpret the practical significance of your answer.

Solution to Question 4







Question 5


In the grocery store example in Question 3, are random variables \(X\) and \(Y\) independent? Explain how you determined your answer, and then interpret the practical significance of your answer.

Solution to Question 5







Expected Values with Joint Distributions


Let \(X\) and \(Y\) be two random variables with joint pdf \(f(x,y)\). If \(\color{dodgerblue}{Z=h(X,Y)}\), then

\[E( {\color{dodgerblue}{Z}} ) = E( {\color{dodgerblue}{h(X,Y)}} ) = \left\{ \begin{array}{ll} \displaystyle \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} {\color{dodgerblue}{h(x,y)}} \cdot f(x,y) \, dx \, dy , & \mbox{if X and Y are continuous} \\ & \\ \displaystyle \sum_y \sum_x {\color{dodgerblue}{h(x,y)}} \cdot f(x,y) , & \mbox{if X and Y are discrete} \end{array} \right. .\]

This is often referred to as the Law of the Unconscious Statistician since we do not need to know the distribution \(f_Z(z)\) in order to compute \(E(Z)\).

Question 6


Let \(X\) and \(Y\) be the values (\(1, 2, \ldots ,6\)) rolled by each of two die. Assume that \(X\) and \(Y\) are independent, and define the random variable \(Z=h(x,y)=xy\) which is the product of the two rolls. Calculate \(E(Z)\), the expected value of \(Z\), the product of the two rolls.

Solution to Question 6







Linear Combinations of Random Variables


Let \(X\) and \(Y\) be two random variables and consider a linear combination \(aX+bY\) for \(a\) and \(b\) two constants. Then

\[\color{dodgerblue}{E(aX+bY)=aE(X)+bE(Y)}.\]

Note

The property above is true regardless of whether \(X\) and \(Y\) are independent or dependent.

Question 7


Prove that expected value and property above.

Solution to Question 7


Let \(X\) and \(Y\) be two continuous random variables and let \(a\) and \(b\) denote two constants. Then we have

\[\begin{aligned} E(aX+bY) &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (ax+by)f(x,y) \, dx \, dy & \mbox{Law of the Unconscious Statistician} \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} {\color{dodgerblue}{axf(x,y)}} \, dx \, dy + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} {\color{dodgerblue}{byf(x,y)}} \, dx \, dy & \mbox{Explain step 1} \\ &= {\color{dodgerblue}{a}} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xf(x,y) \, dx \, dy + {\color{dodgerblue}{b}} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} yf(x,y) \, dx \, dy & \mbox{Explain step 2} \\ &= a \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xf(x,y) \, {\color{dodgerblue}{dy \, dx}} + b \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} yf(x,y) \, dx \, dy & \mbox{Explain step 3} \\ &= a \int_{-\infty}^{\infty} {\color{dodgerblue}{x}} \left( \int_{-\infty}^{\infty} f(x,y) \, dy \right) \, dx + b \int_{-\infty}^{\infty} {\color{dodgerblue}{y}} \left( \int_{-\infty}^{\infty} f(x,y) \, dx \right) \, dy & \mbox{Explain step 4} \\ &= a \int_{-\infty}^{\infty} x {\color{dodgerblue}{f_X(x)}} \, dx + b \int_{-\infty}^{\infty} y {\color{dodgerblue}{f_Y(y)}} \, dy & \mbox{Explain step 5} \\ &= a {\color{dodgerblue}{E(X)}} + b {\color{dodgerblue}{E(Y)}} & \mbox{Explain step 6} \\ \end{aligned}\]

Explanation of Steps of Proof:

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Products of Independent Random Variables


A special case for products: Let \(X\) and \(Y\) be two independent random variables. Then additionally we have the following properties.

  • Expected value: \(\color{dodgerblue}{E(XY) = E(X) \cdot E(Y)}\)
  • Variance of linear combination: \(\color{dodgerblue}{\mbox{Var}(aX+bY)=a^2\mbox{Var}(X)+b^2\mbox{Var}(Y)}\)
  • Variance of product: \(\color{dodgerblue}{\mbox{Var}(XY) = E(X^2Y^2) - \big( E(X)E(Y) \big)^2}\).
Warning

In general these properties do NOT hold if X and Y are dependent. We can only use the properties above if we know \(X\) and \(Y\) are independent random variables.

Question 8


Let \(X\) and \(Y\) be two independent random variables. Prove \(\mbox{Var}(aX+bY)=a^2\mbox{Var}(X)+b^2\mbox{Var}(Y)\).

An outline of the proof is provided below. Fill in the missing details.

Solution to Question 8


Let \(X\) and \(Y\) be two continuous, independent random variables. Then using the property \(\mbox{Var}(X) = E(X^2) - \big( E(X) \big)^2\), we have

\[\mbox{Var}(aX+bY) = E \left( {\color{tomato}{??}} \right) - \left( {\color{tomato}{??}} \right)^2.\]

Using the previous result, we have

\[\left( E(aX+bY) \right)^2 = {\color{tomato}{??}}\]

Next we simplify \(E \left( (aX+bY)^2 \right)\) as follows

\[E \left( (aX+bY)^2 \right) = {\color{tomato}{??}}\]

Since \(X\) and \(Y\) are independent, we can apply the property that \({\color{mediumseagreen}{E(XY) = E(X) E(Y)}}\). This gives

\[\begin{aligned} \mbox{Var}(aX+bY) &= E \left( (aX+bY)^2 \right) - \left( E(aX+bY) \right)^2\\ &= \left( {\color{dodgerblue}{a^2 E(X^2)}} + {\color{mediumseagreen}{ab E(X)E(Y)}} + {\color{tomato}{b^2 E(Y^2)}} \right) - \left( {\color{dodgerblue}{a^2 \big( E(X) \big)^2}} + {\color{mediumseagreen}{abE(X)E(Y)}} + {\color{tomato}{b^2 \big( E(Y) \big)^2}} \right) \\ &= {\color{dodgerblue}{\left( a^2 E(X^2) - a^2 \big( E(X) \big)^2 \right)}} + {\color{tomato}{\left( b^2 E(Y^2) - b^2 \big( E(Y) \big)^2 \right)}}\\ &= a^2 {\color{dodgerblue}{\left(E(X^2) - \big( E(X) \big)^2 \right)}} + b^2 {\color{tomato}{\left( E(Y^2) - \big( E(Y) \big)^2 \right)}}\\ &= a^2 {\color{dodgerblue}{\mbox{Var}(X)}} + b^2 {\color{tomato}{\mbox{Var}(Y)}} \end{aligned}\]

Note

We DID need to use the assumption that \(X\) and \(Y\) are independent! Without the assumption of independence, the proof above would not work.


Creative Commons License

Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.