6.5: Error Types and Power of Tests

Open In Colab  

Potential Errors with Hypothesis Tests


Doctor with Patient Cartoon

A False Positive?

Medical Doctor and Pregnant Patient clipart

A False Negative?

Figure 26.1: Type I (on left) and Type II (on right) Errors

The result of a hypothesis test has one of two possibilities:

  • If p-value \(\leq \alpha\), we reject \(H_0\), and we have enough evidence to support the claim in \(H_a\).
  • If p-value \(> \alpha\), we fail to reject \(H_0\). However, in this case we do not accept \(H_0\). The test is inconclusive.

As with confidence intervals, it is possible we do all of our analysis perfectly without any mistakes, but our conclusion is incorrect due to the randomness in sampling. Rarely, we are unlucky and dealt a biased sample, in which case we arrive at an incorrect conclusion.

In this section, we explore the following questions:

  • What type of errors are possible with hypothesis testing?
  • What are the practical implications of making errors?
  • How can we calculate the probability of correctly rejecting \(H_0\)?

Type I and Type II Errors


There are two possible errors in a hypothesis test:

  • A type I error occurs if we incorrectly reject \(H_0\) when it is true.
    • This is known as a false positive.
    • For example, a jury falsely convicts an innocent person.
  • A type II error occurs if we incorrectly fail to reject \(H_0\) when it is false.
    • This is known as a false negative.
    • For example, a jury fails to convict a guilty person.

For example, when a jury is deciding a case in court, the hypotheses would be:

  • \(H_0\): The accused person is innocent (we assume the person is innocent).
  • \(H_a\): The accused person is guilty (requires evidence beyond a reasonable doubt).

A jury can make two possible errors:

  • If they falsely convict an innocent person, they have made a type I error.
  • If they do not convict a guilty person, they have made a type II error.

Question 1


A hospital is testing to see whether a donated organ is a match for a recipient in need of an organ transplant.

  • \(H_0\): The organ is not a match (boring).
  • \(H_a\): The organ is a match (interesting).

Describe the type I and type II errors in this context. What are the practical consequences of making these errors?

Solution to Question 1







Question 2


A lab runs viral tests to see whether a person is currently infected with COVID-19.

  • \(H_0\): The person is not currently infected with COVID-19 (boring).
  • \(H_a\): The person is currently infected with COVID-19 (interesting).

Describe the type I and type II errors in this context. What are the practical consequences of making these errors?

Solution to Question 2







Question 3


The cholesterol level of healthy men is normally distributed with a mean of 180 mg/dL and a standard deviation of 20 mg/dL, whereas men predisposed to heart disease have a mean cholesterol level of 300 mg/dL with a standard deviation of 30 mg/dL. The cholesterol level 225 mg/dL is used to demarcate healthy from predisposed men.

Question 3a


Given that a man is healthy, what is the probability they are diagnosed as predisposed?

Solution to Question 3a


# code cell to help with calculations




Question 3b


Given that a man is not healthy, what is the probability they are not diagnosed as predisposed?

Solution to Question 3b


# code cell to help with calculations




Question 3c


Which of the previous answers gives the probability of a type I error and which is for a type II error? Explain.

Solution to Question 3c







Question 4


Suppose we want to test whether a ten-sided die is fair (with sides numbered 0 to 9). Let \(p\) be the proportion of all rolls that land on an even number.

Question 4a


Set up the hypotheses to test our claim.

Solution to Question 4a


  • \(H_0\):

  • \(H_a\):




Question 4b


Roll the die 20 times, and record how many times it lands on an even number (0, 2, 4, 6, or 8). If you do not have a ten-sided die, use the code cell below to simulate rolling a fair, ten-sided die \(n=20\) times.

Solution to Question 4b


# run code cell if you do not have a ten-sided die
sample(0:9, 20, replace = TRUE)




Question 4c


Calculate the p-value of your sample.

Solution to Question 4c


# code cell to help with calculations




Question 4d


What (if anything) can you conclude about the hypothesis at 10% significance level?

Solution to Question 4d







The Significance Level Revisited


The significance level of a hypothesis test is the largest value of \(\mathbf{\alpha}\) we find acceptable for the probability for a type I error.

Question 5


Social Sharing

Credit: Seobility CC BY-SA 4.0

A company claims that only 3% of people who use their facial lotion develop an allergic reaction (a rash). You are suspicious of their claim based on hearing some of your friends had an allergic reaction, and you believe it is more than 3%. You pick a random sample of 50 people and have them try the lotion. If more than 3 out of the 50 people develop the rash, you will blow up social media with posts about the dishonesty of the company’s claim.

Question 5a


Set up hypotheses for this test.

Solution to Question 5a


  • \(H_0\):

  • \(H_a\):




Question 5b


Explain what type I and type II errors are in this case. Make sure you explain in the context of this example.

Solution to Question 5b







Question 5c


What is the probability of making a type I error?

Solution to Question 5c


# code cell to help with calculations






Question 5d


If you were to perform the hypothesis test at a 5% significance level, and you observe \(X=4\), what would be the result of the test?

Solution to Question 5d







Question 5e


For what values of \(X\) would you reject \(H_0\) at a 5% significance level?

Solution to Question 5e







Rejection Regions


When performing a hypothesis test at a significance level of \(\alpha\), the rejection or critical region, denoted \(\mathcal{R}\), is the set of all values of the test statistic for which we reject \(H_0\). The endpoint(s) of the region are called critical values.

Question 6


In Question 4 we tested whether or not a ten-sided die is fair by rolling it 20 times and counting the number of rolls that land on an even number. If \(p\) is the proportion of all rolls that land on an even number, then we have

\[H_0: p = 0.5 \qquad \mbox{vs.} \qquad H_a: p \ne 0.5.\]

Question 6a


If you found only \(X=7\) rolls landed on an even number, what is the p-value?

Solution to Question 6a


# code cell to help with calculations




Question 6b


Find the critical values and rejection region if we use a significance level of 10%.

Solution to Question 6b


# code cell to help with calculations




The Power of a Test


Question 7


Suppose you are interested in the lengths of a certain species of snake in an ecosystem. Assume the lengths (in cm.) are normally distributed with unknown mean \(\mu\), but the standard deviation of the population is known to be \(\sigma = 4\) cm. It has been claimed that the mean length of this species is 25 cm. You believe the actual mean length is greater than 25 cm. You collect a random sample of 30 snakes. You will test using a significance level of \(\alpha = 0.05\).

Question 7a


Set up hypotheses for the test.

Solution to Question 7a


  • \(H_0\):

  • \(H_a\):




Question 7b


Find the critical value, and give the rejection region.

Solution to Question 7b


# code cell to help with calculations




Question 7c


If in fact \(\mu = 27\) cm, what is the probability of making a type II error?

Solution to Question 7c


# code cell to help with calculations




Question 7d


What is the probability of correctly rejecting \(H_0\) when \(H_a\) is true?

Solution to Question 7d







Definition of the Power of a Test


The power of a test is the probability of correctly rejecting \(H_0\).

\[{\color{dodgerblue}{\mbox{power} = P(\mbox{Reject } H_0 \ | \ H_a \mbox{ is true}) = 1 - {\color{tomato}{\beta}}}},\]

where \(\beta\) denotes the probability of a type II error.

Additional Practice


Question 8


Let \(X_1\), \(X_2\), \(\ldots\) , \(X_{12}\) be a random variable from a Bernoulli distribution with unknown probability \(p\). We test

\[H_0: p=0.3 \qquad \mbox{versus} \qquad H_a: p < 0.3.\]

We will reject the null if the number of success \(Y= X_1 + X_2 + \ldots + X_{12} \leq 1\).

Question 8a


Find the probability of a type I error.

Solution to Question 8a


# code cell to help with calculations




Question 8b


If the alternative hypothesis is true, find an expression for the power, \(1-\beta\), as a function of \(p\).

Solution to Question 8b


# code cell to help with calculations




Question 9


You draw a random sample \(X_1, X_2, \ldots , X_{10}\) from an exponential distribution \(f(x; \lambda) = \lambda e^{-\lambda x}\) (recall \(\mu = 1/\lambda\)). You will test

\[H_0: \lambda = 0.25 \qquad \mbox{versus} \qquad H_a: \lambda < 0.25.\]

You decide you will reject the null hypothesis if at least 3 of the values of \(X_i\) are greater than 9.

Question 9a


Compute the probability of a type I error.

Solution to Question 9a


# code cell to help with calculations




Question 9b


If actually \(\lambda = 0.15\), what is the power of this test?

Solution to Question 9b


# code cell to help with calculations




Appendix: Summary of Hypothesis Testing


  1. State the hypotheses and identify (from the alternative claim in \(H_a\)) if it is a one or two-tailed test.

    • \(H_0\) is the “boring” claim. Express using an equal sign \(=\).
    • \(H_a\) is the claim we want to show is likely true. Use inequality sign (\(>\), \(<\), or \(\ne\)).
    • State both \(H_0\) and \(H_a\) in terms of population parameters such as \(\mu_1-\mu_2\) and \(p_1-p_2\).
  2. Compute the test statistic.

    • If the observed sample contradicts the null claim, the result is significant.
    • A standardized test statistic measures how many SE’s the observed stat is from the null claim.
    • A standardized test statistic with a large absolute value is supporting evidence to reject \(H_0\).
  3. Using the null distribution, compute the p-value. The p-value is the probability of getting a sample with a test statistic as or more extreme than the observed sample assuming \(H_0\) is true.

    • The p-value is the area in one or both tails beyond the test statistic.
    • The p-value is a probability, so we have \(0 < \mbox{p-value} < 1\).
    • The smaller the p-value, the stronger the evidence to reject \(H_0\).
  4. Based on the significance level, \(\alpha\), make a decision to reject or not reject the null hypothesis

    • If p-value \(\leq \alpha\), we reject \(H_0\).
    • If p-value \(> \alpha\), we do not reject \(H_0\).
  5. Summarize the results in practical terms, in the context of the example.

    • If we reject \(H_0\), this means there is enough evidence to support the claim in \(H_a\).
    • If we do not reject \(H_0\), this means there is not evidence to reject \(H_0\) nor support \(H_a\). The test is inconclusive.

Creative Commons License

Statistical Methods: Exploring the Uncertain by Adam Spiegler is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.