Conditional Probability

Week 1

Author

Maghfira Ramadhani

Published

Aug 20, 2025

Plan

In today’s class you will:

define marginal and conditional probability
understand the multiplicative rule of probability
understand the law of total probability
define and check the independence of events
understand and apply Bayes’ Theorem

Textbook Reference: SDG Chapter 2.1-2.3; JA Chapter 3

Conditional Probability

Often we wish to know the probabilty an event will occur given that another event has occured.

For example, instead of the marginal probability of contracting COVID (regardless of vaccination status), we may wish to know the probability that someone will contract COVID given that they have been vaccinated, or the probability someone will contract COVID given that they have not been vaccinated.

These are examples of conditional probability. The conditional probability that someone is a smoker (A) given that they identify as female (B) is denoted \(P(A|B)\), which we say as “probability of A given B.”

Simple Example

Suppose we have a small set containing 3 female non-smokers, 1 female smoker, 4 non-female non-smokers, and 4 non-female smokers.

If we feel this set is a sample representative of a population of interest, we can estimate the probability someone is a smoker as \(\frac{1+4}{3+1+4+4}=0.42\).

We may be interested in the conditional probability someone is a smoker given that they identify as female, which we can estimate as \(\frac{1}{3+1}\) – we just change the denominator to correspond to our smaller population of interest.

Conditional Probability

More formally, we define conditional probability as \(P(A|B)=\frac{P(A \cap B)}{P(B)}\) (verify this on your own using the simple example on the prior slide).

Manipulating this formula, we get the multiplicative rule of probability: \(P(A \cap B)=P(B)P(A|B)\).

One more helpful rule is the law of total probability:

\(P(A)=P(A \mid B)P(B) + P(A \mid B^c)P(B^c) = P(A \cap B)+P(A \cap B^c)\), which translates to the obvious statement that the probability that A occurs is equal to the sum of the probabilities that A occurs with B and that A occurs without B.

Hypothetical Example: Vaccine Hesitancy Cohort

Etchnicity	Vaccine Hesitant	Not Hesitant
White British or Irish	1362	7368
Other white background	71	199
Mixed	55	115
Asian or Asian British - Indian	37	143
Asian or Asian British - Pakistani/Bangladeshi	85	115
Asian or Asian British - other	15	95
Black or Black British	136	54
Other Ethnic Group or Not Specified	31	119

Three Probabilities

Define events A=vaccine hesitant and B=Asian or Asian British-Indian. Calculate the following probabilities for a randomly-selected person in this cohort.

Marginal probability of vaccine hesitancy, \(P(A)\)
Joint probability of vaccine hesitancy and Indian ethnicity, \(P(A \cap B)\)
Conditional probability of vaccine hesitancy given a person is of Indian ethnicity, \(P(A \mid B)\)

Independence

Events \(A\) and \(B\) are independent when \(P(A \mid B)=P(A)\) or \(P(B \mid A)=P(B)\).

In other words, knowing that one event has occurred does not lead to any change in the probability we assign to another event.

Checking Independence

We can use the multiplicative rule to check if two events are independent.

Events A and B are independent, if and only if \[P(A \cap B)=P(A)\times P(B).\]

Check it out: Are vaccine hesitancy and Indian ethnicity independent in our cohort?

Independent vs Disjoint Events

For independent events \(A\) and \(B\), \(P(A \mid B)=P(A)\) and \(P(B \mid A)=P(B)\), so knowing one event occurred tells us nothing about the chances the other event will occur.

For two disjoint or mutually exclusive events, knowing that one event has occurred tells us that the other event definitely has not occurred, e.g. \(P(A \cap B)=0\).

Disjoint events are therefore not independent!

Example: Breast Cancer Screening

Let \(A\) be the event that a woman has breast cancer (e.g., prevalence in population for a certain age group). Say \(P(A)=0.01\) for a 40-year-old woman.

Let \(B\) be the event that a screening mammogram is positive.

Once a person has a positive mammogram, our mental estimate of the probability she has breast cancer, now \(P(A \mid B)\), has increased. How much should it increase? Are we certain she has cancer, e.g. \(P(\text{cancer} \mid \text{mammo positive})=1\), or is there some chance the test is wrong?

Sensitivity and Specificity

A=has cancer B=mammogram positive

A diagnostic test like a mammogram is often characterized by its quality – we want a test to have good sensitivity (picking up cancer when a person really has it) and specificity (ruling out cancer when a person is cancer-free).

Sensitivity is \(P(B|A)\), and specificity is \(P(B^c \mid A^c)\).

A typical screening mammogram has sensitivity of 85% and specificity of 90%.

Bayes’ Theorem

Bayes’ theorem gives us a formal way to update our beliefs based on new information. It says \[P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}=\frac{P(A \cap B)}{P(B)}\].

In this example, a 40 year old woman with a positive screening mammogram may wish to know her chances of having cancer. Several papers have shown that even doctors tend to strongly overestimate her chances of having cancer.

We’ll consider two ways to solve this problem: one way using Bayes’ formula directly, and another based on a “hypothetical 10000” table, which applies known probabilities to a hypothetical population of 10,000 40 year old women.

Hypothetical 10,000

We really are still using Bayes’ theorem here, but it is hidden behind the scenes.

Here’s what we know.

The prevalance of breast cancer among 40 year old women is 1% or 0.01.
The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
The specificity is 90% or 0.90.

Let’s construct a 2x2 table comparing true cancer status and mammogram results in our hypothetical population of 10,000 women.

	Cancer	No Cancer	Total
Mammo +
Mammo -
Total			10000

Hypothetical Population

The prevalance of breast cancer among 40 year old women is 1% or 0.01.
The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
The specificity is 90% or 0.90.

Item 1 says the prevalence in this group is 1%, so then we expect to have \(10000\times0.01=100\) cases and \(10000\times0.99=9900\) cancer-free women.

	Cancer	No Cancer	Total
Mammo +
Mammo -
Total	100	9900	10000

Hypothetical Population

The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
The specificity is 90% or 0.90.

Item 2 gives the sensitivity, so \(P(\text{mammo +} \mid \text{cancer})=0.85\).

Thus in the group of 100 women with cancer, the mammogram should pick up \(100\times0.85=85\) of them, and miss the remaining \(100-85=15\).

	Cancer	No Cancer	Total
Mammo +	85
Mammo -	15
Total	100	9900	10000

Hypothetical Population

The specificity is 90% or 0.90.

Item 3 gives the specificity, so \(P(\text{mammo -} \mid \text{no cancer})=0.90\).

Thus in the group of 9900 women without cancer, the mammogram should correctly identify \(9900*0.90=8910\) of them as being cancer-free, and it will mistakenly identify \(9900-8910=990\) as having cancer (false positives).

	Cancer	No Cancer	Total
Mammo +	85	990
Mammo -	15	8910
Total	100	9900	10000

Hypothetical Population

Now we complete the table by filling in the row totals.

	Cancer	No Cancer	Total
Mammo +	85	990	1075
Mammo -	15	8910	8925
Total	100	9900	10000

At this point, it’s easy to calculate the conditional probability of cancer given a positive mammogram as \(\frac{85}{1075}=0.079\).

Bayes’ Theorem in Action

Alternatively, we could just use Bayes’ Theorem directly.

Baseline probability of cancer \(P(A)=0.01\) (prevalence)
She wants to know \(P(A \mid B)\), or her chances of having cancer given that her mammogram is positive (also called positive predictive value).
Bayes’ Theorem: \(P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}\).
Sensitivity is \(P(B \mid A)=0.85\).

Bayes’ Theorem in Action

How do we get \(P(B)\)? We can get this using the law of total probability: \(P(B)=P(B \mid A)P(A) + P(B \mid A^c)P(A^c)\).
- We can get \(P(B \mid A^c)\) using the specificity \(P(B^c \mid A^c)=0.90\) and the fact that \(P(B^c \mid A^c)+P(B \mid A^c)=1\). So \(P(B \mid A^c)=1-0.9=0.1\).
- Then \(P(B)=P(B \mid A)P(A) + P(B \mid A^c)P(A^c)=0.85\times0.01+0.1\times0.99=0.1075\)
Then \(P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}=\frac{0.85*0.01}{0.1075}=0.079\).

Bayes’ Theorem and Baseline Prevalence

Here, we can think of the 1% prevalence of breast cancer among 40 year old women as our prior probability a woman has cancer, and 7.9% as the posterior probability she has cancer after we see the data that her screening mammogram is positive.

Cancer would be confirmed or ruled out by subsequent testing, such as a diagnostic mammogram, ultrasound, or biopsy. At the time of the subsequent testing, we’d have an updated baseline probability of 7.9% for our 40 year old woman.

You Try It!

Most people who have a negative test result (e.g., mammogram looks good or COVID test negative) don’t worry any longer about whether they really do have disease. Are they right not to worry? Suppose our 40 year old woman with a baseline 1% breast cancer risk instead had a negative (all clear) mammogram. What is the updated probability she has breast cancer given this test result?

You Try It!

For more practice, go back to the vaccine hesitancy example.
1. Calculate the percentage of each ethnic group in the population provided. We can view this as the marginal probability of each ethnicity in that population.

Given that someone is vaccine hesitant, calculate the conditional probability that the person belongs to each ethnic group, in turn. This information would be useful in helping to target policies and outreach campaigns.

Plan

Conditional Probability

Simple Example

Conditional Probability

Hypothetical Example: Vaccine Hesitancy Cohort

Three Probabilities

Independence

Checking Independence

Check it out: Are vaccine hesitancy and Indian ethnicity independent in our cohort?

Independent vs Disjoint Events

Example: Breast Cancer Screening

Sensitivity and Specificity

Bayes’ Theorem

Hypothetical 10,000

Hypothetical Population

Hypothetical Population

Hypothetical Population

Hypothetical Population

Bayes’ Theorem in Action

Bayes’ Theorem in Action

Bayes’ Theorem and Baseline Prevalence

You Try It!

You Try It!

Thank you!