Conditional Probability

Week 1

Author

Maghfira Ramadhani

Published

Aug 20, 2025

Plan

In today’s class you will:

  1. define marginal and conditional probability
  2. understand the multiplicative rule of probability
  3. understand the law of total probability
  4. define and check the independence of events
  5. understand and apply Bayes’ Theorem

Textbook Reference: SDG Chapter 2.1-2.3; JA Chapter 3

Conditional Probability

Often we wish to know the probabilty an event will occur given that another event has occured.

For example, instead of the marginal probability of contracting COVID (regardless of vaccination status), we may wish to know the probability that someone will contract COVID given that they have been vaccinated, or the probability someone will contract COVID given that they have not been vaccinated.

These are examples of conditional probability. The conditional probability that someone is a smoker (A) given that they identify as female (B) is denoted \(P(A|B)\), which we say as “probability of A given B.”


Simple Example

Suppose we have a small set containing 3 female non-smokers, 1 female smoker, 4 non-female non-smokers, and 4 non-female smokers.

If we feel this set is a sample representative of a population of interest, we can estimate the probability someone is a smoker as \(\frac{1+4}{3+1+4+4}=0.42\).

We may be interested in the conditional probability someone is a smoker given that they identify as female, which we can estimate as \(\frac{1}{3+1}\) – we just change the denominator to correspond to our smaller population of interest.


Conditional Probability

More formally, we define conditional probability as \(P(A|B)=\frac{P(A \cap B)}{P(B)}\) (verify this on your own using the simple example on the prior slide).

Manipulating this formula, we get the multiplicative rule of probability: \(P(A \cap B)=P(B)P(A|B)\).

One more helpful rule is the law of total probability:

\(P(A)=P(A \mid B)P(B) + P(A \mid B^c)P(B^c) = P(A \cap B)+P(A \cap B^c)\), which translates to the obvious statement that the probability that A occurs is equal to the sum of the probabilities that A occurs with B and that A occurs without B.


Hypothetical Example: Vaccine Hesitancy Cohort

Etchnicity Vaccine Hesitant Not Hesitant
White British or Irish 1362 7368
Other white background 71 199
Mixed 55 115
Asian or Asian British - Indian 37 143
Asian or Asian British - Pakistani/Bangladeshi 85 115
Asian or Asian British - other 15 95
Black or Black British 136 54
Other Ethnic Group or Not Specified 31 119

Three Probabilities

Define events A=vaccine hesitant and B=Asian or Asian British-Indian. Calculate the following probabilities for a randomly-selected person in this cohort.

  • Marginal probability of vaccine hesitancy, \(P(A)\)

  • Joint probability of vaccine hesitancy and Indian ethnicity, \(P(A \cap B)\)

  • Conditional probability of vaccine hesitancy given a person is of Indian ethnicity, \(P(A \mid B)\)


Independence

Events \(A\) and \(B\) are independent when \(P(A \mid B)=P(A)\) or \(P(B \mid A)=P(B)\).

In other words, knowing that one event has occurred does not lead to any change in the probability we assign to another event.


Checking Independence

We can use the multiplicative rule to check if two events are independent.

Events A and B are independent, if and only if \[P(A \cap B)=P(A)\times P(B).\]

Check it out: Are vaccine hesitancy and Indian ethnicity independent in our cohort?

Independent vs Disjoint Events

For independent events \(A\) and \(B\), \(P(A \mid B)=P(A)\) and \(P(B \mid A)=P(B)\), so knowing one event occurred tells us nothing about the chances the other event will occur.

For two disjoint or mutually exclusive events, knowing that one event has occurred tells us that the other event definitely has not occurred, e.g. \(P(A \cap B)=0\).

Disjoint events are therefore not independent!

Example: Breast Cancer Screening

Let \(A\) be the event that a woman has breast cancer (e.g., prevalence in population for a certain age group). Say \(P(A)=0.01\) for a 40-year-old woman.

Let \(B\) be the event that a screening mammogram is positive.

Once a person has a positive mammogram, our mental estimate of the probability she has breast cancer, now \(P(A \mid B)\), has increased. How much should it increase? Are we certain she has cancer, e.g. \(P(\text{cancer} \mid \text{mammo positive})=1\), or is there some chance the test is wrong?

Sensitivity and Specificity

A=has cancer B=mammogram positive

A diagnostic test like a mammogram is often characterized by its quality – we want a test to have good sensitivity (picking up cancer when a person really has it) and specificity (ruling out cancer when a person is cancer-free).

Sensitivity is \(P(B|A)\), and specificity is \(P(B^c \mid A^c)\).

A typical screening mammogram has sensitivity of 85% and specificity of 90%.


Bayes’ Theorem

Bayes’ theorem gives us a formal way to update our beliefs based on new information. It says \[P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}=\frac{P(A \cap B)}{P(B)}\].

In this example, a 40 year old woman with a positive screening mammogram may wish to know her chances of having cancer. Several papers have shown that even doctors tend to strongly overestimate her chances of having cancer.

We’ll consider two ways to solve this problem: one way using Bayes’ formula directly, and another based on a “hypothetical 10000” table, which applies known probabilities to a hypothetical population of 10,000 40 year old women.


Hypothetical 10,000

We really are still using Bayes’ theorem here, but it is hidden behind the scenes.

Here’s what we know.

  1. The prevalance of breast cancer among 40 year old women is 1% or 0.01.
  2. The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
  3. The specificity is 90% or 0.90.

Let’s construct a 2x2 table comparing true cancer status and mammogram results in our hypothetical population of 10,000 women.

Cancer No Cancer Total
Mammo +
Mammo -
Total 10000

Hypothetical Population

  1. The prevalance of breast cancer among 40 year old women is 1% or 0.01.
  2. The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.
  3. The specificity is 90% or 0.90.

Item 1 says the prevalence in this group is 1%, so then we expect to have \(10000\times0.01=100\) cases and \(10000\times0.99=9900\) cancer-free women.

Cancer No Cancer Total
Mammo +
Mammo -
Total 100 9900 10000

Hypothetical Population

  1. The sensitivity of a screening mammogram for diagnosing cancer is 85% or 0.85.

  2. The specificity is 90% or 0.90.

Item 2 gives the sensitivity, so \(P(\text{mammo +} \mid \text{cancer})=0.85\).

Thus in the group of 100 women with cancer, the mammogram should pick up \(100\times0.85=85\) of them, and miss the remaining \(100-85=15\).

Cancer No Cancer Total
Mammo + 85
Mammo - 15
Total 100 9900 10000

Hypothetical Population

  1. The specificity is 90% or 0.90.

Item 3 gives the specificity, so \(P(\text{mammo -} \mid \text{no cancer})=0.90\).

Thus in the group of 9900 women without cancer, the mammogram should correctly identify \(9900*0.90=8910\) of them as being cancer-free, and it will mistakenly identify \(9900-8910=990\) as having cancer (false positives).

Cancer No Cancer Total
Mammo + 85 990
Mammo - 15 8910
Total 100 9900 10000

Hypothetical Population

Now we complete the table by filling in the row totals.

Cancer No Cancer Total
Mammo + 85 990 1075
Mammo - 15 8910 8925
Total 100 9900 10000

At this point, it’s easy to calculate the conditional probability of cancer given a positive mammogram as \(\frac{85}{1075}=0.079\).


Bayes’ Theorem in Action

Alternatively, we could just use Bayes’ Theorem directly.

  • Baseline probability of cancer \(P(A)=0.01\) (prevalence)

  • She wants to know \(P(A \mid B)\), or her chances of having cancer given that her mammogram is positive (also called positive predictive value).

  • Bayes’ Theorem: \(P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}\).

  • Sensitivity is \(P(B \mid A)=0.85\).


Bayes’ Theorem in Action

  • How do we get \(P(B)\)? We can get this using the law of total probability: \(P(B)=P(B \mid A)P(A) + P(B \mid A^c)P(A^c)\).

    • We can get \(P(B \mid A^c)\) using the specificity \(P(B^c \mid A^c)=0.90\) and the fact that \(P(B^c \mid A^c)+P(B \mid A^c)=1\). So \(P(B \mid A^c)=1-0.9=0.1\).

    • Then \(P(B)=P(B \mid A)P(A) + P(B \mid A^c)P(A^c)=0.85\times0.01+0.1\times0.99=0.1075\)

  • Then \(P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}=\frac{0.85*0.01}{0.1075}=0.079\).


Bayes’ Theorem and Baseline Prevalence

Here, we can think of the 1% prevalence of breast cancer among 40 year old women as our prior probability a woman has cancer, and 7.9% as the posterior probability she has cancer after we see the data that her screening mammogram is positive.

Cancer would be confirmed or ruled out by subsequent testing, such as a diagnostic mammogram, ultrasound, or biopsy. At the time of the subsequent testing, we’d have an updated baseline probability of 7.9% for our 40 year old woman.


You Try It!

Most people who have a negative test result (e.g., mammogram looks good or COVID test negative) don’t worry any longer about whether they really do have disease. Are they right not to worry? Suppose our 40 year old woman with a baseline 1% breast cancer risk instead had a negative (all clear) mammogram. What is the updated probability she has breast cancer given this test result?


You Try It!

For more practice, go back to the vaccine hesitancy example.
1. Calculate the percentage of each ethnic group in the population provided. We can view this as the marginal probability of each ethnicity in that population.

  1. Given that someone is vaccine hesitant, calculate the conditional probability that the person belongs to each ethnic group, in turn. This information would be useful in helping to target policies and outreach campaigns.

Thank you!