Estimation and Confidence Intervals

Week 10

Author

Maghfira Ramadhani

Published

Oct 22, 2025

Plan

In today’s lecture, we will learn about:

  1. Estimator and its desired properties
  2. Confidence intervals of sample mean of normal random variables (finite sample)
    1. One-sided confidence intervals
    2. Two-sided confidence intervals
  3. Confidence intervals of sample mean from large sample

Textbook Reference: JA 14


Estimator

Estimand \(\theta\) Estimator \(\hat{\theta}_X\) Estimate \(\hat{\theta}_X\)
Mean \(\mu_X\) \(\bar{X}\) \(\bar{x}\)
Variance \(\sigma_X^2\) \(s_X^2\) \(s_x^2\)
Standard Deviation \(\sigma_X\) \(s_X\) \(s_x\)
Median \(\tau_{X,0.5}\) \(\tilde{X}_{0.5}\) \(\tilde{x}_{0.5}\)
Median \(\tau_{X,q}\) \(\tilde{X}_{q}\) \(\tilde{x}_{q}\)
IQR \(\tau_{X,0.75}-\tau_{X,0.25}\) \(\tilde{x}_{0.75}-\tilde{x}_{0.25}\) \(\tilde{x}_{0.75}-\tilde{x}_{0.25}\)
Correlation \(\rho_{XY}\) \(r_{XY}\) \(r_{xy}\)
  • Estimand = parameter or quantity of interests
  • Estimator = a random variable before we observe our sample
  • Estimate = realization of estimator after sample is observed or the calculated statistics

Property of estimator

  1. Unbiased: \(E(\hat{\theta}_X)=\theta\) for any sample size \(n\)

  2. Consistent: \(\hat{\theta}_X\underset{p}{\rightarrow} \theta \quad \text{ as } n\rightarrow\infty.\)

  3. Asymptotically normal: \(\hat{\theta}_X\overset{p}{\sim} N(\theta,\frac{V}{n}) \quad \text{ as } n\rightarrow\infty.\)

  4. Efficiency: An estimator is more efficient when the asymptotic variance is smaller.

We’ve discussed some example of estimator:

  • Sample mean \(\bar{X}=\frac{1}{n}\sum_i X_i\) is unbiased estimator of population mean \(\mu_X=E(X)\)

  • Sample variance \(s_X^2=\frac{1}{n-1}\sum_i (X_i-\bar{X})^2\) is unbiased estimator of population variance \(\sigma_X=Var(X)=E((X-\mu_X)^2)\)


Confidence intervals

  • Confidence intervals: an interval of plausible values for a parameter/estimands \(\theta\) based on an estimator \(\hat{\theta}_X\).

  • A 95% confidence interval for the for a parameter is an interval for which, before observing the sample, there is 95% probability that the parameter is in the interval created by the estimation procedure


Confidence intervals: Normal

  • Assume \(X_i\overset{iid}{\sim} N(\mu,\sigma^2)\)

  • Goal: Estimate the confidence interval for \(\mu\) based on the sample mean estimator \(\bar{X}\)

  • We start from results from previous class: \[\bar{X}\overset{a}{\sim} N\left(\mu,\frac{\sigma^2}{n}\right)\quad\text{ or }\quad \frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\overset{a}{\sim} N\left(0,1\right).\]

  • Assuming we know \(\mu\) and \(\sigma\), then the 95% probability interval for \(\bar{X}\) is \[\left(\mu-1.96\frac{\sigma}{\sqrt{n}},\mu+1.96\frac{\sigma}{\sqrt{n}}\right)\] where -1.96 is the 2.5% quantile of standard normal and 1.96 is the 97.5% quantile of standard normal.


Confidence intervals: Normal


Confidence intervals: Normal


Confidence intervals: Normal

  • We want to play around a little bit so that we got the confidence interval for \(\mu\)

  • The 95% probability interval for \(\mu\) is \[P\left(\bar{X}-1.96\frac{\sigma}{\sqrt{n}}< \mu<\bar{X}+1.96\frac{\sigma}{\sqrt{n}}\right)=0.95\] where -1.96 is the 2.5% quantile of standard normal and 1.96 is the 97.5% quantile of standard normal.

  • Replacing the estimator with our estimate, the 95% confidence interval for \(\mu\) is \[P\left(\bar{x}-1.96\frac{\sigma}{\sqrt{n}}< \mu<\bar{x}+1.96\frac{\sigma}{\sqrt{n}}\right)=0.95\] where -1.96 is the 2.5% quantile of standard normal and 1.96 is the 97.5% quantile of standard normal.

  • Can we use this directly? We know \(n\) but do we know \(\sigma\)?


Confidence intervals: Normal, Unknown Variance

  • What if we use estimator for \(\sigma\), i.e., our sample standard deviation \(s_X\)?

  • Go back to previous class \[\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\overset{a}{\sim} N\left(0,1\right)\quad \text{ but }\quad\frac{\bar{X}-\mu}{s_X/\sqrt{n}}\overset{}{\sim} \text{?}\]

  • We know that \(\bar{X}-\mu\) is normal, but what is the distribution of \(s_X\)?

  • Recall \(s_X^2=\frac{1}{n-1}\sum_i (X_i-\bar{X})^2\), what is the distribution of the sum of normal?


Confidence intervals: Normal, Unknown Variance

  • Turns out when we have two random variables \(Y\) and \(Z\) such that \(Y\sim\chi_m^2\) and \(Z\sim N(0,1)\), then a random variable \[W=\frac{Z}{\sqrt{Y/m}}\sim t_m\] where \(t_m\) is \(t\)-distribution with \(m\) degree of freedom

  • With some math (omitted), if we assume \(X_i\overset{iid}{\sim} N(\mu,\sigma^2)\) then \[\frac{\bar{X}-\mu}{s_X/\sqrt{n}}\overset{}{\sim}t_{n-1}\]


Confidence intervals: Normal, Unknown Variance

t-distribution approximate normal for large \(n\)


Confidence intervals: Normal, Unknown Variance

t-distribution approximate normal for large \(n\)


Confidence intervals: Normal, Unknown Variance

  • The 95% probability interval for \(\bar{X}\) is \[P\left(\bar{X}-t_{n-1,0.025}\frac{s_X}{\sqrt{n}}< \mu<\bar{X}+t_{n-1,0.025}\frac{s_X}{\sqrt{n}}\right)=0.95\] where \(t_{n-1,0.025}\) is the quantile of \(t\)-distribution with \(n-1\) degree of freedom at 97.5% (1-0.025)

Confidence intervals: Normal, Unknown Variance

  • Now we can use our estimator to get the confidence intervals.

  • The 95% confidence interval for \(\mu\) is \[P\left(\bar{x}-t_{n-1,0.025}\frac{s_x}{\sqrt{n}}< \mu<\bar{x}+t_{n-1,0.025}\frac{s_x}{\sqrt{n}}\right)=0.95\] where \(t_{n-1,0.025}\) is the quantile of \(t\)-distribution with \(n-1\) degree of freedom at 97.5% (1-0.025)

  • More generalized version, the (1-\(\alpha\)) confidence interval for \(\mu\) is \[P\left(\bar{x}-t_{n-1,\alpha/2}\frac{s_x}{\sqrt{n}}< \mu<\bar{x}+t_{n-1,\alpha/2}\frac{s_x}{\sqrt{n}}\right)=1-\alpha\] where \(t_{n-1,\alpha/2}\) (a.k.a. critical value) is the quantile of \(t\)-distribution with \(n-1\) degree of freedom at (1-\(\alpha/2\))


Confidence intervals: Example


Confidence intervals: Intuition

Simulation of random sample of n=8 from \(N(10,4)\)


Confidence intervals: One sided

  • What if we’re interested in one-sided confidence interval for \(\mu\)?

  • Using the same method as before, we can get the probability interval for \(\mu\): \[P\left(\mu>\bar{X}-t_{n-1,\alpha}\frac{s_X}{\sqrt{n}}\right)=1-\alpha\] and similarly \[P\left(\mu<\bar{X}+t_{n-1,\alpha}\frac{s_X}{\sqrt{n}}\right)=1-\alpha\]


Confidence intervals: One sided

  • The associated \(1-\alpha\) confidence interval for \(\mu\) is given by \[\left(\bar{x}-t_{n-1,\alpha}\frac{s_x}{\sqrt{n}},\infty\right)\] and similarly \[\left(-\infty,\bar{x}+t_{n-1,\alpha}\frac{s_x}{\sqrt{n}}\right)\]

Confidence intervals: Large \(n\)

  • Leveraging the asymptotically normal distirbution of large sample size, we can use normal quantile instead of t-distribution quantiles.
  • Two-sided confidence interval for \(\mu\): the (1-\(\alpha\)) confidence interval for \(\mu\) is \[P\left(\bar{x}-z_{\alpha/2}\frac{s_x}{\sqrt{n}}< \mu<\bar{x}+z_{\alpha/2}\frac{s_x}{\sqrt{n}}\right)=1-\alpha\] where \(z_{\alpha/2}\) is the \(1-\alpha/2\) quantile of standard normal
  • One-sided confidence interval for \(\mu\): the \(1-\alpha\) confidence interval for \(\mu\) is given by \[\left(\bar{x}-t_{n-1,\alpha}\frac{s_x}{\sqrt{n}},\infty\right)\quad\text{and}\quad\left(-\infty,\bar{x}+t_{n-1,\alpha}\frac{s_x}{\sqrt{n}}\right)\]

Confidence intervals: Example


Up next

  • Research topics due tonight
  • HW4 assigned tomorrow
  • Hypothesis testing next week