Math rules
This page contains a summary mathematical rules we’ll use in this course.
Counting Methods
Permutations
The number of distinct ordering of \(k\) items selected without replacement from a collection of \(n\) different items \((0\leq k \leq n)\) is \[P_{n,k}=\frac{n!}{(n-k)!}.\]
Combinations
The number of distinct subsets of size \(k\) that can be chosen from a set of size \(n\) is \[C_{n,k}=\frac{P_n,k}{k!}=\frac{n!}{k!(n-k)!}.\]
Conditional Probability
Multiplication Rule for Conditional Probability
Let \(A\) and \(B\) be events. If \(P(B)>0\), then \[P(A\cap B)=P(B)P(A\mid B).\] If \(P(A)>0\), then \[P(A\cap B)=P(A)P(B\mid A).\]
(A more general version is available as Theorem 2.1.2 (SDG))
Law of Total Probability
Suppose that events \(B_1,\ldots,B_k\) form a partition of the space \(\mathcal{S}\) and \(P(B_j)>0\) for \(j=1,\ldots,k.\) Then, for every event \(A\) in \(\mathcal{S}\), \[P(A)=\sum_{j=1}^kP(B_j)P(A\mid B_j).\]
Bayes’ Theorem
Let the events \(B_1,\ldots,B_k\) form a partition of the space \(\mathcal{S}\) such that \(P(B_j)>0\) for \(j=1,\ldots,k,\) and let \(A\) be an event such that \(P(A)>0.\) Then, for \(i=1,\ldots,k,\) \[P(B_i\mid A)=\frac{P(B_i)P(A\mid B_i)}{\sum_{j=1}^k P(B_j)P(A\mid B_j)}.\]
Random Variables and Distribution
Probability Density Function
If \(X\) has a continuous distribution, the function \(f\) is called the probability density function (p.d.f.) of \(X\) such that
\[P(X\geq a)=\int_a^{\infty} f(x)dx,\] or equivalently
\[P(X\leq b)=\int_{-\infty}^b f(x)dx.\] Every p.d.f. \(f\) must satisfy:
- \(f(x)\geq 0\) for all \(x\), and 
- \(\int_{-\infty}^{\infty}f(x)dx=1\). 
The closure of the set \(\{x:f(x)>0\}\) is called the support of \(X\).
It might be easier to understand all these with graphical illustration I gave you in class
Cumulative Distribution Function
The cumulative distribution function (c.d.f.) \(F\) of a random variable \(X\) is the funtion
\[F(x)=P(X\leq x) \text{ for } -\infty<x<\infty.\]
The c.d.f. of a Continuous Distribution
Let \(X\) have a continuous distribution with p.d.f. \(f(x)\) and c.d.f. \(F(x)\). Then \(F\) is continuous at every \(x\), \[F(x)=\int_{-\infty}^x f(t)dt,\] and \[\frac{dF(x)}{dx}=f(x),\] at all \(x\) such that \(f\) is continuous
Quantile Function
Quantile is basically the inverse of a c.d.f., it basically say for
more formally
Bivariate Distribution
Marginal Distribution
Expectation
Expected value of random variable \(X\)
The expected value of a random variable \(\mathbf{X}\) is a weighted average, i.e., the mean value of the possible values a random variable can take weighted by the probability of the outcomes.
Let \(f_X(x)\) be the probability distribution of \(X\). If \(X\) is continuous then
\[ E(X) = \int_{-\infty}^{\infty}xf_X(x)dx \]
If \(X\) is discrete then
\[ E(X) = \sum_{x \in X}xf_X(x) = \sum_{x\in X}xP(X = x) \]
Expected value of vector \(\mathbf{z}\)
Let \(\mathbf{z} = \begin{bmatrix}z_1 \\ \vdots \\z_p\end{bmatrix}\) be a \(p \times 1\) vector of random variables.
Then \(E(\mathbf{z}) = E\begin{bmatrix}z_1 \\ \vdots \\ z_p\end{bmatrix} = \begin{bmatrix}E(z_1) \\ \vdots \\ E(z_p)\end{bmatrix}\)
Expected value of vector \(\mathbf{Az}\)
Let \(\mathbf{A}\) be an \(n \times p\) matrix of constants and \(\mathbf{z}\) a \(p \times 1\) vector of random variables. Then
\[ E(\mathbf{Az}) = \mathbf{A}E(\mathbf{z}) \]
Expected value of \(\mathbf{Az} + \mathbf{C}\)
Let \(\mathbf{A}\) be an \(n \times p\) matrix of constants, \(\mathbf{C}\) a \(n \times 1\) vector of constants, and \(\mathbf{z}\) a \(p \times 1\) vector of random variables. Then
\[ E(\mathbf{Az} + \mathbf{C}) = E(\mathbf{Az}) + E(\mathbf{C}) = \mathbf{A}E(\mathbf{z}) + \mathbf{C} \]
Expected value of \(\mathbf{AXA}\mathsf{^T}\)
Let \(\mathbf{A}\) be an \(n\times p\) matrix of constants and \(\mathbf{X}\) a \(p \times p\) matrix. Then
\[ E(\mathbf{AXA}^\mathsf{T}) = \mathbf{A}E(\mathbf{X})\mathbf{A}^\mathsf{T} \]
Variance
Variance of random variable \(X\)
The variance of a random variable \(X\) is a measure of the spread of a distribution about its mean.
\[ Var(X) = E[(X - E(X))^2] = E(X^2) - E(X)^2 \]
Variance of vector \(\mathbf{z}\)
Let \(\mathbf{z} = \begin{bmatrix}z_1 \\ \vdots \\z_p\end{bmatrix}\) be a \(p \times 1\) vector of random variables. Then
\[ Var(\mathbf{z}) = E[(\mathbf{z} - E(\mathbf{z}))(\mathbf{z} - E(\mathbf{z}))^\mathsf{T}] \]
This produced the variance-covariance matrix
\(Var(\mathbf{z}) = \begin{bmatrix}Var(z_1) & Cov(z_1, z_2) & \dots & Cov(z_1, z_p)\\ Cov(z_2, z_1) & Var(z_2) & \dots & Cov(z_2, z_p) \\ \vdots & \vdots & \dots & \cdot \\ Cov(z_p, z_1) & Cov(z_p, z_2) & \dots & Var(z_p)\end{bmatrix}\)
Variance of \(\mathbf{Az}\)
Let \(\mathbf{A}\) be an \(n \times p\) matrix of constants and \(\mathbf{z}\) a \(p \times 1\) vector of random variables. Then
\[ \begin{aligned} Var(\mathbf{Az}) &= E[(\mathbf{Az} - E(\mathbf{Az}))(\mathbf{Az} - E(\mathbf{Az}))^\mathsf{T}] \\ & = \mathbf{A}Var(\mathbf{z})\mathbf{A}^\mathsf{T} \end{aligned} \]
Probability distributions
Multivariate normal distribution
Let \(\mathbf{z}\) be a \(p \times 1\) vector of random variables, such that \(\mathbf{z}\) follows a multivariate normal distribution with mean \(\boldsymbol{\mu}\) and variance \(\boldsymbol{\Sigma}\). Then the probability density function of \(\mathbf{z}\) is
\[f(\mathbf{z}) = \frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}}\exp\Big\{-\frac{1}{2}(\mathbf{z} - \boldsymbol{\mu})^\mathsf{T}\boldsymbol{\Sigma}^{-1}(\mathbf{z}- \boldsymbol{\mu})\Big\}\]
Linear transformation of normal random variable
Suppose \(\mathbf{z}\) is a multivariate normal random variable with mean \(\boldsymbol{\mu}\) and variance \(\boldsymbol{\Sigma}\). A linear transformation of \(\mathbf{z}\) is also multivariate normal, such that
\[ \mathbf{A}\mathbf{z} + \mathbf{B} \sim N(\mathbf{A}\boldsymbol{\mu} + \mathbf{B}, \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^\mathsf{T}) \]
