Economists use data to empirically test economic theory, or the other way, we infer economic behavior empirically from the data.
One goal of statistics is to answer a research question, by making inferences about a population based on data in one or more sample.
Population: the entire group we would like to make conclusions about (e.g., all people aged 15 and up in the US, all university student worldwide)
Sample: specific group we have collected data from (e.g., a random sample of people aged 15 and up in sampled households, from sampled villages from sampled counties; university students at Geogia Tech)
The validity of our inferences depends on a variety of factors, including the representativeness of our sample
Question: How representative of worldwide students would a sample from Georgia Tech be?
Drawing Conclusions (Inference)
In order to draw principled conclusions from our data, we rely on a formal probabilistic framework that allows us to quantify uncertainty.
Statistical inference is built upon the foundation of probability theory.
Motivating example: Email campaign
The website Widgets.com has 3,000 total registered users and wants to test the effectiveness of two possible e-mail campaigns. E-mail A is sent to 300 users at random, e-mail B is sent to 300 users at random, and the other 2,400 users receive no e-mail. For each registered user, Widgets.com has the following information at some point (say, one week) after the e-mail campaigns:
\[
\text{Purchase}_i=\begin{cases}Y & \text{if user made a purchase in the last week}\\
N & \text{if user didn't make a purchase in the last week}
\end{cases}
\]
What do we mean when we say e-mail is sent at random?
Motivating example: Email campaign
There are several ways. Here’s one way:
Draw 300 user id out of the 3000 user id to receive email A (sampling without replacement)
Draw 300 user id out of the remaining 2700 user id to receive email B (sampling without replacement)
The remaining 2400 user id receive no email
Motivating example: Email campaign
In practice, we can use some help of R to do this:
random_draw <-sample(1:3000,600) # Randomly pick 600 id from 1 to 3000emailA_recipient <- random_draw[1:300] # Assign the first 300 draw to receive email AemailB_recipient <- random_draw[301:600] # Assign the next 300 draw to receive email Bnoemail <-setdiff(1:3000,union(emailA_recipient,emailB_recipient))cat(" First 5 user id to receive email A:",head(emailA_recipient))
First 5 user id to receive email A: 2324 1374 2728 2964 1618 1801
cat("\n First 5 user id to receive email B:",head(emailB_recipient))
First 5 user id to receive email B: 2721 761 2156 504 1313 1419
cat("\n First 5 user id thar receive no email:",head(noemail))
First 5 user id thar receive no email: 3 6 7 8 9 10
Motivating example: Email campaign
Before the campaign, there are some know probabilities and some unknown probabilities
For each user, the known probabilities are:
Probability of receiving email A: 300/3000 or 10%,
Probability of receiving email B: 300/3000 or 10%,
Probability of receiving no email: 2400/3000 or 80%
What about the unknown probability?
For any user receiving email A, the probability that she makes a purchase is unknown
For any user receiving email B, the probability that she makes a purchase is unknown
For any user receiving no email, the probability that she makes a purchase is unknown
Motivating example: Email campaign
The goal of Widgets.com is to determine which campaigns is more effective. For them, these unknown probability are of interest. They may be what to know the answer to the following:
Is campaign A more (or less) effective than campaign B? In terms of probability, is the probability of purchase by an e-mail A recipient higher (or lower) than the probability of purchase by an e-mail B recipient?
Is campaign A more (or less) effective than no campaign?
Is campaign B more (or less) effective than no campaign?
Or may be they’re more interested in the extensive margin:
Whether e-mail A recipient spend more than e-mail B recipient, and so on.
Probability
The probability of an event tells us how likely an event is to occur, and it can take values from 0 to 1, inclusive.
It can be viewed as the proportion of times the event would occur if it could be observed an infinite number of times (frequentist interpretation)
It can also be viewed as our degree of belief an event will happen (classical interpretation)
Formalizing Probability
Experiment
Definition 2.1 (JA) An experiment is any process whose outcome is subject to uncertainty
Example:
tossing a coin multiple times
obtaining income information from one or more people
measuring revenue of firms in a given year
We then define the possible outcomes for this experiment as the sample space
Sample space
Definition 2.2 (JA) The sample space, \(\mathcal{S}\), is the set of all possible outcomes for an experiment
Example:
tossing a coin multiple times: e.g. toss coin two times, \(\small\mathcal{S}=\{HH,HT,TH,TT\}\)
obtaining income information from one or more people: e.g. \(\small\mathcal{S}=\{<\text{30k USD per year}, \geq \text{30k USD per year} \}\), or \(\small\mathcal{S}=\{<\text{100k USD per year} ,\geq \text{100k USD per year} \}\), or \(\small\mathcal{S}=\{\mathbb{R}^{+}\}\)
Events
Definition 2.3 (JA) An event is any subset of outcome within \(\mathcal{S}\). A simple event is exactly one outcome from \(\mathcal{S}\), whereas a composite event consists of more than one outcome from \(\mathcal{S}\).
Definition 2.4 (JA) For any event \(E\) with finite # of outcomes, \(|E|\) denote the # of outcomes in event \(E\).
Thus, \(E\) is a simple event if \(|E|=1\), whereas \(E\) is a composite event if \(|E|>1\).
Example Suppose we roll a 6-sided dice. A is the event we roll a 6, while B is the event we roll at least 4, i.e, either 4,5, or 6 from \(\mathcal{S}=\{1,2,3,4,5,6\}\). Thus A is simple event, while B is composite event.
Operations on Events
Let’s use a running example \(A\) is the event that a person is a cigarette smoker \(B\) is the event that a person identifies as female
The union of \(A\) and \(B\), denoted \(A\cup B\), is the event that \(A\), or \(B\), or both \(A\) and \(B\), occur. Here, \(A\cup B\) is the event that a person is a smoker, or identifies as female, or both.
The intersection of \(A\) and \(B\), denoted \(A\cap B\), is the event that both \(A\) and \(B\) occur. Here, \(A\cap B\) is the event that a person both smokes and identifies as female. \(A\) and \(B\) are disjoint or mutually exclusive if \(A\cap B=\emptyset\) (\(A\) and \(B\) can’t occur simultaneously). That’s clearly not the case for smoking and identifying as female.
The complement of \(A\), denoted \(A^C\) or \(\overline{A}\), is the event \(A\) does not occur (nonsmoker). \(A\) and \(A^C\) are mutually exclusive (\(A\cap A^C=\emptyset\)).
More Operations
Venn Diagram
\(E\) is contained in \(D\) if every element of \(E\) is also in \(D\), i.e. \(E\subset D\).
If \(E\subset D\) and \(F\subset E\), then \(F\subset D\).
If \(E\subset D\) and \(D\subset E\), then \(D=E\).
Intuitively, we can think of the probability of an outcome (or set of outcomes) as the proportion of times the outcome (or set of outcomes) would occur if we observed the random process infinitely many times.
If all the outcomes in our random process (sample space \(\mathcal{S}\)) are equally likely, then for some event \(E\),
\[P(E)=\frac{|E|}{|S|}.\]
We’ll talk more about counting # in Week 3.
Example: Coin toss
Using our previous example when we toss a coin two times. Assume the probability of drawing head and tail is equally likely.
Let’s compute the probability of the event \(A\) of drawing two heads.
So the probability of drawing two heads is \(P(A)=|A|/|S|=1/4\).
Properties of Probability
Axiom of Probability
Axiom 1 The probability of any event \(A\) in the sample space is \(0\leq P(A)\leq 1\)
Axiom 2 The probability of the entire sample space is \(P(\mathcal{S})= 1\)
Axiom 3 For every finite (or alternatively, infinite) sequence of disjoint events \(A_1,A_2,...,A_k\)\[P(A_1\cup A_2 \cup ... \cup A_k)=P\left(\bigcup_{i=1}^{k}A_i\right)=\sum_{i=1}^{k}P(A_i)\]
If we know the probability of \(A\), i.e. P(A), it is easy to calculate the probability of \(A^C\) as \(P(A^C)=1-P(A)\). This is called the complement rule: \(P(A)+P(A^C)=1\).
Additive Rule of Probability
When events are mutually exclusive (cannot occur together), \[P(A\cup B)=P(A)+P(B)\]
When two events can occur simultaneously (think about the overlapping sections in the Venn diagrams), then we need to avoid double-counting when calculating the probability either of two events will occur.
The general additive rule of probability is therefore \[P(A\cup B)=P(A)+P(B)-P(A\cap B)\] because \(A\cap B\) is part of the event A and part of the event B, we need to avoid double-counting it.
If two events \(A\) and \(B\) are mutually exclusive, then \(P(A\cap B)=0\).
Example: Three website visitors
Consider these events on total purchase made by the first three website visitor:
Event that no purchase are made: \(A_0=\{NNN\}\)
Event that one purchase is made: \(A_1=\{YNN,NYN,NNY\}\)
Event that two purchase are made: \(A_2=\{YYN,YNY,NYY\}\)
Event that three purchase are made: \(A_3=\{YYY\}\)
Example: Picking a ball from a box
One ball is to be selected from a box containing red, white, blue, yellow, and green balls. If the probability that the selected ball will be red is 1/5 and the probability that it will be white is 2/5.
What is the probability that it will be red or white?
Example: Student stats midterm
If the probability that student \(A\) will fail a certain statistics examination is 0.5, the probability that student \(B\) will fail the examination is 0.2, and the probability that both student \(A\) and student \(B\) will fail the examination is 0.1.
What is the probability that at least one of these two students will fail the examination?