Welcome to ECON 2250!
Week 2
Welcome!
Meet Afi Ramadhani!
- Education and career journey
- BS and MS in Petroleum Engineering from Institute of Technology Bandung (QS Top 100 in Petroleum Engineering)
- Academic professional and consultant for think tanks in Indonesia
- MS in Economics from Georgia Tech
- PhD Candidate in Economics at Georgia Tech
- I am an energy and environmental economist interested in examining the broad impact of climate change and energy transition 🙂
- You call me Afi or Professor or Prof. Afi or Prof. Ramadhani (no Dr yet)
Meet the Teaching Assistant (TA)!
- Rohit Borah: Head TA
Short Survey
Scan the QR code to fill out the survey!
https://gatech.instructure.com/courses/482358/quizzes/717963

Topics
Introduction to the course
Syllabus activity
Reproducibility
What is Statisics?
Statistics is the branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data. It helps us understand and describe patterns, relationships, and trends in data, and it allows us to make informed decisions in the presence of uncertainty
Source: ChatGPT (with modification)
What is Statisics?
There are two main branches of statistics:
Descriptive Statistics – Summarizes and organizes data using measures
Inferential Statistics – Makes inference or generalizations about a population based on a sample of data.
Source: ChatGPT (with modification)
Statistics in practices
Statistics in practice: Economics
Economic measurement
GDP = C + I + G + (X - M), Unemployment rate, Inflation, Consumer Spending, Median Income, Demographics
Typical process to publish these data:
Collect data from sample of population (Census, Survey, etc),
Create a measure from processing the data and then publish data,
Economist use the data to inform policy making
Statistics in practice: Economics
Testing Economic Theory
Do men and women have the same average wage?
Do people with college degrees earn more than those without?
Does increasing minimum wage reduce employment?
Are poverty rates the same in urbal and rural areas?
Does conservation nudge reduce electricity consumption?
Statistics in practice: Data Science
ECON2250
What is ECON2250?
MATH
Probability theory
+
DATA
Sampling and uncertainty
=
STATISTICS
Descriptive and Inference
Course goal: As student’s first statistics course, will equip students with fundamental concepts and tools for more advanced course, e.g. Econometrics, Data Science, or Machine Learning
Prerequisites: No prerequisites, but some math course are encouraged, e.g. MATH1551 and MATH1552
Course learning objectives
By the end of the semester, you will be able to…
- master the language and the fundamental concepts of probability theory.
- apply basic statistical inference techniques in empirical research settings with an understanding of their utility and limitations.
- implement a reproducible workflow using R for statistical analysis and simulation, Quarto to write reports and GitHub for version control and collaboration.
- understand data in economics/social science, including data types, data generating processes, data analysis, and how to communicate with data (data literacy).
- acquire foundation knowledge of statistical concepts to prepare for more advanced data analysis or econometrics courses.
Course topics
Probability
- Probability
- Conditional probability and independence
- Counting methods
- Random variables
- Expectation and moments
Data and Distributions
- Data and sampling
- Descriptive statistics
- Discrete random variables
- Continuous random variables
Statistical Inference
- Sampling distributions
- Estimation and confidence intervals
- Hypothesis testing
- Multiple hypothesis testing
- Simple linear regression
- Advanced topics (optional)
General topics
- Computing using R and GitHub
- Presenting statistical results
- Collaboration and teamwork
Course overview
Course toolkit
- Website: https://maghfiraer.github.io/Stats-F25
- Central hub for the course!
- Tour of the website
- Canvas: https://gatech.instructure.com/courses/482358
- Gradebook
- Office hours
- Announcements
- Gradescope
- Ed Discussion
- GitHub: https://github.com/Stats-F25
- Distribute assignments
- Platform for version control and collaboration
Hardware requirement
- Computer:
- Individual computer with the following software installed:
- R, R studio, Git and Github
- Access through the IAC VLab is available
- Individual computer with the following software installed:
- Textbook:
- Schervish, Mark J., and Morris H. DeGroot (SDG). Probability and Statistics, 4th Edition.
- Jason Abrevaya (JA). Probability and Statistics for Economics and Business An Introduction Using R
Computing toolkit
All analyses using R, a statistical programming language
Write reproducible reports in Quarto
Access RStudio through IAC VLab
Access assignments
Facilitates version control and collaboration
All work in Stats-F25 course organization (tentative)
Classroom community
It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.
If you have a name that differs from those that appear in your official Tech records, please let me know.
Please let me know your preferred pronouns, if you are comfortable sharing.
If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. If you prefer to speak with someone outside of the course, your advisers and deans are excellent resources.
I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said or done in class (by anyone) that made you feel uncomfortable, please talk to me about it.
Accessibility
The Office of Disability Services (ODS) is available to ensure that students are able to engage with their courses and related assignments.
If you have documented accommodations from ODS, please send the documentation as soon as possible.
I am committed to making all course activities and materials accessible. If any course component is not accessible to you in any way, please don’t hesitate to let me know.
Syllabus activity
- Read the portion of the syllabus assigned to your group.
- Discuss the key points and questions you my have with your neighbors.
- We’ll ask for volunteers to share a summary with the class.
Syllabus activity assignments
Group 1: What to expect in lectures and labs
Group 2: Homework and lab assignments
Group 3: Exams and project
Group 4: Participation
Group 5: Academic honesty (except AI policy)
Group 6: Artificial intelligence policy
Group 8: Late work and regrade request
Syllabus activity report out
Group 1: What to expect in lectures and labs
Group 2: Homework and lab assignments
Group 3: Exams and project
Group 4: Participation
Group 5: Academic honesty (except AI policy)
Group 6: Artificial intelligence policy
Group 8: Late work and regrade request
Grading
Category | Percentage |
---|---|
Homework | 15% |
Final project | 20% |
Lab | 15% |
Exams (2 midterms) | 40% |
Participation (AEs + Teamwork) | 10% |
Total | 100% |
Five tips for success in ECON2250
Complete all the preparation work before class.
Ask questions in class, office hours, and on Ed Discussion.
Do the homework and labs; get started on homework early when possible.
Don’t procrastinate and don’t let a week pass by with lingering questions.
Stay up-to-date on announcements on Ed Discussion and sent via email.
Questions?
Reproducible workflow
The perils of bad data cleaning
Published in the American Economic Review (2007):
The perils of bad data cleaning
DG’s baseline climate measure (dd89_7000) has a value of zero degree days for 163 counties. If correct, this measure implies temperatures do not exceed 8°C (46.4°F) in those counties during the growing season of April through September. Temperatures this low would seem implausible in any state, yet many of these counties are in warm southern states such as Texas.
The perils of bad data cleaning
The perils of bad data cleaning
Contrary to the results in DG (2007), the corrected data suggest that an immediate shift to the projected end-of-the-century climate would reduce agricultural profits.
Another example
Originally reported “the intervention, compared with usual care, resulted in a fewer number of mean COPD-related hospitalizations and emergency department visits at 6 months per participant.”
There were actually more COPD-related hospitalizations and emergency department visits in the intervention group compared to the control group
Mixed up the intervention vs. control group using “0/1” coding
Transparency and reproducibility
Avoiding errors is only the first step. It’s also critical to make your work reproducible.
In the private sector, the benefits may be more obvious.
Your code has to work together with other people’s code.
Eventually, someone else will take over your code.
In academic research, it’s equally important.
To trust the results – many research findings fail to replicate.
To build on your work and collaborate with others.
Many journals now require a full “replication package” of data and code.
The push for transparency and reproducibility is known as the open science movement.
Reproducibility: Can someone else run your code and get the exact same results?
Replication: If another analyst attempts the same question, do they get the same answer?
Transparency: Can everyone see what choices you made and how you got your results?
Reproducibility checklist
What does it mean for an analysis to be reproducible?
. . .
Near term goals:
✔️ Can the tables and figures be exactly reproduced from the code and data?
✔️ Does the code actually do what you think it does?
✔️ In addition to what was done, is it clear why it was done?
. . .
Long term goals:
✔️ Can the code be used for other data?
✔️ Can you extend the code to do other things?
Toolkit
Scriptability \(\rightarrow\) R
Literate programming (code, narrative, output in one place) \(\rightarrow\) Quarto
Version control \(\rightarrow\) Git / GitHub
R and RStudio
R is a statistical programming language
RStudio is a convenient interface for R (an integrated development environment, IDE)
RStudio IDE
Quarto
Fully reproducible reports – the analysis is run from the beginning each time you render
Code goes in chunks and narrative goes outside of chunks
Visual editor to make document editing experience similar to a word processor (Google docs, Word, Pages, etc.)
Quarto
How will we use Quarto?
Every application exercise and assignment is written in a Quarto document
You’ll have a template Quarto document to start with
The amount of scaffolding in the template will decrease over the semester
Version control with git and GitHub
What is versioning?
What is versioning?
with human readable messages
Why do we need version control?
Provides a clear record of how the analysis methods evolved. This makes analysis auditable and thus more trustworthy and reliable. (Ostblom and Timbers 2022)
git and GitHub
- git is a version control system – like “Track Changes” features from Microsoft Word.
- GitHub is the home for your git-based projects on the internet (like DropBox but much better).
- There are a lot of git commands and very few people know them all. 99% of the time you will use git to add, commit, push, and pull.
Caveat
Before next class
Complete JA Chapter 4
Review the updated syllabus
Office hours start today, Monday, August 25 (5-6pm)