Name | Gender | Wages as man | Wages as woman |
---|---|---|---|
Scott | Man | 32 000 CZK | - |
Ramona | Woman | - | 31 000 CZK |
Wallace | Man | 36 000 CZK | - |
Kim | Woman | - | 35 000 CZK |
Niel | Man | 34 000 CZK | - |
Knives | Woman | - | 33 000 CZK |
Introduction
Statistical modeling used for two things:
Predictive modeling
Explanative modeling/Causal Inferece
No sure way to do it, causal inference is an unsolved problem.
Doing causal inference can be uncomfortable, but the benefits are great.
We are interested in what happens with variable \(Y\) if variable \(X\) changes.
In a way, we are predicting an alternative reality, where everything is the same, except for the value of \(X\).
How would math skill change if we started using project based learning?
How likely would be people be to purchase a product, if they were offered a sale?
What would the GDP of Czechia be, if we never entered the European Union?
Name | Gender | Wages as man | Wages as woman |
---|---|---|---|
Scott | Man | 32 000 CZK | - |
Ramona | Woman | - | 31 000 CZK |
Wallace | Man | 36 000 CZK | - |
Kim | Woman | - | 35 000 CZK |
Niel | Man | 34 000 CZK | - |
Knives | Woman | - | 33 000 CZK |
Estimand - theoretical quantity of interest.
Estimator - the way we estimate the estimand
Estimate - the outcome of our analysis.
Theoretical quantity of interest
The number we want to get an unbiased estimate of of.
Is attending pre-election debates worth it to politicians?
Is the police unfair towards minorities?
Is having children economically beneficial?
Would we make more money by selling our products at lower price?
Is being a woman disadvantageous in academia?
Two parts:
Unit specific quantity
Target population of interest
Unit specific quantity
The number that will represent the effect of interest.
Should be clear, if we are after causal effects or correlations.
Target population of interest
Research question: Is attending pre-election debates worth it to politicians?
Important and interesting question, but very vague.
More focused one: What’s the effect of watching a pre-election debate on the perspective voters?
Estimand | Advantage | Disadvantage |
---|---|---|
Difference in probability of voting for a candidate, if voter saw vs didn't see the pre-election debate | Super relavent to both base and applied research. | We can't directly observe voting patterns for individuals |
Difference in probability of reported vote for a candidate, if voter saw vs didn't see the pre-election debate | Reported votes can be easily observed | Reported and actual votes are not the same |
Difference in logits of reported vote for a candidate, if voter saw t vs didn't see he pre-election debate | lol | Getting unbiased estimates for logits is super hard |
Difference in reported preferences on 5-point likert scale for a candidate, if voter saw vs didn't see the pre-election debate | More granular measurment | Likert scales less intuitive |
Population of TV viewers.
Population of likely voters in Czechia.
Population of voters in Czechia.
Population of voters.
Reseach question | Unit specific quantity | Target population of interest |
---|---|---|
Is being a felon worse for blacks than for whites? | Difference in whether application would be called back if it signaled White with a felony vs. Black without | Applications to jobs in Milwaukee |
Is police more likely to target minorities? | Difference in whether person i would be stopped if perceived as Black vs. White | Those stopped by police |
Does having rich parents increases income in adulthood? | Adult income that person i would be realized if childhood income took a particular value | U.S. population |
The difference in expected probability of voting for candidate for voters who saw vs didn’t see the debate.
\(P(vote = x|debate= 1) - P(vote = x|debate= 0)\)
The difference in means on 5-point likert scale for voters who saw vs didn’t see the debate
\(E(Y|debate = 1) - E(Y|debate = 0)\)
The difference in means on 5-point likert scale for voters who saw vs didn’t see the debate, conditional on variables \(X\).
\(E(Y|debate = 1, X) - E(Y|debate = 0, X)\)
Linear regression
Propensity score matching
Difference-in-differences
Fixed effects
At the beginning, you need to define two things:
What you are estimating (unit specific quantity)
For which population (target population of interest)
Then you define how to represent it mathematically (Empirical Estimand)
Then you define a strategy to get the estimate.
Developed by Donald Rubin & bros.
The OG framework for causal inference.
Most of natural/technical sciences are based on it.
Basis for experimental design
Potential outcome - outcome given a value of treatment.
Treatment - the focal independent variable
We cannot observe individual effects.
But (under certain conditions), we can observe the average treatment effect!
Two primary assumptions:
Ignorability
SUTVA
The probability of receiving treatment has to be independent of the potential outcomes.
Simpler: The probability of being in treatment group shouldn’t be dependent on outcome.
Broken when:
Schools with high rate of bullying are more likely to receive anti-bullying training.
People more susceptible to change voting preferences are more likely to see pre-election debates.
Stable Unit Treatment Value Assumption.
Two-parter:
Everyone receives the same treatment.
One unit/respondent receiving treatment doesn’t influence others.
Broken when:
Teacher who received anti-bullying training starts teaching their colleagues.
People who saw the debate starts to talk with people who didn’t.
Nonprofit supporting the European Union.
Communication experts think framing messages as “European” is more effective than framing as “EU”.
We want to test this empirically.
Survey experiment
Formulate a set of statements
ID | Statement |
---|---|
1 | Spolupráce v [Evroské unii/Evropě] je výhodná pro českou ekonomiku. |
2 | ČR je světovou velmocí díky [Evropské unii/Evropě] |
3 | Díky společnému postupu v [Evropské unii/Evropě] jsme silnějš |
4 | V rámci [Evropské unie/Evropě] potřebujeme posilovat odolnost |
5 | Když je bezpečnostně silná [Evropská unie/Evropa], tak je silné i Česko |
6 | Spolupráce v rámci [Evropské unie/Evropy] pomůže udržet ceny energií pod kontrolou |
7 | [Evropská unie/Evropa] by měla usilovat o to stát se světovým lídrem v zelených technologiích |
8 | [Evropská unie/Evropa] by se měla vzdát části svého blahobytu ve prospěch důslednější ochrany životního prostředí |
Ignorability:
SUTVA:
Respondents don’t know each other -> No influence
Statements are the same for everyone -> One version of treatment