Business Data Science by Matt Taddy
Author:Matt Taddy
Language: eng
Format: epub
Publisher: McGraw-Hill Education
Published: 2019-03-15T16:00:00+00:00
Significance disappears! The direction of the effect also changes—now abortion causes more murders—but really what is going on is that we’ve added so many variables to the model that n ≈ p. We just don’t have enough data to get a decent estimate.
The point here is not that the Freakonomics authors are “wrong” or that they’ve made a statistical error. Indeed, in their original paper they caution about the weakness we’ve exploited:
That abortion is only one factor influencing crime . . . points out the caution required in drawing any conclusions regarding an abortion–crime link based on time-series evidence alone.
Rather, this example illustrates that whenever your analysis is premised on conditional ignorability, you are always susceptible to others introducing additional controls until the model is nearly saturated and you can’t measure anything. This is the weakness of using OLS—a low-dimensional method—as your regression tool.
YOU CAN BETTER CONTROL FOR CONFOUNDERS BY MODELING THE TREATMENT ASSIGNMENT PROCESS. That is, you need to take seriously and estimate the second “treatment regression” line of the LTE system in Equation 6.2. You can do this using ML tools such as the lasso and cross validation. The process involves many of the same ideas from how we used regularization to improve predictions in the face of many potential inputs. However, the fact that we are now estimating a causal treatment effect means that we need to completely rework our model-building recipes around this goal. A naive application of ML tools for causal inference can lead to a mess of incorrect results.
Going back to Chapter 3, we’ve followed a simple model-building recipe: use a path of penalties to create a set of candidate models and then use predictive performance—measured either via cross validation or an information criterion—as the metric for choosing the best model among this set. A key characteristic here is the focus on unstructured prediction as the basis for model evaluation: you are seeking to do the best job forecasting y at new x drawn from the same distribution as the inputs in the training sample. That is, you are choosing models to do well in predicting new data drawn from p(x, y), the same joint data generating process (DGP) that provided the training data (think about the CV algorithm to convince yourself of this). However, you now have a special input d, the treatment, and you want to know the treatment effect on y when d moves independent of all other influences. That is, you no longer want to do a good job predicting ŷ under the existing DGP but rather under the DGPs that arise when you change d yourself.
The idea behind structural or counterfactual prediction is to remove from the treatment effect estimate, , the effect of other influences that are correlated with d. As in the earlier discussions, these outside influences are called controls or confounders, and they can pollute your treatment effect estimate if their effect is confused with that of d. Again, this is simple when you have a low-dimensional set of potential confounders—you just include them in your regression.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Biomathematics | Differential Equations |
Game Theory | Graph Theory |
Linear Programming | Probability & Statistics |
Statistics | Stochastic Modeling |
Vector Analysis |
Weapons of Math Destruction by Cathy O'Neil(5029)
Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling(4013)
Factfulness_Ten Reasons We're Wrong About the World_and Why Things Are Better Than You Think by Hans Rosling(2751)
Descartes' Error by Antonio Damasio(2728)
A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra) by Barbara Oakley(2688)
TCP IP by Todd Lammle(2633)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2474)
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb(2407)
The Book of Numbers by Peter Bentley(2399)
The Tyranny of Metrics by Jerry Z. Muller(2397)
The Great Unknown by Marcus du Sautoy(2179)
Once Upon an Algorithm by Martin Erwig(2141)
Easy Algebra Step-by-Step by Sandra Luna McCune(2110)
Practical Guide To Principal Component Methods in R (Multivariate Analysis Book 2) by Alboukadel Kassambara(2087)
Lady Luck by Kristen Ashley(2067)
Police Exams Prep 2018-2019 by Kaplan Test Prep(2027)
Linear Time-Invariant Systems, Behaviors and Modules by Ulrich Oberst & Martin Scheicher & Ingrid Scheicher(1980)
All Things Reconsidered by Bill Thompson III(1957)
Secrets of Creation, Volume 1: The Mystery of the Prime Numbers by Watkins Matthew(1858)