Business Data Science by Matt Taddy

Business Data Science by Matt Taddy

Author:Matt Taddy
Language: eng
Format: epub
Publisher: McGraw-Hill Education
Published: 2019-03-15T16:00:00+00:00


Significance disappears! The direction of the effect also changes—now abortion causes more murders—but really what is going on is that we’ve added so many variables to the model that n ≈ p. We just don’t have enough data to get a decent estimate.

The point here is not that the Freakonomics authors are “wrong” or that they’ve made a statistical error. Indeed, in their original paper they caution about the weakness we’ve exploited:

That abortion is only one factor influencing crime . . . points out the caution required in drawing any conclusions regarding an abortion–crime link based on time-series evidence alone.

Rather, this example illustrates that whenever your analysis is premised on conditional ignorability, you are always susceptible to others introducing additional controls until the model is nearly saturated and you can’t measure anything. This is the weakness of using OLS—a low-dimensional method—as your regression tool.

YOU CAN BETTER CONTROL FOR CONFOUNDERS BY MODELING THE TREATMENT ASSIGNMENT PROCESS. That is, you need to take seriously and estimate the second “treatment regression” line of the LTE system in Equation 6.2. You can do this using ML tools such as the lasso and cross validation. The process involves many of the same ideas from how we used regularization to improve predictions in the face of many potential inputs. However, the fact that we are now estimating a causal treatment effect means that we need to completely rework our model-building recipes around this goal. A naive application of ML tools for causal inference can lead to a mess of incorrect results.

Going back to Chapter 3, we’ve followed a simple model-building recipe: use a path of penalties to create a set of candidate models and then use predictive performance—measured either via cross validation or an information criterion—as the metric for choosing the best model among this set. A key characteristic here is the focus on unstructured prediction as the basis for model evaluation: you are seeking to do the best job forecasting y at new x drawn from the same distribution as the inputs in the training sample. That is, you are choosing models to do well in predicting new data drawn from p(x, y), the same joint data generating process (DGP) that provided the training data (think about the CV algorithm to convince yourself of this). However, you now have a special input d, the treatment, and you want to know the treatment effect on y when d moves independent of all other influences. That is, you no longer want to do a good job predicting ŷ under the existing DGP but rather under the DGPs that arise when you change d yourself.

The idea behind structural or counterfactual prediction is to remove from the treatment effect estimate, , the effect of other influences that are correlated with d. As in the earlier discussions, these outside influences are called controls or confounders, and they can pollute your treatment effect estimate if their effect is confused with that of d. Again, this is simple when you have a low-dimensional set of potential confounders—you just include them in your regression.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.