Errors, Blunders, and Lies: How to Tell the Difference (ASA-CRC Series on Statistical Reasoning in Science and Society) by David Salsburg

Errors, Blunders, and Lies: How to Tell the Difference (ASA-CRC Series on Statistical Reasoning in Science and Society) by David Salsburg

Author:David Salsburg [Salsburg, David]
Language: eng
Format: azw3
Publisher: CRC Press
Published: 2017-05-18T04:00:00+00:00


8.1 SUMMARY

There are two dimensions to a large data set, N, which equals the number of statistically independent units generating that data, and p, which equals the number of items generated by each independent unit. Traditional statistical methods require that N be much greater than p. In our modern era of big data generated by use of the Internet, it often happens that p is sometimes almost as big as N or even greater.

Since it is necessary to reduce the size of p in order to run useful multilinear regressions, methods have been developed to identify the elements of each independent unit that are useful in predicting observations of variable, y.

Because it is theoretically possible to “fit” the data exactly with enough elements in the regression, the criteria for leaving elements in or out is the adjusted R2, where the calculated R2 is adjusted downward for increasing numbers of elements in the regression.

Within any large data set are what Anscombe called “will o’ the wisps,” apparently strong relationships that are unique to that data set and have no predictive value. Bonferroni bounds are often used to reduce the chance of finding one.

A better procedure for avoiding will o’ the wisps and producing spurious conclusions is to select a small subset of the data at random and use that subset for exploratory analysis. The rest of the data are left for formal statistical analyses to identify those elements that have predictive value.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.