Statistical and Machine-learning Data Mining by Ratner Bruce
Author:Ratner, Bruce. [Ratner, Bruce.]
Language: eng
Format: epub
Published: 2014-08-14T22:01:36+00:00
a. Worthy of note is that statistics textbooks refer to multicollinear-
ity as a “data problem, but not a weakness in the model.” Students
are taught that model performance is not affected by the condition
of multicollinearity. Multicollinearity is a data problem because
its only affects a clearly assigned contribution of each predictor
variable to the dependent: The assigned contribution of each pre-
dictor variable is muddied. Unfortunately, students are not taught
that model performance is not affected by the condition of mul-
ticollinearity as long as the condition of multicollinearity remains
the same as when the model was initially built. If the condition
is the same as when the model was first built, then implementa-
tion of the model should yield good performance. However, for
every reimplementation of the model after the first, the condition
of multicollinearity has showed in practice not to remain the same.
Hence, I uphold and defend that multicollinearity is a data prob-
lem, and multicollinearity does affect model performance.
2. Average correlation values in the range of 0.35 or less are desirable.
In this situation, a soundly honest assessment of the contributions of
the predictor variables to the performance of the model can be made.
The Average Correlation
233
3. Average correlation values that are greater than 0.35 and less than
0.55 are moderately desirable. In this situation, a somewhat honest
assessment of the contributions of the predictor variables to the per-
formance of the model can be made.
4. Average correlation values that are greater than 0.55 are not desirable
as they indicate the predictor variables are excessively redundant.
In this situation, a questionably honest assessment of the contribu-
tions of the predictor variables to the performance of the model can
be made.
As long as the average correlation value is acceptable (less than 0.40),
the second proposed item of assessing competing models (every modeler
builds several models and must choose the best one) is in play. If a project
session brings forth models within the acceptable range of average cor-
relation values, the model builder uses both the average correlation value
and the set of the individual correlations of predictor variable with depen-
dent variable. The individual correlations indicate the content validity of
the model. Rules of thumb for the values of the individual correlation coef-
ficients are as follows:
1. Values between 0.0 and 0.3 (0.0 and -0.3) indicate poor validity.
2. Values between 0.3 and 0.7 (-0.3 and -0.7) indicate moderate validity.
3. Values between 0.7 and 1.0 (-0.7 and -1.0) indicate a strong validity.
In sum, the model builder uses the average correlation and the indi-
vidual correlations to assess competing predictive models and the impor-
tance of the predictor variables. I continue with the illustration of the
LTV5 model to make sense of these discussions and rules of thumb in the
next section.
13.5.2 Continuing with the illustration of the Average
Correlation with an LTV5 Model
The average correlation of the LTV5 model is 0.33502. The individual cor-
relations of the predictor variables with LTV5 (Table 13.3) indicate the vari-
ables have moderate to strong validity, except for VAR2. The combination of
0.33502 and values of Table 13.3 is compelling for any modeler to be pleased
with the reliability and validity of the LTV5 model.
13.5.3 Continuing with the illustration with a
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5155)
The Sports Rules Book by Human Kinetics(4052)
The Age of Surveillance Capitalism by Shoshana Zuboff(3961)
ACT Math For Dummies by Zegarelli Mark(3835)
Blood, Sweat, and Pixels by Jason Schreier(3471)
Unlabel: Selling You Without Selling Out by Marc Ecko(3454)
Hidden Persuasion: 33 psychological influence techniques in advertising by Marc Andrews & Matthijs van Leeuwen & Rick van Baaren(3273)
Urban Outlaw by Magnus Walker(3227)
The Pixar Touch by David A. Price(3189)
Bad Pharma by Ben Goldacre(3072)
Project Animal Farm: An Accidental Journey into the Secret World of Farming and the Truth About Our Food by Sonia Faruqi(2992)
Brotopia by Emily Chang(2876)
Kitchen confidential by Anthony Bourdain(2782)
Slugfest by Reed Tucker(2778)
The Content Trap by Bharat Anand(2758)
The Airbnb Story by Leigh Gallagher(2669)
Coffee for One by KJ Fallon(2405)
Smuggler's Cove: Exotic Cocktails, Rum, and the Cult of Tiki by Martin Cate & Rebecca Cate(2327)
Beer is proof God loves us by Charles W. Bamforth(2238)
