Data Mining Concepts and Techniques: Complete Guide to a Comprehensive Understanding of Data Mining by Cameron Zak

Data Mining Concepts and Techniques: Complete Guide to a Comprehensive Understanding of Data Mining by Cameron Zak

Author:Cameron Zak [Zak, Cameron]
Language: eng
Format: azw3, epub
Published: 2020-09-09T16:00:00+00:00


Table 5.1 Decision Matrix for Model Assessment

Classification is a common approach because it's simple to understand, it closely aligns with what most people equate with the "best" model, and it tests the model fit across all values. If the proportion of events and non-events is not roughly equal, then the values have to be adjusted to make the right decisions. (See Figure 5.2.)

Table 5.2 Formulas to Calculate Classification Measures

Operating Receiver Characteristics

The operating characteristics of the receiver (ROC) are determined for all purposes and are shown at analysis graphically. The ROC plot axis is the Sensitivity and 1‐ Specificity, determined from the levels of classification.

LIFT

Lift is the ratio of right responders to baseline response levels. To measure lift, a percentile must be added to it in the results. It is generally referred to as file size, and the first or second decile is usually picked. For the example of the food drive, if we measure the lift at the first decile (10% of the data), the baseline (or random) model will have 2,500 respondents to the campaign so that 250 respondents (2,500 u.1) would be in the first decile. Our model is good; it captures 300 respondents in the first decile so that the lift at the first decile is 1.2 (300/2500=12 percent of the response captured/10 percent of the response). I like to use the cumulative lift for my model evaluation as it is monotonous, and in practice, campaigns should sort a list with the possibility of responding and then sell until a natural break is detected or the campaign budget is depleted.

The Criterion of Knowledge for Akaike

The Akaike information criterion (AIC) is a statistical measure of fitness goodness for a specific model. It maximizes the −2(LL + k) expression, where

K = number of approximate parameters (The number of terms in the model for linear regression)

LL = maximized value of the log-likelihood function for the model in question.

The smaller the AIC, the better the fits the data model. Since



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.