Predictive Analytics, Data Mining and Big Data by Steven Finlay

Predictive Analytics, Data Mining and Big Data by Steven Finlay

Author:Steven Finlay
Language: eng
Format: epub
Publisher: PALGRAVE MACMILLAN


The difference in the predictive accuracy of different models is usually pretty small.24 This is true even for problems that are described as being highly non-linear or there are a lot of interactions between variables25 as long as suitable data transformations have been applied (e.g. binning and the use of indicator variables26). A classic case is fraud detection. A very widely expressed belief is that you have to use a neural network or support vector machine if you want to produce a decent model, because of the complexities of the relationships in fraud data. This is a misconception, based on the fact that one of the earliest fraud detection systems just happened to be based on a neural network model. I have come across more than one example of industry-leading fraud detection systems based on linear models and/or rule sets that have performed as well as or better than competitors based on more advanced methods. Having said this, one should be careful not to confuse general and specific findings.27 There is a lot of evidence that a wide range of algorithms yield very similar levels of performance on average, but for some specific problems one method may be substantially better than another – but you can’t tell if this is the case until you’ve built the model. Therefore it often makes sense to develop a number of competing models using different methodologies in order to see which one generates the best model for your particular problem.

One drawback of neural networks is that it is notoriously easy to over-fit to the data, making them appear to perform much better than they really are, i.e. their performance in real-world usage is inferior to their performance based on the data used to develop them. They also require a lot more computer power to generate than linear models constructed using linear or logistic regression, or decision trees using C4.5 or CHAID (often 10–100 times more), which can cause problems when one is dealing with large samples and lots of predictor variables.

Decision trees, like neural networks, are prone to over-fitting and have some other drawbacks. In particular:

Popular algorithms for deriving decision trees are not very efficient at utilizing data. Consequently, their performance is sometimes (although not always) marginally worse than other types of predictive model for a development sample of a given size.28 This is particularly true when small and medium-sized samples are used to construct the model.29

For classification you need equal numbers of cases that do/do not display the behavior to build good decision trees. If you have lots more examples of behavior or non-behavior in the development sample then model performance will be poor (e.g. the results of a mailing campaign where only 1% of those targeted respond). The greater the degree of imbalance the worse the model will be. Decision tree algorithms are more sensitive to imbalance than almost any other type model construction method.30 There are however, ways of getting around this problem.31

The range of scores is smaller than many other types of model, resulting in score distributions that are “clumpy.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.