Business Intelligence and Data Mining by Anil K. Maheshwari PhD

Business Intelligence and Data Mining by Anil K. Maheshwari PhD

Author:Anil K. Maheshwari, PhD [Anil K. Maheshwari, PhD]
Language: eng
Format: epub
Tags: Business Expert Press, Data Analytics, Data Mining, Business Intelligence, Decision Trees, Regression, Neural Networks, Cluster analysis, Association rules.
Published: 2014-12-29T11:06:47+00:00


DECISION TREES

73

4. Split the data into mutually exclusive subsets along the lines of the specific split.

5. Repeat Steps 2 and 3 for each and every leaf node until the stopping criteria is reached.

There are many algorithms for making decision trees. The most popu-

lar ones are C5, CART, and CHAID. They differ on three key elements: 1. Splitting criteria

a. Which variable to use for the first split? How should one deter-

mine the most important variable for the first branch, and sub-

sequently, for each subtree? There are many measures like least

errors, information gain, and Gini coefficient.

b. What values to use for the split? If the variables have continuous values, such as for age or BP, what value-ranges should be used to

make bins?

c. How many branches should be allowed for each node? There

could be binary trees, with just two branches at each node. Or

there could be more branches allowed.

2. Stopping criteria

a. When to stop building the tree? There are two major ways to make that determination. The tree building could be stopped when a

certain depth of the branches has been reached and the tree be-

comes unreadable after that. The tree could also be stopped when

the error level at any node is within predefined tolerable levels.

3. Pruning

a. Prepruning and postpruning: The tree could be trimmed to make

it more balanced and more easily usable. The pruning is often

done after the tree is constructed, to balance out the tree and

improve usability.

In order to increase predictive accuracy, a decision tree may completely fit the training data and make the tree long. It will thus show good accuracy on training data. However, it may not show such good accuracy on test data. The symptoms of an overfitted tree are a tree too deep, with too many branches, some of which may reflect anomalies due to noise or outliers. Thus, the tree should be pruned. There are two approaches to avoid overfitting.

74

BUSINESS INTELLIGENCE AND DATA MINING

- Prepruning means to halt the tree construction early, when

certain criteria are met. The downside is that it is difficult

to decide what criteria to use for halting the construction,

because we do not know what may happen subsequently, if

we keep growing the tree.

- Postpruning: Remove branches or subtrees from a “fully

grown” tree. This method is commonly used. C4.5

algorithm uses a statistical method to estimate the errors

at each node for pruning. A validation set may be used for

pruning as well (Table 5.2).

Table 5.2 Comparing popular decision tree algorithms

Decision Tree C4.5

CART

CHAID

Full name

Iterative

Classification and

Chi-square

Dichotomiser (ID3)

regression trees

automatic

interaction detector

Basic algorithm

hunt’s algorithm

hunt’s algorithm

Adjusted

significance testing

Developer

Ross Quinlan

Bremman

Gordon Kass

When developed

1986

1984

1980

Types of trees

Classification

Classification and

Classification and

regression trees

regression

Serial

Tree growth and tree Tree growth and

Tree growth and

implementation

pruning

tree pruning

tree pruning

Type of data

Discrete and

Discrete and

Non-normal data

continuous;

continuous

also accepted

incomplete data

Types of splits

Multiway splits

Binary splits only;

Multiway splits

clever surrogate

as default

splits to reduce

tree depth

Splitting criteria

Information gain

Gini coefficient, and Chi-square test

others

Pruning criteria

Clever bottom-up

Remove weakest

Trees can become

technique avoids

links first

very large

overfitting

Implementation

Publicly available

Publicly available in Popular in market

most packages

research, for

segmentation



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(56093)
What's Done in Darkness by Kayla Perrin(26623)
The Fifty Shades Trilogy & Grey by E L James(19101)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19089)
Shot Through the Heart by Mercy Celeste(18956)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 10 by Isuna Hasekura and Jyuu Ayakura(17141)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17031)
Peren F. Statistics for Business and Economics...Essential Formulas 3ed 2025 by Unknown(16905)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 03 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16844)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 01 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16471)
The Subtle Art of Not Giving a F*ck by Mark Manson(14395)
The 3rd Cycle of the Betrayed Series Collection: Extremely Controversial Historical Thrillers (Betrayed Series Boxed set) by McCray Carolyn(14162)
Stepbrother Stories 2 - 21 Taboo Story Collection (Brother Sister Stepbrother Stepsister Taboo Pseudo Incest Family Virgin Creampie Pregnant Forced Pregnancy Breeding) by Roxi Harding(13686)
Scorched Earth by Nick Kyme(12790)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(11036)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(11000)
Drei Generationen auf dem Jakobsweg by Stein Pia(10986)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(10982)
Suna by Ziefle Pia(10906)
Scythe by Neal Shusterman(10375)