Business Intelligence and Data Mining by Anil K. Maheshwari PhD

Business Intelligence and Data Mining by Anil K. Maheshwari PhD

Author:Anil K. Maheshwari, PhD [Anil K. Maheshwari, PhD]
Language: eng
Format: epub
Tags: Business Expert Press, Data Analytics, Data Mining, Business Intelligence, Decision Trees, Regression, Neural Networks, Cluster analysis, Association rules.
Published: 2014-12-29T11:06:47+00:00


DECISION TREES

73

4. Split the data into mutually exclusive subsets along the lines of the specific split.

5. Repeat Steps 2 and 3 for each and every leaf node until the stopping criteria is reached.

There are many algorithms for making decision trees. The most popu-

lar ones are C5, CART, and CHAID. They differ on three key elements: 1. Splitting criteria

a. Which variable to use for the first split? How should one deter-

mine the most important variable for the first branch, and sub-

sequently, for each subtree? There are many measures like least

errors, information gain, and Gini coefficient.

b. What values to use for the split? If the variables have continuous values, such as for age or BP, what value-ranges should be used to

make bins?

c. How many branches should be allowed for each node? There

could be binary trees, with just two branches at each node. Or

there could be more branches allowed.

2. Stopping criteria

a. When to stop building the tree? There are two major ways to make that determination. The tree building could be stopped when a

certain depth of the branches has been reached and the tree be-

comes unreadable after that. The tree could also be stopped when

the error level at any node is within predefined tolerable levels.

3. Pruning

a. Prepruning and postpruning: The tree could be trimmed to make

it more balanced and more easily usable. The pruning is often

done after the tree is constructed, to balance out the tree and

improve usability.

In order to increase predictive accuracy, a decision tree may completely fit the training data and make the tree long. It will thus show good accuracy on training data. However, it may not show such good accuracy on test data. The symptoms of an overfitted tree are a tree too deep, with too many branches, some of which may reflect anomalies due to noise or outliers. Thus, the tree should be pruned. There are two approaches to avoid overfitting.

74

BUSINESS INTELLIGENCE AND DATA MINING

- Prepruning means to halt the tree construction early, when

certain criteria are met. The downside is that it is difficult

to decide what criteria to use for halting the construction,

because we do not know what may happen subsequently, if

we keep growing the tree.

- Postpruning: Remove branches or subtrees from a “fully

grown” tree. This method is commonly used. C4.5

algorithm uses a statistical method to estimate the errors

at each node for pruning. A validation set may be used for

pruning as well (Table 5.2).

Table 5.2 Comparing popular decision tree algorithms

Decision Tree C4.5

CART

CHAID

Full name

Iterative

Classification and

Chi-square

Dichotomiser (ID3)

regression trees

automatic

interaction detector

Basic algorithm

hunt’s algorithm

hunt’s algorithm

Adjusted

significance testing

Developer

Ross Quinlan

Bremman

Gordon Kass

When developed

1986

1984

1980

Types of trees

Classification

Classification and

Classification and

regression trees

regression

Serial

Tree growth and tree Tree growth and

Tree growth and

implementation

pruning

tree pruning

tree pruning

Type of data

Discrete and

Discrete and

Non-normal data

continuous;

continuous

also accepted

incomplete data

Types of splits

Multiway splits

Binary splits only;

Multiway splits

clever surrogate

as default

splits to reduce

tree depth

Splitting criteria

Information gain

Gini coefficient, and Chi-square test

others

Pruning criteria

Clever bottom-up

Remove weakest

Trees can become

technique avoids

links first

very large

overfitting

Implementation

Publicly available

Publicly available in Popular in market

most packages

research, for

segmentation



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Eco-friendly approach of bio-indigo synthesis and developing purification methods towards isolation of indigo from indirubin and bacterial fragments by Ramalingam Manivannan & Kaliyan Prabakaran & Young-A Son(213245)
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(181721)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(90147)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(89751)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(89512)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74453)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50907)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40277)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40220)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40104)
Alkaline-earth metals promote propane dehydrogenation with carbon dioxide through geometric effects: Altering the reaction pathway by unknow(32743)
Induced iron vacancies boosting FeOOH loaded on sustainable Fenton-like collagen fiber membrane for efficient removal of emerging contaminants by unknow(32523)
Efficient electric-field-assisted photochemical conversion of methane to n-propanol exclusively over penetrated TiO2Ti hollow fibers by Guanghui Feng(32461)
Bi2SiO5 nanosheets as piezo-photocatalyst for efficient degradation of 2,4-Dichlorophenol by Hangyu Shi & Yifu Li & Lishan Zhang & Guoguan Liu & Qian Zhang & Xuan Ru & Shan Zhong(32400)
A novel NDIPTA organic heterojunction photocatalyst with built-in electric field for efficient hydrogen production by Jiahui Yang & Baojun Ma & Yongfa Zhu(32372)
Enhanced conversion of methane to liquid-phase oxygenates via hollow ferrite nanotube@horseradish peroxidase based photoenzymatic catalysis by Jun Duan & Shiying Fan & Xinyong Li & Shaomin Liu(32340)
Ordered macroporous superstructure of defective carbon adorned with tiny cobalt sulfide for selective electrocatalytic hydrogenation of cinnamaldehyde by Xiao-Shi Yuan & Sheng-Hua Zhou & San-Mei Wang & Wenbo Wei & Xiaofang Li & Xin-Tao Wu & Qi-Long Zhu(32263)
What's Done in Darkness by Kayla Perrin(27157)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26545)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26471)