Business Intelligence and Data Mining by Anil K. Maheshwari PhD
Author:Anil K. Maheshwari, PhD [Anil K. Maheshwari, PhD]
Language: eng
Format: epub
Tags: Business Expert Press, Data Analytics, Data Mining, Business Intelligence, Decision Trees, Regression, Neural Networks, Cluster analysis, Association rules.
Published: 2014-12-29T11:06:47+00:00
DECISION TREES
73
4. Split the data into mutually exclusive subsets along the lines of the specific split.
5. Repeat Steps 2 and 3 for each and every leaf node until the stopping criteria is reached.
There are many algorithms for making decision trees. The most popu-
lar ones are C5, CART, and CHAID. They differ on three key elements: 1. Splitting criteria
a. Which variable to use for the first split? How should one deter-
mine the most important variable for the first branch, and sub-
sequently, for each subtree? There are many measures like least
errors, information gain, and Gini coefficient.
b. What values to use for the split? If the variables have continuous values, such as for age or BP, what value-ranges should be used to
make bins?
c. How many branches should be allowed for each node? There
could be binary trees, with just two branches at each node. Or
there could be more branches allowed.
2. Stopping criteria
a. When to stop building the tree? There are two major ways to make that determination. The tree building could be stopped when a
certain depth of the branches has been reached and the tree be-
comes unreadable after that. The tree could also be stopped when
the error level at any node is within predefined tolerable levels.
3. Pruning
a. Prepruning and postpruning: The tree could be trimmed to make
it more balanced and more easily usable. The pruning is often
done after the tree is constructed, to balance out the tree and
improve usability.
In order to increase predictive accuracy, a decision tree may completely fit the training data and make the tree long. It will thus show good accuracy on training data. However, it may not show such good accuracy on test data. The symptoms of an overfitted tree are a tree too deep, with too many branches, some of which may reflect anomalies due to noise or outliers. Thus, the tree should be pruned. There are two approaches to avoid overfitting.
74
BUSINESS INTELLIGENCE AND DATA MINING
- Prepruning means to halt the tree construction early, when
certain criteria are met. The downside is that it is difficult
to decide what criteria to use for halting the construction,
because we do not know what may happen subsequently, if
we keep growing the tree.
- Postpruning: Remove branches or subtrees from a “fully
grown” tree. This method is commonly used. C4.5
algorithm uses a statistical method to estimate the errors
at each node for pruning. A validation set may be used for
pruning as well (Table 5.2).
Table 5.2 Comparing popular decision tree algorithms
Decision Tree C4.5
CART
CHAID
Full name
Iterative
Classification and
Chi-square
Dichotomiser (ID3)
regression trees
automatic
interaction detector
Basic algorithm
hunt’s algorithm
hunt’s algorithm
Adjusted
significance testing
Developer
Ross Quinlan
Bremman
Gordon Kass
When developed
1986
1984
1980
Types of trees
Classification
Classification and
Classification and
regression trees
regression
Serial
Tree growth and tree Tree growth and
Tree growth and
implementation
pruning
tree pruning
tree pruning
Type of data
Discrete and
Discrete and
Non-normal data
continuous;
continuous
also accepted
incomplete data
Types of splits
Multiway splits
Binary splits only;
Multiway splits
clever surrogate
as default
splits to reduce
tree depth
Splitting criteria
Information gain
Gini coefficient, and Chi-square test
others
Pruning criteria
Clever bottom-up
Remove weakest
Trees can become
technique avoids
links first
very large
overfitting
Implementation
Publicly available
Publicly available in Popular in market
most packages
research, for
segmentation
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(181720)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(90147)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(89750)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(89510)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74453)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50905)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40277)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40220)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40104)
Alkaline-earth metals promote propane dehydrogenation with carbon dioxide through geometric effects: Altering the reaction pathway by unknow(32743)
Induced iron vacancies boosting FeOOH loaded on sustainable Fenton-like collagen fiber membrane for efficient removal of emerging contaminants by unknow(32523)
Efficient electric-field-assisted photochemical conversion of methane to n-propanol exclusively over penetrated TiO2Ti hollow fibers by Guanghui Feng(32460)
Bi2SiO5 nanosheets as piezo-photocatalyst for efficient degradation of 2,4-Dichlorophenol by Hangyu Shi & Yifu Li & Lishan Zhang & Guoguan Liu & Qian Zhang & Xuan Ru & Shan Zhong(32400)
A novel NDIPTA organic heterojunction photocatalyst with built-in electric field for efficient hydrogen production by Jiahui Yang & Baojun Ma & Yongfa Zhu(32372)
Enhanced conversion of methane to liquid-phase oxygenates via hollow ferrite nanotube@horseradish peroxidase based photoenzymatic catalysis by Jun Duan & Shiying Fan & Xinyong Li & Shaomin Liu(32339)
Ordered macroporous superstructure of defective carbon adorned with tiny cobalt sulfide for selective electrocatalytic hydrogenation of cinnamaldehyde by Xiao-Shi Yuan & Sheng-Hua Zhou & San-Mei Wang & Wenbo Wei & Xiaofang Li & Xin-Tao Wu & Qi-Long Zhu(32263)
What's Done in Darkness by Kayla Perrin(27156)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26544)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26470)