Tree-based Machine Learning Algorithms: Decision Trees, Random Forests, and Boosting by Clinton Sheppard
Author:Clinton Sheppard
Language: eng
Format: azw3
Tags: ensemble learning, python, decision trees, random forests, machine learning
Published: 2017-08-27T07:00:00+00:00
test.py can now be changed to get its data from the file like this:
test.py
data = dtree.read_csv('census.csv') data = dtree.prepare_data(data, ['Age']) ... testData = ['Elizabeth', 'female', 'Married', 19, 'Daughter']
When this code is run it produces a tree with 27 nodes, notice the structure.
Of the 13 branch nodes, 7 use age, 4 use name, 1 uses gender and 1 uses marital status. The first three branches split the data almost evenly each time. The problem is that after the 4th branch the tree starts to fan out into very small nodes and use the Name attribute to determine the birth places of the remaining people.
This would be a great decision tree if we only planned to apply it to the data that was used to build the tree. But it is not such a good tree if we plan to use it to predict birth places of people in new data. The reason is, values in the 5th-6th level branches are too granular. They also use characteristics that are too specific to the data used to build the tree, like having the name August or being between 16 and 18 years old. This means the tree works substantially better on the initial data than it would on future data. That’s called overfitting the data.
There are three common methods used to reduce overfitting in order to improve a decision tree’s ability to predict future data:
prune while building the tree, or top-down,
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Deep Learning with Python by François Chollet(12582)
Hello! Python by Anthony Briggs(9919)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9798)
The Mikado Method by Ola Ellnestam Daniel Brolund(9781)
Dependency Injection in .NET by Mark Seemann(9342)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8304)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7767)
Grails in Action by Glen Smith Peter Ledbrook(7699)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7561)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(7111)
Microservices with Go by Alexander Shuiskov(6878)
Practical Design Patterns for Java Developers by Miroslav Wengner(6794)
Test Automation Engineering Handbook by Manikandan Sambamurthy(6739)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6420)
Angular Projects - Third Edition by Aristeidis Bampakos(6149)
The Art of Crafting User Stories by The Art of Crafting User Stories(5674)
NetSuite for Consultants - Second Edition by Peter Ries(5605)
Demystifying Cryptography with OpenSSL 3.0 by Alexei Khlebnikov(5415)
Kotlin in Action by Dmitry Jemerov(5068)
