The Applied Data Science Workshop, Second Edition by Alex Galea
Author:Alex Galea
Language: eng
Format: epub
Publisher: Packt Publishing Pvt. Ltd.
Published: 2020-07-21T00:00:00+00:00
Figure 4.17: A decision tree from a Random Forest ensemble, where max_depth=5
From the preceding graph, we can see that each path is limited to five consecutive nodes as a result of setting max_depth=5. At each branch, scikit-learn's decision tree algorithm has decided on the feature split that maximizes the separability of classes in the training data. Consider the following section of the tree:
Figure 4.18: A section of the decision tree where a split is made on the last_evaluation ≤ 0.445 condition
Here, we can see that 1,926 training samples from the top node have been split on the last_evaluation ≤ 0.445 condition, resulting in a child node that's pure (on the left) with 208 "no" samples, and a child node that's mixed (on the right) with 1,544 "no" samples and 1,149 "yes" samples. Recall that "no" corresponds to employees who are still working at the company, while "yes" corresponds to those who have left.
The orange boxes represent nodes where the majority of samples are labeled "no", and the blue boxes represent nodes where the majority of samples are "yes". The shade of each box (light, dark, and so on) indicates the confidence level, which is related to the purity of that node.
Note
To access the source code for this specific section, please refer to https://packt.live/30FSdOZ.
You can also run this example online at https://packt.live/2ACdbUc.
This concludes our exercise on Random Forests and takes us to the end of our initial modeling research on the Human Resource Analytics dataset. In this exercise, we learned how to train Random Forests and explored how their decision tree constituents are composed.
Although we trained a variety of models in this section, we only worked through one end-to-end example where data was loaded, split into training and testing sets, used to train a model, and then scored. After that, we relied on previous work to make our modeling process simple.
In the next section, you'll have the opportunity to work through a full modeling activity, from loading the preprocessed dataset to scoring and comparing the final results.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8299)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6737)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6714)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6590)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6375)
Driving Data Quality with Data Contracts by Andrew Jones(6324)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6089)
Learning SQL by Alan Beaulieu(5995)
Weapons of Math Destruction by Cathy O'Neil(5779)
Big Data Analysis with Python by Ivan Marin(5363)
Data Engineering with dbt by Roberto Zagni(4359)
Solidity Programming Essentials by Ritesh Modi(4009)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3866)
Pandas Cookbook by Theodore Petrou(3578)
Blockchain Basics by Daniel Drescher(3294)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2905)
Feature Store for Machine Learning by Jayanth Kumar M J(2814)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2796)
Mastering Python for Finance by Unknown(2744)
