Data Smart by Jordan Goldmeier
Author:Jordan Goldmeier [Goldmeier, Jordan]
Language: eng
Format: epub
ISBN: 9781119931393
Publisher: Wiley
Published: 2023-09-20T00:00:00+00:00
SPLITTING A FEATURE WITH MORE THAN TWO VALUES
In the RetailMart example, all the independent variables are binary. You never have to decide how to split the training data when you create a decision treeâthe 1s go one way, and the 0s go the other. But what if you have a feature that has all kinds of values?
For example, let's say you work for a large email service, and you want to know if an email address is alive and can receive mail. One of the metrics used to do this is how many days have elapsed since someone sent an email to that address.
This feature isn't anywhere close to being binary! So if you train a decision tree that uses this feature, how do you determine what value to split it on?
It's actually really easy.
There's only a finite number of values you can split on. At max, it's one unique value per record in your training set. And there's probably some addresses in your training set that have the same number of days since you last sent to them.
You need to consider only these values. If you have four unique values to split on from your training records (say 10 days, 20 days, 30 days, and 40 days), splitting on 35 is no different than splitting on 30. So, you just check the impurity scores you get if you chose each value to split on, and you pick the one that gives you the least impurity. Done!
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Microsoft Access | Microsoft Excel |
Microsoft Office | Microsoft Outlook |
Microsoft Powerpoint | Microsoft Project |
Microsoft Sharepoint | Microsoft Windows |
Microsoft Word |
Implementing Enterprise Observability for Success by Manisha Agrawal and Karun Krishnannair(7473)
Supercharging Productivity with Trello by Brittany Joiner(6736)
Mastering Tableau 2023 - Fourth Edition by Marleen Meier(6499)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6430)
Inkscape by Example by István Szép(6358)
Visualize Complex Processes with Microsoft Visio by David J Parker & Šenaj Lelić(6048)
Build Stunning Real-time VFX with Unreal Engine 5 by Hrishikesh Andurlekar(5057)
Design Made Easy with Inkscape by Christopher Rogers(4674)
Customizing Microsoft Teams by Gopi Kondameda(4211)
Linux Device Driver Development Cookbook by Rodolfo Giometti(3948)
Business Intelligence Career Master Plan by Eduardo Chavez & Danny Moncada(3842)
Extending Microsoft Power Apps with Power Apps Component Framework by Danish Naglekar(3800)
Salesforce Platform Enterprise Architecture - Fourth Edition by Andrew Fawcett(3679)
Pandas Cookbook by Theodore Petrou(3658)
The Tableau Workshop by Sumit Gupta Sylvester Pinto Shweta Sankhe-Savale JC Gillet and Kenneth Michael Cherven(3460)
TCP IP by Todd Lammle(2998)
Drawing Shortcuts: Developing Quick Drawing Skills Using Today's Technology by Leggitt Jim(2931)
Exploring Microsoft Excel's Hidden Treasures by David Ringstrom(2930)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2892)
