Machine Learning With Python For Beginners: A Step-By-Step Guide with Hands-On Projects (Learn Coding Fast with Hands-On Project Book 7) by Chan Jamie
Author:Chan, Jamie [Chan, Jamie]
Language: eng
Format: epub
Published: 2021-08-01T00:00:00+00:00
5.5 Model Selection with Scikit-Learn
The section above discussed various metrics for model evaluation. When working on a machine learning project, we commonly build more than one machine learning model and use the metrics above to select the best-performing model.
There are several approaches to model selection.
5.5.1 Train Test Split
One approach is to split our dataset into training and test subsets and train different models using the training set. We first evaluate the models on the training set and select the best-performing model. This model is then further evaluated on the test set to determine if it generalizes well to data not used in its training.
To split our dataset into training and test subsets, we can use the train_test_split() function in the sklearn.model_selection module.
This function accepts one or more arrays (such as lists, NumPy arrays, or pandas DataFrames) as input and splits the array(s) into training and test subsets. After splitting, it returns two or more arrays containing the train-test split of the input array(s).
Letâs use a simple dataset to illustrate how the function works. Suppose we pass the following arrays to the function:
X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [23, 11, 31, 45, 12, 65, 43, 69, 13, 12]
The function randomly splits the two arrays and returns four arrays. If we do a 80-20 split, the function may return the following arrays:
Training Subset
X = [1, 2, 4, 5, 6, 7, 9, 10]
y = [23, 11, 45, 12, 65, 43, 13, 12]
Test Subset
X = [3, 8]
y = [31, 69]
We typically use the training subset to train different models and evaluate them. The best model is then selected and evaluated on the test subset. Weâll demonstrate how to use the train_test_split() function in subsequent chapters when we work with actual datasets.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Hello! Python by Anthony Briggs(9911)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9793)
The Mikado Method by Ola Ellnestam Daniel Brolund(9775)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8288)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7775)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7758)
Grails in Action by Glen Smith Peter Ledbrook(7693)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7557)
Windows APT Warfare by Sheng-Hao Ma(6776)
Layered Design for Ruby on Rails Applications by Vladimir Dementyev(6503)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6409)
Blueprints Visual Scripting for Unreal Engine 5 - Third Edition by Marcos Romero & Brenden Sewell(6371)
Kotlin in Action by Dmitry Jemerov(5058)
Hands-On Full-Stack Web Development with GraphQL and React by Sebastian Grebe(4313)
Functional Programming in JavaScript by Mantyla Dan(4037)
Solidity Programming Essentials by Ritesh Modi(3974)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3757)
Unity 3D Game Development by Anthony Davis & Travis Baptiste & Russell Craig & Ryan Stunkel(3701)
The Ultimate iOS Interview Playbook by Avi Tsadok(3675)
