Machine Learning With Python For Beginners: A Step-By-Step Guide with Hands-On Projects (Learn Coding Fast with Hands-On Project Book 7) by Chan Jamie

Machine Learning With Python For Beginners: A Step-By-Step Guide with Hands-On Projects (Learn Coding Fast with Hands-On Project Book 7) by Chan Jamie

Author:Chan, Jamie [Chan, Jamie]
Language: eng
Format: epub
Published: 2021-08-01T00:00:00+00:00


5.5 Model Selection with Scikit-Learn

The section above discussed various metrics for model evaluation. When working on a machine learning project, we commonly build more than one machine learning model and use the metrics above to select the best-performing model.

There are several approaches to model selection.

5.5.1 Train Test Split

One approach is to split our dataset into training and test subsets and train different models using the training set. We first evaluate the models on the training set and select the best-performing model. This model is then further evaluated on the test set to determine if it generalizes well to data not used in its training.

To split our dataset into training and test subsets, we can use the train_test_split() function in the sklearn.model_selection module.

This function accepts one or more arrays (such as lists, NumPy arrays, or pandas DataFrames) as input and splits the array(s) into training and test subsets. After splitting, it returns two or more arrays containing the train-test split of the input array(s).

Let’s use a simple dataset to illustrate how the function works. Suppose we pass the following arrays to the function:

X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

y = [23, 11, 31, 45, 12, 65, 43, 69, 13, 12]

The function randomly splits the two arrays and returns four arrays. If we do a 80-20 split, the function may return the following arrays:

Training Subset

X = [1, 2, 4, 5, 6, 7, 9, 10]

y = [23, 11, 45, 12, 65, 43, 13, 12]

Test Subset

X = [3, 8]

y = [31, 69]

We typically use the training subset to train different models and evaluate them. The best model is then selected and evaluated on the test subset. We’ll demonstrate how to use the train_test_split() function in subsequent chapters when we work with actual datasets.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.