Python Feature Engineering Cookbook by Soledad Galli

Python Feature Engineering Cookbook by Soledad Galli

Author:Soledad Galli
Language: eng
Format: epub
Tags: COM062000 - COMPUTERS / Data Modeling and Design, COM018000 - COMPUTERS / Data Processing, COM021030 - COMPUTERS / Databases / Data Mining
Publisher: Packt Publishing
Published: 2020-01-21T11:26:50+00:00


boston_dataset = load_boston()

data = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)

data['MEDV'] = boston_dataset.target

The boundaries for the intervals should be learned using variables in the train set only, and then used to discretize the variables in train and test sets.

Let's divide the data into train and test sets and their targets:

X_train, X_test, y_train, y_test = train_test_split(

data.drop('MEDV', axis=1), data['MEDV'], test_size=0.3,

random_state=0)

We will divide the LSTAT continuous variable into 10 intervals. The width of the intervals is given by the value range divided by the number of intervals.

Let's calculate the range of the LSTAT variable, that is, the difference between its maximum and minimum values:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.