Python for Data Science: Comprehensive Guide to Data Science with Python by Campbell Alex

Python for Data Science: Comprehensive Guide to Data Science with Python by Campbell Alex

Author:Campbell, Alex [Campbell, Alex]
Language: eng
Format: epub, azw3
Published: 2021-06-17T16:00:00+00:00


3.5

1

0

0

2.0

0

1

0

6.7

0

0

1

As we said earlier, a dummy variable is used to indicate whether something is there or not. Commonly, they are used as substitutes, allowing us to take qualitative data and do quantitative analysis. From that table, we can easily see that New York is 3.5, split as follows:

New York – 1

California – 0

Florida – 0

This is a very easy way of representing text categories into numeric values.

However, you need to watch out for the dummy variable trap. In this trap, an extra variable exists – this could have been eliminated because the other variables could predict it. In the above example, when the New York and California columns are zero, you automatically know it is Florida. Even with two variables, you can know the State.

Going back to our Startups example, we can get around this trap by having the following in the code:

X = X[:, 1:]

Here's what we have so far:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('50_Startups.csv')

X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, 4].values

Have a look at the data:

dataset.head()



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.