Python Machine Learning: The Ultimate Guide for Beginners to Machine Learning with Python, Programming and Deep Learning, Artificial Intelligence, Neural Networks, and Data Science by Moore Richard & Moore Richard

Python Machine Learning: The Ultimate Guide for Beginners to Machine Learning with Python, Programming and Deep Learning, Artificial Intelligence, Neural Networks, and Data Science by Moore Richard & Moore Richard

Author:Moore, Richard & Moore, Richard [Moore, Richard]
Language: eng
Format: epub
Published: 2019-10-05T16:00:00+00:00


Now that the data is loaded we can begin preprocessing it with Pandas. The first thing we might consider is applying a mask in order to use a function only on a specific part of a row. Keep in mind that a mask in this case represents a collection of Booleans that are expressed whenever a line is chosen. Take note that Booleans are true or false values. Let’s see this step in code and things will become clearer:

In: mask_feature = iris['sepal_length'] > 6.0

In: mask_feature

0​ False

1​ False

2​ False

...

146​ True

147​ True

148​ True

149​ True

Our objective during this step is to select all the iris items with a sepal length greater than 6. You’ll notice that those that hold a smaller value return a false observation and therefore don’t fit with our request. Next, we are going to use a mask in order to relabel one of our targets. Type the following lines:

In: mask_target = iris['target'] == 'Iris-virginica'

In: iris.loc[mask_target, 'target'] = 'New label'

We have selected iris-virginica as our target and have renamed it as “new label”. We can verify this process by asking for the list of labels that are part of the column:

iris['target'].unique()

Out: array(['Iris-setosa', 'Iris-versicolor', 'New label'], dtype=object)

You’ll notice that we’re using the “unique” function. This is used to analyze our list after the update. In addition, we can group all of our columns together in order to verify the stats. Let’s look at the example:

grouped_targets_mean = iris.groupby(['target']).mean()

grouped_targets_mean

Out:

grouped_targets_var = iris.groupby(['target']).var()

grouped_targets_var

Out:

First we group our table columns by using the groupby function. If by any chance you have experience working with SQL databases, this function is identical to “GROUP BY” in SQL. In the next step we have to calculate the mean value by using the mean function. Keep in mind that you can use this operation on any number of columns. Furthermore we apply certain functions that are unique to Pandas, such as summation and variance. For now, we are still working on a data frame and therefore we can connect a number of operations. So far we used the ”groupby” function in order to compile all of our observations based on their label and then verify whether there is a difference between certain values within each group.

When you work with real data instead of nice, clean examples such as the iris datasets, you will sometimes have to also deal with time series. In machine learning, or data science, time series refer to certain data points that are marked on a graph or list by following a chronological order. In simpler terms, it’s a series of points in time with an equal distance between each other. This sequence of time data has a wide application anywhere between statistical analysis and counting dark spots on the sun. Take note that datasets with time series are often quite messy. Therefore you have to clean up the data points. The easiest way to do this is to use a mean function, like this:

In: smooth_time_series = pd.rolling_mean(time_series, 5)

Keep in mind that you can also use the median operation instead.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.