Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) by Oliver Theobald
Author:Oliver Theobald [Oliver Theobald]
Language: eng
Format: epub, pdf, mobi
Publisher: Scatterplot Press
Published: 2017-06-20T21:00:00+00:00
Figure 1: An example of k- NN clustering used to predict the class of a new data point
As seen in Figure 1, the scatterplot enables us to compute the distance between any two data points. The data points on the scatterplot have already been categorized into two clusters. Next, a new data point whose class is unknown is added to the plot. We can predict the category of the new data point based on its relationship to existing data points.
First though, we must set “k ” to determine how many data points we wish to nominate to classify the new data point. If we set k to 3, k -NN will only analyze the new data point’s relationship to the three closest data points (neighbors). The outcome of selecting the three closest neighbors returns two Class B data points and one Class A data point. Defined by k (3), the model’s prediction for determining the category of the new data point is Class B as it returns two out of the three nearest neighbors.
The chosen number of neighbors identified, defined by k , is crucial in determining the results. In Figure 1, you can see that classification will change depending on whether k is set to “3” or “7.” It is therefore recommended that you test numerous k combinations to find the best fit and avoid setting k too low or too high. Setting k to an uneven number will also help to eliminate the possibility of a statistical stalemate and invalid result. The default number of neighbors is five when using Scikit-learn.
Although generally a highly accurate and simple technique to learn, storing an entire dataset and calculating the distance between each new data point and all existing data points does place a heavy burden on computing resources. Thus, k -NN is generally not recommended for use with large datasets.
Another potential downside is that it can be challenging to apply k -NN to high-dimensional data (3-D and 4-D) with multiple features. Measuring multiple distances between data points in a three or four-dimensional space is taxing on computing resources and also complicated to perform accurate classification. Reducing the total number of dimensions, through a descending dimension algorithm such as Principle Component Analysis (PCA) or merging variables, is a common strategy to simplify and prepare a dataset for k -NN analysis.
Download
Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) by Oliver Theobald.pdf
Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) by Oliver Theobald.mobi
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8309)
Test-Driven Development with Java by Alan Mellor(6794)
Data Augmentation with Python by Duc Haba(6712)
Principles of Data Fabric by Sonia Mezzetta(6456)
Learn Blender Simulations the Right Way by Stephen Pearson(6362)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6230)
Hadoop in Practice by Alex Holmes(5965)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5813)
RPA Solution Architect's Handbook by Sachin Sahgal(5630)
Big Data Analysis with Python by Ivan Marin(5396)
The Infinite Retina by Robert Scoble Irena Cronin(5317)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5158)
Pretrain Vision and Large Language Models in Python by Emily Webber(4362)
Infrastructure as Code for Beginners by Russ McKendrick(4129)
Functional Programming in JavaScript by Mantyla Dan(4044)
The Age of Surveillance Capitalism by Shoshana Zuboff(3964)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3842)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3646)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3617)
