Machine Learning for Beginners, Part 6: K-Nearest Neighbors

On January 4, 2018, I released another video in my Data Science in 90 Seconds series, which explained k-Nearest Neighbors machine learning algorithm. And back last summer, I did a five-part series on types of machine learning. That series included more details about K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori and Frequent Pattern-Growth. Today I want to expand on the ideas presented in the k-Nearest Neighbor video and continue the discussion in plain language for beginners.

If you recall from earlier discussions, unsupervised machine learning is the ‘task of inferring a function to describe hidden structure from unlabeled data’. In unsupervised machine learning, the computer takes observations of data that do not have a predetermined class or category and tries to predict future data from it. A k-Nearest Neighbor (kNN) algorithm looks at which data points are closest or are the nearest neighbors to existing data points. Unlike k-Means Clustering, kNN does not make groups or partition the observation into k number of sets.

Let’s say we’re trying to predict if a drought will occur at our location, labeled “c” on the graph below. Points labeled “a” are locations that have drought and points labeled “o” are locations that do not have drought. Obviously this is a clean and simple example and actual data an analysis is more complex, but this is for beginners so stick with me here.

kNN-1

If we let k= 3, a purple circle is drawn around our unknown location, “c” as shown below. Where the circle touches are the “nearest neighbors.” The three nearest points are “a” (no drought), “o” (no drought) and “o” (no drought) that the circle goes through. In the graph, since two of the three nearest neighbors are “o”s that represent no drought, we predict our location “c” will have no drought.

kNN-2

In my next few blogs, I plan to talk about Naive Bayes, Support Vector Machine, Decision Tree and Random Forest machine learning methods.

Advertisements

One thought on “Machine Learning for Beginners, Part 6: K-Nearest Neighbors

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s