On January 4, 2018, I released another video in my Data Science in 90 Seconds series, which explained k-Nearest Neighbors machine learning algorithm. And back last summer, I did a five-part series on types of machine learning. That series included more details about K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori and Frequent Pattern-Growth. Today I want to expand on the ideas presented in the k-Nearest Neighbor video and continue the discussion in plain language for beginners.
If you recall from earlier discussions, unsupervised machine learning is the ‘task of inferring a function to describe hidden structure from unlabeled data’. In unsupervised machine learning, the computer takes observations of data that do not have a predetermined class or category and tries to predict future data from it. A k-Nearest Neighbor (kNN) algorithm looks at which data points are closest or are the nearest neighbors to existing data points. Unlike k-Means Clustering, kNN does not make groups or partition the observation into k number of sets.
Let’s say we’re trying to predict if a drought will occur at our location, labeled “c” on the graph below. Points labeled “a” are locations that have drought and points labeled “o” are locations that do not have drought. Obviously this is a clean and simple example and actual data an analysis is more complex, but this is for beginners so stick with me here.
If we let k= 3, a purple circle is drawn around our unknown location, “c” as shown below. Where the circle touches are the “nearest neighbors.” The three nearest points are “a” (no drought), “o” (no drought) and “o” (no drought) that the circle goes through. In the graph, since two of the three nearest neighbors are “o”s that represent no drought, we predict our location “c” will have no drought.
In my next few blogs, I plan to talk about Naive Bayes, Support Vector Machine, Decision Tree and Random Forest machine learning methods.