(Image by Quora)
In a February 6 blog, I discussed the unsupervised machine learning Naive Bayes algorithm with an example that was hopefully easy to understand for beginners. During the summer of 2017, I began a five-part series on types of machine learning. That series included more details about k-Nearest neighbor, K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori, Frequent Pattern-Growth and more. Today I want to expand on the ideas presented in my Support Vector “Data Science in 90 Seconds” YouTube video and continue the discussion in plain language.
If you recall from earlier discussions, supervised machine learning is the ‘task of inferring a function to describe hidden structure from labeled data’. Unlike unsupervised machine learning, in supervised machine learning, the computer takes observations of data that have a predetermined class or category label. The algorithm then tries to predict future outcomes from these observations.
Support Vector Machine, or SVM, is a simple and relatively fast way to classify categories of data when you have a very large data sets (about 100,000 or more data observations). The SVM uses something called the kernel trick to assign labels to new data points. The kernel trick changes any data that is non-linear by estimating where the data points would be in a higher dimensional space. Also like the Naive Bayes algorithm, the SVM can outperform other sophisticated classification methods, is widely used among data scientists and is easy to interpret and explain to a non-technical audience.
Now let’s look at an example of SVM I found on another blog. Say we are a farmer and are trying to figure out the best location to put up a fence to protect our cows from wolves or other predators. Let’s use SVM to answer this classification problem. The farmers know the natural location where the cows and wolves like to be. Let’s represent the cows with Xs and wolves with Os. The SVM makes the best estimate of where the fence should be by looking at the maximum separation between the are with the Xs and the area with the Os.
The brown area is where we might expect wolves to be in the future. The blue area is where we might expect cows to be in the future – with or without the fence. In our case, the SVM is a non-linear curved boundary shaded in blue in the image above, which is where the farmer might consider building the fence to keep the cows safe. Machine learning has been used to classify and predict the animals’ locations.