###### (Image from Microsoft)

Last week I began this four-part series on unsupervised machine learning concepts by talking about K-means clustering. Today I want to switch gears by describing singular value decomposition unsupervised machine learning. According to Kirk Baker in his Singular Value Decomposition (SVD) Tutorial, the basic idea is ‘taking a high dimensional, highly variable set of data points and reducing it to a lower dimensional space; in other words, SVD can be seen as a method for data reduction.’ Dimensionality reduction is usually done to get better features when you’re trying to classify data for machine learning tasks.

A great non-math heavy example is from Stack Overflow. Suppose you have a list of 100 movies and 1000 people and for each person, you know whether they like or dislike each of the 100 movies. Since machine learning processes numbers, we convert whether a person likes or dislikes a movie into a numerical representation. So for each person or instance, you have a vector with a length of 100. The vector is binary in that a 0 means the person dislikes the move and a 1 means the person likes the movie.

You can perform a machine learning task on these vectors directly, but instead you could choose 5 genres of movies and using the data you already have, figure out whether the person likes or dislikes the entire genre. In this way, you are reduce your data from a vector of size 100 movies into a vector of size 5. The vector of length 5 can be thought of as a good representative or sample of the vector of length 100 because most people might be liking movies only in their preferred genres.

Note that by reducing the vector dimensionality from 100 to 5, we’re not going to get an exact representative of the person’s behavior because there might be cases where a person hates all movies of a genre except one. The reduced vector conveys most of the information in the larger one while consuming a lot less space and being faster to compute with machine learning.

Some other SVD examples are from the Make it Easy (Python) and UCLA Institute for Digital Research (R) blogs. Some of the best white papers are from Balabit and Abidin; Stanford University still has one of the best YouTube video explanations and some great GitHub source code is from J2kun (Python) and Benli11 (R). Other technology applications include image compression, recommender systems, numerical weather forecast, natural language processing, and facial recognition.

Next week we’ll take a look at Principal Component Analysis.