In my last blog, I discussed using support vector machine supervised machine learning algorithm to determine where to fence in my cows. During the summer of 2017 I began a five-part series on types of machine learning. That series included more details about k-Nearest neighbor, K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori, Frequent Pattern-Growth and more algorithms. Today I want to expand on the ideas presented in my Decision Tree “Data Science in 90 Seconds” YouTube video and continue the discussion in plain language.
If you recall from earlier discussions, supervised machine learning is the ‘task of inferring a function to describe hidden structure from labeled data’. Unlike unsupervised machine learning, in supervised machine learning, the computer takes observations of data that have a predetermined class or category label. The algorithm then tries to predict future outcomes from these observations.
Prashant Gupta in the “Towards Data Science” blog describes decision trees as is a simple and relatively fast way to classify categories of data when you have a very large data sets (about 100,000 or more data observations). Decision Trees has the root at the top and branches where decisions are made along the tree. As you move along the tree, you can make conclusions about what’s called a target variable. Also like the Naive Bayes algorithm, the Decision Tree can outperform other sophisticated classification methods, is widely used among data scientists and is easy to interpret and explain to a non-technical audience.
Now let’s look at an example by Joel Grus. Let’s pretend we’re playing a game of 20 questions to try to figure out what kind of animal we have based on the animal’s features. At the top of the decision tree is the root question “Is it a mammal?” Underneath the root is a branch that has either “yes” or “no” responses to this question. Then underneath “yes”, there’s a second level that asks “Do people commonly keep it as a pet?” or “Is it a kind of bird?”. Each level in the decision tree further segments the data.
Continue down the tree asking questions until you reach the last answer to the question you are asking. When you reach the bottom of the decision tree, you’ll be able to predict whether the animal is a goat, elephant, squid or spider.
In this example, supervised machine learning with decision tree algorithm has been used to classify what type of animal we will have in the future. Let me know if you have used decision trees for other use cases.