The Data Scientist uses Machine Learning

online-learning-300x200

Image from University of Texas-Austin

Today is part three of my mini-series on the Anatomy of a Data Scientist. I spent the first two weeks talking about the data scientist needing problem solving and analytical skills, and how the data scientist uses statistics in their job. Today we’re going to learn about the role of machine learning in the skill set needed to be an effective data scientist. We’re going to look at the boundaries between statistics, coding and their role within this emerging field.

Data science and machine learning are inclusive to one another. One of the best definitions I have seen is that the Data Scientist generally determines which machine learning approach to use, models the algorithms and prototypes and tests it using a coding language such as R or Python. Machine learning is a way to find patterns from the past to predict what can possibly happen in the future. I like to think the Data Scientist is responsible more for data strategy in that they decide which algorithm to use to solve the problem and the machine learning engineer implements the algorithm into production at a large scale.

(ImScreenshot_1age by Drew Conway)

There are two types of machine learning: supervised and unsupervised or predictive and descriptive. Five main steps are used in machine learning: collecting the data, preparing the data, training a model, evaluating the model and improving the performance. Keep in mind the key point of machine learning is to quantitatively answer a business problem. I think it can be easy for us (aspiring) and actual data scientists that we’re using all of these tools to answer a problem and derive value for the organization.  We do this by gaining knowledge from the data.

One of the first machine learning problems most people do when they are first learning this discipline is with a data set that is trying to answer the question of predicting what type of iris flower will you get given certain flower characteristics.  There are many tutorials on this: http://machinelearningmastery.com/machine-learning-in-python-step-by-step/ and http://scikit-learn.org/stable/tutorial/basic/tutorial.html. The gist of machine learning is to describe characteristics in numerical terms so that the future can be predicted. Next week I’ll conclude the ‘Anatomy of a Data Scientist’ series by looking at the soft skills needed to become a data science ninja.

 

Advertisements

One thought on “The Data Scientist uses Machine Learning

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s