Machine Learning: Decision Trees Finalized
By Dr. Jason M. Pittman, D.Sc.
Welcome back to our ongoing journey into the world of machine learning. Our goal in this series is to help demystify the topic and break down the processes that make it possible not only for humans to teach machines – but also for machines to teach themselves.
In a previous blog post, we took a small side step in our discussion of decision trees to explore terminology and concepts a bit more deeply. From time to time we'll need to do that. Now, though, it is time to get back to hands-on work. Let’s start by finalizing our understanding of decision trees.
Up to this point, we've been examining a binary classification algorithm that will allow us to predict whether a student will like a programming course. The algorithm is supervised, meaning that we have data to use as a training set for the algorithm to how to make our prediction.
The specific use came we have been developing is a classifier for programming courses. That is, will students like a programming class based on prior experience with programming and prior experience with course scheduling (i.e, time of day)?
Our Course Classifier
We already outlined our training dataset. We're left with the secret sauce when we're traversing a decision tree. I'm referring to splitting. which is the action taken when we reaching an interior node or feature. When we split, we're basically sorting data into yes and no subsets represented by the decision tree leafs or end nodes. More precisely, we're splitting into subsets that represent the binary classification categories represented in our question (the implicit yes or no associated with will a given student like a specific programming class?).
Let's look at a general pseudocode example.
function training returns tree
if prior_experience = 'yes' AND course = 'programming' THEN label = 'yes'
if time = 'night' THEN label = 'no'
if prior_experience = 'no' AND course = 'programming' THEN label = 'no'
Such a function clearly isn't complete and the function only represents training. However, I think this demonstrates the simple logic involved. Something that should be made explicit too is that we, as the programmers, are defining the guidelines or rules associated with how the algorithm makes decisions (that is, splits and classifies). While this demystifies the supposed intelligence of such a program, I don't think we've diminished the technological power imparted by using decision trees. Predicting whether a student will like a class is likely to be extremely beneficial for everyone involved. Do you agree?
That, my friends, is a basic supervised binary classification decision tree. Certainly, there is more to the programming implementation than what we've looked at here. However, I think we've seen enough to suitably dispel some of the mystery surrounding how a machine can appear intelligent when making a simple categorical prediction. Fortunately, there is a great deal more to reveal with machine learning. Join me next time as we expose some details of how our algorithms calculate things like cost of a decision and information gain.