Machine Learning: The Cost of Decisions

February 13, 2019

By Jason M. Pittman, DSc

Robot playing chess

Welcome back to another instance of demystifying machine learning.

Recently, we wrapped up the overarching logic associated with decision trees. You’ll recall that decision trees are just one of many machine learning algorithms, each type mapping to a specific type of problem. Thus, we ought to be able to articulate what a decision tree is in both classification and regression forms as well as what a simple classifier looks like in pseudocode.

You’ll also recall that the essence, the secret sauce, of decision trees exists in the splitting between options. How exactly is a split computed? Let's find out!

Overview of Split Costs

At first glance, split cost computation seems like a harder aspect of decision trees to grasp, compared to the ground we've covered so far. Let's build a little clarity first: just as algorithms pair with problems, cost computations pair with algorithm types. That said, there are some common attributes across different cost functions.

For instance, all cost functions seek to find the lowest cost associated with a given split. Our algorithms will tend towards computing the cost of every split in the descision tree and then learning by mapping the connections between lowest cost nodes. As always, there are some critical differences too.

Let's start with classification and Gini functions. Gini functions are common split cost calculations in machine learning. Notice the plural? That's right-- there is one form associated with binary decisions because it handles dichotomous decisions (e.g., yes or now, true or false, and so forth. Thus, splitting occurs at the junction of such decisions. There is another form associated with non-binary decisions of course.

The first, binary classification form is Gini coefficient. Related in name only, the non-binary form is Gini impurity and handles multiclass situations. It is important to know these forms are not equivalent or interchangeable.

By the way, Gini is pronounced liked genie.

Split Cost Illustrated

As mentioned earlier, split cost computation may seem like a more difficult part of our machine learning work. I'm going to illustrate why that's not actually true despite involving...yes, that's right - math. Let's use a slightly modified table of student course preferences to illustrate the data.

Split cost table

Broadly defined, the equation for a Gini function used in a classifier looks something like:

Gini=1-∑(objects-in-class/total-objects)2

To illustrate, we have 2 classes (liked) with two attributes (prior experience, time) and a total of ten samples. There are 5 instances of the yes class in the sample and 5 in the no class. Accordingly, we can encode this into the following.

1 - [(2 / 10)] 2 + [(2 / 10)]2 = ?

What do you think the result is here? What does the value represent? We'll look deeper into the answers next time as well as alternative cost functions.