Decision Tree and Random Forest

5 min readApr 2, 2021

There are various machine learning algorithms that we interact with within our day-to-day life and use without even knowing it. One of the most used algorithms that one person uses in his daily life is a decision tree. The question that now arises is how?

We humans have to make hundreds of decisions every day whether it be watching a YouTube video to Purchasing Groceries or Placing an order on an online shopping site. We get several thoughts and recommendations. We put it all into our minds and finally make a decision. Sounds easy right? Now let us understand how decisions are made by machine learning algorithms and what is actually the meaning and need of it.

Table of Contents:

1.) Decision Tree

Types of nodes
Important Terminologies
Algorithm

2.) Random Forest

Ensemble Techniques
Algorithm

3.) Comparison

Decision Tree:

According to Wikipedia “A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.”

In general, a decision tree asks a question and based on the answer classifies the object/person/feature. The classification can be categorical or numeric. For the most part decision trees are pretty intuitive to work with; you start with the top and go down till you get to a point where you further cannot go down; that’s how you classify a sample.

Let us take an example to understand what a decision tree looks like. We have to identify whether a given animal is a Hawk, Penguin, Dolphin, or Bear. So we start with whether it has feathers or not if so it is a bird that is either a Hawk or a penguin and if it doesn’t have feathers it can either be a Dolphin or Bear. Now in the next turn, we ask whether it can fly or not, if yes it is a Hawk if no it is a Penguin. For the other we ask, does it have fins? If yes; it is a dolphin if no it is a Bear. So a simple decision tree is formed as below

Types of nodes in a decision tree:

⇒ The Top of the tree is called “The Root node” or just “The Root”.

⇒ Then there are the “Internal nodes” or we can say “ Decision nodes”. These nodes have arrows pointing toward them also pointing away from them. They show a decision to be made.

⇒Lastly the leaf nodes or just Terminal nodes. They have arrows pointing towards them but not pointing away from them. They show the outcome of a decision path.

Important Terminologies:

⇒ Entropy: It is known as the measure of randomness or we can say entropy is used as a way to measure how “mixed” a column is.

⇒ Information Gain: This helps us understand the total entropy that went down after a split. It can be calculated as (Entropy before the split — Weighted entropy after the split)

⇒ Impurity: This tells us about how much the data is mixed in the result. This helps check the homogeneity

Note: Pure node has 0 entropy.

Now let’s see that how this algorithm is applied by a machine:

Find attribute for first branching.
Check impurity by choosing one node of any feature.
Calculate entropy of sub-node and information gain.
Repeat it for another feature to start the process.
Dig down to sub-node splitting and calculation of entropy and information gain.
Choose best from this recursive partitioning.

Random Forest:

As the name suggests; a random forest is a collection of various decision trees built together for a large data set. The correct definition of a random forest according to Wikipedia “Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean/average prediction of the individual trees.”

Random Forest basically uses two techniques.

a.) Bootstrapping: When we have a very large data set, making a single decision tree may not give a good output. So we randomly select some features and some observations and make N number of different decision trees with these randomly selected features and observations. But we make sure that all the features and observations are being selected at least once.

b.) Bagging: This technique is used when we want to produce some decision from the random forest we created. So we take some n different decision trees; pass our data from those trees and then take a cumulative output to make the decision.

Algorithm for the random forest:

1. For b =1 to B

(a) take bootstrap sample z* of size N from training data (b) Grow random forest tree Tb (recursive repeating for each terminal node) i. select randomly m variable from p variable ii. Spit best variable node/split point from m iii. Split node into two daughter node

2. Output the ensembles of trees {𝑻𝒃} 1 to B

When to choose a Decision tree or Random Forest?

Decision Trees can be used when the dataset is comparatively small as for larger datasets e.g. 10,00,000 data and 10,000 attributes it takes hours and days to classify. Whereas Random forest can be used when we have such a large dataset as it breaks the data into smaller pieces.

⇒ makes decision trees

⇒ averages the values of the decision tree

⇒ does bagging

⇒ calculates accurate estimation.

Random forests are sometimes difficult to interpret, so if we want the output to be interpretable we should choose a decision tree as it is comparatively simple to interpret.