Question - Executing a binary classification tree algorithm is a simple task. But how does tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?
Answer -
Gini index and Node Entropy assist the binary classification tree to make decisions. Basically, the tree algorithm determines the feasible feature that is used to distribute data into the most genuine child nodes.
According to the Gini index, if we arbitrarily pick a pair of objects from a group, then they should be of identical class and the probability for this event should be 1.
The following are the steps to compute the Gini index:
- Compute Gini for sub-nodes with the formula: The sum of the square of probability for success and failure (p^2 + q^2)
- Compute Gini for split by weighted Gini rate of every node of the split
Now, Entropy is the degree of indecency that is given by the following:
Where a and b are the probabilities of success and failure of the node
When Entropy = 0, the node is homogenous
When Entropy is high, both groups are present at 50–50 percent in the node.
Finally, to determine the suitability of the node as a root node, the entropy should be very low.