what is gini index in decision tree

11 months ago 20
Nature

The Gini Index is a measure of impurity used in decision trees to determine the probability of a random instance being misclassified when chosen randomly. It measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen. The lower the Gini Index, the better the likelihood of correct classification. The formula for Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one. The Gini Index has a minimum (highest level of purity) of 0 and a maximum value of 0.5. When building a decision tree, the attribute/feature with the least Gini Index is preferred as the root node. The Gini Index is used to originate binary splits in non-parametric decision tree learning techniques that provide regression or classification trees, relying on whether the dependent variable is categorical or numerical respectively. The Gini Index is simple to implement and favours mostly the larger partitions. An alternative to the Gini Index is the Information Entropy, which is based on the concept of entropy, the degree of impurity or uncertainty, and aims to decrease the level of entropy from the root nodes to the leaf nodes of the decision tree.