what is entropy in decision tree

1 year ago 69
Nature

In the context of decision trees, entropy is a measure of disorder or impurity in a node. It is used to determine the root node of a decision tree by calculating the entropy for each variable and its potential splits. The goal is to calculate the average entropy across both or all the nodes and then the change in entropy vis-a-vis the parent node. This change in entropy is termed Information Gain and represents how much information a feature provides for the target. Entropy is a logarithmic measure and can be interpreted as the expected error rate in a classifier. The maximum level of entropy or disorder is given by 1 and minimum entropy is given by a value 0. An attribute with the highest information gain from a set should be selected as the parent (root) node, and child nodes should be built for every value of the selected attribute.