A decision tree model has a root node, which is associated with all the training data set used to build the tree. Each tree node is either a decision node or a leaf node.
A decision node represents a split in the data based on the value of a single input or predictor variable. A decision node has descendant nodes.
A leaf node represents a subset of the data that has a particular value of the predicted variable (for example, the resulting class of the predicted variable) and a measure of accuracy. A leaf node has no descendent nodes.
- How to split data at each decision node
- When to stop splitting each decision node and make it a leaf
- Which class to assign to each leaf node
Typically, you use a decision tree to score or classify new data, creating a new table containing key fields and the predicted value or class identifier. However, if the new data includes the values of the predicted variable, you can use those values to measure the effectiveness of the decision tree.