Decision trees are very simple models. For example, suppose you want to predict the value of a variable, y, and you have two predictor variables, x1 and x2. You want to model y as a function of x1 and x2 (y = f(x1, x2)).
You can visualize x1 and x2 as forming a plane, and values of y at particular coordinates of (x1, x2) rising out of the plane in the third dimension. A decision tree partitions the plane into rectangles and assigns each partition to predict a constant value of y, which is usually the average value of all the y values in that region. You can extend this two-dimensional example into arbitrarily many dimensions to fit models with large numbers of predictors.
In this example, the x1-x2 plane has four regions, R1, R2, R3 and R4. The predicted value of y for any test observation in R1 is the average value of y for all training observations in R1.
This information can be represented by a decision tree:
The algorithm starts at the Root node. If the x1 value for a data point is greater than 5, then the algorithm travels down the right path; if the value of x1 is less than 5, then the algorithm travels down the left path. At each subsequent node, the algorithm determines which branch to follow, until it reaches a leaf node, to which it assigns a prediction value.