The purpose of the Scale function is to normalize the input data set. The function shifts the input data and scales it to generate the normalized values.
Scaling is a preprocessing step for many data sets. Many data sets consist of variables that are measured in different units, or have very different ranges or variances. If the data set is not scaled, some columns may have a much greater influence on the results than others, which can produce misleading results. Some functions to consider for scaling are KMeans, Principal Component Analysis (PCA) and VARMAX.
For example, an insurance company wants to use the KMeans function to cluster customers according to their house data, which includes square feet (sqft), number of rooms (numrooms), and price. As the following table shows, the numeric values of these variables differ significantly because they are measured in different units. If the input data is not scaled, the effect of house price dominates the clustering due to its larger variance.
The following table shows the data after normalization with the Scale function using the MAXABS method. All three columns are now on a similar scale and price no longer dominates the KMeans clustering.