Scatter Plot

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

Teradata Warehouse Miner
User Guide

Scatter plots are useful to identify relationships and outliers across two and/or three different variable combinations. These types of plots are used to investigate the possible relationship between two or three variables that both relate to the same “event”. Often, inferences can be made depending upon the cluster of points within the scatter plot. For example:

  • There may be a positive correlation if the points are clustered in a band running from lower left (i.e., (0,0)) to upper right.
  • There may be a negative correlation if the points are clustered in a band running from upper left to lower right.
  • If a straight line or curve can be drawn through the data so that it “fits” as well as possible, the more the points cluster closely around the imaginary line of best fit, the stronger the relationship that exists between the two variables.

The Scatter Plot analysis can readily be applied to any type of numeric data, using the Teradata SAMPLE extension to plot a random selection of data points across two or three dimensions. The axes are scaled based on the range of data returned from the sample size specified. A practical limit has been set at 30000 data points.

The Scatter Plot analysis is parameterized by specifying the databases, tables and columns to analyze, options unique to the Scatter Plot analysis, as well as specifying the desired results and SQL or Expert Options.