Canopy - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Canopy clustering is a simple, fast, accurate method for grouping objects into preliminary clusters. It is often performed in preparation for more rigorous clustering techniques, such as k-means clustering.

The canopy clustering algorithm uses a fast approximate distance metric and two distance thresholds, T1 (loose distance) and T2 (tight distance), with T1 greater than T2. A point is assigned to a canopy if the distance from the point to the canopy center is less than T2, and can be assigned to a canopy if the distance from the point to the canopy center is less than T1. A point can be assigned to more than one canopy.

For distance measurement, the Canopy function uses Euclidean distance.
If there are more than 10,000 canopy centers, the function fails. Run the function again, increasing the value of T2 (specified by the TightDistance argument).
The canopy clustering algorithm is nondeterministic, and the randomness of the canopy assignments cannot be controlled by a seed argument (for more information, see Nondeterministic Results).