Nondeterministic Results - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢
Some functions are nondeterministic; that is, repeated runs using the same input tables and argument values might produce different results. Nondeterministic results can occur for the following reasons:
  • Different cluster configurations

    The same function call run on clusters with different numbers of workers can have different results because the data is distributed differently across workers. An example is DecisionForest, where each worker builds a set of trees based on its data partition. If the data is partitioned differently, as it might be on a different cluster, the set of trees produced varies across different configurations.

  • Nondeterministic data transfer

    Data transfer from the database to the ML Engine is nondeterministic; that is, rows are transferred in random order and the data is distributed differently among workers across function runs. Nondeterministic data transfer affects functions for which data distribution and row-processing order are important.

    If the function has a partition key, you can ensure repeatable results with the partition key and ORDER BY clause.

  • The function is based on an algorithm that has a random component.

    Results differ from run to run, due to the random nature of the algorithm.

Some ML Engine functions have a Seed argument that their algorithms use for repeatable results. However, because of nondeterministic data transfer between the database and the ML Engine, using the Seed argument does not guarantee repeatable results. If you observe nondeterministic results across runs with significant variation, contact Teradata Customer Support for assistance.