1.1 - 8.10 - Nondeterministic Results and UniqueID Syntax Element - Teradata Vantage

Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.1
8.10
Published
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)
Last Update
2019-12-31
Some ML Engine functions are nondeterministic; that is, repeated runs using the same input tables and syntax element values might produce different results. Nondeterministic results can occur for the following reasons:
  • Different cluster configurations

    The same function call run on clusters with different numbers of vworkers (that is, a different worker pod configuration) can have different results, because the data is distributed differently across workers. An example is DecisionForest, where each worker builds a set of trees based on its data partition. If the data is partitioned differently, as it might be on a different cluster, the set of trees produced varies across different configurations.

  • Nondeterministic data transfer

    Data transfer from Advanced SQL Engine to ML Engine is nondeterministic; that is, rows are transferred in random order and the data is distributed differently among workers across function runs. Nondeterministic data transfer affects functions for which data distribution and row-processing order are important.

    If the function has a partition key, you can ensure repeatable results with the PARTITION BY and ORDER BY clauses.

  • The function is based on an algorithm that has a random component.

    Results differ from run to run, due to the random nature of the algorithm. Some ML Engine functions have a Seed syntax element that their algorithms use for repeatable results. However, because of nondeterministic data transfer between Advanced SQL Engine and ML Engine, using the Seed syntax element alone may not guarantee repeatable results.