Nondeterministic Results from Functions | Teradata Vantage - Nondeterministic Results and UniqueID Syntax Element

Nondeterministic Results from Functions | Teradata Vantage - Nondeterministic Results and UniqueID Syntax Element - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

9.02

9.01

2.0

1.3

Published

February 2022

Language

English (United States)

Last Update

2022-02-10

dita:mapPath

rnn1580259159235.ditamap

dita:ditavalPath

ybt1582220416951.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Some ML Engine functions are nondeterministic; that is, repeated runs using the same input tables and syntax element values might produce different results. Nondeterministic results can occur for the following reasons:

Different cluster configurations
The same function call run on clusters with different numbers of vworkers (that is, a different worker pod configuration) can have different results, because the data is distributed differently across workers. An example is DecisionForest, where each worker builds a set of trees based on its data partition. If the data is partitioned differently, as it might be on a different cluster, the set of trees produced varies across different configurations.
Nondeterministic data transfer
Data transfer from Advanced SQL Engine to ML Engine is nondeterministic; that is, rows are transferred in random order and the data is distributed differently among workers across function runs. Nondeterministic data transfer affects functions for which data distribution and row-processing order are important.

If the function has a partition key, you can ensure repeatable results with the PARTITION BY and ORDER BY clauses.
The function is based on an algorithm that has a random component.
Results differ from run to run, due to the random nature of the algorithm. Some ML Engine functions have a Seed syntax element that their algorithms use for repeatable results. However, because of nondeterministic data transfer between Advanced SQL Engine and ML Engine, using the Seed syntax element alone may not guarantee repeatable results.