Nondeterministic Results - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Some functions are nondeterministic; that is, repeated runs using the same input tables and argument values might produce different results. Nondeterministic results can occur for the following reasons:

Different cluster configurations
The same function call run on clusters with different numbers of workers can have different results because the data is distributed differently across workers. An example is DecisionForest, where each worker builds a set of trees based on its data partition. If the data is partitioned differently, as it might be on a different cluster, the set of trees produced varies across different configurations.
Nondeterministic data transfer
Data transfer from the database to the ML Engine is nondeterministic; that is, rows are transferred in random order and the data is distributed differently among workers across function runs. Nondeterministic data transfer affects functions for which data distribution and row-processing order are important.

If the function has a partition key, you can ensure repeatable results with the partition key and ORDER BY clause.
The function is based on an algorithm that has a random component.
Results differ from run to run, due to the random nature of the algorithm.

Some ML Engine functions have a Seed argument that their algorithms use for repeatable results. However, because of nondeterministic data transfer between the database and the ML Engine, using the Seed argument does not guarantee repeatable results. If you observe nondeterministic results across runs with significant variation, contact Teradata Customer Support for assistance.