Optional Syntax Elements for TD_WordEmbeddings - Analytics Database
Database Analytic Functions
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Analytics Database
- Release Number
- 17.20
- Published
- June 2022
- ft:locale
- en-US
- ft:lastEdition
- 2025-07-09
- dita:mapPath
- gjn1627595495337.ditamap
- dita:ditavalPath
- qkf1628213546010.ditaval
- dita:id
- jmh1512506877710
- Product Category
- Teradata Vantageā¢
- SecondaryColumn
- Name of the input table column that contains the text. This field is applicable for the token2token-similarity and doc2doc-similarity operations only.
- Accumulate
- List of columns to be added to the output from the input table. This is not applicable with the token-embedding operation.
- Operation
- Operation to be performed on the data. Options are:
- token-embedding: Emits vectors to all tokens in the column. Each token present in the specified text column is mapped to a vector of real numbers that represents the semantic meaning of that token. For example, the word "dog" might be represented by the vector [0.1, 0.2, 0.3, 0.4, 0.5], where each number represents a different aspect of the meaning of the word.
- doc-embedding: Vectorizes each token in the document and combines them. For example, the document "The dog ran across the street" might be represented by the vector [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], where each number represents a different aspect of the meaning of the document.
- token2token-similarity: Computes the similarity between tokens and quantifies the result value. It measures how similar or related two tokens are based on their word embeddings. If the word embeddings of two tokens are close in the multi-dimensional space, the similarity value will be higher, indicating a semantic similarity between the tokens. For example, the similarity between the words "dog" and "cat" would be higher than the similarity between the words "dog" and "table".
- doc2doc-similarity: Computes the similarity between documents and quantifies the result value. It considers the embeddings of two entire documents, which are created using the "doc-embedding" operation. The similarity value reflects how similar or related two documents are in terms of their content. For example, the doc2doc-similarity between the documents "The dog ran across the street" and "The cat sat on the mat" would be higher than the doc2doc-similarity between the documents "The dog ran across the street" and "The apple fell from the tree".
- Default value: token-embedding
- RemoveStopWords
- Stop words in English include words such as "the", "and", "in", "of", "to", "is", "it", "on", "at", and so on. All stop words present in the input table text are removed before any operation is performed. Applicable to all operations except token2token-similarity. Default is False.
- ConvertToLowerCase
- All operations are performed after converting input table text to lowercase letters. Default is True.
- StemTokens
- Converts word to its root word in the input table, such as converting going to go. Default is False.