Optional Syntax Elements for TD_WordEmbeddings - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢
SecondaryColumn
Name of the input table column that contains the text. This field is applicable for the token2token-similarity and doc2doc-similarity operations only.
Accumulate
List of columns to be added to the output from the input table. This is not applicable with the token-embedding operation.
Operation
Operation to be performed on the data. Options are:
  • token-embedding: Emits vectors to all tokens in the column. Each token present in the specified text column is mapped to a vector of real numbers that represents the semantic meaning of that token. For example, the word "dog" might be represented by the vector [0.1, 0.2, 0.3, 0.4, 0.5], where each number represents a different aspect of the meaning of the word.
  • doc-embedding: Vectorizes each token in the document and combines them. For example, the document "The dog ran across the street" might be represented by the vector [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], where each number represents a different aspect of the meaning of the document.
  • token2token-similarity: Computes the similarity between tokens and quantifies the result value. It measures how similar or related two tokens are based on their word embeddings. If the word embeddings of two tokens are close in the multi-dimensional space, the similarity value will be higher, indicating a semantic similarity between the tokens. For example, the similarity between the words "dog" and "cat" would be higher than the similarity between the words "dog" and "table".
  • doc2doc-similarity: Computes the similarity between documents and quantifies the result value. It considers the embeddings of two entire documents, which are created using the "doc-embedding" operation. The similarity value reflects how similar or related two documents are in terms of their content. For example, the doc2doc-similarity between the documents "The dog ran across the street" and "The cat sat on the mat" would be higher than the doc2doc-similarity between the documents "The dog ran across the street" and "The apple fell from the tree".
Default value: token-embedding
RemoveStopWords
Stop words in English include words such as "the", "and", "in", "of", "to", "is", "it", "on", "at", and so on. All stop words present in the input table text are removed before any operation is performed. Applicable to all operations except token2token-similarity. Default is False.
ConvertToLowerCase
All operations are performed after converting input table text to lowercase letters. Default is True.
StemTokens
Converts word to its root word in the input table, such as converting going to go. Default is False.