TD_WordEmbeddings Function | WordEmbeddings | Teradata - TD_WordEmbeddings - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

Word embedding is the representation of a word/token in multi-dimensional space such that words/tokens with similar meanings have similar embeddings. Each word/token is mapped to a vector of real numbers that represent the word/token. The Analytics Database function TD_WordEmbeddings produces vectors for each piece of text and can find the similarity between the texts. The options are token-embedding, doc-embedding, token2token-similarity, and doc2doc-similarity.

The ModelTable contains pretrained words/tokens and their corresponding vector mappings in multidimensional space. You can use pre-defined vectors from Word Vectors, or train your own using packages such as GloVe or Word2Vec. Note that the ModelTable format expects the vectors in GloVe format (one word/token vector pair per row ). To convert a Word2Vec file, simply delete the first row which contains the number of words or tokens and the number of dimensions.

  • This function supports CHARACTER SET LATIN.
  • This function does not support CHARACTER SET UNICODE.