TD_WordEmbeddings Examples | WordEmbeddings - Example: How to Use TD_WordEmbeddings - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

The InputTable, wordEmb_inputTable, used for the token-embedding operation and doc-embedding operation examples, is as follows:

doc_id doc1 doc2
1 I like pizza I love pizza
2 single_token token
3 food is delicious dinner is yummy
4 tokyo hosting olympics food is delicious
5 person xyz was assisted by nurses few medics helped person xyz

The model table, wordEmbedModel, used for the token-embedding operation and doc-embedding operation examples, is as follows:

doc_id v1 v2 v3 v4
assisted 0.10058 0.1914 0.28125 0.17382
by -0.11572 -0.03149 0.15917 0.13867
delicious -0.18164 -0.13281 0.03906 0.31445
dinner -0.06152 -0.08496 -0.15039 0.42382
few 0.13867 0.02941 -0.18652 0.15039
food -0.18164 -0.16503 -0.16601 0.35742
hosting -0.06396 0.25585 0.04321 0.01721
i -0.22558 -0.01953 0.09082 0.2373
is 0.00704 -0.07324 0.17187 0.02258
helped 0.12695 0.09033 0.26367 0.08544
like 0.10351 0.13769 -0.00297 0.18164
love 0.10302 -0.15234 0.02587 0.16503
nurses -0.04638 -0.14257 -0.34179 0.21582
olympics -0.39648 0.02038 0.07275 0.24414
person 0.27539 0.24707 0.01721 0.16796
pizza -0.12597 0.02539 0.16699 0.55078
medics 0.05981 0.26171 0.16894 0.60156
token 0.04174 0.2041 -0.26757 0.29882
tokyo -0.05664 -0.05029 -0.0075 0.23828
was 0.026 -0.00189 0.18554 -0.05175
xyz -0.01574 -0.13476 0.1582 0.11328
yummy -0.18945 0.06591 -0.00417 0.43359

Example: TD_WordEmbeddings SQL Call Using token-embedding Operation

SELECT * FROM TD_wordembeddings (
ON wordEmb_inputTable AS InputTable
ON wordEmbedModel AS ModelTable DIMENSION
USING
IDColumn('doc_id')
ModelVectorColumns('[1:4]')
PrimaryColumn('doc1')
Operation('token-embedding')
MODELTEXTCOLUMN('token')
)AS dt ORDER BY doc_id ASC;

TD_WordEmbeddings Output Table Using token-embedding Operation

id token         v1        v2       v3       v4
-- -----         --        --       --       --
1  i             -0.22558  -0.01953  0.09082  0.2373
1  like           0.10351   0.13769 -0.00297  0.18164
1  pizza         -0.12597   0.02539  0.16699  0.55078
2  single_token   0         0        0        0
3  delicious     -0.18164  -0.13281  0.03906  0.31445
3  is             0.00704  -0.07324  0.17187  0.02258
3  food          -0.18164   0.16503 -0.16601  0.35742
4  olympics      -0.39648   0.02038  0.07275  0.24414
4  hosting       -0.06396   0.25585  0.04321  0.01721
4  tokyo         -0.05664  -0.05029 -0.0075   0.23828
5  nurses        -0.04638  -0.14257 -0.34179  0.21582
5  person         0.27539   0.24707  0.01721  0.16796
5  assisted       0.10058   0.1914   0.28125  0.17382
5  was            0.026    -0.00189  0.18554 -0.05175
5  by            -0.11572  -0.03149  0.15917  0.13867
5  xyz           -0.01574  -0.13476  0.1582   0.11328

Example: TD_WordEmbeddings SQL Call Using doc-embedding Operation

SELECT * FROM TD_wordembeddings (
ON wordEmb_inputTable AS InputTable
ON wordEmbedModel AS ModelTable DIMENSION
USING
IDColumn('doc_id')
ModelVectorColumns('[1:4]')
PrimaryColumn('doc1')
Operation('doc-embedding')
MODELTEXTCOLUMN('token')
Accumulate('doc1')
)AS dt ORDER BY doc_id ASC;

TD_WordEmbeddings Output Table Using doc-embedding Operation

doc_id v1        v2       v3      v4      doc
------ --        --       --      --      ---
1      -0.08268  0.04785  0.08494 0.32324 i like pizza
2       0        0        0       0       single_token
3      -0.11874 -0.01367  0.01497 0.23148 food is delicious
4      -0.17236  0.07531  0.03615 0.16654 tokyo hosting olympics
5       0.03735  0.02129  0.07659 0.1263  person xyz was assisted by nurses

The InputTable, wordEmb_inputTable2, used for the token2token-similarity operation and doc2doc-similarity operation examples, is as follows:

doc_id Token1 Token2
1 food delicious
2 pizza food
3 love like
4 nurses olympics

Example: TD_WordEmbeddings SQL Call Using token2token-similarity Operation

SELECT * FROM TD_wordembeddings (
ON wordEmb_inputTable2 AS InputTable
ON wordEmbedModel AS ModelTable DIMENSION
USING
IDColumn('token_id')
ModelVectorColumns('[1:4]')
PrimaryColumn('token1')
SECONDARYCOLUMN('token2')
Operation('token2token-similarity')
MODELTEXTCOLUMN('token')
Accumulate('token1','token2')
)AS dt ORDER BY token_id ASC;

TD_WordEmbeddings Output Table Using token2token-similarity Operation

doc_id Similarity Token1    Token2
------ ---------- ------    ------
1      0.64836    food      delicious
2      0.71667    pizza     food
3      0.31491    love      like
4      0.21295    nurses    olympics

Example: TD_WordEmbeddings SQL Call Using doc2doc-similarity Operation

SELECT * FROM TD_wordembeddings (
ON wordEmb_inputTable AS InputTable
ON wordEmbedModel AS ModelTable DIMENSION
USING
IDColumn('doc_id')
ModelVectorColumns('[1:4]')
PrimaryColumn('doc1')
SECONDARYCOLUMN('doc2')
Operation('doc2doc-similarity')
MODELTEXTCOLUMN('token')
Accumulate('doc1','doc2')
)AS dt ORDER BY token_id ASC;

TD_WordEmbeddings Output Table Using doc2doc-similarity Operation

doc_id Similarity doc1                              doc2
------ ---------- ----                              ----
1      0.96055    i like pizza                      i love pizza
2      0          single_token                      token
3      0.97761    food is delicious                 dinner is yummy
4      0.88368    tokyo hosting olympics            food is delicious
5      0.94299    person xyz was assisted by nurses few medics helped person xyz