The InputTable, wordEmb_inputTable, used for the token-embedding operation and doc-embedding operation examples, is as follows:
doc_id | doc1 | doc2 |
---|---|---|
1 | I like pizza | I love pizza |
2 | single_token | token |
3 | food is delicious | dinner is yummy |
4 | tokyo hosting olympics | food is delicious |
5 | person xyz was assisted by nurses | few medics helped person xyz |
The model table, wordEmbedModel, used for the token-embedding operation and doc-embedding operation examples, is as follows:
doc_id | v1 | v2 | v3 | v4 |
---|---|---|---|---|
assisted | 0.10058 | 0.1914 | 0.28125 | 0.17382 |
by | -0.11572 | -0.03149 | 0.15917 | 0.13867 |
delicious | -0.18164 | -0.13281 | 0.03906 | 0.31445 |
dinner | -0.06152 | -0.08496 | -0.15039 | 0.42382 |
few | 0.13867 | 0.02941 | -0.18652 | 0.15039 |
food | -0.18164 | -0.16503 | -0.16601 | 0.35742 |
hosting | -0.06396 | 0.25585 | 0.04321 | 0.01721 |
i | -0.22558 | -0.01953 | 0.09082 | 0.2373 |
is | 0.00704 | -0.07324 | 0.17187 | 0.02258 |
helped | 0.12695 | 0.09033 | 0.26367 | 0.08544 |
like | 0.10351 | 0.13769 | -0.00297 | 0.18164 |
love | 0.10302 | -0.15234 | 0.02587 | 0.16503 |
nurses | -0.04638 | -0.14257 | -0.34179 | 0.21582 |
olympics | -0.39648 | 0.02038 | 0.07275 | 0.24414 |
person | 0.27539 | 0.24707 | 0.01721 | 0.16796 |
pizza | -0.12597 | 0.02539 | 0.16699 | 0.55078 |
medics | 0.05981 | 0.26171 | 0.16894 | 0.60156 |
token | 0.04174 | 0.2041 | -0.26757 | 0.29882 |
tokyo | -0.05664 | -0.05029 | -0.0075 | 0.23828 |
was | 0.026 | -0.00189 | 0.18554 | -0.05175 |
xyz | -0.01574 | -0.13476 | 0.1582 | 0.11328 |
yummy | -0.18945 | 0.06591 | -0.00417 | 0.43359 |
Example: TD_WordEmbeddings SQL Call Using token-embedding Operation
SELECT * FROM TD_wordembeddings ( ON wordEmb_inputTable AS InputTable ON wordEmbedModel AS ModelTable DIMENSION USING IDColumn('doc_id') ModelVectorColumns('[1:4]') PrimaryColumn('doc1') Operation('token-embedding') MODELTEXTCOLUMN('token') )AS dt ORDER BY doc_id ASC;
TD_WordEmbeddings Output Table Using token-embedding Operation
id token v1 v2 v3 v4 -- ----- -- -- -- -- 1 i -0.22558 -0.01953 0.09082 0.2373 1 like 0.10351 0.13769 -0.00297 0.18164 1 pizza -0.12597 0.02539 0.16699 0.55078 2 single_token 0 0 0 0 3 delicious -0.18164 -0.13281 0.03906 0.31445 3 is 0.00704 -0.07324 0.17187 0.02258 3 food -0.18164 0.16503 -0.16601 0.35742 4 olympics -0.39648 0.02038 0.07275 0.24414 4 hosting -0.06396 0.25585 0.04321 0.01721 4 tokyo -0.05664 -0.05029 -0.0075 0.23828 5 nurses -0.04638 -0.14257 -0.34179 0.21582 5 person 0.27539 0.24707 0.01721 0.16796 5 assisted 0.10058 0.1914 0.28125 0.17382 5 was 0.026 -0.00189 0.18554 -0.05175 5 by -0.11572 -0.03149 0.15917 0.13867 5 xyz -0.01574 -0.13476 0.1582 0.11328
Example: TD_WordEmbeddings SQL Call Using doc-embedding Operation
SELECT * FROM TD_wordembeddings ( ON wordEmb_inputTable AS InputTable ON wordEmbedModel AS ModelTable DIMENSION USING IDColumn('doc_id') ModelVectorColumns('[1:4]') PrimaryColumn('doc1') Operation('doc-embedding') MODELTEXTCOLUMN('token') Accumulate('doc1') )AS dt ORDER BY doc_id ASC;
TD_WordEmbeddings Output Table Using doc-embedding Operation
doc_id v1 v2 v3 v4 doc ------ -- -- -- -- --- 1 -0.08268 0.04785 0.08494 0.32324 i like pizza 2 0 0 0 0 single_token 3 -0.11874 -0.01367 0.01497 0.23148 food is delicious 4 -0.17236 0.07531 0.03615 0.16654 tokyo hosting olympics 5 0.03735 0.02129 0.07659 0.1263 person xyz was assisted by nurses
The InputTable, wordEmb_inputTable2, used for the token2token-similarity operation and doc2doc-similarity operation examples, is as follows:
doc_id | Token1 | Token2 |
---|---|---|
1 | food | delicious |
2 | pizza | food |
3 | love | like |
4 | nurses | olympics |
Example: TD_WordEmbeddings SQL Call Using token2token-similarity Operation
SELECT * FROM TD_wordembeddings ( ON wordEmb_inputTable2 AS InputTable ON wordEmbedModel AS ModelTable DIMENSION USING IDColumn('token_id') ModelVectorColumns('[1:4]') PrimaryColumn('token1') SECONDARYCOLUMN('token2') Operation('token2token-similarity') MODELTEXTCOLUMN('token') Accumulate('token1','token2') )AS dt ORDER BY token_id ASC;
TD_WordEmbeddings Output Table Using token2token-similarity Operation
doc_id Similarity Token1 Token2 ------ ---------- ------ ------ 1 0.64836 food delicious 2 0.71667 pizza food 3 0.31491 love like 4 0.21295 nurses olympics
Example: TD_WordEmbeddings SQL Call Using doc2doc-similarity Operation
SELECT * FROM TD_wordembeddings ( ON wordEmb_inputTable AS InputTable ON wordEmbedModel AS ModelTable DIMENSION USING IDColumn('doc_id') ModelVectorColumns('[1:4]') PrimaryColumn('doc1') SECONDARYCOLUMN('doc2') Operation('doc2doc-similarity') MODELTEXTCOLUMN('token') Accumulate('doc1','doc2') )AS dt ORDER BY token_id ASC;
TD_WordEmbeddings Output Table Using doc2doc-similarity Operation
doc_id Similarity doc1 doc2 ------ ---------- ---- ---- 1 0.96055 i like pizza i love pizza 2 0 single_token token 3 0.97761 food is delicious dinner is yummy 4 0.88368 tokyo hosting olympics food is delicious 5 0.94299 person xyz was assisted by nurses few medics helped person xyz