Before you begin, select a pre-trained model from Huggingface downloads. In the following example, we will use the BAAI/bge model with the Optimum utility to convert the model into the ONNX (Open Neural Network Exchange) format.
optimum-cli export onnx --opset 16 --trust-remote-code -m BAAI/bge-small-en-v1.5 bge-small-en-v1.5-onnx
After the conversion to ONNX, perform the following:
- Update the dynamic dimensions on the input and output to ensure compatibility with different input sizes.
ONNXEmbeddings are compatible with symbolic dimensions on input.
- Fix the opset in the ONNX file for compatibility with ONNX runtime.
- Remove the tokens embedding in the output to save I/O during processing to optimize the model's run efficiency.
You can use the following Python codes to perform the updates:
import onnx import onnxruntime as rt import transformers from onnxruntime.tools.onnx_model_utils import * from sentence_transformers.util import cos_sim from sentence_transformers import SentenceTransformer import teradataml as tdml import getpass op = onnx.OperatorSetIdProto() op.version = 16 model = onnx.load('bge-small-en-v1.5-onnx/model.onnx') model_ir8 = onnx.helper.make_model(model.graph, ir_version = 8, opset_imports = [op]) #to be sure that we have compatible opset and IR version # fixing the variable dim sizes in our mode rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "batch_size", 1) rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "sequence_length", 512) rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "Divsentence_embedding_dim_1", 384) #remove useless token_embeddings output from the model for node in model_ir8.graph.output: if node.name == "token_embeddings": model_ir8.graph.output.remove(node) #saving the model onnx.save(model_ir8, 'bge-small-en-v1.5-onnx/model_fixed.onnx')
Once you have saved your model, you can load it into the database as you would any other model. The conversion with optimum-cli will also produce a tokenizer.json file which is needed to load into the database just like models in the previous example using the following table definition:
CREATE SET TABLE embedding_tokenizers ( tokenizer_id VARCHAR (30), tokenizer BLOB ) PRIMARY INDEX (tokenizer_id);Once both the model and tokenizer are loaded you can start running queries against your text input.