If you plan to use ONNXEmbeddings or ONNXSeq2Seq, there is an added tuning parameter for cufconfig. It is IntraOpThreads and can be set to a value between 1 and the maximum number of AMPS per node.
This tuning parameter is for managing the system resource utilization when ONNXEmbeddings and ONNXSeq2Seq functions are run. The default value is 25% of the number of AMPS per node. To modify the default value, first determine the number of amps per node and then set the value to the desired number of amps. Restart the JVM for the new value to take effect.
For example, to set the value to 10 threads, change the current JVMOptions from:
JVMOptions: -XX:+UseBCFIPS
to:JVMOptions: -XX:+UseBCFIPS -DintraOpThreads=10
IntraOpThreads impacts BYOM ONNXEmbeddings and ONNXSeq2Seq function performance. If a higher performance is desired, you can increase the IntraOpThreads value up to the number of amps per node. Be aware that a higher value may affect the performance of other workloads actively running when the BYOM function is run.