Production Deployment | Open Analytics Framework | VantageCloud Lake - Production Deployment - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The workflow described in Sample Use Cases in Open Analytics Framework focuses on the personas that conduct data science development by creating their own user environments to place their scripts and machine learning models. It is these tasks that precede and enable machine learning model scoring and validation of scores. You can then continually refine the models and iterate the scoring tasks to select champion models, as well as to monitor and improve model health.

With the selection of one or more models to deploy to production, Teradata recommends creating a separate user account that will be dedicated to running qualified scoring jobs. This account could exist either on the same system where models have been tested, or on a different VantageCloud Lake system. In this production user account, you can keep repeating as needed the same processes of user environment creation, as well as uploading the tested and qualified scoring script(s) and associated machine learning models.

To run a scoring script, following two options exist:
  • Issue a SQL command to call the APPLY table operator.
  • Run the Python script on a client platform, using an application that can connect to a Vantage system and invoke teradataml APIs.

VantageCloud Lake offers the choice of running the scoring job in various analytic compute group cluster sizes that range from extra small (1 node), to small (2 nodes), medium (4 nodes), large (8 nodes), extra large (16 nodes), double extra large (32 nodes), and triple extra large (64 nodes). This flexibility is important, as a larger size cluster will complete the scoring job faster than a smaller cluster. Therefore, an organization can reduce the cost and improve efficiency of running a job on a regular basis by choosing the optimum cluster size. This optimal choice can be determined through trial runs on different cluster sizes or derived from the previous runs by the personas that run the data science tasks.