UDCF Usage Notes - Teradata VantageCloud Lake

Lake - Using Queries, UDFs, and External Stored Procedures

Deployment
VantageCloud
Edition
Lake
Product
Teradata VantageCloud Lake
Release Number
Published
February 2025
ft:locale
en-US
ft:lastEdition
2025-08-12
dita:mapPath
vgj1683671089901.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
vgj1683671089901
  • AWS Lambda is a regional service. Teradata recommends selecting the region nearest to your Vantage Cloud Lake system. You should also consider the location of other services that are used by your Lambda function. Using services across multiple locations can affect your function’s latency as well as cost.
  • AWS CloudWatch has built-in metrics that you can use to monitor latency of your AWS Lambda function. Currently, UDCFs only support synchronous invocations of AWS Lambda functions.

    Refer to the Function metrics section of AWS Lambda Developer Guide.

Database concurrency

Teradata is a massively multi-parallel database. Multiple processes in the database execute the UDCF concurrently depending on whether those processes have rows associated with them in the query.

Concurrency in Scalar UDCFs

Scalar UDCFs are invoked concurrently by each process that has data (e.g., a data literal or data in the FROM clause). The data may be distributed evenly or be skewed among such processes. The data skew may limit the number of processes participating in the scalar UDCF execution. Non-participating processes will not invoke the AWS Lambda function. See Indexes and Maps for more information about distributing data in a query.

Scalar UDCFs are invoked per row. Each scalar UDCF invocation in turn makes a single AWS Lambda invocation to process the row. The total number of requests made to the AWS Lambda service is the number of rows in the query * the number of processes participating in the query.

Concurrency in Table Operator UDCFs

Similar to scalar UDCFs, table operator UDCFs are also invoked concurrently by processes that have data. The ON clause of the table operator specifies the data and how that data is distributed. Typically, rows are distributed as evenly as possible to maximize the work done by each process. Data skew limits the number of processes participating in the table operator UDCF execution. See Indexes and Maps for more information about distributing data in a query.

A process executing the table operator UDCF will batch rows from its chunk of the data in the ON clause then send a request with those rows to the AWS Lambda. This significantly reduces the number of AWS Lambda invocations compared to the Scalar UDCF. Non-participating processes that do not have data will not invoke the AWS Lambda function.

Use the following information to estimate how many requests (on average) are made to the AWS Lambda endpoint in a table operator UDCF query:
  • Number of rows in the table or subquery specified in the ON clause.
  • The average byte size of a row in the table or subquery specified in the ON clause. The average byte size of a row is the sum of the average byte size of each column in the row.
  • The buffer size is currently an internal parameter. It specifies the number of bytes for a process to send in a request to the AWS Lambda service. This value depends on the AWS Lambda quota for a synchronous request (refer to the Lambda quotas section in the AWS Lambda Developer Guide). The buffer size is currently half the AWS Lambda quota for a synchronous request but may change in the future. This is about 3MB at the time of writing.

The equation to estimate the average number of AWS Lambda invocations follows:

ceil((number of rows * average input row size) / buffer size)

This equation does not depend on the number of processes executing the table operator.
  • If data is distributed evenly, each participating process in the query will make average number of AWS Lambda invocations/number of participating processes AWS Lambda invocations.
  • If data is skewed, the number of Lambda invocations per participating process depends on the number of rows assigned to the process and the average row size of the rows assigned to that process. The buffer size remains constant.

Error handling

  • Lambda functions are expected to throw exceptions when an error is encountered during runtime. Use CloudWatch logs in your AWS account to investigate and debug the issue in your lambda code. Errors received from the Lambda endpoint are generalized to ensure no secrets, PII, or PHI are stored in the database through an error message.

    Query failures are returned as Failure 7504, followed by an error code prefixed with SQLSTATE U0D or SQLSTATE U0C, followed by an error message. The U0D error code means the error is potentially due to the AWS Lambda Function. For example, no response is returned (perhaps due to an exception being raised), a response is returned that does not follow the Data Format, or a response is returned that does not follow the output types of the UDCF.

    Refer to the Troubleshooting section in AWS Lambda Developer Guide