Flow Introduction | VantageCloud Lake - Introduction to Flow (AWS) - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
Flow is available only on AWS and does not support Bring Your Own IdP (BYOIDP).

The Flow service supplies data to VantageCloud Lake for initial loads, continuous loads, and ad hoc data exploration. You can schedule flows.

You can use Flow from the VantageCloud Lake Console.

A flow is a process that performs autonomous and continuous data loading and recovers from network and database failures. Flows run on the primary cluster. Flow guarantees that data is loaded exactly once.

Source Data

Source data for a flow comes from one or more object storage paths. A source is a path in an external object storage containing files. Supported file formats are CSV and Parquet.

You cannot join data from multiple sources in a flow.

Flow loads source files in lexicographical order. For data sources that continuously add to your object storage bucket, consider including a timestamp in the file path.

Target Tables

If you do not specify an existing target table, Flow creates a target table, inferring the schema from the source data (see Schema Inference).

If you do specify an existing target table, Teradata recommends using schema hints to override the schema that Flow infers from the source data and help prevent load errors (see Schema Hints).

Source-Target Relationships

A flow can have up to five sources, but each source can have only one target. However, multiple sources in a flow can reference the same target table. For example, sources A, B, and C can have one target table, D.