Flow Introduction | VantageCloud Lake - Introduction to Flow (AWS)

Flow Introduction | VantageCloud Lake - Introduction to Flow (AWS) - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

Flow is available only on AWS and does not support Bring Your Own IdP (BYOIDP).

The Flow service supplies data to VantageCloud Lake for initial loads, continuous loads, and ad hoc data exploration. You can schedule flows.

You can use Flow from the VantageCloud Lake Console.

A flow is a process that performs autonomous and continuous data loading and recovers from network and database failures. Flows run on the primary cluster. Flow guarantees that data is loaded exactly once.

Source Data

Source data for a flow comes from one or more object storage paths. A source is a path in an external object storage containing files. Supported file formats are CSV and Parquet.

You cannot join data from multiple sources in a flow.

Flow loads source files in lexicographical order. For data sources that continuously add to your object storage bucket, consider including a timestamp in the file path.

Target Tables

If you do not specify an existing target table, Flow creates a target table, inferring the schema from the source data (see Schema Inference).

If you do specify an existing target table, Teradata recommends using schema hints to override the schema that Flow infers from the source data and help prevent load errors (see Schema Hints).

Source-Target Relationships

A flow can have up to five sources, but each source can have only one target. However, multiple sources in a flow can reference the same target table. For example, sources A, B, and C can have one target table, D.