Inferred Schemas - Parallel Transporter

Teradata Parallel Transporter User Guide

Product
Parallel Transporter
Release Number
15.10
Language
English (United States)
Last Update
2018-10-07
dita:id
B035-2445
lifecycle
previous
Product Category
Teradata Tools and Utilities

Inferred Schemas

When a producer operator template reference does not include a schema specification, that is, when it does not include either a schema name or a Teradata Database table name, Teradata PT may be able to infer the appropriate schema and, if not a script-defined schema, generate a DEFINE SCHEMA statement for it.

Teradata PT analyzes all operator usage in the job step containing the template reference and attempts to determine what the schema of the template should be in order for the APPLY statement of the job set to make sense.

Example 1

Consider this example.

   STEP LOAD_2
   (
     APPLY  <DML statement(s)> 
     
     TO OPERATOR( $INSERTER() )
     
     SELECT * FROM OPERATOR( $EXPORT() ) 
   
     UNION ALL
 
     SELECT * FROM OPERATOR( EXPORT_OPER2() );
   );       
      

The script-defined producer operator EXPORT_OPER_2 must have been defined previously in the job script, with a schema specification of the name of a script-defined schema. Since the source data extracted by EXPORT_OPER_2 is merged into a single input data stream with the source data extracted by producer template operator $EXPORT, Teradata PT can infer that the schema for both producer operators must be the same and can then use the name of the schema for EXPORT_OPER_2 in the copy of the $EXPORT template that it imports into the job script.

Example 2

In this example, an SQL SELECT statement is inferred as the basis for generating the schema for the $EXPORT producer template:

   STEP INSERT_DAILY_TRANS
   (
     APPLY  <DML statement(s)> 
     
     TO OPERATOR( $INSERTER() )
     
     SELECT *
       
     FROM OPERATOR
     (
       $EXPORT() 
       ATTR
       (
         PrivateLogName = 'daily_trans.log',
         SelectStmt     = 'Select * from Daily_Trans;'
       )
     );
   );       
      

In the above job step, the SelectStmt attribute of the $EXPORT template has been assigned the value Select * from Daily_Trans that, according to the definition of this attribute, must be an SQL SELECT statement. By querying the Teradata Database, Teradata PT can generate a schema based on the columns of the result table of the SELECT statement since that will accurately describe the source data produced by the Export operator invoked via template $EXPORT.

Basing the generated schema on the value of the SelectStmt attribute works for producer templates $EXPORT and $SELECTOR, since both of their underlying operators require this attribute.

Example 3

In this example, Teradata PT determines the identity of the target table of a consumer operator in the job step and infers that a schema based on this table will be the basis for generating a schema for the producer template(s) in the job step:

   STEP INSERT_MONTHLY_SHIPMENTS
   (
     APPLY  <DML statement(s)>
     
     TO OPERATOR
     (
       $LOAD() 
       ATTRIBUTES
       (
         PrivateLogName = 'monthly_ship.log',
         TargetTable    = 'Monthly_Shipments'
       )
     )
    
     SELECT * FROM OPERATOR( $FILE_READER() );
   );   

In the above example, Teradata PT can determine, from the TargetTable attribute of the $LOAD operator template, that the target for data to be loaded in this job step is Teradata Database table Monthly_Shipments.

In the absence of any script specification of the schema from the producer template, and the absence of any script indicators of a possible basis for generating this source schema, Teradata PT makes the assumption that source data will be loaded unchanged into the target table and thus that the target Teradata Database table can be inferred to be the basis for generating the source schema for the producer template, $FILE_READER.

Since this assumption may not be correct in all cases in which only the target table is known to Teradata PT, some inferred schemas may not accurately describe the source data from a producer template and the resulting schema mismatch will be detected by the Teradata PT and cause the job to fail. This is a limitation to the inferred schema feature: for the schema for the producer template to be correctly inferred from a Teradata Database target table, the source data must move unchanged through Teradata PT to the target. To be more specific: target columns may not be a projection of source columns nor include any derived columns. Only SELECT * can describe the rows from the producer template operator in order for Teradata PT to be able to infer the correct schema for that producer template.

Teradata PT will not attempt to infer a schema for a producer template from a target table when the job step contains multiple target tables, such as syntax that employs the multiple APPLY feature, or when the step consumer operator is the Update operator (or $UPDATE template) and it is going to update more than one target table.

Job steps with no basis for Teradata PT to infer a schema for a producer template, as well as job steps that result in schema mismatch errors when Teradata PT infers an incorrect schema for a template can always be remedied through the use of a set of special job variables, as described in the next section.