Scenario 1: Pre-Production Data Loading - Aster Client

Teradata Aster® Client Guide

Product
Aster Client
Release Number
7.00
Published
May 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
hki1475000360386.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-2005
lifecycle
previous
Product Category
Software
To validate your loading process and data before deploying it to your production system, it is an accepted industry best practice to load to a pre-production or testing system. You can perform test data loading using the standard Aster Database command-line, bulk loading tool, ncluster_loader. For a complete overview of the tool’s command-line options refer to Argument Flags.

The tool’s error logging features are useful for debugging your loading process. If there is a possibility that your input data contains malformed rows, you should consider using error logging for bulk load operations to accomplish the following goals:

  1. Make sure that even in the presence of malformed rows a given load operation succeeds.

    This can be accomplished by enabling error logging but not setting an error logging limit. (Set --el-enabled but do not set an --el-limit.) If you are not interested in what errors are present, malformed input rows can be discarded so that they are not stored in the cluster (--el-enabled --el-discard-errors).

    Tip: A UNIQUE or PRIMARY KEY violation in the data being loaded will always cause the load to abort. So if the table you are loading into contains these constraints and your load is failing, check the data you are loading to ensure it complies.
  2. Store malformed rows into a separate table for later inspection.

    You should specify a custom error logging table to store malformed rows. This is done via the appropriate option (--el-enabled --el-table = 'my_errorlogging_table'). If the error logging feature of Aster Database is turned on without using --el-table to specify an error logging table, malformed rows are stored in the default system error logging tables (nc_errortable_part or nc_errortable_repl). This is undesirable because the rows from different loads will be mixed together in the default tables. Note that any custom error logging table has to inherit from the default system error logging table.

    Customer error logging tables are not supported on a system running Aster Execution Engine. However, to identify malformed rows for a specific load job, a label can be used. See the step called "Label malformed rows in the error logging table" below.
  3. Abort data load operation in the presence of too many malformed rows.

    This is in particularly useful if you want a given load operation to abort if too many malformed rows are present in the input data (--el-enabled --el-limit = 100). In order to preserve atomicity for bulk load operations, the load operation fails as a transaction when the error limit is reached. When the operation fails, any rows already written by the transaction to the target table and error logging table are deleted.

  4. Label malformed rows in the error logging table.

    If multiple operations are loading data into Aster Database at the same time (for example, the pre-production system is testing integration of two separate data sources), you can label each load operation to identify which rows belong to which data source (--el-enabled --el-label = 'my_data_source').

    The following types of errors in input data files can be detected:

    1. Malformed datatypes (for example, text value for an integer column);
    2. Missing column values (for example, the input data file provides data values for the first two columns, but not for the third one);
    3. Additional column values (for example, the input data file contains a row with more values than there are columns in the destination table);
    4. Check constraint violations (for example, the integer value of an imported data field exceeds the range allowed by the check constraint on the target table);
    5. Character set conversion errors (for example, input data file contains invalid UTF-8 characters);
    6. Missing or invalid distribution key (for example, for your distribution key, you have specified a column that has a distribution-key-disallowed datatype).