BTEQ reader (Windows only)
Purpose
BTEQ Reader provides the user the ability to extract metadata from a BTEQ script into CWM/XMI XML. The user may then import that metadata into the CWM_Metamodel of the MDS repository via the XML Bridge.
This functionality is similar to that supplied by various 3rd-party tools, which provide the ability to export their metadata to CWM/XMI XML, and in general, it is provided to help track ETL activities within the enterprise.
The BTEQ Reader utility searches for transformation-type SQL found in a given BTEQ script file, and produces an XML file in CWM XML/XMI format. This file contains the tables and columns from both the “source” and “target” database schema found in the BTEQ script. BTEQ Reader uses only the information found in the BTEQ script, and does not query the Teradata system(s) involved, or the MDS repository. Therefore, the source and target database schema will be whatever it finds in the script, which usually means the tables and columns, but also the database(s) when known. It also extracts relevant BTEQ session directives; for example, tdp, charset, transaction, and sessions.
For the purpose of the BTEQ Reader, transformation SQL has “source” tables/columns and “target” table(s)/column(s), where the source tables/columns are copied (and possibly changed while doing so) to the target table(s).
As of MDS 14.10, two types of transformation SQL were defined: CREATE-SELECT (a.k.a. COPY-CREATE) and INSERT-SELECT.
MDS release 15.00 added MERGE-INTO SQL as the third type of transformation SQL tracked by BTEQ Reader. At this time, all other SQL in the BTEQ script is ignored, with the exception of the “Database <database>” command, which is just used to track the current default database.
Syntax
The actual functionality is provided by a Java JAR file in the METAHOME bin directory, but is more easily accessed by a BAT file in the same directory called bteqrdr. The BAT file takes the same parameters as the JAR file, and since it is in the METAHOME “bin” directory, it can be called from any location on your PC. Note that since the functionality is provided by Java, a JRE must be found in your PATH. The JRE must be version 1.6.0_04 or later.
When the utility is run the resulting XML file is created in the same directory as the BTEQ script file, and has the same name as that file, but with xml as the file extension.
The syntax for the BAT command is:
bteqrdr <bteq-script file-name> [-d <default_DB>] [-q <qualifier>] [-v]
If you experience a stack or heap overflow error with the BTEQ Reader, you can increase the default values by modifying a line in the bteqrdr.bat file, as follows:
java -Xss2048 -Xms512m -Xmx1024m -jar "%METAHOME%bin\metabteqrdr.jar" %1 %2 %3 %4 %5 %6
Where:
Parameter |
Description |
<bteq-script file-name> |
The name (with the path if needed) to the BTEQ script file. This parameter is required. |
-d <default_DB> |
The name of the default database used, but not specified, in the BTEQ script. This parameter is optional. |
-q <qualifier> |
The name of an arbitrary string, which the user can use to aid in searches of the metadata. This parameter is optional. |
-v |
A request to get more output than normally provided. The actual output may change from release to release. This parameter is optional. |
Examples
The following command line example produces an etl_1.xml file in CMW/XMI format:
bteqrdr etl_1.bteq
The following command line example produces an etl_2.xml file in CWM/XMI format, and uses “dbc” as the default database whenever a database is not otherwise specified in the script (such as with a Database command or explicitly with the table name, for example, dbname.tablename:
bteqrdr etl_2.bteq -d dbc
The following command line example produces the same result as the previous example, but with more output to the screen (the XML file is exactly the same):
bteqrdr etl_2.bteq -d dbc -v
The following command line example produces an etl_3.xml file in the c:\Temp Files\bteq directory:
bteqrdr "c:\Temp Files\bteq\etl_3.bteq"
ETL SQL Statements
The following types of SQL provide extract, transform, and load (ETL) functionality, and these are the INSERT-SELECT statement and the COPY-CREATE statement.
An INSERT-SELECT SQL statement is any statement such as:
INSERT [INTO] tbl... SELECT... FROM....
A COPY-CREATE SQL statement is any statement such as:
CREATE TABLE tbl_1 [(...)] AS (SELECT ... FROM tbl_2 ...) [WITH DATA]
A MERGE-INTO SQL statement is any statement such as:
MERGE INTO tbl_1 USING (SELECT * from tbl_2) ON col1=50 WHEN NOT MATCHED THEN INSERT VALUES (10)
Each time one of these statements is encountered in the script, it is seen as a transformation and ultimately, when the XML is imported into the repository, a new transformation object is created (see the information for the CWMTFM_Transformation class in “BTEQ Script Metadata to CWM_Metamodel Mapping” on page 123).
No other type of SQL metadata is sent to the XML file.
BTEQ Directives
Several BTEQ directives, which are considered relevant to the SQL metadata, are saved along with the imported objects. See “BTEQ Script Metadata to CWM_Metamodel Mapping” on page 123 for more information on the CWMTFM_TransformationTask and CWMTFM_Transformation classes, and their associated CWM_TaggedValue classes.
Importing the XML into the MDS Repository
The XML file produced by the bteqrdr can be imported in to the CWM_Metamodel of the MDS repository, using the XML Bridge. This simply means that you will use MetaBrowse (or MetaManager) and select the Tools > Import > XML menu items. Select the desired XML file and click Open. This causes the XML to be transformed into MetaXML, and then call the metaxml utility to import that XML into the repository.
BTEQ Script Metadata to CWM_Metamodel Mapping
This section describes where the metadata, which is extracted from the BTEQ script, goes in the repository.
It is assumed that the user will use the XML Bridge to do the translation/import of the XML generated by BTEQ reader into the MDS repository. While this is not required (the user may do anything they wish with the CMW/XMI XML produced by the BTEQ reader), the main purpose of creating CWM/XMI XML is in order to use the MDS XML Bridge to import the metadata into the CWM_Metamodel classes of the MDS repository.
The following discussion does not mention the actual contents of the XMI XML, but rather focuses on the classes of the repository CWM_Metamodel, which are the targets of the import.
There are four CWM_Metamodel classes used for the import:
Class |
Description |
CWMTFM_TransformationTask class |
The top-level class of the import hierarchy; it has a collection of “transformations” (see CWMTFM_Transformation class). The “root” object created in this class has the same name as the BTEQ script that was read. |
CWMTFM_Transformation class |
There is one object of this class for each transformation in the transformation task (BTEQ script). A “transformation” is any ETL-type SQL statement. The “expression” property of these objects contains the full text of the SQL statement. |
CWMTFM_DataObjectSet class |
Each transformation object can have a collection of source data-set objects and a collection of target data-set objects. The “source” objects typically identify the columns of the table(s) where the data is coming from, and the “target” objects typically identify the columns of the table where the data is going. The name of these objects is normally the column name. |
CWM_TaggedValue class |
Each of the above classes may be extended by objects of this class to identify properties or attributes which are not part of the CWM/XMI standard. |
The Relationships joining these classes follow the usual MDS naming convention for relationships, and are as follows:
Class |
Relationship(s) |
CWMTFM_TransformationTask class |
Associates the transformations with the transformation-task object. Extends the transformation-task object with the following tags: |
CWMTFM_Transformation class |
Associates source DataObjectSet objects with the transformation object. These objects contain information about where the data is coming from; for example, from which columns of what tables. Associates target DataObjectSet objects with the transformation object. These objects contain information about where the data is going; for example, to which columns of what table. Extends the transformation object with the following tags: Only “transType” and “sequenceNumber” are guaranteed to be present; the other tags appear only if they are specified in the BTEQ script. |
CWMTFM_DataObjectSet class |
Extends the data-set object with the following tags: So for this case, this tag value will be the position of the column in the list of columns, starting with 1. Note that for any particular source column, there may be more than one “real” column associated with it, in which case each of the “real” columns will have a separate data-object but with the same “colPos” number. With the exception of the “objType” tag, which is always present, each of these will be included only if their values are known (either explicitly stated or can be inferred). And if the “objType” is not a “table”, the “colPos” will be present, numbered starting from 1. |
Limitations
The following identify the current limitations for the BTEQ reader: