HASH BY Clause - Teradata Database

SQL Data Manipulation Language

Product
Teradata Database
Release Number
15.00
Language
English (United States)
Last Update
2018-09-28
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata® Database

HASH BY Clause

Purpose  

Hashes rows across the AMPs in preparation to be input to a table function or table operator.

Syntax  

where:

 

Syntax element …

Specifies …

column_name

the column name set by which rows are to be partitioned across the AMPs before being input to a table function or table operator.

ANSI Compliance

HASH BY is a Teradata extension to the ANSI SQL:2011 standard.

Rules and Restrictions

The rules and restrictions are:

  • You can only specify a HASH BY clause for a statement whose result is to be input to a table function.
  • HASH BY must be specified as part of a FROM clause (see “FROM Clause” on page 96).
  • The scope of input to the HASH BY clause is limited to:
  • Derived tables
  • Views
  • WITH clause objects
  • You cannot specify a HASH BY clause with a derived table, view, or WITH clause object that specifies set operations.
  • You cannot specify more than one HASH BY clause per statement.
  • You can specify a HASH BY clause by itself or with a LOCAL ORDER BY clause. If you specify both, the HASH BY clause must precede the LOCAL ORDER BY clause.
  • When you use a multiple input table operator that has multiple hash by clauses, the following restrictions apply:
  • All columns must have the same number of partitioning attributes.
  • The corresponding attributes must be the same type or must be types that are compatible so that they can be implicitly cast.
  • If you specify a LOCAL ORDER BY clause with HASH BY input, the following is required:
  • All of the ON clauses must have the same number of LOCAL ORDER BY columns.
  • The data types of the columns must be the same or matched using an implicit cast.
  • Example : Hash Ordering Input Parameters to a Table Function

    This example shows the use of the HASH BY clause for a table UDF. The table function add2int takes two integer values as input and returns both of them and their sum.

    The query in this example selects all columns from add2int, which specifies that its input, dt, is hashed by dt.y1. The specified hashing might not be relevant to the add2int function and is only used for illustration.

    The expected processing of the query is that dt is first spooled and then hashed by y1 among the AMPs. The final hashed spool is used as the input to add2int.

         CREATE TABLE t1 (
           a1 INTEGER,
           b1 INTEGER);
     
         CREATE TABLE t2 (
           a2 INTEGER,
           b2 INTEGER);
     
         CREATE FUNCTION add2int (
           a INTEGER, 
           b INTEGER)
         RETURNS TABLE (
           addend1 INTEGER, 
           addend2 INTEGER, 
           mysum INTEGER)
         SPECIFIC add2int  
         LANGUAGE C  
         NO SQL  
         PARAMETER STYLE SQL
         NOT DETERMINISTIC
         CALLED ON NULL INPUT
         EXTERNAL NAME 'CS!add3int!add2int.c';
     
         WITH dt(x1,y1) AS (SELECT a1,b1
                            FROM t1)
         SELECT *  
         FROM TABLE (add2int(dt.x1,dt.y1) 
         HASH BY y1) AS aa;

    Example  2: HASH BY and LOCAL ORDER BY Clauses in the Same Statement

    This example shows the use of the HASH BY and LOCAL ORDER BY clauses for sorting the input to a table UDF.

    Some applications need to enforce the ordering of input to table UDFs. Rather than sorting the input arguments in the application or table UDF, you can specify the HASH BY and LOCAL ORDER BY clauses when you invoke the table UDF in the FROM clause of a SELECT request. The scope of input includes derived tables, views, and WITH objects.

    Consider this table definition:

         CREATE TABLE tempdata (
           tid INTEGER, 
           tts TIMESTAMP, 
           x   INTEGER, 
           y   INTEGER);

    Suppose the data in tempdata looks like this:

     

    Now consider a table UDF named char_from_rows that produces a text string that represents all of the timestamp, x, and y values in rows that have the same value for tid. Furthermore, the values in the text string are ordered by timestamp. Here is the definition of the table UDF:

       CREATE FUNCTION char_from_rows(tid INTEGER,
                                      tts TIMESTAMP,
                                      x   INTEGER,
                                      y   INTEGER)
       RETURNS TABLE(outID   INTEGER, 
                     outCHAR VARCHAR(64000))
       LANGUAGE C
       NO SQL
       EXTERNAL NAME 'CS!charfromrows!udfsrc/charfromrows.c'
       PARAMETER STYLE SQL;

    The following statement invokes the char_from_rows table UDF, using a nonrecursive WITH clause to hash the input by tid and value order the input on each AMP by tts:

       WITH wq (tID1, tTS1, x1, y1) AS
          (SELECT tID, tTS, x, y 
           FROM tempData)
       SELECT * 
       FROM TABLE (char_from_rows(wq.tID1, wq.tTS1, wq.x1, wq.y1)
       HASH BY tID1 LOCAL ORDER BY tTS1) AS tudf;

    The output looks like this:

       outID outCHAR
       ----- ---------------------------------------------------------------------
       1001  2008-02-03 14:33:1510112008-02-03 14:44:2020242008-02-03 14:59:083130
       1002  2008-02-04 11:02:1910112008-02-04 11:33:0422182008-02-04 11:48:272929

    For More Information

    For more information about the use of the HASH BY clause, see:

  • “FROM Clause” on page 96
  • “LOCAL ORDER BY Clause” on page 113
  • “CREATE FUNCTION (Table Form)” in SQL Data Definition Language