HASH BY Clause

Purpose

Hashes rows across the AMPs in preparation to be input to a table function or table operator.

Syntax

where:

Syntax element …	Specifies …
column_name	the column name set by which rows are to be partitioned across the AMPs before being input to a table function or table operator.

ANSI Compliance

HASH BY is a Teradata extension to the ANSI SQL:2011 standard.

Rules and Restrictions

The rules and restrictions are:

You can only specify a HASH BY clause for a statement whose result is to be input to a table function.

HASH BY must be specified as part of a FROM clause (see “FROM Clause” on page 96).

The scope of input to the HASH BY clause is limited to:

Derived tables

Views

WITH clause objects

You cannot specify a HASH BY clause with a derived table, view, or WITH clause object that specifies set operations.

You cannot specify more than one HASH BY clause per statement.

You can specify a HASH BY clause by itself or with a LOCAL ORDER BY clause. If you specify both, the HASH BY clause must precede the LOCAL ORDER BY clause.

When you use a multiple input table operator that has multiple hash by clauses, the following restrictions apply:

All columns must have the same number of partitioning attributes.

The corresponding attributes must be the same type or must be types that are compatible so that they can be implicitly cast.

If you specify a LOCAL ORDER BY clause with HASH BY input, the following is required:

All of the ON clauses must have the same number of LOCAL ORDER BY columns.

The data types of the columns must be the same or matched using an implicit cast.

Example : Hash Ordering Input Parameters to a Table Function

This example shows the use of the HASH BY clause for a table UDF. The table function add2int takes two integer values as input and returns both of them and their sum.

The query in this example selects all columns from add2int, which specifies that its input, dt, is hashed by dt.y1. The specified hashing might not be relevant to the add2int function and is only used for illustration.

The expected processing of the query is that dt is first spooled and then hashed by y1 among the AMPs. The final hashed spool is used as the input to add2int.

     CREATE TABLE t1 (

       a1 INTEGER,

       b1 INTEGER);

     CREATE TABLE t2 (

       a2 INTEGER,

       b2 INTEGER);

     CREATE FUNCTION add2int (

       a INTEGER,

       b INTEGER)

     RETURNS TABLE (

       addend1 INTEGER,

       addend2 INTEGER,

       mysum INTEGER)

     SPECIFIC add2int

     LANGUAGE C

     NO SQL

     PARAMETER STYLE SQL

     NOT DETERMINISTIC

     CALLED ON NULL INPUT

     EXTERNAL NAME 'CS!add3int!add2int.c';

     WITH dt(x1,y1) AS (SELECT a1,b1

                        FROM t1)

     SELECT *

     FROM TABLE (add2int(dt.x1,dt.y1)

     HASH BY y1) AS aa;

Example 2: HASH BY and LOCAL ORDER BY Clauses in the Same Statement

This example shows the use of the HASH BY and LOCAL ORDER BY clauses for sorting the input to a table UDF.

Some applications need to enforce the ordering of input to table UDFs. Rather than sorting the input arguments in the application or table UDF, you can specify the HASH BY and LOCAL ORDER BY clauses when you invoke the table UDF in the FROM clause of a SELECT request. The scope of input includes derived tables, views, and WITH objects.

Consider this table definition:

     CREATE TABLE tempdata (

       tid INTEGER,

       tts TIMESTAMP,

       x   INTEGER,

       y   INTEGER);

Suppose the data in tempdata looks like this:

`tid` `---`	`tts` `---`	`x` `-`	`y` `-`
`1001`	`'2008-02-03 14:33:15'`	`10`	`11`
`1001`	`'2008-02-03 14:44:20'`	`20`	`24`
`1001`	`'2008-02-03 14:59:08'`	`31`	`30`
`1002`	`'2008-02-04 11:02:19'`	`10`	`11`
`1002`	`'2008-02-04 11:33:04'`	`22`	`18`
`1002`	`'2008-02-04 11:48:27'`	`29`	`29`

Now consider a table UDF named char_from_rows that produces a text string that represents all of the timestamp, x, and y values in rows that have the same value for tid. Furthermore, the values in the text string are ordered by timestamp. Here is the definition of the table UDF:

   CREATE FUNCTION char_from_rows(tid INTEGER,

                                  tts TIMESTAMP,

                                  x   INTEGER,

                                  y   INTEGER)

   RETURNS TABLE(outID   INTEGER,

                 outCHAR VARCHAR(64000))

   LANGUAGE C

   NO SQL

   EXTERNAL NAME 'CS!charfromrows!udfsrc/charfromrows.c'

   PARAMETER STYLE SQL;

The following statement invokes the char_from_rows table UDF, using a nonrecursive WITH clause to hash the input by tid and value order the input on each AMP by tts:

   WITH wq (tID1, tTS1, x1, y1) AS

      (SELECT tID, tTS, x, y

       FROM tempData)

   SELECT *

   FROM TABLE (char_from_rows(wq.tID1, wq.tTS1, wq.x1, wq.y1)

   HASH BY tID1 LOCAL ORDER BY tTS1) AS tudf;

The output looks like this:

   outID outCHAR

   ----- ---------------------------------------------------------------------

   1001  2008-02-03 14:33:1510112008-02-03 14:44:2020242008-02-03 14:59:083130

   1002  2008-02-04 11:02:1910112008-02-04 11:33:0422182008-02-04 11:48:272929

For More Information

For more information about the use of the HASH BY clause, see:

“FROM Clause” on page 96

“LOCAL ORDER BY Clause” on page 113

“CREATE FUNCTION (Table Form)” in SQL Data Definition Language