Aster Database SQL-MapReduce - Aster Analytics

Teradata AsterĀ® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Language
English (United States)
Last Update
2018-04-17
dita:mapPath
uce1497542673292.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1022
lifecycle
previous
Product Category
Software

The Aster Database In-Database MapReduce framework, SQL-MapReduce, lets you write functions in Java or C, save these functions in the cluster, and allow analysts to run them in a parallel fashion on Aster Database for efficient data analysis.

Analysts invoke a SQL-MapReduce function in a SELECT query and receive function output as if the function were a table. A SQL-MapReduce function takes as input one or more sets of rows from tables or views (for example, the contents of a table in the database, the output of a SQL SELECT statement, or the output of another SQL-MapReduce function) and produces a set of rows as output.

SQL-MapReduce functions can accept multiple inputs. For more information, see SQL-MapReduce with Multiple Inputs.

Because a call to a SQL-MapReduce function results in a set of parallel tasks being run across the cluster, the input data provided to a SQL-MapReduce function must be divided across the parallel tasks.

SQL-MapReduce Function Types

The following table summarizes the SQL-MapReduce function types.

SQL-MapReduce Function Types
Function Type How Function Takes Input Aster Database SQL-MapReduce API Interface for Function SQL Statement That Calls Function Includes Further Description
Single-input row One row at a time, in any order. RowFunction ON input_table Operates on individual rows.

Corresponds to a map function in traditional map-reduce systems.

Single-input partition One partition at a time.

In a partition, rows are grouped by a specified key of one or more columns.

PartitionFunction ON input_table PARTITION BY attributes Operates on rows that share a partition.

Has simultaneous access to all rows in a partition, enabling more complex processing than possible with row-wise input. Within each partition, you can sort rows with an ORDER BY clause.

Corresponds to a reduce function in traditional map-reduce systems.

Multiple-input From multiple sources.

Inputs can include a cogroup operation in which inputs from multiple sources are partitioned and combined before being processed, a dimension operation where all rows of one or more inputs are replicated to each vworker, or a combination of both.

MultipleInputFunction A combination of the following, to specify each input and how to distribute its rows:
  • ON input_table PARTITION BY attributes

    for each input where rows are to be partitioned among vworkers using the specified columns

  • ON input_table PARTITION BY ANY

    for each input where rows can be processed where stored when the function was called

  • ON input_table DIMENSION

    for each input where all rows are to be replicated to all vworkers

See Rules for Number of Inputs by Type.

Summary

A SQL-MapReduce function:

  • Uses the Aster Database API (which supports the languages Java and C).
  • Is compiled outside the database.
  • Is installed (uploaded to the cluster) using Aster Database ACT.
  • Is invoked with a SQL statement, whose ON clauses specify inputs.
  • Receives, as input, rows of one or more database tables or views, pre-existing trained models, or the result of another SQL-MapReduce function.
  • Receives, as arguments, zero or more argument clauses (parameters), which can modify function behavior.
  • Returns output rows to the database.
  • Is polymorphic.

    During initialization, the function gets its input schema (for example, (key, value)) and instructions for returning its output schema.

  • Is designed to run on an MPP system by allowing the user to specify the slice of the data (partition) that a particular instance of the function can access.