Aster Database SQL-MapReduce

Aster Database SQL-MapReduce - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product

Aster Analytics

Release Number

6.21

Published

November 2016

Language

English (United States)

Last Update

2018-04-14

dita:mapPath

kiu1466024880662.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1021

lifecycle

Product Category

Software

The Aster Database In-Database MapReduce framework, SQL-MapReduce, lets you write functions in Java or C, save these functions in the cluster, and allow analysts to run them in a parallel fashion on Aster Database for efficient data analysis.

Analysts invoke a SQL-MapReduce function in a SELECT query and receive function output as if the function were a table. A SQL-MapReduce function takes as input one or more sets of rows from tables or views (for example, the contents of a table in the database, the output of a SQL SELECT statement, or the output of another SQL-MapReduce function) and produces a set of rows as output.

SQL-MapReduce functions can accept multiple inputs. For more information, see SQL-MapReduce with Multiple Inputs.

Because a call to a SQL-MapReduce function results in a set of parallel tasks being run across the cluster, the input data provided to a SQL-MapReduce function must be divided across the parallel tasks.

SQL-MapReduce Function Types

The following table summarizes the SQL-MapReduce function types.

SQL-MapReduce Function Types
Function Type	How Function Takes Input	Aster Database SQL-MapReduce API Interface for Function	SQL Statement That Calls Function Includes	Further Description
Single-input row	One row at a time, in any order.	RowFunction	ON input_table	Operates on individual rows. Corresponds to a map function in traditional map-reduce systems.
Single-input partition	One partition at a time. In a partition, rows are grouped by a specified key of one or more columns.	PartitionFunction	ON input_table PARTITION BY attributes	Operates on rows that share a partition. Has simultaneous access to all rows in a partition, enabling more complex processing than possible with row-wise input. Within each partition, you can sort rows with an ORDER BY clause. Corresponds to a reduce function in traditional map-reduce systems.
Multiple-input	From multiple sources. Inputs can include a cogroup operation in which inputs from multiple sources are partitioned and combined before being processed, a dimension operation where all rows of one or more inputs are replicated to each vworker, or a combination of both.	MultipleInputFunction	A combination of the following, to specify each input and how to distribute its rows: ON input_table PARTITION BY attributes for each input where rows are to be partitioned among vworkers using the specified columns ON input_table PARTITION BY ANY for each input where rows can be processed where stored when the function was called ON input_table DIMENSION for each input where all rows are to be replicated to all vworkers	See Rules for Number of Inputs by Type.

Summary

A SQL-MapReduce function:

Uses the Aster Database API (which supports the languages Java and C).
Is compiled outside the database.
Is installed (uploaded to the cluster) using Aster Database ACT.
Is invoked with a SQL statement, whose ON clauses specify inputs.
Receives, as input, rows of one or more database tables or views, pre-existing trained models, or the result of another SQL-MapReduce function.
Receives, as arguments, zero or more argument clauses (parameters), which can modify function behavior.
Returns output rows to the database.
Is polymorphic.
During initialization, the function gets its input schema (for example, (key, value)) and instructions for returning its output schema.
Is designed to run on an MPP system by allowing the user to specify the slice of the data (partition) that a particular instance of the function can access.