Teradata Package for R Function Reference | 17.00 - 17.00 - tdBinning - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

tdBinning() allows user to perform bin coding to replaces continuous numeric column with a categorical one to produce ordinal values (for example, numeric categorical values where order is meaningful). Binning uses the same techniques used in Histogram analysis, allowing you to choose between:

  1. equal-width bins

  2. equal-width bins with a user-specified minimum and maximum range

  3. bins with a user-specified width

  4. evenly distributed bins

  5. bins with user-specified boundaries

If the minimum and maximum are specified, all values less than the minimum are put into bin 0, while all values greater than the maximum are put into bin N+1. The same is true when the boundary option is specified.

tdBinning() supports numeric and date type columns. If date values are entered, the keyword DATE must precede the date value, and can not be enclosed in single quotes.
Note:

  • Output of this function is passed to "bins" argument of td_transform_valib().

Usage

tdBinning(columns, datatype=NULL, style="bins",
          value=10, lbound=NULL, ubound=NULL,
          fillna=NULL, ...)

Arguments

columns

Required Argument.
Specifies name(s) of column(s) containing the input and output column names, where key is the name of the column to perform transformation on and value contains the name of the transformed output column. When only key is specified then output column name is the name of input column.
Types: character OR list of Strings (character)

datatype

Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using the permitted strings below:

------------------------------------ ---------------------------------------
If intended SQL Data Type is Permitted Value to be passed is
------------------------------------ ---------------------------------------
bigint bigint
byteint byteint
char(n) char,n
date date
decimal(m,n) decimal,m,n
float float
integer integer
number(*) number
number(n) number,n
number(*,n) number,*,n
number(n,n) number,n,n
smallint smallint
time(p) time,p
timestamp(p) timestamp,p
varchar(n) varchar,n

Notes:

  1. Argument is ignored if "columns" argument is not used.

  2. char without a size is not supported.

  3. number(*) does not include the * in its datatype format.

Examples:

  1. If intended datatype for the output column is 'bigint', then pass string 'bigint' to the argument as shown below:
    datatype="bigint"

  2. If intended datatype for the output column is 'decimal(3,5)', then pass string 'decimal,3,5' to the argument as shown below:
    datatype="decimal,3,5"

Types: character

style

Optional Argument.
Specifies the bin style to use.
Permitted Values:

  1. 'bins': This style allows user to specify equal-width bins without any boundaries. Argument "values" must be used when this style of binning is used.

  2. 'binswithboundaries': This style allows user to specify equal-width bins with minimum and maximum range. Arguments "values", "lbound" and "ubound" must be used when this style of binning is used. All values less than the minimum are put in to bin 0, while all values greater than the maximum are put into bin N+1.

  3. 'boundaries': This style allows user to specify bins with boundaries. To specify boundaries one should use keyword arguments as:
    b1 –> To specify first boundary.
    b2 –> To specify second boundary.
    b3 –> To specify third boundary.
    ...
    bN –> To specify Nth boundary.
    All values less than the first boundary value are put into bin 0, while all values greater than the last boundary value are put into bin N+1.
    See "..." description below for more details on how these arguments must be used.

  4. 'quantiles': This style allows user to specify evenly-distributed bins. Argument "values" must be used when this style of binning is used.

  5. 'width': This style allows user to specify bins with widths. Argument "values" must be used when this style of binning is used.

Default Value: 'bins'
Types: character

value

Optional Argument.
Specifies the value to be used for Bincode transformations.
When bin style is:

  1. 'bins' or 'binswithboundaries' argument specifies the number of bins.

  2. 'quantiles' argument specifies the number of quantiles.

  3. 'width' argument specifies the bin width.

Note:

  • Ignored when style is 'boundaries'.

Default Value: 10
Types: integer

lbound

Optional Argument.
Specifies the minimum boundary value for 'binswithboundaries' style.
Notes:

  1. Ignored when style is not 'binswithboundaries'.

  2. If date values are entered, the keyword 'DATE' must precede the date value, and can not be enclosed in single quotes.
    For example,
    value='DATE 1987-06-09'

Types: integer, numeric, character

ubound

Optional Argument.
Specifies the maximum boundary value for 'binswithboundaries' style.
Notes:

  1. Ignored when style is not 'binswithboundaries'.

  2. If date values are entered, the keyword 'DATE' must precede the date value, and do not enclose in single quotes.
    For example,
    value='DATE 1987-06-09'

Types: integer, numeric, character

fillna

Optional Argument.
Specifies whether the null replacement/missing value treatment should be performed with binning or not. Output of tdFillNa() can be passed to this argument.
Note:

  • If the tdFillNa object is created with its arguments "columns", and "datatype", then values passed in tdFillNa() arguments are ignored. Only nullstyle information is captured from the same.

Types: tdFillNa

...

Optional Argument. Required if style is 'boundaries'.
Specifies the keyword arguments to provide the boundaries required for binning with bin style 'boundaries'.
To specify boundaries one should use keyword arguments as:
b1 –> To specify first boundary.
b2 –> To specify second boundary.
b3 –> To specify third boundary.
...
bN –> To specify Nth boundary.
Notes:

  1. When keyword arguments are used, make sure to specify boundaries in sequence, i.e., b1, b2, b3, ..
    In case a sequential keyword argument is missing an error is raised.

  2. Keyword arguments specified for the boundaries must start with 'b'.

  3. First boundary must always be specified with "b1" argument.

Types: integer, numeric, character

Value

An object of tdBinning class.

Examples

Notes:
# 1. To run any transformation, user needs to use td_transform_valib()
#    function.
# 2. To do so set option 'val.install.location' to the database name where
#    Vantage analytic library functions are installed.
# 3. Datasets used in these examples can be loaded using Vantage Analytic
#    Library installer.

# Get the current context/connection
con <- td_get_context()$connection

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Create object(s) of class "tbl_teradata".
ibm_stock <- tbl(con, "ibm_stock")
ibm_stock

# Example 1: Binning is carried out with 'bins' style, i.e. equal-width
#            binning, with 5 number of bins. Null replacement is also
#            combined with binning. "key.columns" argument must be used
#            with td_transform_valib() function, when null
#            replacement is being done.
#            Perform the binning transformation using
#            td_transform_valib() function from Vantage Analytic
#            Library.

# Create tdFillNa object.
fn <- tdFillNa(style="literal", value=0)

# Create tdBinning object.
bins <- tdBinning(style="bins", value=5, columns="stockprice", fillna=fn)

# Perform the binning transformation using td_transform_valib() function.
obj <- td_transform_valib(data=ibm_stock, bins=bins, key.columns="id")
obj$result

# Example 2: Binning is carried out with multiple styles.
#            Perform the binning transformation using
#            td_transform_valib() function from Vantage Analytic
#            Library.
# 'binswithboundaries' style:
#     Equal-width bins with a user-specified minimum and maximum range on
#     'period' column. Resultant output return the value with the same
#     column name. Number of bins created are 5.

# Create tdBinning object.
bins_1 <- tdBinning(style="binswithboundaries", value=5,
                    lbound="DATE 1962-01-01",
                    ubound="DATE 1962-06-01",
                    columns="period")

# 'boundaries' style:
#     Bins created with user specified boundaries on 'period' column.
#     Resultant column is names as 'period2'. Three boundaries are
#     specified with arguments "b1", "b2" and "b3".
#     When using this style, keyword argument names must start with
#     'b' and they should be in sequence b1, b2, ..., bN.

# Create tdBinning object.
bins_2 <- tdBinning(style="boundaries", b1="DATE 1962-01-01",
                    b2="DATE 1962-06-01", b3="DATE 1962-12-31",
                    columns=list("period"="period2"))

# Perform the binning transformation using td_transform_valib() function.
obj <- td_transform_valib(data=ibm_stock, bins=c(bins_1, bins_2))
obj$result

# Example 3: Binning is carried out with multiple styles 'quantiles' and
#            'width'.
#            Perform the binning transformation using
#            td_transform_valib() function from Vantage Analytic
#            Library.
# 'quantiles' style:
#     Evenly distributed bins on 'stockprice' column. Resultant output
#     return the column with name 'stockprice_q'. Number of quantiles
#     considered here are 4.

# Create tdBinning object.
bins_1 <- tdBinning(style="quantiles", value=4,
                    columns=list("stockprice"="stockprice_q"))

# 'width' style:
#     Bins with user specified width on 'stockprice' column.
#     Resultant output returns the column with name 'stockprice_w'.
#     Width considered for binning is 5.

# Create tdBinning object.
bins_2 <- tdBinning(style="width", value=5,
                    columns=list("stockprice"="stockprice_w"))

# Perform the binning transformation using td_transform_valib() function.
obj <- td_transform_valib(data=ibm_stock, bins=c(bins_1, bins_2))
obj$result