Teradata Package for Python Function Reference | 17.10 - Binning - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.analytics.Transformations.Binning.__init__ = __init__(self, columns, style='bins', value=10, lbound=None, ubound=None, out_columns=None, datatype=None, fillna=None, **kwargs)
- DESCRIPTION:
Binning allows user to perform bin coding to replace continuous numeric
column with a categorical one to produce ordinal values (for example,
numeric categorical values where order is meaningful). Binning uses the
same techniques used in Histogram analysis, allowing you to choose between:
1. equal-width bins,
2. equal-width bins with a user-specified minimum and maximum range,
3. bins with a user-specified width,
4. evenly distributed bins, or
5. bins with user-specified boundaries.
If the minimum and maximum are specified, all values less than the minimum
are put into bin 0, while all values greater than the maximum are put into
bin N+1. The same is true when the boundary option is specified.
Bin Coding supports numeric and date type columns. If date values are entered,
the keyword DATE must precede the date value, and do not enclose in single
quotes.
Note:
Output of this function is passed to "bins" argument of "Transform"
function from Vantage Analytic Library.
PARAMETERS:
columns:
Required Argument.
Specifies the names of the columns to perform transformation on.
Types: str or list of str
style:
Optional Argument.
Specifies the bin style to use.
Permitted Values:
* "bins":
This style allows user to specify equal-width bins without any
boundaries. Argument "values" must be used when this style of
binning is used.
* "binswithboundaries":
This style allows user to specify equal-width bins with minimum
and maximum range. Arguments "values", "lbound" and "ubound" must
be used when this style of binning is used.
All values less than the minimum are put into bin 0, while all
values greater than the maximum are put into bin N+1.
* "boundaries":
This style allows user to specify bins with boundaries.
To specify boundaries one should use keyword arguments as:
b1 --> To specify first boundary.
b2 --> To specify second boundary.
b3 --> To specify third boundary.
...
bN --> To specify Nth boundary.
All values less than the first boundary value are put into bin 0,
while all values greater than the last boundary value are put into
bin N+1.
See "kwargs" description below for more details on how these
arguments must be used.
* "quantiles":
This style allows user to specify evenly-distributed bins.
Argument "values" must be used when this style of binning is used.
* "width":
This style allows user to specify bins with widths. Argument
"values" must be used when this style of binning is used.
Default Value: 'bins'
Types: str
value:
Optional Argument.
Specifies the value to be used for bin code transformations.
When bin style is:
* 'bins' or 'binswithboundaries' argument specifies the number of bins.
* 'quantiles' argument specifies the number of quantiles.
* 'width' argument specifies the bin width.
Note:
Ignored when style is 'boundaries'.
Default Value: 10
Types: int
lbound:
Optional Argument.
Specifies the minimum boundary value for 'binswithboundaries' style.
Notes:
1. Ignored when style is not 'binswithboundaries'.
2. If date values are entered as string, the keyword 'DATE' must precede
the date value, and do not enclose in single quotes OR
pass a datetime.date object.
For example,
value='DATE 1987-06-09'
value=date(1987, 6, 9)
Types: int, float, str, datetime.date
ubound:
Optional Argument.
Specifies the maximum boundary value for 'binswithboundaries' style.
Notes:
1. Ignored when style is not 'binswithboundaries'.
2. If date values are entered as string, the keyword 'DATE' must precede
the date value, and do not enclose in single quotes OR
pass a datetime.date object.
For example,
value='DATE 1987-06-09'
value=date(1987, 6, 9)
Types: int, float, str, datetime.date
out_columns:
Optional Argument.
Specifies the names of the output columns.
Note:
Number of elements in "columns" and "out_columns" must be same.
Types: str or list of str
datatype:
Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using either the
teradatasqlalchemy types or the permitted strings mentioned below:
-------------------------------------------------------------------
| If intended SQL Data Type is | Permitted Value to be passed is |
|-------------------------------------------------------------------|
| bigint | bigint |
| byteint | byteint |
| char(n) | char,n |
| date | date |
| decimal(m,n) | decimal,m,n |
| float | float |
| integer | integer |
| number(*) | number |
| number(n) | number,n |
| number(*,n) | number,*,n |
| number(n,n) | number,n,n |
| smallint | smallint |
| time(p) | time,p |
| timestamp(p) | timestamp,p |
| varchar(n) | varchar,n |
--------------------------------------------------------------------
Notes:
1. Argument is ignored if "columns" argument is not used.
2. char without a size is not supported.
3. number(*) does not include the * in its datatype format.
Examples:
1. If intended datatype for the output column is "bigint", then
pass string "bigint" to the argument as shown below:
datatype="bigint"
2. If intended datatype for the output column is "decimal(3,5)", then
pass string "decimal,3,5" to the argument as shown below:
datatype="decimal,3,5"
Types: str, BIGINT, BYTEINT, CHAR, DATE, DECIMAL, FLOAT, INTEGER, NUMBER, SMALLINT, TIME,
TIMESTAMP, VARCHAR.
fillna:
Optional Argument.
Specifies whether the null replacement/missing value treatment should
be performed with binning or not. Output of FillNa() can be passed to
this argument.
Note:
If the FillNa object is created with its arguments "columns",
"out_columns" and "datatype", then values passed in FillNa() arguments
are ignored. Only nullstyle information is captured from the same.
Types: FillNa
kwargs:
Specifies the keyword arguments to provide the boundaries required
for binning with bin style 'boundaries'.
To specify boundaries one should use keyword arguments as:
b1 --> To specify first boundary.
b2 --> To specify second boundary.
b3 --> To specify third boundary.
...
bN --> To specify Nth boundary.
Notes:
1. When keyword arguments are used, make sure to specify boundaries
in sequence, i.e., b1, b2, b3, ...
In case a sequential keyword argument is missing an error is
raised.
2. Keyword arguments specified for the boundaries must start with 'b'.
3. First boundary must always be specified with "b1" argument.
Types: int, float, str, datetime.date
RETURNS:
An instance of Binning class.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLE:
# Note:
# To run any transformation, user needs to use Transform() function from
# Vantage Analytic Library.
# To do so import valib first and set the "val_install_location".
>>> from teradataml import configure, DataFrame, Binning, FillNa, load_example_data, valib
>>> configure.val_install_location = "SYSLIB"
>>>
# Load example data.
>>> load_example_data("movavg", "ibm_stock")
>>>
# Create the required teradataml DataFrame.
>>> ibm_stock = DataFrame.from_table("ibm_stock")
>>> ibm_stock
name period stockprice
id
183 ibm 62/02/07 552.0
202 ibm 62/03/07 548.0
181 ibm 62/02/05 551.0
242 ibm 62/05/02 482.0
364 ibm 62/10/25 331.0
221 ibm 62/04/03 513.0
38 ibm 61/07/11 473.0
366 ibm 62/10/29 352.0
326 ibm 62/08/30 387.0
61 ibm 61/08/11 497.0
>>>
# Example 1: Binning is carried out with "bins" style, i.e. equal-width
# binning, with 5 number of bins. Null replacement is also combined
# with binning.
# "key_columns" argument must be used with Transform() function,
# when null replacement is being done.
>>> fn = FillNa(style="literal", value=0)
>>> bins = Binning(style="bins", value=5, columns="stockprice", fillna=fn)
# Execute Transform() function.
>>> obj = valib.Transform(data=ibm_stock,
... bins=bins,
... key_columns="id")
>>> obj.result
id stockprice
0 263 1
1 324 2
2 303 2
3 99 5
4 36 3
5 97 5
6 160 5
7 59 4
8 19 4
9 122 5
>>>
# Example 2: Binning is carried out with multiple styles.
# 'binswithboundaries' style:
# Equal-width bins with a user-specified minimum and maximum range on 'period'
# column. Resultant output return the value with the same column name. Number
# of bins created are 5.
>>> bins_1 = Binning(style="binswithboundaries",
... value=5,
... lbound="DATE 1962-01-01",
... ubound="DATE 1962-06-01",
... columns="period")
>>>
# 'boundaries' style:
# Bins created with user specified boundaries on 'period' column. Resultant
# column is names as 'period2'. Three boundaries are specified with arguments
# "b1", "b2" and "b3". When using this style, keyword argument names must
# start with 'b' and they should be in sequence b1, b2, ..., bN.
>>> bins_2 = Binning(style="boundaries",
... b1="DATE 1962-01-01",
... b2="DATE 1962-06-01",
... b3="DATE 1962-12-31",
... columns="period",
... out_columns="period2")
>>>
# Execute Transform() function.
>>> obj = valib.Transform(data=ibm_stock,
... bins=[bins_1, bins_2])
>>> obj.result
id period period2
0 223 4 1
1 345 6 2
2 120 0 0
3 343 6 2
4 57 0 0
5 118 0 0
6 200 3 1
7 80 0 0
8 162 1 1
9 40 0 0
>>>
# Example 3: Binning is carried out with multiple styles 'quantiles' and
# 'width'.
# 'quantiles' style :
# Evenly distributed bins on 'stockprice' column. Resultant output returns
# the column with name 'stockprice_q'. Number of quantiles considered here
# are 4.
>>> bins_1 = Binning(style="quantiles",
... value=4,
... out_columns="stockprice_q",
... columns="stockprice")
>>>
# 'width' style :
# Bins with user specified width on 'stockprice' column. Resultant output
# returns the column with name 'stockprice_w'. Width considered for binning
# is 5.
>>> bins_2 = Binning(style="width",
... value=5,
... out_columns="stockprice_w",
... columns="stockprice")
>>>
# Execute Transform() function.
>>> obj = valib.Transform(data=ibm_stock,
... bins=[bins_1, bins_2])
>>> obj.result
id stockprice_q stockprice_w
0 183 4 50
1 202 3 49
2 181 4 50
3 242 2 36
4 364 1 6
5 221 3 42
6 38 2 34
7 366 1 10
8 326 1 17
9 61 3 39
>>>