Teradata Package for Python Function Reference | 20.00 - width_bucket - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.sql.DataFrameColumn.width_bucket = width_bucket(min, max, numBucket)
- DESCRIPTION:
Function returns the number of the partition to which values in a column
is assigned.
Following rules apply to width_bucket:
* If any value is null, then the result is also null.
* If "min" < "max", then following rules apply:
* If value in column < "min", then 0 returned.
* If value in column >= "max", the "numBucket" + 1 is returned.
If the result cannot be represented by the data type specified for the
result, then an error is returned.
* Else the greatest exact numeric value with scale 0 that is less than or
equal to the following expression is returned.
(("numBucket")(value in column - "min")/("max" - "min")) + 1
* If "min" > "max", then following rules apply:
* If column_expression > "min", then 0 returned.
* If column_expression <= "max", the "numBucket" + 1 is returned.
If the result cannot be represented by the data type specified for the
result, then an error is returned.
* Else the least exact numeric value with scale 0 that is less than or equal
to the following expression is returned.
(("numBucket")("min" - column_expression)/("min" - "max")) + 1
* Error is reported in following cases:
* If "numBucket" <= 0 or if "numBucket" > 2147483646
* If "min" = "max"
PARAMETERS:
min:
Required Argument.
Specifies the lower boundary for the range of values to be partitioned equally.
Types: float, int
max:
Required Argument.
Specifies the upper boundary for the range of values to be partitioned equally.
Types: float, int
numBucket:
Required Argument.
Specified the number of partitions to be created. This value also specifies
the width of the partitions by default. The number of partitions created is
"numBucket" + 2. Partition 0 and partition "numBucket" + 1 account
for values of column_expression that are outside the lower and upper boundaries.
Types: float, int
RAISES:
TypeError, ValueError, TeradataMlException
RETURNS:
DataFrameColumn
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> df = DataFrame("admissions_train").iloc[:4]
>>> print(df)
masters gpa stats programming admitted
id
3 no 3.70 Novice Beginner 1
4 yes 3.50 Beginner Novice 1
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0
# Example 1: Execute the function and pass it as input to DataFrame.assign().
>>> res = df.assign(col = df.gpa.width_bucket(2.5, 3.5, 3))
>>> print(res)
masters gpa stats programming admitted col
id
3 no 3.70 Novice Beginner 1 4
4 yes 3.50 Beginner Novice 1 4
2 yes 3.76 Beginner Beginner 0 4
1 yes 3.95 Beginner Beginner 0 4
# Example 2: Executed width_bucket() function on "gpa" column and filtered computed
# values which are equal to 4.
>>> print(df[df.gpa.width_bucket(2.5, 3.5, 3) == 4])
masters gpa stats programming admitted
id
3 no 3.70 Novice Beginner 1
4 yes 3.50 Beginner Novice 1
2 yes 3.76 Beginner Beginner 0
1 yes 3.95 Beginner Beginner 0