| |
- width_bucket(column_expression, lower_bound, upper_bound, partition_count)
- DESCRIPTION:
Function returns the number of the partition to which column_expression
is assigned.
Following rules apply to width_bucket:
* If any argument is null, then the result is also null.
* If lower_bound < upper_bound, then following rules apply:
* If column_expression < lower_bound, then 0 returned.
* If column_expression >= upper_bound, the partition_count + 1 is returned.
If the result cannot be represented by the data type specified for the
result, then an error is returned.
* Else the greatest exact numeric value with scale 0 that is less than or
equal to the following expression is returned.
((partition_count)(column_expression - lower_bound)/(upper_bound - lower_bound)) + 1
* If lower_bound > upper_bound, then following rules apply:
* If column_expression > lower_bound, then 0 returned.
* If column_expression <= upper_bound, the partition_count + 1 is returned.
If the result cannot be represented by the data type specified for the
result, then an error is returned.
* Else the least exact numeric value with scale 0 that is less than or equal
to the following expression is returned.
((partition_count)(lower_bound - column_expression)/(lower_bound - upper_bound)) + 1
* Error is reported in following cases:
* If partition_count <= 0 or if partition_count > 2147483646
* If lower_bound = upper_bound
PARAMETERS:
column_expression:
Required Argument.
Specifies a ColumnExpression of a column for which a partition number is to be returned.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
lower_bound:
Required Argument.
Specifies the lower boundary for the range of values to be partitioned equally.
Types: float or int
upper_bound:
Required Argument.
Specifies the upper boundary for the range of values to be partitioned equally.
Types: float or int
partition_count:
Required Argument.
Specified the number of partitions to be created. This value also specifies
the width of the partitions by default. The number of partitions created is
partition_count + 2. Partition 0 and partition partition_count + 1 account
for values of column_expression that are outside the lower and upper boundaries.
Types: float or int
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Import func from sqlalchemy to execute sign() function.
>>> from sqlalchemy import func
# Create a sqlalchemy Function object and pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(bucket_gpa_ = func.Width_bucket(admissions_train.gpa.expression, 2.5, 3.5, 3))
>>> print(df)
masters gpa stats programming admitted bucket_gpa_
id
22 yes 3.46 Novice Beginner 0 3
36 no 3.00 Advanced Novice 0 2
15 yes 4.00 Advanced Advanced 1 4
38 yes 2.65 Advanced Beginner 1 1
5 no 3.44 Novice Novice 0 3
17 no 3.83 Advanced Advanced 1 4
34 yes 3.85 Advanced Beginner 0 4
13 no 4.00 Advanced Novice 1 4
26 yes 3.57 Advanced Advanced 1 4
19 yes 1.98 Advanced Advanced 0 0
>>>
|