| |
- hashrow(expression1, expression2, expressionN)
- DESCRIPTION:
Function returns the hexadecimal row hash value for an expression or sequence of
expressions. If no expression is specified, function returns the maximum hash
code value.
Function is particularly useful for identifying the statistical properties of the
current primary index, or to evaluate these properties for other columns to determine
their suitability as a future primary index. You can also use these statistics to
help minimize hash synonyms and enhance the uniformity of data distribution.
There are a maximum of 4,294,967,295 hash codes available in the system, ranging from
'00000000'XB to 'FFFFFFFF'XB.
PARAMETERS:
expression1:
Optional Argument.
Specifies a ColumnExpression of a column that make up a (potential) index.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
expression2:
Optional Argument.
Specifies a ColumnExpression of a column that make up a (potential) index.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
expressionN:
Optional Argument.
Specifies a ColumnExpression of a column that make up a (potential) index.
N here represents any number, i.e, user is allowed to pass as many expressions
as he or she can.
Format for the argument: '<dataframe>.<dataframe_column>.expression'.
Notes:
1. If no arguments are passed, then function returns the maximum hash code value.
2. If only one is passed and it evaluates to NULL, then function returns '00000000'XB.
3. If all expression evaluates to NULL, then function returns '00000000'XB.
4. If an expression that evaluates to 0, empty string, whitespace, or a similar
value is passed, then function returns '00000000'XB.
5. If a valid, non-NULL expression is passed, then function evaluates expression or
the list of expressions and applies the hash function on the result. Function
returns the resulting row hash value.
6. If multiple expressions are passed, where some expressions can evaluate to NULL,
then function evaluates expression or the list of expressions and applies the hash
function on the result. Function returns the resulting row hash value.
NOTE:
Function accepts positional arguments only.
EXAMPLES:
# Load the data to run the example.
>>> load_example_data("dataframe", "admissions_train")
>>>
# Create a DataFrame on 'admissions_train' table.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
38 yes 2.65 Advanced Beginner 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
26 yes 3.57 Advanced Advanced 1
19 yes 1.98 Advanced Advanced 0
>>>
# Import func from sqlalchemy to execute hashrow() function.
>>> from sqlalchemy import func
# Example 1: Return the maximum hash code value. To do so, execute function without any argument.
# Create a sqlalchemy Function object.
>>> hashrow_func_ = func.hashrow()
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(hashrow_max_=hashrow_func_)
>>> print(df)
masters gpa stats programming admitted hashrow_max_
id
5 no 3.44 Novice Novice 0 b'-1'
34 yes 3.85 Advanced Beginner 0 b'-1'
13 no 4.00 Advanced Novice 1 b'-1'
40 yes 3.95 Novice Beginner 0 b'-1'
22 yes 3.46 Novice Beginner 0 b'-1'
19 yes 1.98 Advanced Advanced 0 b'-1'
36 no 3.00 Advanced Novice 0 b'-1'
15 yes 4.00 Advanced Advanced 1 b'-1'
7 yes 2.33 Novice Novice 1 b'-1'
17 no 3.83 Advanced Advanced 1 b'-1'
>>>
# Example 2: Return the hexadecimal row hash value corresponding to values in "stats" column.
# Create a sqlalchemy Function object.
>>> hashrow_func_ = func.hashrow(admissions_train.stats.expression)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(hashrow_stats_=hashrow_func_)
>>> print(df)
masters gpa stats programming admitted hashrow_stats_
id
5 no 3.44 Novice Novice 0 b'C8D3967'
34 yes 3.85 Advanced Beginner 0 b'-69A5305C'
13 no 4.00 Advanced Novice 1 b'-69A5305C'
40 yes 3.95 Novice Beginner 0 b'C8D3967'
22 yes 3.46 Novice Beginner 0 b'C8D3967'
19 yes 1.98 Advanced Advanced 0 b'-69A5305C'
36 no 3.00 Advanced Novice 0 b'-69A5305C'
15 yes 4.00 Advanced Advanced 1 b'-69A5305C'
7 yes 2.33 Novice Novice 1 b'C8D3967'
17 no 3.83 Advanced Advanced 1 b'-69A5305C'
>>>
# Example 3: Return the hexadecimal row hash value corresponding to all columns in "admissions_train"
# DataFrame.
# Create a sqlalchemy Function object.
>>> hashrow_func_ = func.hashrow(admissions_train.id.expression,
... admissions_train.masters.expression,
... admissions_train.stats.expression,
... admissions_train.programming.expression,
... admissions_train.admitted.expression)
>>>
# Pass the Function object as input to DataFrame.assign().
>>> df = admissions_train.assign(hashrow_all_col_=hashrow_func_)
>>> print(df)
masters gpa stats programming admitted hashrow_all_col_
id
5 no 3.44 Novice Novice 0 b'2EF3CB74'
34 yes 3.85 Advanced Beginner 0 b'-2B8F2EE7'
13 no 4.00 Advanced Novice 1 b'5EDB072F'
40 yes 3.95 Novice Beginner 0 b'-44A7BF48'
22 yes 3.46 Novice Beginner 0 b'11DFF574'
19 yes 1.98 Advanced Advanced 0 b'-2763FD49'
36 no 3.00 Advanced Novice 0 b'-178A3958'
15 yes 4.00 Advanced Advanced 1 b'682DBBCD'
7 yes 2.33 Novice Novice 1 b'4C9EA097'
17 no 3.83 Advanced Advanced 1 b'-3A2234F5'
>>>
|