Example setup
Create a Python 3.10 environment with given name and description in Analytics Database. This example uses the 'admissions_train' dataset, to increase the 'gpa' by a give percentage.
>>> env = create_env('testenv', 'python_3.10', 'Test environment') User environment testenv created. # Packages dill and pandas must be installed in remote user environment. >>> env.install_lib(['pandas','dill']) Request to install libraries initiated successfully in the remote user environment demo_env. Check the status using status() with the claim id 'ef255030-1be2-4d4a-9d47-12cd4365a003'. # Check the status of installation. >>> env.status('ef255030-1be2-4d4a-9d47-12cd4365a003')
Claim Id File/Libs Method Name Stage Timestamp Additional Details 0 ef255030-1be2-4d4a-9d47-12cd4365a003 pandas, dill install_lib Started 2022-08-04T04:27:56Z 1 ef255030-1be2-4d4 a-9d47-12cd4365a003 pandas, dill install_lib Finished 2022-08-04T04:29:12Z >>>
# Load the example data. >>> load_example_data("dataframe", "admissions_train") >>> df = DataFrame('admissions_train') >>> print(df)
masters gpa stats programming admitted id 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 15 yes 4.00 Advanced Advanced 1 38 yes 2.65 Advanced Beginner 1 5 no 3.44 Novice Novice 0 17 no 3.83 Advanced Advanced 1 34 yes 3.85 Advanced Beginner 0 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 19 yes 1.98 Advanced Advanced 0
Example 1: Create the user defined function to increase the 'gpa' by the # percentage provided
Note that the input to and the output from the function is a Pandas Series object.
>>> def increase_gpa(row, p=20): ... row['gpa'] = row['gpa'] + row['gpa'] * p/100 ... return row ... >>>
# Apply the user defined function to the DataFrame. # Note that since the output of the user defined function # expects the same columns with the same types, we can skip # passing the 'returns' argument. >>> increase_gpa_20 = df.apply(increase_gpa, env_name='testenv') >>>
>>> # Print the result. >>> print(increase_gpa_20)
masters gpa stats programming admitted id 22 yes 4.152 Novice Beginner 0 36 no 3.600 Advanced Novice 0 15 yes 4.800 Advanced Advanced 1 38 yes 3.180 Advanced Beginner 1 5 no 4.128 Novice Novice 0 17 no 4.596 Advanced Advanced 1 34 yes 4.620 Advanced Beginner 0 13 no 4.800 Advanced Novice 1 26 yes 4.284 Advanced Advanced 1 19 yes 2.376 Advanced Advanced 0
Example 2: Use the same user defined function with a lambda notation to pass the percentage, 'p = 40'
>>> increase_gpa_40 = df.apply(lambda row: increase_gpa(row, ... p = 40), ... env_name='testenv') >>>
>>> print(increase_gpa_40)
masters gpa stats programming admitted id 22 yes 4.844 Novice Beginner 0 36 no 4.200 Advanced Novice 0 15 yes 5.600 Advanced Advanced 1 38 yes 3.710 Advanced Beginner 1 5 no 4.816 Novice Novice 0 17 no 5.362 Advanced Advanced 1 34 yes 5.390 Advanced Beginner 0 13 no 5.600 Advanced Novice 1 26 yes 4.998 Advanced Advanced 1 19 yes 2.772 Advanced Advanced 0
Example 3: Use the same user defined function with functools.partial to pass the percentage, 'p = 50'
>>> from functools import partial >>> increase_gpa_50 = df.apply(partial(increase_gpa, p = 50), ... env_name='testenv') >>>
>>> print(increase_gpa_50)
masters gpa stats programming admitted id 13 no 6.000 Advanced Novice 1 26 yes 5.355 Advanced Advanced 1 5 no 5.160 Novice Novice 0 19 yes 2.970 Advanced Advanced 0 15 yes 6.000 Advanced Advanced 1 40 yes 5.925 Novice Beginner 0 7 yes 3.495 Novice Novice 1 22 yes 5.190 Novice Beginner 0 36 no 4.500 Advanced Novice 0 38 yes 3.975 Advanced Beginner 1
Example 4: Use a lambda function to double 'gpa', and # return numpy ndarray
>>> from numpy import asarray >>> inc_gpa_lambda = lambda row, p=20: asarray([row['id'], ... row['masters'], ... row['gpa'] + row['gpa'] * p/100, ... row['stats'], ... row['programming'], ... row['admitted']]) >>> increase_gpa_100 = df.apply(lambda row: inc_gpa_lambda(row, ... p=100), ... env_name='testenv') >>>
>>> print(increase_gpa_100)
masters gpa stats programming admitted id 13 no 8.00 Advanced Novice 1 26 yes 7.14 Advanced Advanced 1 5 no 6.88 Novice Novice 0 19 yes 3.96 Advanced Advanced 0 15 yes 8.00 Advanced Advanced 1 40 yes 7.90 Novice Beginner 0 7 yes 4.66 Novice Novice 1 22 yes 6.92 Novice Beginner 0 36 no 6.00 Advanced Novice 0 38 yes 5.30 Advanced Beginner 1