Example 1: Create a UDF to add the data in column 'Jan' with column 'Feb' and store result in Integer type column
>>> from teradatasqlalchemy.types import INTEGER >>> from teradataml.dataframe.functions import udf >>> @udf(returns=INTEGER()) ... def sum(x, y): ... if not x: ... x = 0 ... return x + y >>>
>>> # Assign the Column Expression returned by user defined function >>> # to the DataFrame. >>> res = df.assign(len_sum = sum('Jan', 'Feb')) >>> res
Feb Jan Mar Apr datetime len_sum accounts Alpha Co 210.0 200.0 215.0 250.0 17/01/04 410 Blue Inc 90.0 50.0 95.0 101.0 17/01/04 140 Yellow Inc 90.0 NaN NaN NaN 17/01/04 90 Jones LLC 200.0 150.0 140.0 180.0 17/01/04 350 Orange Inc 210.0 NaN NaN 250.0 17/01/04 210 Red Inc 200.0 150.0 140.0 NaN 17/01/04 350
Example 2: Create a function to get the values in 'accounts' to upper case and pass it to udf() as parameter to UDF
>>> from teradataml.dataframe.functions import udf >>> def to_upper(s): ... if s is not None: ... return s.upper() >>> upper_case = udf(to_upper) >>>
>>> # Assign the Column Expression returned by user defined function >>> # to the DataFrame. >>> res = df.assign(upper_stats = upper_case('accounts')) >>> res
Feb Jan Mar Apr datetime upper_stats accounts Alpha Co 210.0 200.0 215.0 250.0 17/01/04 ALPHA CO Blue Inc 90.0 50.0 95.0 101.0 17/01/04 BLUE INC Yellow Inc 90.0 NaN NaN NaN 17/01/04 YELLOW INC Jones LLC 200.0 150.0 140.0 180.0 17/01/04 JONES LLC Orange Inc 210.0 NaN NaN 250.0 17/01/04 ORANGE INC
Example 3: Create a UDF to add 4 to the 'datetime' column and store the result in DATE type column
While working on date and time data types one must format these to supported formats. See Requisite Input and Output Structures in Open Analytics Framework for more details.
>>> from teradataml.dataframe.functions import udf >>> from teradatasqlalchemy.types import DATE >>> @udf(returns=DATE()) ... def add_date(x, y): ... import datetime ... return (datetime.datetime.strptime(x, "%y/%m/%d")+datetime.timedelta(y)).strftime("%y/%m/%d") >>>
>>> # Assign the Column Expression returned by user defined function >>> # to the DataFrame. >>> res = df.assign(new_date = add_date('datetime', 4)) >>> res
Feb Jan Mar Apr datetime new_date accounts Alpha Co 210.0 200.0 215.0 250.0 17/01/04 17/01/08 Blue Inc 90.0 50.0 95.0 101.0 17/01/04 17/01/08 Jones LLC 200.0 150.0 140.0 180.0 17/01/04 17/01/08 Orange Inc 210.0 NaN NaN 250.0 17/01/04 17/01/08 Yellow Inc 90.0 NaN NaN NaN 17/01/04 17/01/08 Red Inc 200.0 150.0 140.0 NaN 17/01/04 17/01/08
Example 4: Create a user defined function 'to_upper' to get values in 'accounts' column to upper case using a UDF that runs on non default environment
Create a Python 3.10.5 environment with given name and description in Analytics Database.
>>> env = create_env('test_udf', 'python_3.10.', 'Test environment for UDF') User environment 'test_udf' created. >>>
>>> # Create a user defined functions to 'to_upper' to get the values in upper case >>> # and pass the user env to run it on. >>> from teradataml.dataframe.functions import udf >>> @udf(env_name = env) ... def to_upper(s): ... if s is not None: ... return s.upper() >>>
>>> # Assign the Column Expression returned by user defined function >>> # to the DataFrame. >>> df.assign(upper_stats = to_upper('accounts'))
Feb Jan Mar Apr datetime upper_stats accounts Alpha Co 210.0 200.0 215.0 250.0 17/01/04 ALPHA CO Blue Inc 90.0 50.0 95.0 101.0 17/01/04 BLUE INC Yellow Inc 90.0 NaN NaN NaN 17/01/04 YELLOW INC Jones LLC 200.0 150.0 140.0 180.0 17/01/04 JONES LLC Orange Inc 210.0 NaN NaN 250.0 17/01/04 ORANGE INC Red Inc 200.0 150.0 140.0 NaN 17/01/04 RED INC
Example 5: Create a UDF with required functions inside the UDF itself
Define a function 'inner_add_date' inside the UDF to create a date object by passing year, month, and day and add 1 to that date. Call this function inside the UDF.
>>> from teradataml.dataframe.functions import udf >>> @udf ... def add_date(y,m,d): ... import datetime ... def inner_add_date(y,m,d): ... return datetime.date(y,m,d) + datetime.timedelta(1) ... return inner_add_date(y,m,d)
>>> # Assign the Column Expression returned by user defined function >>> # to the DataFrame. >>> res = df.assign(new_date = add_date(2021, 10, 5)) >>> res
Feb Jan Mar Apr datetime new_date accounts Jones LLC 200.0 150.0 140.0 180.0 17/01/04 2021-10-06 Blue Inc 90.0 50.0 95.0 101.0 17/01/04 2021-10-06 Yellow Inc 90.0 NaN NaN NaN 17/01/04 2021-10-06 Orange Inc 210.0 NaN NaN 250.0 17/01/04 2021-10-06 Alpha Co 210.0 200.0 215.0 250.0 17/01/04 2021-10-06 Red Inc 200.0 150.0 140.0 NaN 17/01/04 2021-10-06