Example 2: Data Normalization | Examples with DataFrame.apply Method| Open Analytics Framework - Example 2: Normalizing Data - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

In this example, the "data" dictionary contains list of numbers that become the columns X and Y in the uploaded table and the subsequent "normalize_test" teradataml DataFrame.

  1. Define the "data" dictionary.
    data = { 
            'X':[1, 2, 3],  
            'Y':[45, 65, 89] } 
  2. Convert the dictionary into DataFrame.
    df = pd.DataFrame(data) 
    copy_to_sql(df = df, table_name = 'normalize_test', if_exists="replace")
  3. Print the original DataFrame.
    print("Original DataFrame:\n", df)

    Out:

    Original DataFrame:
        X   Y
    0   1  45
    1   2  65
    2   3  89
  4. Create a DataFrame for normalization.
    normalize_test = DataFrame.from_table("normalize_test")
  5. Compose a function "normalize" that performs normalization on specific rows.
    from numpy import asarray
    def normalize(row): 
        x_new = ((row['X'] - np.mean([row['X'], row['Y']])) /
                 (max(row['X'], row['Y']) - min(row['X'], row['Y']))) 
        return asarray([x_new, row['X']])
  6. Call the teradataml.DataFrame.apply method on the "normalize_test" teradataml DataFrame with the function built in previous step. The operation results are stored in the "output" variable.
    output = normalize_test.apply(normalize, env_name=testenv,
                                  returns = OrderedDict([('X_NEW', FLOAT()),
                                            ('Y', INTEGER())]))
  7. Print the normalized data.
    print('\nNormalized:\n', output)

    Out:

    Normalized:
       X_NEW  Y
    0   -0.5  1
    1   -0.5  2
    2   -0.5  3