Teradata Package for Python Function Reference | 17.10 - LabelEncoder - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.Transformations.LabelEncoder.__init__ = __init__(self, values, columns, default=None, out_columns=None, datatype=None, fillna=None): DESCRIPTION: Label encoding a categorical data column is done to re-express existing values of a column (variable) into a new coding scheme or to correct data quality problems and focus an analysis of a particular value. It allows for mapping individual values, NULL values, or any number of remaining values (ELSE option) to a new value, a NULL value or the same value. Label encoding supports character, numeric, and date type columns. Note: Output of this function is passed to "label_encode" argument of "Transform" function from Vantage Analytic Library. PARAMETERS: values: Required Argument. Specifies the values to be label encoded. Values can be specified in two formats: 1. A list of two-tuples, where first value in the tuple is a old value and second value is a new value. For example, values = [(old_val1, new_val2), (old_val2, new_val2)] 2. A dictionary with key as old value and value as new value. For example, values = {old_val1: new_val2, old_val2: new_val2} Note: 1. If date values are entered as string, the keyword 'DATE' must precede the date value, and do not enclose in single quotes OR pass a datetime.date object. For example, value='DATE 1987-06-09' value=date(1987, 6, 9) 2. To keep the old value as is, one can pass 'same' as it's new value. 3. To use NULL values for old or new value, one can either use string 'null' or None. Types: two-tuple, list of two-tuples, dict columns: Required Argument. Specifies the names of the columns containing values to be label encoded. Types: str or list of str default: Optional Argument. Specifies the value assumed for all other cases. Permitted Values: None, 'SAME', 'NULL', a literal Default Value: None Types: bool, float, int, str out_columns: Optional Argument. Specifies the names of the output columns. Value passed to this argument also plays a crucial role in determining the output column name. Note: Number of elements in "columns" and "out_columns" must be same. Types: str or list of str datatype: Optional Argument. Specifies the name of the intended datatype of the output column. Intended data types for the output column can be specified using either the teradatasqlalchemy types or the permitted strings mentioned below: ------------------------------------------------------------------- | If intended SQL Data Type is | Permitted Value to be passed is | |-------------------------------------------------------------------| | bigint | bigint | | byteint | byteint | | char(n) | char,n | | date | date | | decimal(m,n) | decimal,m,n | | float | float | | integer | integer | | number(*) | number | | number(n) | number,n | | number(*,n) | number,*,n | | number(n,n) | number,n,n | | smallint | smallint | | time(p) | time,p | | timestamp(p) | timestamp,p | | varchar(n) | varchar,n | -------------------------------------------------------------------- Notes: 1. Argument is ignored if "columns" argument is not used. 2. char without a size is not supported. 3. number(*) does not include the * in its datatype format. Examples: 1. If intended datatype for the output column is "bigint", then pass string "bigint" to the argument as shown below: datatype="bigint" 2. If intended datatype for the output column is "decimal(3,5)", then pass string "decimal,3,5" to the argument as shown below: datatype="decimal,3,5" Types: str, BIGINT, BYTEINT, CHAR, DATE, DECIMAL, FLOAT, INTEGER, NUMBER, SMALLINT, TIME, TIMESTAMP, VARCHAR. fillna: Optional Argument. Specifies whether the null replacement/missing value treatment should be performed with recoding or not. Output of FillNa() can be passed to this argument. Note: If the FillNa object is created with its arguments "columns", "out_columns" and "datatype", then values passed in FillNa() arguments are ignored. Only nullstyle information is captured from the same. Types: FillNa RETURNS: An instance of LabelEncoder class. RAISES: TeradataMlException, TypeError, ValueError EXAMPLE: # Note: # To run any transformation, user needs to use Transform() function from # Vantage Analytic Library. # To do so import valib first and set the "val_install_location". >>> from teradataml import configure, DataFrame, LabelEncoder, FillNa, load_example_data, valib >>> configure.val_install_location = "SYSLIB" >>> # Load example data. >>> load_example_data("dataframe", "admissions_train") >>> # Create the required DataFrame. >>> admissions_train = DataFrame("admissions_train") >>> admissions_train masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 5 no 3.44 Novice Novice 0 19 yes 1.98 Advanced Advanced 0 15 yes 4.00 Advanced Advanced 1 40 yes 3.95 Novice Beginner 0 7 yes 2.33 Novice Novice 1 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 38 yes 2.65 Advanced Beginner 1 >>> # Example 1: Recode all values 'Novice', 'Advanced', and 'Beginner' # in "programming" and "stats" columns. # We will pass values to "label_encode" as dictionary. >>> rc = LabelEncoder(values={"Novice": 1, "Advanced": 2, "Beginner": 3}, columns=["stats", "programming"]) # Execute Transform() function. >>> obj = valib.Transform(data=admissions_train, label_encode=rc) >>> obj.result id stats programming 0 22 1 3 1 36 2 1 2 15 2 2 3 38 2 3 4 5 1 1 5 17 2 2 6 34 2 3 7 13 2 1 8 26 2 2 9 19 2 2 >>> # Example 2: Recode value 'Novice' as 1 which is passed as tuple to "values" # argument and "label_encode" other values as 0 by passing it to "default" # argument in "programming" and "stats" columns. >>> rc = LabelEncoder(values=("Novice", 1), columns=["stats", "programming"], default=0) # Execute Transform() function. >>> obj = valib.Transform(data=admissions_train, label_encode=rc) >>> obj.result id stats programming 0 15 0 0 1 7 1 1 2 22 1 0 3 17 0 0 4 13 0 1 5 38 0 0 6 26 0 0 7 5 1 1 8 34 0 0 9 40 1 0 >>> # Example 3: In this example we encode values differently for multiple columns. # For values in "programming" column, recoding will be done as follows: # Novice --> 0 # Advanced --> 1 and # Rest of the values as --> NULL >>> rc_prog = LabelEncoder(values=[("Novice", 0), ("Advanced", 1)], columns="programming", ... default=None) >>> # For values in "stats" column, recoding will be done as follows: # Novice --> N # Advanced --> keep it as is and # Beginner --> NULL >>> rc_stats = LabelEncoder(values={"Novice": 0, "Advanced": "same", "Beginner": None}, ... columns="stats") >>> # For values in "masters" column, recoding will be done as follows: # yes --> 1 and other as 0 >>> rc_yes = LabelEncoder(values=("yes", 1), columns="masters", default=0, ... out_columns="masters_yes") >>> # For values in "masters" column, label encoding will be done as follows: # no --> 1 and other as 0 >>> rc_no = LabelEncoder(values=("no", 1), columns="masters", default=0, ... out_columns="masters_no") >>> # Execute Transform() function. >>> obj = valib.Transform(data=admissions_train, label_encode=[rc_prog, rc_stats, rc_yes, ... rc_no]) >>> obj.result id programming stats masters_yes masters_no 0 13 0 Advanced 0 1 1 26 1 Advanced 1 0 2 5 0 0 0 1 3 19 1 Advanced 1 0 4 15 1 Advanced 1 0 5 40 None 0 1 0 6 7 0 0 1 0 7 22 None 0 1 0 8 36 0 Advanced 0 1 9 38 None Advanced 1 0 >>>