Teradata Package for R Function Reference | 17.00 - 17.00 - td_text_analyzer_valib - Teradata Package for R

Teradata® Package for R Function Reference

Teradata Package for R
Release Number
July 2021
Last Update
August 2022
Content Type
Programming Reference
Publication ID
English (United States)
Last Update
Descriptive Statistics Function: Text Field Analyzer


When working with character data it is useful to determine the data type and what data can be stored in the database. The td_text_analyzer() analyzes character data and distinguishes if the field is a numeric type, date, time, timestamp, or character data and returns two output results, one containing the analysis results and the second one containing the column data type matrix having the progression of data type through the series of steps (mentioned below).

The function runs a series of tests to distinguish what the correct underlying type of each selected column should be.

  1. The first test performed on the column is the MIN and the MAX test. The MIN and MAX values of a column are retrieved from the database and tested to determine what type the values are.

  2. The next test is a sample test which retrieves a small sample of data for each column and again assesses what type they should be.

  3. The next test, extended Numeric analysis, is for fields that were determined to be numeric and it tries to classify them in a more specific category if possible. For instance, a column that is considered a FLOAT type after the first two tests might really be a DECIMAL type with 2 decimal places.

  4. In the next test, a date type is validated to make sure all values in that column are truly dates.

  5. If requested, unicode columns are tested to see if they contain only Latin characters. This test is called extended Unicode analysis.

The function can be used on columns of any character data type. Columns of non-character data type are passed along to the output as defined in the input data.

  • This function is available in Vantage Analytic Library or later.


td_text_analyzer_valib(data, columns, ...)



Required Argument.
Specifies the input data to perform text analysis.
Types: tbl_teradata


Required Argument.
Specifies the name(s) of the column(s) to analyze. Occasionally, it can also accept permitted strings to specify all columns, or all character columns.
Permitted Values:

  1. Name(s) of the column(s) in "data".

  2. Pre-defined strings:

    1. 'all' - all columns

    2. 'allcharacter' - all character columns

Types: character OR vector of Strings (character)


Specifies other arguments supported by the function as described in the 'Other Arguments' section.


Function returns an object of class "td_text_analyzer_valib" which is a named list containing object of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator using name:

  1. result

  2. data.type.matrix

Other Arguments


Optional Argument.
Specifies the name(s) of the column(s) to exclude from the analysis, if a column specifier such as 'all', 'allcharacter' is used in the "columns" argument.
Types: character OR vector of Strings (character)


Optional Argument.
Specifies whether to process specific numeric types. If TRUE, the function processes numeric types.
Default Value: TRUE
Types: logical


Optional Argument.
Specifies whether a column declared to contain Unicode characters actually contains only Latin characters. If TRUE, Unicode analysis is performed.
Default Value: FALSE
Types: logical


# Notes:
#   1. To execute Vantage Analytic Library functions, set option 
#      'val.install.location' to the database name where Vantage analytic 
#      library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic 
#      Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")

# Example 1: Perform text analysis on all columns with only required 
#            arguments and default values for "analyze.numerics" and 
#            "analyze.unicode" arguments.
obj <- td_text_analyzer_valib(data=df,

# Print the results.

# Example 2: Perform text analysis, including numeric and unicode analysis, 
#            on columns 'cust_id', 'gender' and 'marital_status'.
obj <- td_text_analyzer_valib(data=df,

# Print the results.