NERTrainer Feature Template - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantage™
The ML Engine does not support the creation of new extractor classes. However, it does support existing JAR files—for installation instructions, see Teradata Vantage™ User Guide, B700-4002.
The feature template file specifies:
  • The characteristics of a token, or word, to use to build the model.

    For example, the template can specify that a word begins with an uppercase letter, that all its letters are uppercase, that it can (or cannot) contain a hyphen or number, and so on.

  • Which nearby tokens to consider when classifying a token.
The feature template has two parts:
  1. Declares the Java classes that the function uses to extract features with which to build the model.

    Seven predefined classes are included with the function (see the following table). You can also create your own classes (see Part 1 of the Example Template File).

  2. Specifies which extractor classes to apply to which tokens to build the model.

The following table lists the predefined extractor classes and describes the features that they extract.

NERTrainer Predefined Extractor Classes and Features
Extractor Class Feature
Defaul_Token The token itself.
Begin_with_Uppercase "T" (true) if the token begins with an uppercase letter, "F" (false) otherwise.
All_Uppercase "T" (true) if all characters of the token are uppercase, "F" (false) otherwise.
Is_Digital "T" (true) if the token represents a digit (for example, 1 or '2'), "F" (false) otherwise.
Has_Hyphen "T" (true) if the token has a hyphen, "F" (false) otherwise.
Prefix_n The first n characters of the token, where n is a positive integer.
Suffix_n The last n characters of the token, where n is a positive integer.

Here is an example of a feature template file, template_1.txt:

#part 1: extractor classes
0: Defaul_Token
1: Begin_with_Uppercase
2: Prefix_2
#part 2: templates
%x[0,0]
%x[0,1]
%x[0,2]
%x[-1,0]
%x[1,0]
%x[-1,1]%x[0,1]