1.0 - 8.00 - Part 1 of the Example Template File - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
1.0
8.00
Release Date
May 2019
Content Type
Programming Reference
Publication ID
B700-4003-098K
Language
English (United States)

Part 1 of the example template file declares three extractor classes—Defaul_Token, Begin_with_Uppercase, and com.asterdata.ner.SuffixExtractor, with serial numbers 0, 1, and 2, respectively. (Serial numbers must start with 0 and be incremented by 1.)

Defaul_Token and Begin_with_Uppercase are predefined extractor classes.

The third class, com.asterdata.ner.SuffixExtractor, is an example of a user-defined class. User-defined classes must be created in Java and must implement the Extractor interface. This is the Extractor interface:

package com.asterdata.sqlmr.text_analysis.ner;
import java.io.Serializable;
import java.util.List;

/**
 * Implement this interface to define a
 * function that generates features from a sequence
 */

public interface Extractor extends Serializable
{
  /**
  * extract the feature of a token
  * @param sequence
  * @param i, the index
  * @return the feature flag
  */
 String extract(List String sequence, int i);
}

The Java class SuffixExtractor in this example returns the last character of the current token. This is the code for SuffixExtractor:

public class SuffixExtractor implements Extractor
{
  @Override
  public String extract(List String sequence, int i)
  {
    String token = sequence.get(i);
    return token.substring(token.length() - 1);
  }
}

Suppose that the function applies the extractor classes in the example template file to the input text "More restaurants open in San Diego." For the token "More":

  • Defaul_Token returns the token itself, "More".
  • Begin_with_Uppercase returns "T" because the token begins with an uppercase letter.
  • com.asterdata.ner.SuffixExtractor returns "e", the last character of the token.

This table shows the features that each extractor class returns for the entire input text:

Defaul_Token Begin_with_Uppercase com.asterdata.ner.SuffixExtractor
More T e
restaurants F s
open F n
in F n
San T n
Diego T o
. F .