TextChunker Example 2: SentenceExtractor and POSTagger Output as Input

TextChunker Example 2: SentenceExtractor and POSTagger Output as Input - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.00

1.0

Published

May 2019

Language

English (United States)

Last Update

2019-11-22

dita:mapPath

blj1506016597986.ditamap

dita:ditavalPath

blj1506016597986.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

Input

paragraphs_input
paraid	paratopic	paratext
1	Decision Trees	Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value. It is one of the predictive modelling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a finite set of values are called classification trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.
2	Simple Regression	In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.
3	Logistic Regression	Logistic regression was developed by statistician David Cox in 1958[2][3] (although much work was done in the single independent variable case almost two decades earlier). The binary logistic model is used to estimate the probability of a binary response based on one or more predictor (or independent) variables (features). As such it is not a classification method. It could be called a qualitative response/discrete choice model in the terminology of economics.
4	Cluster analysis	Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to solve. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them.
5	Association rule learning	Association rule learning is a method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al.[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule {onions, potatoes} => {burger} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat.

SQL Call

TextChunker requires each sentence to have a unique identifier, and the input to TextChunker must be partitioned by that identifier.

SELECT * FROM TextChunker (
  ON (SELECT * FROM POSTagger (
    ON (SELECT paraid*1000+sentence_sn AS sentence_id, sentence FROM SentenceExtractor (
      ON paragraphs_input
      USING
      TextColumn ('paratext')
      Accumulate ('paraid')
    ) AS dt1 )
    USING
    TextColumn ('sentence')
    Accumulate ('sentence_id')
  ) AS dt2 ) PARTITION BY sentence_id ORDER BY word_sn
  USING
  WordColumn ('word')
  POSColumn ('pos_tag')
)  AS dt ;

Output

partition_key	chunk_sn	chunk	chunk_tag
1001	1	Decision tree learning	NP
1001	2	uses	VP
1001	3	a decision tree	NP
1001	4	as	PP
1001	5	a predictive model	NP
1001	6	which	NP
1001	7	maps	VP
1001	8	observations	NP
1001	9	about	PP
1001	10	an item	NP
1001	11	to	PP
1001	12	conclusions	NP
1001	13	about	PP
1001	14	the items target value	NP
1001	15	.	O
1002	1	It	NP
1002	2	is	VP
1002	3	one	NP
1002	4	of	PP
1002	5	the predictive modelling approaches	NP
1002	6	used	VP
1002	7	in	PP
1002	8	statistics , data mining and machine learning	NP
1002	9	.	O
1003	1	Tree models	NP
1003	2	where	ADVP
1003	3	the target variable	NP
1003	4	can take	VP
1003	5	a finite set	NP
1003	6	of	PP
1003	7	values	NP
1003	8	are called	VP
1003	9	classification trees	NP
1003	10	.	O
...	...	...	...