# 1.1 - 8.10 - TextChunker Example: SentenceExtractor and POSTagger Output as Input - Teradata Vantage

## Teradata Vantage™ - Machine Learning Engine Analytic Function Reference

Product
Release Number
1.1
8.10
Published
October 2019
Content Type
Programming Reference
Publication ID
B700-4003-079K
Language
English (United States)

## Input

paragraphs_input
paraid paratopic paratext
1 Decision Trees Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value. It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a finite set of values are called classification trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.
2 Simple Regression In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.
3 Logistic Regression Logistic regression was developed by statistician David Cox in 1958 (although much work was done in the single independent variable case almost two decades earlier). The binary logistic model is used to estimate the probability of a binary response based on one or more predictor (or independent) variables (features). As such it is not a classification method. It could be called a qualitative response/discrete choice model in the terminology of economics.
4 Cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to solve. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them.
5 Association rule learning Association rule learning is a method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al. introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule {onions, potatoes} => {burger} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat.

## SQL Call

TextChunker requires each sentence to have a unique identifier, and the input to TextChunker must be partitioned by that identifier.

```SELECT * FROM TextChunker (
ON (
SELECT * FROM POSTagger (
ON (
SELECT paraid*1000+sentence_sn AS sentence_id, sentence FROM SentenceExtractor (
ON paragraphs_input
USING
TextColumn ('paratext')
Accumulate ('paraid')
) AS dt1
)
USING
TextColumn ('sentence')
Accumulate ('sentence_id')
) AS dt2
) PARTITION BY sentence_id ORDER BY word_sn
USING
WordColumn('word')
POSColumn('pos_tag')
) AS dt;```

## Output

``` partition_key chunk_sn chunk                                                                                                chunk_tag
------------- -------- ---------------------------------------------------------------------------------------------------- ---------
1001        1 decision tree learning                                                                               NP
1001        2 uses                                                                                                 VP
1001        3 a decision tree                                                                                      NP
1001        4 as                                                                                                   PP
1001        5 a predictive model                                                                                   NP
1001        6 which                                                                                                NP
1001        7 maps                                                                                                 VP
1001        8 observations                                                                                         NP
1001        9 about                                                                                                PP
1001       10 an item                                                                                              NP
1001       11 to                                                                                                   PP
1001       12 conclusions                                                                                          NP
1001       13 about                                                                                                PP
1001       14 the items target value                                                                               NP
1001       15 .                                                                                                    O
1001       16 it                                                                                                   NP
1001       17 is                                                                                                   VP
1001       18 one                                                                                                  NP
1001       19 of                                                                                                   PP
1001       20 the predictive modelling approaches                                                                  NP
1001       21 used                                                                                                 VP
1001       22 in                                                                                                   PP
1001       23 statistics , data mining and machine learning . tree models                                          NP
1001       24 where                                                                                                ADVP
1001       25 the target variable                                                                                  NP
1001       26 can take                                                                                             VP
1001       27 a finite set                                                                                         NP
1001       28 of                                                                                                   PP
1001       29 values                                                                                               NP
1001       30 are called                                                                                           VP
1001       31 classification trees                                                                                 NP
1001       32 .                                                                                                    O
1001       33 in                                                                                                   PP
1001       34 these tree structures                                                                                NP
1001       35 ,                                                                                                    O
1001       36 leaves                                                                                               VP
1001       37 represent class labels and branches                                                                  NP
1001       38 represent                                                                                            VP
1001       39 conjunctions                                                                                         NP
1001       40 of                                                                                                   PP
1001       41 features                                                                                             NP
1001       42 that                                                                                                 NP
1001       43 lead                                                                                                 VP
1001       44 to                                                                                                   PP
1001       45 those class labels . decision trees                                                                  NP
1001       46 where                                                                                                ADVP
1001       47 the target variable                                                                                  NP
1001       48 can take                                                                                             VP
1001       49 continuous values                                                                                    NP
1001       50 ( typically real numbers                                                                             NP
1001       51 )                                                                                                    NP
1001       52 are called                                                                                           VP
1001       53 regression trees                                                                                     NP
1001       54 .                                                                                                    O
2001        1 in                                                                                                   PP
2001        2 statistics                                                                                           NP
2001        3 ,                                                                                                    O
2001        4 simple linear regression                                                                             NP
2001        5 is                                                                                                   VP
2001        6 the least squares estimator                                                                          NP
2001        7 of                                                                                                   PP
2001        8 a linear regression model                                                                            NP
2001        9 with                                                                                                 PP
2001       10 a single explanatory variable .                                                                      NP
2001       11 in                                                                                                   PP
2001       12 other words                                                                                          NP
2001       13 ,                                                                                                    O
2001       14 simple linear regression                                                                             NP
2001       15 fits                                                                                                 VP
2001       16 a straight line                                                                                      NP
2001       17 through                                                                                              PP
2001       18 the set                                                                                              NP
2001       19 of                                                                                                   PP
2001       20 n points                                                                                             NP
2001       21 in                                                                                                   PP
2001       22 such a way                                                                                           NP
2001       23 that                                                                                                 NP
2001       24 makes                                                                                                VP
2001       25 the sum                                                                                              NP
2001       26 of                                                                                                   PP
2001       27 squared residuals                                                                                    NP
2001       28 of                                                                                                   PP
2001       29 the model (                                                                                          NP
2001       30 that                                                                                                 NP
2001       31 is                                                                                                   VP
2001       32 , vertical distances                                                                                 NP
2001       33 between                                                                                              PP
2001       34 the points                                                                                           NP
2001       35 of                                                                                                   PP
2001       36 the data                                                                                             NP
2001       37 set                                                                                                  VP
2001       38 and                                                                                                  O
2001       39 the fitted line                                                                                      NP
2001       40 )                                                                                                    VP
2001       41 as small                                                                                             ADJP
2001       42 as                                                                                                   PP
2001       43 possible                                                                                             ADJP
2001       44 .                                                                                                    O
3001        1 logistic regression                                                                                  NP
3001        2 was developed                                                                                        VP
3001        3 by                                                                                                   PP
3001        4 statistician david cox                                                                               NP
3001        5 in                                                                                                   PP
3001        6 1958(although much work                                                                        NP
3001        7 was done                                                                                             VP
3001        8 in                                                                                                   PP
3001        9 the single independent variable case                                                                 NP
3001       10 almost                                                                                               ADVP
3001       11 two decades                                                                                          NP
3001       12 earlier)                                                                                             VP
3001       13 .                                                                                                    O
3001       14 the binary logistic model                                                                            NP
3001       15 is used to estimate                                                                                  VP
3001       16 the probability                                                                                      NP
3001       17 of                                                                                                   PP
3001       18 a binary response                                                                                    NP
3001       19 based                                                                                                VP
3001       20 on                                                                                                   PP
3001       21 one or more predictor ( or independent ) variables ( features) .                                     NP
3001       22 as                                                                                                   PP
3001       23 such                                                                                                 ADJP
3001       24 it                                                                                                   NP
3001       25 is                                                                                                   VP
3001       26 not                                                                                                  O
3001       27 a classification method                                                                              NP
3001       28 .                                                                                                    VP
3001       29 it                                                                                                   NP
3001       30 could be called                                                                                      VP
3001       31 a qualitative response/discrete choice model                                                         NP
3001       32 in                                                                                                   PP
3001       33 the terminology                                                                                      NP
3001       34 of                                                                                                   PP
3001       35 economics                                                                                            NP
3001       36 .                                                                                                    O
4001        1 cluster analysis or clustering                                                                       NP
4001        2 is                                                                                                   VP
4001        3 the task                                                                                             NP
4001        4 of                                                                                                   PP
4001        5 grouping                                                                                             VP
4001        6 a set                                                                                                NP
4001        7 of                                                                                                   PP
4001        8 objects                                                                                              NP
4001        9 in                                                                                                   PP
4001       10 such a way                                                                                           NP
4001       11 that                                                                                                 NP
4001       12 objects                                                                                              VP
4001       13 in                                                                                                   PP
4001       14 the same group                                                                                       NP
4001       15 ( called                                                                                             VP
4001       16 a cluster )                                                                                          NP
4001       17 are                                                                                                  VP
4001       18 more similar                                                                                         ADJP
4001       19 (                                                                                                    O
4001       20 in                                                                                                   PP
4001       21 some sense                                                                                           NP
4001       22 or                                                                                                   O
4001       23 another )                                                                                            NP
4001       24 to                                                                                                   PP
4001       25 each other                                                                                           NP
4001       26 than                                                                                                 PP
4001       27 to                                                                                                   PP
4001       28 those                                                                                                NP
4001       29 in                                                                                                   PP
4001       30 other groups                                                                                         NP
4001       31 ( clusters)                                                                                          NP
4001       32 .                                                                                                    O
4001       33 it                                                                                                   NP
4001       34 is                                                                                                   VP
4001       35 a main task                                                                                          NP
4001       36 of                                                                                                   PP
4001       37 exploratory data mining                                                                              NP
4001       38 ,                                                                                                    O
4001       39 and                                                                                                  O
4001       40 a common technique                                                                                   NP
4001       41 for                                                                                                  PP
4001       42 statistical data analysis                                                                            NP
4001       43 , used                                                                                               VP
4001       44 in                                                                                                   PP
4001       45 many fields                                                                                          NP
4001       46 ,                                                                                                    O
4001       47 including                                                                                            PP
4001       48 machine learning                                                                                     NP
4001       49 ,                                                                                                    O
4001       50 pattern recognition , image analysis , information retrieval , and bioinformatics . cluster analysis NP
4001       51 itself                                                                                               NP
4001       52 is                                                                                                   VP
4001       53 not                                                                                                  O
4001       54 one specific algorithm                                                                               NP
4001       55 ,                                                                                                    O
4001       56 but                                                                                                  O
4001       57 the general task                                                                                     NP
4001       58 to be solved                                                                                         VP
4001       59 .                                                                                                    O
4001       60 it                                                                                                   NP
4001       61 can be achieved                                                                                      VP
4001       62 by                                                                                                   PP
4001       63 various algorithms                                                                                   NP
4001       64 that                                                                                                 NP
4001       65 differ                                                                                               VP
4001       66 significantly                                                                                        ADVP
4001       67 in                                                                                                   PP
4001       68 their notion                                                                                         NP
4001       69 of                                                                                                   PP
4001       70 what                                                                                                 NP
4001       71 constitutes                                                                                          VP
4001       72 a cluster                                                                                            NP
4001       73 and                                                                                                  O
4001       74 how                                                                                                  ADVP
4001       75 to efficiently find                                                                                  VP
4001       76 them                                                                                                 NP
4001       77 .                                                                                                    O
5001        1 association rule learning                                                                            NP
5001        2 is                                                                                                   VP
5001        3 a method                                                                                             NP
5001        4 for                                                                                                  PP
5001        5 discovering                                                                                          VP
5001        6 interesting relations                                                                                NP
5001        7 between                                                                                              PP
5001        8 variables                                                                                            NP
5001        9 in                                                                                                   PP
5001       10 large databases                                                                                      NP
5001       11 .                                                                                                    O
5001       12 it                                                                                                   NP
5001       13 is intended to identify                                                                              VP
5001       14 strong rules                                                                                         NP
5001       15 discovered                                                                                           VP
5001       16 in                                                                                                   PP
5001       17 databases                                                                                            NP
5001       18 using                                                                                                VP
5001       19 different measures                                                                                   NP
5001       20 of                                                                                                   PP
5001       21 interestingness                                                                                      NP
5001       22 . based                                                                                              VP
5001       23 on                                                                                                   PP
5001       24 the concept                                                                                          NP
5001       25 of                                                                                                   PP
5001       26 strong rules                                                                                         NP
5001       27 ,                                                                                                    O
5001       28 rakesh agrawal et al.[2 ] introduced association rules                                               NP
5001       29 for                                                                                                  PP
5001       30 discovering regularities                                                                             NP
5001       31 between                                                                                              PP
5001       32 products                                                                                             NP
5001       33 in                                                                                                   PP
5001       34 large-scale transaction data                                                                         NP
5001       35 recorded                                                                                             VP
5001       36 by                                                                                                   PP
5001       37 point-of-sale ( pos ) systems                                                                        NP
5001       38 in                                                                                                   PP
5001       39 supermarkets                                                                                         NP
5001       40 .                                                                                                    O
5001       41 for                                                                                                  PP
5001       42 example                                                                                              NP
5001       43 ,                                                                                                    O
5001       44 the rule { onions , potatoes}=>{burger                                                               NP
5001       45 } found                                                                                              VP
5001       46 in                                                                                                   PP
5001       47 the sales data                                                                                       NP
5001       48 of                                                                                                   PP
5001       49 a supermarket                                                                                        NP
5001       50 would indicate                                                                                       VP
5001       51 that                                                                                                 SBAR
5001       52 if                                                                                                   SBAR
5001       53 a customer                                                                                           NP
5001       54 buys                                                                                                 VP
5001       55 onions                                                                                               NP
5001       56 and                                                                                                  O
5001       57 potatoes                                                                                             VP
5001       58 together                                                                                             ADVP
5001       59 ,                                                                                                    O
5001       60 they                                                                                                 NP
5001       61 are                                                                                                  VP
5001       62 likely                                                                                               ADJP
5001       63 to also buy                                                                                          VP
5001       64 hamburger meat                                                                                       NP
5001       65 .                                                                                                    O```

Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.