Input
paraid | paratopic | paratext |
---|---|---|
1 | Decision Trees | Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the items target value. It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a finite set of values are called classification trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. |
2 | Simple Regression | In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible. |
3 | Logistic Regression | Logistic regression was developed by statistician David Cox in 1958[2][3] (although much work was done in the single independent variable case almost two decades earlier). The binary logistic model is used to estimate the probability of a binary response based on one or more predictor (or independent) variables (features). As such it is not a classification method. It could be called a qualitative response/discrete choice model in the terminology of economics. |
4 | Cluster analysis | Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general task to solve. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. |
5 | Association rule learning | Association rule learning is a method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al.[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule {onions, potatoes} => {burger} found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. |
SQL Call
TextChunker requires each sentence to have a unique identifier, and the input to TextChunker must be partitioned by that identifier.
SELECT * FROM TextChunker ( ON ( SELECT * FROM POSTagger ( ON ( SELECT paraid*1000+sentence_sn AS sentence_id, sentence FROM SentenceExtractor ( ON paragraphs_input USING TextColumn ('paratext') Accumulate ('paraid') ) AS dt1 ) USING TextColumn ('sentence') Accumulate ('sentence_id') ) AS dt2 ) PARTITION BY sentence_id ORDER BY word_sn USING WordColumn('word') POSColumn('pos_tag') ) AS dt;
Output
partition_key chunk_sn chunk chunk_tag ------------- -------- ---------------------------------------------------------------------------------------------------- --------- 1001 1 decision tree learning NP 1001 2 uses VP 1001 3 a decision tree NP 1001 4 as PP 1001 5 a predictive model NP 1001 6 which NP 1001 7 maps VP 1001 8 observations NP 1001 9 about PP 1001 10 an item NP 1001 11 to PP 1001 12 conclusions NP 1001 13 about PP 1001 14 the items target value NP 1001 15 . O 1001 16 it NP 1001 17 is VP 1001 18 one NP 1001 19 of PP 1001 20 the predictive modelling approaches NP 1001 21 used VP 1001 22 in PP 1001 23 statistics , data mining and machine learning . tree models NP 1001 24 where ADVP 1001 25 the target variable NP 1001 26 can take VP 1001 27 a finite set NP 1001 28 of PP 1001 29 values NP 1001 30 are called VP 1001 31 classification trees NP 1001 32 . O 1001 33 in PP 1001 34 these tree structures NP 1001 35 , O 1001 36 leaves VP 1001 37 represent class labels and branches NP 1001 38 represent VP 1001 39 conjunctions NP 1001 40 of PP 1001 41 features NP 1001 42 that NP 1001 43 lead VP 1001 44 to PP 1001 45 those class labels . decision trees NP 1001 46 where ADVP 1001 47 the target variable NP 1001 48 can take VP 1001 49 continuous values NP 1001 50 ( typically real numbers NP 1001 51 ) NP 1001 52 are called VP 1001 53 regression trees NP 1001 54 . O 2001 1 in PP 2001 2 statistics NP 2001 3 , O 2001 4 simple linear regression NP 2001 5 is VP 2001 6 the least squares estimator NP 2001 7 of PP 2001 8 a linear regression model NP 2001 9 with PP 2001 10 a single explanatory variable . NP 2001 11 in PP 2001 12 other words NP 2001 13 , O 2001 14 simple linear regression NP 2001 15 fits VP 2001 16 a straight line NP 2001 17 through PP 2001 18 the set NP 2001 19 of PP 2001 20 n points NP 2001 21 in PP 2001 22 such a way NP 2001 23 that NP 2001 24 makes VP 2001 25 the sum NP 2001 26 of PP 2001 27 squared residuals NP 2001 28 of PP 2001 29 the model ( NP 2001 30 that NP 2001 31 is VP 2001 32 , vertical distances NP 2001 33 between PP 2001 34 the points NP 2001 35 of PP 2001 36 the data NP 2001 37 set VP 2001 38 and O 2001 39 the fitted line NP 2001 40 ) VP 2001 41 as small ADJP 2001 42 as PP 2001 43 possible ADJP 2001 44 . O 3001 1 logistic regression NP 3001 2 was developed VP 3001 3 by PP 3001 4 statistician david cox NP 3001 5 in PP 3001 6 1958[2][3](although much work NP 3001 7 was done VP 3001 8 in PP 3001 9 the single independent variable case NP 3001 10 almost ADVP 3001 11 two decades NP 3001 12 earlier) VP 3001 13 . O 3001 14 the binary logistic model NP 3001 15 is used to estimate VP 3001 16 the probability NP 3001 17 of PP 3001 18 a binary response NP 3001 19 based VP 3001 20 on PP 3001 21 one or more predictor ( or independent ) variables ( features) . NP 3001 22 as PP 3001 23 such ADJP 3001 24 it NP 3001 25 is VP 3001 26 not O 3001 27 a classification method NP 3001 28 . VP 3001 29 it NP 3001 30 could be called VP 3001 31 a qualitative response/discrete choice model NP 3001 32 in PP 3001 33 the terminology NP 3001 34 of PP 3001 35 economics NP 3001 36 . O 4001 1 cluster analysis or clustering NP 4001 2 is VP 4001 3 the task NP 4001 4 of PP 4001 5 grouping VP 4001 6 a set NP 4001 7 of PP 4001 8 objects NP 4001 9 in PP 4001 10 such a way NP 4001 11 that NP 4001 12 objects VP 4001 13 in PP 4001 14 the same group NP 4001 15 ( called VP 4001 16 a cluster ) NP 4001 17 are VP 4001 18 more similar ADJP 4001 19 ( O 4001 20 in PP 4001 21 some sense NP 4001 22 or O 4001 23 another ) NP 4001 24 to PP 4001 25 each other NP 4001 26 than PP 4001 27 to PP 4001 28 those NP 4001 29 in PP 4001 30 other groups NP 4001 31 ( clusters) NP 4001 32 . O 4001 33 it NP 4001 34 is VP 4001 35 a main task NP 4001 36 of PP 4001 37 exploratory data mining NP 4001 38 , O 4001 39 and O 4001 40 a common technique NP 4001 41 for PP 4001 42 statistical data analysis NP 4001 43 , used VP 4001 44 in PP 4001 45 many fields NP 4001 46 , O 4001 47 including PP 4001 48 machine learning NP 4001 49 , O 4001 50 pattern recognition , image analysis , information retrieval , and bioinformatics . cluster analysis NP 4001 51 itself NP 4001 52 is VP 4001 53 not O 4001 54 one specific algorithm NP 4001 55 , O 4001 56 but O 4001 57 the general task NP 4001 58 to be solved VP 4001 59 . O 4001 60 it NP 4001 61 can be achieved VP 4001 62 by PP 4001 63 various algorithms NP 4001 64 that NP 4001 65 differ VP 4001 66 significantly ADVP 4001 67 in PP 4001 68 their notion NP 4001 69 of PP 4001 70 what NP 4001 71 constitutes VP 4001 72 a cluster NP 4001 73 and O 4001 74 how ADVP 4001 75 to efficiently find VP 4001 76 them NP 4001 77 . O 5001 1 association rule learning NP 5001 2 is VP 5001 3 a method NP 5001 4 for PP 5001 5 discovering VP 5001 6 interesting relations NP 5001 7 between PP 5001 8 variables NP 5001 9 in PP 5001 10 large databases NP 5001 11 . O 5001 12 it NP 5001 13 is intended to identify VP 5001 14 strong rules NP 5001 15 discovered VP 5001 16 in PP 5001 17 databases NP 5001 18 using VP 5001 19 different measures NP 5001 20 of PP 5001 21 interestingness NP 5001 22 . based VP 5001 23 on PP 5001 24 the concept NP 5001 25 of PP 5001 26 strong rules NP 5001 27 , O 5001 28 rakesh agrawal et al.[2 ] introduced association rules NP 5001 29 for PP 5001 30 discovering regularities NP 5001 31 between PP 5001 32 products NP 5001 33 in PP 5001 34 large-scale transaction data NP 5001 35 recorded VP 5001 36 by PP 5001 37 point-of-sale ( pos ) systems NP 5001 38 in PP 5001 39 supermarkets NP 5001 40 . O 5001 41 for PP 5001 42 example NP 5001 43 , O 5001 44 the rule { onions , potatoes}=>{burger NP 5001 45 } found VP 5001 46 in PP 5001 47 the sales data NP 5001 48 of PP 5001 49 a supermarket NP 5001 50 would indicate VP 5001 51 that SBAR 5001 52 if SBAR 5001 53 a customer NP 5001 54 buys VP 5001 55 onions NP 5001 56 and O 5001 57 potatoes VP 5001 58 together ADVP 5001 59 , O 5001 60 they NP 5001 61 are VP 5001 62 likely ADJP 5001 63 to also buy VP 5001 64 hamburger meat NP 5001 65 . O
Download a zip file of all examples and a SQL script file that creates their input tables from the attachment in the left sidebar.