Description
The TextTagging function tags text documents according to user-defined rules that use text-processing and logical operators.
Usage
td_text_tagger_mle ( data = NULL, rules.data = NULL, language = "en", rules = NULL, tokenize = FALSE, outputby.tag = FALSE, tag.delimiter = ",", accumulate = NULL, data.sequence.column = NULL, rules.data.sequence.column = NULL )
Arguments
data |
Required Argument. |
rules.data |
Optional Argument. |
language |
Optional Argument. |
rules |
Optional Argument. |
tokenize |
Optional Argument. |
outputby.tag |
Optional Argument. |
tag.delimiter |
Optional Argument. |
accumulate |
Optional Argument. |
data.sequence.column |
Optional Argument. |
rules.data.sequence.column |
Optional Argument. |
Value
Function returns an object of class "td_text_tagger_mle" which is a named
list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator
using name: result.
Examples
# Get the current context/connection con <- td_get_context()$connection # Load example data. loadExampleData("texttagger_example", "text_inputs", "rule_inputs") # Create remote tibble objects. text_inputs <- tbl(con, "text_inputs") rule_inputs <- tbl(con, "rule_inputs") # Example 1 - Specifying rules as an argument td_text_tagger_out1 <- td_text_tagger_mle(data = text_inputs, outputby.tag = TRUE, rules=c('contain(content, "floods",1,) or contain(content,"tsunamis",1,) AS Natural-Disaster', 'contain(content,"Roger",1,) and contain(content,"Nadal",1,) AS Tennis-Rivalry', 'contain(titles,"Tennis",1,) and contain(content,"Roger",1,) AS Tennis-Greats', 'contain(content,"India",1,) and contain(content,"Pakistan",1,) AS Cricket-Rivalry', 'contain(content,"Australia",1,) and contain(content,"England",1,) AS The-Ashes'), accumulate = c("id") ) # Example 2 - Using rules specified in a table td_text_tagger_out2 <- td_text_tagger_mle(data = text_inputs, rules.data = rule_inputs, accumulate = c("id") ) # Example 3 - Specify dictionary file in rules argument td_text_tagger_out3 <- td_text_tagger_mle(data = text_inputs, rules=c('dict(content, "keywords.txt", 1,) and equal(titles, "Chennai Floods") AS Natural-Disaster', 'dict(content, "keywords.txt", 2,) and equal(catalog, "sports") AS Great-Sports-Rivalry '), accumulate = c("id") ) # Example 4 - Specify superdist in rules argument td_text_tagger_out4 <- td_text_tagger_mle(data = text_inputs, rules=c('superdist(content,"Chennai","floods",sent,,) AS Chennai-Flood-Disaster', 'superdist(content,"Roger","titles",para, "Nadal",para) AS Roger-Champion', 'superdist(content,"Roger","Nadal",para,,) AS Tennis-Rivalry', 'contain(content,regex"[A|a]shes",2,) AS Aus-Eng-Cricket', 'superdist(content,"Australia","won",nw5,,) AS Aus-victory'), accumulate = c("id") )