The input train table, ner_sports_train, is a collection of different sports news items. There are 500 rows of training data.
id | content |
---|---|
2 | CRICKET - <START:ORG> LEICESTERSHIRE <END> TAKE OVER AT TOP AFTER INNINGS VICTORY . |
3 | <START:LOC> LONDON <END> 1996-08-30 |
4 | West Indian all-rounder <START:PER> Phil Simmons <END> took four for 38 on Friday as <START:ORG> Leicestershire <END> beat <START:ORG> Somerset <END> by an innings and 39 runs in two days to take over at the head of the county championship . |
5 | Their stay on top |
6 | After bowling <START:ORG> Somerset <END> out for 83 on the opening morning at <START:LOC> Grace Road <END> |
7 | Trailing by 213 |
8 | <START:ORG> Essex <END> |
9 | <START:PER> Hussain <END> |
10 | By the close <START:ORG> Yorkshire <END> had turned that into a 37-run advantage but off-spinner <START:PER> Such <END> had scuttled their hopes |
... | ... |
The example uses the feature template file template_1.txt:
#part 1: extractor classes 0: Defaul_Token 1: Begin_with_Uppercase 2: Prefix_2 #part 2: templates %x[0,0] %x[0,1] %x[0,2] %x[-1,0] %x[1,0] %x[-1,1]%x[0,1]
The function applies the rules defined in the feature template file to the input data and generates a model file, ner_model.bin.