The training table is a collection of categorized news articles in Simplified Chinese, from news.data.
To create the training table, use this statement:
CREATE FACT TABLE news ( doc_id VARCHAR(10), content TEXT, category VARCHAR(8) ) DISTRIBUTE BY HASH(doc_id);
To load the training table with data, use this command:
ncluster_loader -h queen_ip_address -U username -w password news news.data;
NaiveBayesTextClassifierTrainer Chinese Example Training Table news
To create the stop words table, use this statement:
CREATE DIMENSION TABLE stop_words (word TEXT);
To load the stop words table with data from stop_words.data, use this command:
ncluster_loader -h queen_ip_address -U username -w password stop_words stop_words.data;
NaiveBayesTextClassifierTrainer Chinese Example stop_words: