The POSTagger function creates part-of-speech (POS) tags for the words in the input text. POS tagging is the first step in the syntactic analysis of a language, and an important preprocessing step in many natural language-processing applications.
The POSTagger function was developed on the Penn Treebank Project and Chinese Penn Treebank Project data set. Its POS tags comply with the tags defined by the two projects.
For the parts of speech used, see the following:
Text Language | Parts of Speech |
---|---|
English | https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html |
Chinese | https://www.sketchengine.co.uk/chinese-penn-treebank-part-of-speech-tagset/ |
POSTagger uses files that are preinstalled on ML Engine. For details, see Preinstalled Files That Functions Use.