The TextParser function tokenizes an input stream of words, optionally stems them (reduces them to their root forms), and then outputs them. The function can either output all words in one row or output each word in its own row with (optionally) the number of times that the word appears.
The TextParser function uses Porter2 as the stemming algorithm.
The TextParser function reads a document into a memory buffer and creates a hash table. The dictionary for the document must not exceed available memory; however, a million-word dictionary with an average word length of ten bytes requires only 10 MB of memory.
TextParser uses files that are preinstalled on ML Engine. For details, see Preinstalled Files That Functions Use.