In this example, invoke a Python script file using the SCRIPT table operator. The script mapper.py reads in a line of text input (“Old Macdonald Had A Farm”) and splits the line into individual words, emitting a new row for each word.
An example of the Python script:
#!/usr/bin/python import sys # input comes from STDIN (standard input) for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() # split the line into words words = line.split() # increase counters for word in words: # write the results to STDOUT (standard output); # what we output here will be the input for the # Reduce step, i.e. the input for reducer.py # # tab-delimited; the trivial word count is 1 print '%s\t%s' % (word, 1)
To install the script, run the following command:
CALL SYSUIF.INSTALL_FILE('mapper', 'mapper.py' 'cz!/tmp/mapper.py');
The table barrier contains the sentence as one line of text input:
Id int | Name varchar(100) |
---|---|
1 | Old Macdonald Had A Farm |
To split the sentence into individual words, run the following script:
SELECT * FROM SCRIPT ( ON ( SELECT name FROM barrier ) SCRIPT_COMMAND('./mydb/mapper.py') RETURNS ( 'word varchar(10)', 'count_input int' ) ) AS tab; );
The result:
Word | Count_input |
---|---|
Old | 1 |
Macdonald | 1 |
Had | 1 |
A | 1 |
Farm | 1 |