In this example, the Python script mapper.py reads in a line of text input ("Old Macdonald Had A Farm") from a csv file and splits the line into individual words, emitting a new row for each word.
- Load example data.
>>> load_example_data("Script", ["barrier"])
- Create teradataml DataFrame objects.
>>> barrierdf = DataFrame.from_table("barrier")
- Create remote user environment.
>>> testenv = create_env('testenv', 'python_3.7.13', 'Demo environment') User environment testenv created.
>>> import os, teradataml
>>> teradataml_dir = os.path.dirname(teradataml.__file__)
- Create an APPLY object that allows user to run script.
>>> apply_obj = Apply(data=barrierdf, script_name='mapper.py', files_local_path= os.path.join(teradataml_dir, 'data', 'scripts'), apply_command='python3 mapper.py', data_order_column="Id", is_local_order=False, nulls_first=False, sort_ascending=False, returns={"word": VARCHAR(15), "count_input": VARCHAR(10)}, env_name=testenv, delimiter='\t')
- Run user script locally within Docker container and using data from the csv file.
This helps the user to fix script level issues outside Open Analytics Framework.
- Set up the environment by providing local path to the Docker image file.
>>> apply_obj.setup_sto_env(docker_image_location='/tmp/sto_sandbox_docker_image.tar')) Loading image from /tmp/sto_sandbox_docker_image.tar. It may take few minutes. Image loaded successfully.
- Run user script locally in the Docker container.
>>> apply_obj.test_script(input_data_file=os.path.join(teradataml_dir, 'data', 'barrier.csv')) ############ STDOUT Output ############ word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1
- Set up the environment by providing local path to the Docker image file.
- Install the script file in remote user environment.
>>> apply_obj.install_file(file_name=os.path.join(teradataml_dir, 'data', 'mapper.py')) File 'mapper.py' installed successfully in the remote user environment 'demo_env'.
- Run the user script in Open Analytics Framework
>>> apply_obj.execute_script() word count_input 0 Macdonald 1 1 A 1 2 Farm 1 3 Had 1 4 Old 1 5 1 1
- Remove the installed file from remote user environment.
>>> apply_obj.remove_file(file_name='mapper.py') File 'mapper.py' removed successfully from the remote user environment 'demo_env'.