Dataproc provides initialization actions that can be used to install custom software on cluster instances. To create an initialization action, you need to provide a bootstrap script. The script must be stored in Google Cloud Storage on a URI that is accessible from the Dataproc cluster. For compatibility information on QueryGrid components, see the QueryGrid Compatibility Matrix.
The required script, TDQG_DEPLOYMENT.sh, is packaged in the node package tdqg-node-version.tar.gz.
This procedure assumes the following prerequisites:
- You have required privileges to provision the Dataproc cluster and access scripts stored on Google Cloud Storage.
- The cURL tool is installed on all nodes where you intend to install QueryGrid.
Note the following considerations with initialization actions:
- can only be provided during cluster provisioning
- cannot be modified post-cluster provisioning
- are always persisted when created, all future Dataproc nodes run the initialization actions
- Add a system and download the tdqg-node.json token file that was generated by the QueryGrid Manager.For information about downloading tdqg-node.json, see Adding Nodes Manually.
- Do one of the following:
Option Action Install QueryGridâ„¢ on Google Cloud Dataproc - Download the node package.
For more information, see Downloading Required Packages.
- Unzip the package:
tar -xvzf tdqg-node-version.tar.gz
The TDQG_DEPLOYMENT.sh script is available in the path qgdeployment/dataproc, named TDQG_DEPLOYMENT.sh.
- Upload the QueryGrid deployment script to Google Cloud Storage.
- In the Dataproc Create a Cluster screen, do the following:
- At Initialization Actions, provide the path to the deployment script.
- At Metadata, use tdqg_node_json as the key and use the contents of the file for the data.
Install QueryGrid on an existing node Running the initialization actions script requires a user with sudo permissions. - On each node in the cluster, run the following command:
./TDQG_DEPLOYMENT.sh --tdqg_node_json_file 'input'
Where input can be one of the following:- (Recommended) Path to the tdqg_node_json_file.
- File contents of tdqg_node_json.
Install QueryGrid on a new node The initialization action on a new node depends on how you ran the TDQG_DEPLOYMENT.sh script when provisioning the Dataproc cluster. - If you ran the script as an Initialization Action to Dataproc, the script automatically runs on the new node.
- If you did not run the script as an Initialization Action, run the script on the new node as if installing the script on an existing node.
- Download the node package.