The following sections document the functionality of each EDA UI tab.
Save / Reset buttons
Use the Save button to save the existing EDA UI settings for the current Jupyter session for corresponding DataFrame. For example, build a pipeline and save the settings so you can use it for another DataFrame. Note that the settings can pass different values to the saved pipeline.
The Reset button resets your saved settings.
Describe tab
Use the Describe tab to retrieve different statistics on the resultant teradataml DataFrame.
The following example creates a sales DataFrame and displays the Describe tab.
Descriptions for each sub-tab follow.
- Shape and Size
- Retrieves the total number of rows and total number of columns on teradataml DataFrame.
- Column Statistics
- Retrieves the statistical values such as count, min, max, mean, standard deviation, percentiles (25, 50, 75) for every possible teradataml DataFrame Column.
- Column Types
- Retrieves the Teradata type and corresponding Python type for every teradataml DataFrame Column.
- Column Summary
- Retrieves Null value count, non-Null value count, total number of positive values, total number of negative values for every possible teradataml DataFrame Column.
- Categorical Summary
- Retrieves the categorical values and their total count in corresponding column for every possible teradataml DataFrame Column.
- Futile Columns
- Retrieves the Futile columns from teradataml DataFrame Column.
- Source Query
- Retrieves the underlying source query for corresponding teradataml DataFrame.
Visualize tab
Use the Visualize tab to generate a plot on a teradataml DataFrame. It uses DataFrame.plot() function internally. See Plotting in teradataml for options available in the UI.
Analyze tab
Use the Analyze tab to run analytic functions such as VAL, VantageCloud Enterprise, and UAF on teradataml DataFrame. You can select the corresponding function from the dropdown so the UI can pass other arguments for the function. Once the arguments are specified, select Execute to execute the function.
Along with these functions, you run AutoML by selecting AutoML from the dropdown.
Pipeline (Analyze tab option)
The Analyze tab includes a pipeline option to build a chain of actions using the output for any number of functions. The process to build a pipeline follows:
Lets look at steps involved in building a pipeline.
- Select a function from the dropdown.
If the function accepts any parameters, UI will show additional options to provide parameters.
- Select Add to pipeline.
- Select other functions as needed and specify parameters if the function accepts parameters.
- Select Remove from pipeline to remove functions.
- Select Move << to move up a function in the list.
- Select Move >> to move down a function in the list
- After all functions are added, select Execute to execute the pipeline and display the final result.
The following example builds a pipeline using two functions to get Column Summary of all columns from sales DataFrame and exclude column NonNullCount from final output.
- Select TD_Column Summary from the dropdown and provide arguments for the function.
- Exclude the column NonNullCount using AntiSelect.
- Select Add to pipeline and select AntiSelect from the dropdown.
- Provide the NonNullCount column name in arguments.
- Select Execute to run the pipeline.
Persist tab
Use the Persist tab to persist teradataml DataFrame to a table in Teradata Vantage. It uses copy_to_sql() internally. See copy_to_sql() for options available in persist.