show_query() Method | Teradata Python Package - 17.00 - show_query() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Release Date
November 2021
Content Type
User Guide
Publication ID
B700-4006-070K
Language
English (United States)

Use the show_query() method to return underlying SQL for a teradataml DataFrame. It is the same SQL that is used to view the data of the teradataml DataFrame.

Arguments:

The optional argument full_query specifies if the complete query for the dataframe should be returned.
  • If set to True, query for the dataframe is returned with respect to the base dataframe's table ("from_table()" or "from_query()"), or from the output tables of analytical functions, if there are any in the workflow.
    This query may or may not be directly used to retrieve data of the dataframe upon which the function is called.
  • If set to False or not used, the string returned is the query already used or will be used to retrieve data for the teradataml DataFrame.

    This is the default value.

show_query() API is not intended to be used on the output of the following APIs:
  • set_index()
  • sort()
  • sort_index()
  • squeeze()
In most cases, queries returned by show_query() with full_query set to False, can be executed as is on Vantage. But there are certain functions which are lazy, some functions are executed at the time when actual data is requested. When such DataFrames are manipulated and show_query() is executed on the outcome, such queries may or may not be executable on Vantage.
Following are some analytic functions which creates lazy DataFrames:
  • ML Engine functions:
    • Attribution
    • CoxPH
    • CoxHazardRatio
    • DecisionForest
    • DecisionForestPredict
    • FPGrowth
    • GLML1L2Predict
    • NaiveBayes
    • PageRank
    • POSTagger
    • Sessionize
    • XGBoost
  • Advanced SQL Engine functions:
    • MovingAverage
    • NaiveBayesTextClassifierPredict

Examples Prerequisite

Load example data and create required DataFrame on it.

>>> load_example_data("dataframe", "admissions_train")
>>> load_example_data("NaiveBayes", "nb_iris_input_train")
>>> df = DataFrame.from_table("admissions_train")

Example 1: Show query on base (from_table) dataframe, with default option.

>>> df.show_query()
'select * from "admissions_train"'

Example 2: Show query on base (from_query) dataframe, with default option.

>>> df_from_query = DataFrame.from_query("select masters, gpa from admissions_train")
>>> df_from_query.show_query()
'select masters, gpa from admissions_train'

Example 3: Show query on base (from_table) dataframe, with full_query option.

This will return same query as with default option because workflow only has one dataframe.

>>> df.show_query(full_query = True)
'select * from "admissions_train"

Example 4: Show query on base (from_query) dataframe, with full_query option.

>>> df_from_query = DataFrame.from_query("select masters, gpa from admissions_train")
>>> df_from_query.show_query(full_query = True)
'select masters, gpa from admissions_train'

Example 5: Show query used in a workflow demonstrating default and full_query options.

  • Assign operation on base dataframe:
    # Workflow Step-1: Assign operation on base dataframe
    >>> df1 = df.assign(temp_column=admissions_train_df.gpa + admissions_train_df.admitted)
  • Select columns from assign's result:
    # Workflow Step-2: Selecting columns from assign's result
    >>> df2 = df1.select(["masters", "gpa", "programming", "admitted"])
  • Filter on top of select's result:
    # Workflow Step-3: Filtering on top of select's result
    >>> df3 = df2[df2.admitted > 0]
  • Sample 90% rows from filter's result:
    # Workflow Step-4: Sampling 90% rows from filter's result
    >>> df4 = df3.sample(frac=0.9)
  • Show query with full_query option on df4:
    # Show query with full_query option on df4.
    # This will give full query upto base dataframe(df)
    >>> df4.show_query(full_query = True)
    'select masters,gpa,stats,programming,admitted,sampleid as "sampleid" from (
     select * from (select masters,gpa,stats,programming,admitted from (select id AS
     id, masters AS masters, gpa AS gpa, stats AS stats, programming AS programming,
     admitted AS admitted, gpa + admitted AS temp_column from "admissions_train") as
     temp_table) as temp_table where admitted > 0) as temp_table SAMPLE 0.9'
  • Show query with default option on df4:
    # Show query with default option on df4. This will give same query as give in above case.
    >>> df4.show_query()
    'select masters,gpa,stats,programming,admitted,sampleid as "sampleid" from (select *
     from (select masters,gpa,stats,programming,admitted from (select id AS id, masters
     AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted,
     gpa + admitted AS temp_column from "admissions_train") as temp_table) as temp_table
     where admitted > 0) as temp_table SAMPLE 0.9'
    
  • Execute intermediate dataframe df3:
    >>> df2
      masters   gpa programming  admitted
    0      no  4.00      Novice         1
    1     yes  3.57    Advanced         1
    2      no  3.44      Novice         0
    3     yes  1.98    Advanced         0
    4     yes  4.00    Advanced         1
    5     yes  3.95    Beginner         0
    6     yes  2.33      Novice         1
    7     yes  3.46    Beginner         0
    8      no  3.00      Novice         0
    9     yes  2.65    Beginner         1
  • Show query with default option on df4:
    # Show query with default option on df4. This will give query with respect
    # to view/table created by the latest executed dataframe in the workflow (df2 in this scenario).
    # This is the query teradataml internally uses to retrieve data for dataframe df4, if executed
    # at this point.
    >>> df4.show_query()
    'select masters,gpa,stats,programming,admitted,sampleid as "sampleid" from (select * from
    "ALICE"."ml__select__1585722211621282" where admitted > 0) as temp_table SAMPLE 0.9'
  • Show query with full_query option on df4:
    >>> df4.show_query(full_query = True)
    'select masters,gpa,stats,programming,admitted,sampleid as "sampleid" from (select *
     from (select masters,gpa,stats,programming,admitted from (select id AS id, masters
     AS masters, gpa AS gpa, stats AS stats, programming AS programming, admitted AS admitted,
     gpa + admitted AS temp_column from "admissions_train") as temp_table) as temp_table
     where admitted > 0) as temp_table SAMPLE 0.9'