Accessing Foreign Table Created On Parquet Data | NOS | teradataml - Accessing Foreign Table Created On Parquet Data - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.10
Published
May 2022
Language
English (United States)
Last Update
2022-08-18
dita:mapPath
rsu1641592952675.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

The following examples show how to create a teradataml DataFrame on this table and process this DataFrame with other teradataml API's.

Assume that the following foreign table has been created on Parquet data in Amazon S3 bucket:
CREATE MULTISET FOREIGN TABLE t1l,FALLBACK,
EXTERNAL SECURITY INVOKER TRUSTED AUTH_OBJECT(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
a INTEGER,
b INTEGER,
c INTEGER,
d INTEGER,
e VARCHAR(10) CHARACTER SET UNICODE CASESPECIFIC,
f DATE FORMAT 'YYYY-MM-DD',
g  Decimal(8, 3),
h FLOAT,
i BYTEINT,
j SMALLINT)
USING(
        LOCATION  ('/s3/ceph2-s3.teradata.com/mk250104/csoj_files/t1l.parquet')
        PATHPATTERN  ('$Var1/$Var2/$var3/$Var4')
        STOREDAS  ('PARQUET')
        MANIFEST  ('FALSE')
        )
    NO PRIMARY INDEX
    PARTITION BY COLUMN;

Create a dataframe on the foreign table

# Create dataframe on foreign table which contains data in PARQUET data
>>> t1l = DataFrame.from_table("t1l")

Check Properties of the DataFrame

  • Display the columns in the DataFrame.
    # Let's take a peek at the columns in a DataFrame.
    >>> t1l.columns
    ['Location', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
  • Check the types of the columns.
    # Let's check their types
    >>> t1l.dtypes
    Location                str
    a                       int
    b                       int
    c                       int
    d                       int
    e                       str
    f             datetime.date
    g           decimal.Decimal
    h                     float
    i                       int
    j                       int
  • Keys
    # Keys
    >>> t1l.keys
    <bound method DataFrame.keys of                                             Location   a    b    c  d         e           f        g     h  i   j
    0  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   7   70  107  1  CScs      1963-02-03  100.333  19.0  4  10
    1  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   3   30  103  1  CScs      1961-06-05  300.333  19.0  4  10
    2  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...  10  100  110  1  CScs      1963-02-03  100.333  19.0  4  10
    3  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   4   40  104  1  CScs      1961-06-05  100.333  19.0  4  10
    4  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   8   80  108  1  CScs      1963-02-03  200.333  19.0  4  10
    5  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   2   20  102  1  CScs      1961-06-05  200.333  19.0  4  10
    6  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   1   10  101  1  CScs      1961-06-05  100.333  19.0  4  10
    7  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   6   60  106  1  CScs      1963-02-03  300.333  19.0  4  10
    8  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   9   90  109  1  CScs      1963-02-03  300.333  19.0  4  10
    9  /S3/ceph2-s3.teradata.com/mk250104/csoj_files/...   5   50  105  1  CScs      1961-06-05  200.333  19.0  4  10>
  • Shape
    # shape
    >>> t1l.shape
    (10, 11)

Explore Data

  • Select some columns from the DataFrame.
    >>> t1l.select(["a", "b", "e", "f", "i", "j"]).to_pandas()
    	a	b	e	f	i	j
    0	7	70	CScs	1963-02-03	4	10
    1	3	30	CScs	1961-06-05	4	10
    2	10	100	CScs	1963-02-03	4	10
    3	4	40	CScs	1961-06-05	4	10
    4	8	80	CScs	1963-02-03	4	10
    5	2	20	CScs	1961-06-05	4	10
    6	1	10	CScs	1961-06-05	4	10
    7	6	60	CScs	1963-02-03	4	10
    8	9	90	CScs	1963-02-03	4	10
    9	5	50	CScs	1961-06-05	4	10
  • Filter data.
    >>> t1l[t1l.b == 70].to_pandas()
    	Location	a	b	c	d	e	f	g	h	i	j
    0	/S3/ceph2-s3.teradata.com/mk250104/csoj_files/...	7	70	107	1	CScs	1963-02-03	100.333	19.0	4	10
  • Sample data.
    >>> t1l.sample(5).to_pandas()
    Location	a	b	c	d	e	f	g	h	i	j	sampleid
    0	/S3/ceph2-s3.teradata.com/mk250104/csoj_files/...	6	60	106	1	CScs	1963-02-03	300.333	19.0	4	10	1
    1	/S3/ceph2-s3.teradata.com/mk250104/csoj_files/...	2	20	102	1	CScs	1961-06-05	200.333	19.0	4	10	1
    2	/S3/ceph2-s3.teradata.com/mk250104/csoj_files/...	4	40	104	1	CScs	1961-06-05	100.333	19.0	4	10	1
    3	/S3/ceph2-s3.teradata.com/mk250104/csoj_files/...	9	90	109	1	CScs	1963-02-03	300.333	19.0	4	10	1
    4	/S3/ceph2-s3.teradata.com/mk250104/csoj_files/...	5	50	105	1	CScs	1961-06-05	200.333	19.0	4	10	1