The following examples show how to create a teradataml DataFrame on this table and process this DataFrame with other teradataml API's.
Assume that the following foreign table has been created on Parquet data in Amazon S3 bucket:
CREATE MULTISET FOREIGN TABLE t1l,FALLBACK, EXTERNAL SECURITY INVOKER TRUSTED AUTH_OBJECT( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, a INTEGER, b INTEGER, c INTEGER, d INTEGER, e VARCHAR(10) CHARACTER SET UNICODE CASESPECIFIC, f DATE FORMAT 'YYYY-MM-DD', g Decimal(8, 3), h FLOAT, i BYTEINT, j SMALLINT) USING( LOCATION ('/s3/ceph2-s3.teradata.com/mk250104/csoj_files/t1l.parquet') PATHPATTERN ('$Var1/$Var2/$var3/$Var4') STOREDAS ('PARQUET') MANIFEST ('FALSE') ) NO PRIMARY INDEX PARTITION BY COLUMN;
Create a dataframe on the foreign table
# Create dataframe on foreign table which contains data in PARQUET data >>> t1l = DataFrame.from_table("t1l")
Check Properties of the DataFrame
- Display the columns in the DataFrame.
# Let's take a peek at the columns in a DataFrame. >>> t1l.columns
['Location', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
- Check the types of the columns.
# Let's check their types >>> t1l.dtypes
Location str a int b int c int d int e str f datetime.date g decimal.Decimal h float i int j int
- Keys
# Keys >>> t1l.keys
<bound method DataFrame.keys of Location a b c d e f g h i j 0 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 7 70 107 1 CScs 1963-02-03 100.333 19.0 4 10 1 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 3 30 103 1 CScs 1961-06-05 300.333 19.0 4 10 2 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 10 100 110 1 CScs 1963-02-03 100.333 19.0 4 10 3 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 4 40 104 1 CScs 1961-06-05 100.333 19.0 4 10 4 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 8 80 108 1 CScs 1963-02-03 200.333 19.0 4 10 5 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 2 20 102 1 CScs 1961-06-05 200.333 19.0 4 10 6 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 1 10 101 1 CScs 1961-06-05 100.333 19.0 4 10 7 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 6 60 106 1 CScs 1963-02-03 300.333 19.0 4 10 8 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 9 90 109 1 CScs 1963-02-03 300.333 19.0 4 10 9 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 5 50 105 1 CScs 1961-06-05 200.333 19.0 4 10>
- Shape
# shape >>> t1l.shape
(10, 11)
Explore Data
- Select some columns from the DataFrame.
>>> t1l.select(["a", "b", "e", "f", "i", "j"]).to_pandas()
a b e f i j 0 7 70 CScs 1963-02-03 4 10 1 3 30 CScs 1961-06-05 4 10 2 10 100 CScs 1963-02-03 4 10 3 4 40 CScs 1961-06-05 4 10 4 8 80 CScs 1963-02-03 4 10 5 2 20 CScs 1961-06-05 4 10 6 1 10 CScs 1961-06-05 4 10 7 6 60 CScs 1963-02-03 4 10 8 9 90 CScs 1963-02-03 4 10 9 5 50 CScs 1961-06-05 4 10
- Filter data.
>>> t1l[t1l.b == 70].to_pandas()
Location a b c d e f g h i j 0 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 7 70 107 1 CScs 1963-02-03 100.333 19.0 4 10
- Sample data.
>>> t1l.sample(5).to_pandas()
Location a b c d e f g h i j sampleid 0 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 6 60 106 1 CScs 1963-02-03 300.333 19.0 4 10 1 1 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 2 20 102 1 CScs 1961-06-05 200.333 19.0 4 10 1 2 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 4 40 104 1 CScs 1961-06-05 100.333 19.0 4 10 1 3 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 9 90 109 1 CScs 1963-02-03 300.333 19.0 4 10 1 4 /S3/ceph2-s3.teradata.com/mk250104/csoj_files/... 5 50 105 1 CScs 1961-06-05 200.333 19.0 4 10 1