Generate and Load Click-Stream Sequences - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

Use the following procedure to generate the click-stream sequences and load them into a database file. The Python script in step 1 generates the event types randomly with equal probability.

  1. Generate the click-stream sequences as a .csv file (table.csv) by running this Python script:
    -- Python script begins.
    -- The input value for length (of each click-stream) is 100000.
    -- The input value for partition is 5.
    
    #!/usr/bin/python
    import sys, getopt
    import sys
    import csv
    import random
    
    def usage():
        print '\nUsage:-'
        print 'generate_table_data.py -l length -p partition'
        print ' '
        sys.exit(2)
    
    def main(argv):
        length = 0
        partition = 0
        try:
            opts, args = 
    getopt.getopt(argv,"hl:p:",["length=","partition="])
        except getopt.GetoptError:
            usage()
        for opt, arg in opts:
            if (opt == '-h'):
                usage()
            elif opt in ("-l", "--length"):
                length = long(arg)
            elif opt in ("-p", "--partition"):
                partition = long(arg)
            else:
                usage()
    
        random.seed(50)
        event = ['EMLOP','CLKNI','CLKIN', 'CLKAC', 'CLKCO', 'ACTVD',
                 'APEAP', 'APEST','APESU','FUNDD','REACT','XXX'];
    
        f = open ( 'table.csv', 'wb' )
        writer = csv.writer( f )
    
        for indiv_prod_id in range(0,length):
            for mat_intractn_dt_ts in range(0,partition):
                index = random.randint(0,11)
                        #print str(mat_intractn_dt_ts) + "," +
                        str(indiv_prod_id) + "," + event[index]
                writer.writerow([mat_intractn_dt_ts, indiv_prod_id,
                                event[index]] )
    
    if __name__ == "__main__":
       main(sys.argv[1:])
    
    ---python script ends
  2. Create the table atrbtn_table_old_direct_noprsnt in the database:
    DROP TABLE IF EXISTS atrbtn_table_old_direct_noprsnt;
    
    CREATE TABLE atrbtn_table_old_direct_noprsnt (
      indiv_prod_id INTEGER,
      mat_intractn_dt_ts INTEGER,
      mat_intractn_typ_cd VARCHAR
    ) DISTRIBUTE BY HASH(indiv_prod_id);
  3. Load the click-stream sequences into the database table using nlcuster_loader (with the username and password for your cluster IP):
    ncluster_loader -h Queen_IP_address -U username
      -w password atrbtn_table_old_direct_noprsnt table.csv -D ','