Use the following procedure to generate the click-stream sequences and load them into a database file. The Python script in step 1 generates the event types randomly with equal probability.
- Generate the click-stream sequences as a .csv file (table.csv) by running this Python script:
-- Python script begins. -- The input value for length (of each click-stream) is 100000. -- The input value for partition is 5. #!/usr/bin/python import sys, getopt import sys import csv import random def usage(): print '\nUsage:-' print 'generate_table_data.py -l length -p partition' print ' ' sys.exit(2) def main(argv): length = 0 partition = 0 try: opts, args = getopt.getopt(argv,"hl:p:",["length=","partition="]) except getopt.GetoptError: usage() for opt, arg in opts: if (opt == '-h'): usage() elif opt in ("-l", "--length"): length = long(arg) elif opt in ("-p", "--partition"): partition = long(arg) else: usage() random.seed(50) event = ['EMLOP','CLKNI','CLKIN', 'CLKAC', 'CLKCO', 'ACTVD', 'APEAP', 'APEST','APESU','FUNDD','REACT','XXX']; f = open ( 'table.csv', 'wb' ) writer = csv.writer( f ) for indiv_prod_id in range(0,length): for mat_intractn_dt_ts in range(0,partition): index = random.randint(0,11) #print str(mat_intractn_dt_ts) + "," + str(indiv_prod_id) + "," + event[index] writer.writerow([mat_intractn_dt_ts, indiv_prod_id, event[index]] ) if __name__ == "__main__": main(sys.argv[1:]) ---python script ends
- Create the table atrbtn_table_old_direct_noprsnt in the database:
DROP TABLE IF EXISTS atrbtn_table_old_direct_noprsnt; CREATE TABLE atrbtn_table_old_direct_noprsnt ( indiv_prod_id INTEGER, mat_intractn_dt_ts INTEGER, mat_intractn_typ_cd VARCHAR ) DISTRIBUTE BY HASH(indiv_prod_id);
- Load the click-stream sequences into the database table using nlcuster_loader (with the username and password for your cluster IP):
ncluster_loader -h Queen_IP_address -U username -w password atrbtn_table_old_direct_noprsnt table.csv -D ','