Use case: You want to simulate a process or phenomenon.
Simulation is a common analytical task to study, understand and model the mechanisms that govern a process. A simulation analysis can be scaled and performed very efficiently, if data can be partitioned into groups and simulation iterations executed in parallel.
This use case is a simple example of simulating waiting lines. It investigates the number of physical store customers that might renege and leave while waiting in a line. Specifically, over a period of eight hours, consider bank customers standing in line, and consider specific characteristics of the customers as parameters that determine the customer behavior while in line.
A Python script performs the analysis with this input, in addition to reading rows of random seeds to run a series of simulation iterations. The simulation iterations produce rows with different numbers of customers that waited in line within the 8-hour window, and the corresponding number of customers that reneged and left the line in the same time period.
- The input data with rows of an observation ID and a corresponding random seed are in a CSV file "simData.csv" on your client.
- The simulation algorithm is in a Python script "simulation.py", also stored on the client. The Python script needs five input arguments in the command line to specify the following parameters:
- INTERVAL_CUSTOMERS : Time interval (in minutes) between customer entries.
- MIN_PATIENCE : Minimum time (in minutes) customers will wait.
- MAX_PATIENCE : Maximum time (in minutes) customers will wait.
- TIME_IN_BANK : Average time (in minutes) a customer spends being served.
- MAX_MINUTES : Process observation time (in minutes).
- Connect from the client to a target VantageCloud Lake system where the simulation task will be performed, as illustrated in the Introduction.
- Import necessary modules for the use case.
from teradatasqlalchemy.types import BIGINT, VARCHAR from collections import OrderedDict import pandas as pd
- Specify the path where the script is kept on the client.
path_to_files = '/Users/JaneDoe/OpeanAFexamples/scripts/'
- Create a teradataml DataFrame of the input data table.
simData = DataFrame.from_table("simData")
- Load the CSV file to create a table in the Primary Cluster Analytics Database. The call to the teradataml read_csv function returns a teradataml DataFrame with a data sample from the new table.
types = OrderedDict(ObsID=BIGINT, RandSeed=BIGINT)
simData = read_csv(os.path.join(path_to_files, 'files', 'simData.csv'), table_name='simData', types=types)