Simulation | Sample Use Cases | Open Analytics Framework on VantageCloud Lake - Simulation - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Use case: You want to simulate a process or phenomenon.

Simulation is a common analytical task to study, understand and model the mechanisms that govern a process. A simulation analysis can be scaled and performed very efficiently, if data can be partitioned into groups and simulation iterations executed in parallel.

This use case is a simple example of simulating waiting lines. It investigates the number of physical store customers that might renege and leave while waiting in a line. Specifically, over a period of eight hours, consider bank customers standing in line, and consider specific characteristics of the customers as parameters that determine the customer behavior while in line.

A Python script performs the analysis with this input, in addition to reading rows of random seeds to run a series of simulation iterations. The simulation iterations produce rows with different numbers of customers that waited in line within the 8-hour window, and the corresponding number of customers that reneged and left the line in the same time period.

For this use case, assume:
  • The input data with rows of an observation ID and a corresponding random seed are in a CSV file "simData.csv" on your client.
  • The simulation algorithm is in a Python script "simulation.py", also stored on the client.
    The Python script needs five input arguments in the command line to specify the following parameters:
    • INTERVAL_CUSTOMERS : Time interval (in minutes) between customer entries.
    • MIN_PATIENCE : Minimum time (in minutes) customers will wait.
    • MAX_PATIENCE : Maximum time (in minutes) customers will wait.
    • TIME_IN_BANK : Average time (in minutes) a customer spends being served.
    • MAX_MINUTES : Process observation time (in minutes).
Prerequisite steps:
  • Connect from the client to a target VantageCloud Lake system where the simulation task will be performed, as illustrated in the Introduction.
  • Import necessary modules for the use case.
    from teradatasqlalchemy.types import BIGINT, VARCHAR
    from collections import OrderedDict
    import pandas as pd
  • Specify the path where the script is kept on the client.
    path_to_files = '/Users/JaneDoe/OpeanAFexamples/scripts/'
  • Create a teradataml DataFrame of the input data table.
    simData = DataFrame.from_table("simData")
  • Load the CSV file to create a table in the Primary Cluster Analytics Database. The call to the teradataml read_csv function returns a teradataml DataFrame with a data sample from the new table.
    types = OrderedDict(ObsID=BIGINT, RandSeed=BIGINT)
    simData = read_csv(os.path.join(path_to_files, 'files', 'simData.csv'), table_name='simData', types=types)