Simulation | Sample Use Cases | Open Analytics Framework on VantageCloud Lake - Simulation

Simulation | Sample Use Cases | Open Analytics Framework on VantageCloud Lake - Simulation - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

Use case: You want to simulate a process or phenomenon.

Simulation is a common analytical task to study, understand and model the mechanisms that govern a process. A simulation analysis can be scaled and performed very efficiently, if data can be partitioned into groups and simulation iterations executed in parallel.

This use case is a simple example of simulating waiting lines. It investigates the number of physical store customers that might renege and leave while waiting in a line. Specifically, over a period of eight hours, consider bank customers standing in line, and consider specific characteristics of the customers as parameters that determine the customer behavior while in line.

A Python script performs the analysis with this input, in addition to reading rows of random seeds to run a series of simulation iterations. The simulation iterations produce rows with different numbers of customers that waited in line within the 8-hour window, and the corresponding number of customers that reneged and left the line in the same time period.

For this use case, assume:

The input data with rows of an observation ID and a corresponding random seed are in a CSV file "simData.csv" on your client.
The simulation algorithm is in a Python script "simulation.py", also stored on the client.
The Python script needs five input arguments in the command line to specify the following parameters:
- INTERVAL_CUSTOMERS : Time interval (in minutes) between customer entries.
- MIN_PATIENCE : Minimum time (in minutes) customers will wait.
- MAX_PATIENCE : Maximum time (in minutes) customers will wait.
- TIME_IN_BANK : Average time (in minutes) a customer spends being served.
- MAX_MINUTES : Process observation time (in minutes).

Prerequisite steps:

Connect from the client to a target VantageCloud Lake system where the simulation task will be performed, as illustrated in the Introduction.

Import necessary modules for the use case.

from teradatasqlalchemy.types import BIGINT, VARCHAR
from collections import OrderedDict
import pandas as pd

Specify the path where the script is kept on the client.

path_to_files = '/Users/JaneDoe/OpeanAFexamples/scripts/'

Create a teradataml DataFrame of the input data table.
```
simData = DataFrame.from_table("simData")
```
Load the CSV file to create a table in the Primary Cluster Analytics Database. The call to the teradataml read_csv function returns a teradataml DataFrame with a data sample from the new table.
```
types = OrderedDict(ObsID=BIGINT, RandSeed=BIGINT)
```
```
simData = read_csv(os.path.join(path_to_files, 'files', 'simData.csv'), table_name='simData', types=types)
```