Input - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
The raw input is mobile telephone user data where each user (who is identified with UserID) has these attributes (for a specific time period):
Attribut Description
CallDuration Total time spent on telephne calls (in minutes)
SMS Number of Short Message Service (SMS) messages sent and received
DataCounter Data consumed (in megabytes)
VectorDistance Examples Raw Input Data
UserID CallDuration SMS DataCounter
1 25000 24 4
2 40000 27 5
3 55000 32 7
4 27000 25 5
5 53000 30 5

The CallDuration values are so much higher than other attribute values that they skew the distribution. Normalizing the raw data to the range [0, 1] solves this problem.

In the following table, the raw input data in the preceding table has been normalized to the range [0, 1] using the Min-Max normalization technique.

VectorDistance Examples Normalized Input Data
UserID CallDuration SMS DataCounter
1 0.0000333 0.1 0.2
2 0.5 0.4 0.4
3 1 0.9 0.8
4 0.01 0.2 0.4
5 0.93 0.7 0.4

This technique transforms the value 'a' (in column A) to the value 'b' in the range [C, D], using this formula:

b=((a-minimum_value_in_A)/(maximum_value_in_A-minimum_value_in_A))*(D-C)+C

The following table shows the minimum and maximum values that the formula uses for each input table column.

VectorDistance Examples Minimum and Maximum Values
Column Name Minimum Value Maximum Value
CallDuration 24999 55001
SMS 23 33
DataCounter 3 8

From the normalized input data, select one or more users as the reference vector; the remaining users are the target vectors. The choice of reference vector depends on the application. For example, if the mobile telephone service is expanding its range to include a new area with similar users, then one or more typical users (with average or median attribute values) can be the reference vector. When the company has identified similar users in the new area, it can send them promotional offers.

For these examples, the reference vector is UserID 5. The following two tables are the reference and target tables for the VectorDistance function.

VectorDistance Examples Reference Table ref_mobile_data
UserID Feature Value
5 CallDuration 0.93
5 SMS 0.7
5 DataCounter 0.4
VectorDistance Examples Target Table target_mobile_data
UserID Feature Value
1 CallDuration 0.0000333
1 SMS 0.1
1 DataCounter 0.2
2 CallDuration 0.5
2 SMS 0.4
2 DataCounter 0.4
3 CallDuration 1
3 SMS 0.9
3 DataCounter 0.8
4 CallDuration 0.01
4 SMS 0.2
4 DataCounter 0.4