VectorDistance Examples Input - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product
Teradata Vantage
Release Number
8.00
1.0
Published
May 2019
Language
English (United States)
Last Update
2019-11-22
dita:mapPath
blj1506016597986.ditamap
dita:ditavalPath
blj1506016597986.ditaval
dita:id
B700-4003
lifecycle
previous
Product Category
Teradata Vantageā„¢

Raw Input

The raw input is mobile telephone user data where each user (who is identified with UserID) has these attributes (for a specific time period):
Attribute Description
CallDuration Total time spent on telephone calls (in minutes)
SMS Number of Short Message Service (SMS) messages sent and received
DataCounter Data consumed (in megabytes)
UserID CallDuration SMS DataCounter
1 25000 24 4
2 40000 27 5
3 55000 32 7
4 27000 25 5
5 53000 30 5

The CallDuration values are so much higher than other attribute values that they skew the distribution. Normalizing the raw data to the range [0, 1] solves this problem.

Normalized Input

In the following table, the raw input data has been normalized to the range [0, 1] using the Min-Max normalization technique.

UserID CallDuration SMS DataCounter
1 0.0000333 0.1 0.2
2 0.5 0.4 0.4
3 1 0.9 0.8
4 0.01 0.2 0.4
5 0.93 0.7 0.4

This technique transforms the value 'a' (in column A) to the value 'b' in the range [C, D], using this formula:

b=((a-minimum_value_in_A)/(maximum_value_in_A-minimum_value_in_A))*(D-C)+C

The following table shows the minimum and maximum values that the formula uses for each input table column.

Column Minimum Value Maximum Value
CallDuration 24999 55001
SMS 23 33
DataCounter 3 8

From the normalized input data, select one or more users as the reference vector; the remaining users are the target vectors. The choice of reference vector depends on the application. For example, if the mobile telephone service is expanding its range to include a new area with similar users, then one or more typical users (with average or median attribute values) can be the reference vector. When the company has identified similar users in the new area, it can send them promotional offers.

Reference and Target Tables for Examples

For these examples, the reference vector is UserID 5.

ref_mobile_data
UserID Feature Value
5 CallDuration 0.93
5 SMS 0.7
5 DataCounter 0.4
target_mobile_data
UserID Feature value1
1 CallDuration 0.0000333
1 SMS 0.1
1 DataCounter 0.2
2 CallDuration 0.5
2 SMS 0.4
2 DataCounter 0.4
3 CallDuration 1
3 SMS 0.9
3 DataCounter 0.8
4 CallDuration 0.01
4 SMS 0.2
4 DataCounter 0.4