Attribut | Description |
---|---|
CallDuration | Total time spent on telephne calls (in minutes) |
SMS | Number of Short Message Service (SMS) messages sent and received |
DataCounter | Data consumed (in megabytes) |
UserID | CallDuration | SMS | DataCounter |
---|---|---|---|
1 | 25000 | 24 | 4 |
2 | 40000 | 27 | 5 |
3 | 55000 | 32 | 7 |
4 | 27000 | 25 | 5 |
5 | 53000 | 30 | 5 |
The CallDuration values are so much higher than other attribute values that they skew the distribution. Normalizing the raw data to the range [0, 1] solves this problem.
In the following table, the raw input data in the preceding table has been normalized to the range [0, 1] using the Min-Max normalization technique.
UserID | CallDuration | SMS | DataCounter |
---|---|---|---|
1 | 0.0000333 | 0.1 | 0.2 |
2 | 0.5 | 0.4 | 0.4 |
3 | 1 | 0.9 | 0.8 |
4 | 0.01 | 0.2 | 0.4 |
5 | 0.93 | 0.7 | 0.4 |
This technique transforms the value 'a' (in column A) to the value 'b' in the range [C, D], using this formula:
b=((a-minimum_value_in_A)/(maximum_value_in_A-minimum_value_in_A))*(D-C)+C
The following table shows the minimum and maximum values that the formula uses for each input table column.
Column Name | Minimum Value | Maximum Value |
---|---|---|
CallDuration | 24999 | 55001 |
SMS | 23 | 33 |
DataCounter | 3 | 8 |
From the normalized input data, select one or more users as the reference vector; the remaining users are the target vectors. The choice of reference vector depends on the application. For example, if the mobile telephone service is expanding its range to include a new area with similar users, then one or more typical users (with average or median attribute values) can be the reference vector. When the company has identified similar users in the new area, it can send them promotional offers.
For these examples, the reference vector is UserID 5. The following two tables are the reference and target tables for the VectorDistance function.
UserID | Feature | Value |
---|---|---|
5 | CallDuration | 0.93 |
5 | SMS | 0.7 |
5 | DataCounter | 0.4 |
UserID | Feature | Value |
---|---|---|
1 | CallDuration | 0.0000333 |
1 | SMS | 0.1 |
1 | DataCounter | 0.2 |
2 | CallDuration | 0.5 |
2 | SMS | 0.4 |
2 | DataCounter | 0.4 |
3 | CallDuration | 1 |
3 | SMS | 0.9 |
3 | DataCounter | 0.8 |
4 | CallDuration | 0.01 |
4 | SMS | 0.2 |
4 | DataCounter | 0.4 |