This example uses KMeans++ sampling with the Manhattan distance metric, and treats the numeric variables cyl, gear, and carb as categorical variables (and the categorical variables vs and am). The category weights are assigned in the order that the columns appear in the input table: 1000 to cyl, 10 to vs, 100 to am, 100 to gear, and 100 to carb.
Input
- InputTable: fs_input, as in RandomSample Example 1: Basic Sampling (Weighted)
SQL Call
SELECT * FROM RandomSample ( ON fs_input AS InputTable USING NumSample (10) SamplingMode ('kmeans++') InputColumns ('mpg:carb') CategoryWeights (1000, 10, 100, 100, 100) AsCategories ('cyl', 'gear', 'carb') Distance ('manhattan') Seed (1) SeedColumn ('model') ) AS dt ORDER BY 1, 2, 3;
Output
set_id | sn | model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | Mazda RX4 Wag | 21 | 6 | 160 | 110 | 3.9 | 2.875 | 17.02 | S | manual | 4 | 4 |
0 | 4 | Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | V | automatic | 3 | 1 |
0 | 13 | Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.73 | 17.6 | S | automatic | 3 | 3 |
0 | 18 | Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.2 | 19.47 | V | manual | 4 | 1 |
0 | 21 | Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.7 | 2.465 | 20.01 | V | automatic | 3 | 1 |
0 | 24 | Camaro Z28 | 13.3 | 8 | 350 | 245 | 3.73 | 3.84 | 15.41 | S | automatic | 3 | 4 |
0 | 25 | Pontiac Firebird | 19.2 | 8 | 400 | 175 | 3.08 | 3.845 | 17.05 | S | automatic | 3 | 2 |
0 | 27 | Porsche 914-2 | 26 | 4 | 120.3 | 91 | 4.43 | 2.14 | 16.7 | S | manual | 5 | 2 |
0 | 30 | Ferrari Dino | 19.7 | 6 | 301 | 335 | 3.54 | 3.57 | 14.6 | S | manual | 5 | 6 |
0 | 31 | Maserati Bora | 15 | 8 | 301 | 335 | 3.54 | 3.57 | 14.6 | S | manual | 5 | 8 |