- ON clause
- Specifies the table name as an EncodingsTable.
- CategoricalInputColumns
- Specifies the input table columns names that need to be used for oversampling only with 'smotenc' sampling strategy.
- MedianStandardDeviation
- Specifies the median of standard deviation for the numerical input columns in the minority class used only with 'smotenc' sampling strategy. The SMOTENC algorithm uses this value to encode nominal to numerical values and you can obtain this value with a query such as the following:
- OversamplingFactor
- Specifies the factor for oversampling the minority class.
- SamplingStrategy
- Specifies the oversampling algorithm to use for creating synthetic samples.
- FillSampleID
- Specifies whether the function writes out the id of the observation used to generate the corresponding new synthetic observations. If FillSamplID is false, the column indicated in IDColumn will be empty (NULL values).
- ValueForNonInputColumns
- Specifies the value to put in a sample column for columns not specified as input columns.
- NumberOfNeighbors
- Specifies the nearest neighbors number for choosing the sample to be used in oversampling. The NumberOfNeighbors must be a positive integer value <= 100.
- Seed
- Specifies the seed to use for sampling, random selection of nearest neighbor and sampling a point in the feature space between a data point and its selected nearest neighbor using convex combination. The seed must be a non-negative integer value. Assures deterministic results.