ScaleFit
Description
td_scale_fit_sqle()
function outputs statistics to input to td_scale_transform_sqle()
function,
which scales specified input tbl_teradata columns.
Usage
td_scale_fit_sqle (
data = NULL,
target.columns = NULL,
scale.method = NULL,
miss.value = "KEEP",
global.scale = FALSE,
multiplier = '1',
intercept = '0',
...
)
Arguments
data |
Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata
|
target.columns |
Required Argument.
Specifies the input tbl_teradata column(s) for which to output statistics.
The columns must contain numeric data in the range (-1e308, 1e308).
Types: character OR vector of Strings (character)
|
scale.method |
Required Argument.
Specifies the scale method to be used for scaling. If one value is
provided, it applies to all target columns. If more than one value is
specified, scale method values applies to target columns values in the
order specified by the user.
td_scale_transform_sqle() function uses the location and scale values in the
following formula to scale target column value X to scaled value X':
X' = intercept + multiplier * ((X - location)/scale)
"intercept" and "multiplier" arguments determine intercept and multiplier.
In the table, Xmin, Xmax, and XMean are the minimum, maximum, and mean
values of the target column.
--------------- | --------------------------- | ----------------- | ---------------------- |
scale_method | Description | location | scale |
--------------- | --------------------------- | ----------------- | ---------------------- |
MAXABS | Maximum absolute value. | 0 | Maximum |X| |
MEAN | Mean. | XMean | 1 |
MIDRANGE | Midrange. | (Xmax+Xmin)/2 | (Xmax-Xmin)/2 |
RANGE | Range. | Xmin | Xmax-Xmin |
RESCALE | Rescale using specified lower bound, upper bound. See syntax after this table | See table after RESCALE syntax | See table after RESCALE syntax. |
STD | Standard deviation. | XMean | √(∑((Xi - Xmean)2)/ N) where N is count of valid values. |
SUM | Sum. | 0 | ΣX |
USTD | Unbiased standard deviation. | XMean | √(∑((Xi - Xmean)2)/ (N - 1)) where N is count of valid values. |
-------------- | --------------------------- | ----------------- | ------------------------------ |
|
RESCALE ( lb=lower_bound | ub=upper_bound | lb=lower_bound, ub=upper_bound )
------------------------- | -------------------------- | ------------------------- |
| location | scale |
------------------------- | -------------------------- | ------------------------- |
Lower bound only | Xmin - lower_bound | 1 |
Upper bound only | Xmax - upper_bound | 1 |
Lower and upper bounds | Xmin - (lower_bound/(upper_bound- lower_bound)) | (Xmax - Xmin)/ (upper_bound- lower_bound) |
-------------------------- | ---------------------------- | ---------------------------- |
|
Permitted Values:
MAXABS
MEAN
MIDRANGE
RANGE
RESCALE
STD
SUM
USTD
Types: character OR vector of Strings (character)
|
miss.value |
Optional Argument.
Specifies how to process NULL values in input.
Permitted Values:
Default Value: "KEEP"
Types: character
|
global.scale |
Optional Argument.
Specifies whether all input columns are scaled to the same location
and scale. When set to FALSE, each input column is scaled separately.
Default Value: FALSE
Types: logical
|
multiplier |
Optional Argument.
Specifies one or more multiplying factors(multiplier) to apply to the input data.
If only one multiplier is specified, it applies to all target columns.
If a list of multipliers is specified, each multiplier applies to the
corresponding target column.
Default Value: "1"
Types: character OR vector of String (character)
|
intercept |
Optional Argument.
Specifies one or more addition factors(intercept) incrementing the scaled results.
If only one intercept specified, it applies to all target columns.
If a list of intercepts is specified, each intercept applies to the
corresponding target column.
The syntax of intercept is:
[-]number | min | mean | max
where min, mean, and max are the global minimum, maximum, mean values
in the corresponding columns. The function scales the values of min,
mean, and max.
For example, if intercept is "- min" and multiplier is
according to this formula, where scaledmin is the scaled value:
X = -scaledmin + 1 * (X - location)/scale.
Default Value: "0"
Types: character OR vector of String (character)
|
... |
Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:
persist:
Optional Argument.
Specifies whether to persist the results of the function in a table or not.
When set to TRUE, results are persisted in a table; otherwise, results
are garbage collected at the end of the session.
Default Value: FALSE
Types: logical
volatile:
Optional Argument.
Specifies whether to put the results of the function in a volatile table or not.
When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical
Function allows the user to partition, hash, order or local order the input
data. These generic arguments are available for each argument that accepts
tbl_teradata as input and can be accessed as:
"<input.data.arg.name>.partition.column" accepts character OR vector of Strings (character) (Strings)
"<input.data.arg.name>.hash.column" accepts character OR vector of Strings (character) (Strings)
"<input.data.arg.name>.order.column" accepts character OR vector of Strings (character) (Strings)
"local.order.<input.data.arg.name>" accepts logical
Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine
function supports, else an exception is raised.
|
Value
Function returns an object of class "td_scale_fit_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):
output
output.data
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load the example data.
loadExampleData("scalemap_example", "scale_housing")
# Create tbl_teradata.
scaling_house <- tbl(con, "scale_housing")
# Check the list of available analytic functions.
display_analytic_functions()
# Example 1: Create statistics to scale "lotsize" with respect to
# mean value of the column.
fit_obj <- td_scale_fit_sqle(data=scaling_house,
target.columns="lotsize",
scale.method="MEAN",
miss.value="KEEP",
global.scale=FALSE,
multiplier="1",
intercept="0")
# Print the result.
print(fit_obj$result)
print(fit_obj$output.data)