KNN
Description
The td_knn_sqle()
function classifies data objects based on proximity to other
data objects with known categories.
Usage
td_knn_sqle (
test.data = NULL,
train.data = NULL,
id.column = NULL,
input.columns = NULL,
model.type = "classification",
k = 5,
accumulate = NULL,
response.column = NULL,
voting.weight = 0,
tolerance = 1.0,
output.prob = FALSE,
output.responses = NULL,
emit.neighbors = NULL,
emit.distances = FALSE,
...
)
Arguments
test.data |
Required Argument. |
train.data |
Required Argument. |
id.column |
Required Argument. |
input.columns |
Required Argument. |
model.type |
Optional Argument. |
k |
Optional Argument. |
accumulate |
Optional Argument. |
response.column |
Optional Argument. Required when model type is regression or classification. |
voting.weight |
Optional Argument. |
tolerance |
Optional Argument. |
output.prob |
Optional Argument. |
output.responses |
Optional Argument. |
emit.neighbors |
Optional Argument. |
emit.distances |
Optional Argument. |
... |
Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments: volatile: Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:
Note: |
Value
Function returns an object of class "td_knn_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):result
Examples
# Get the current context/connection.
con <- td_get_context()$connection
# Load the example data.
loadExampleData("knn_example", "computers_train1_clustered", "computers_test1")
# Create tbl_teradata object.
computers_test1 <- tbl(con, "computers_test1")
computers_train1_clustered <- tbl(con, "computers_train1_clustered")
# Check the list of available analytic functions.
display_analytic_functions()
# Generate fit object for column "computer_category".
fit_obj <- td_one_hot_encoding_fit_sqle(
data=computers_train1_clustered,
is.input.dense=TRUE,
target.column="computer_category",
categorical.values=c("ultra", "special"),
other.column="other")
# Encode "ultra" and "special" values of column "computer_category".
computers_train1_encoded <- td_one_hot_encoding_transform_sqle(
data=computers_train1_clustered,
object=fit_obj$result,
is.input.dense=TRUE)
# Example 1: Map the test computer data to "special" category.
KNN_out <- td_knn_sqle(
train.data = computers_train1_encoded$result,
test.data = computers_test1,
k = 50,
response.column = "computer_category_special",
id.column="id",
output.prob=FALSE,
input.columns = c("price", "speed", "hd", "ram", "screen"),
voting.weight = 1.0,
emit.distances=FALSE)
# Print the result.
print(KNN_out$result)
# Example 2: Get the distance of 10 nearest neighbours based on "price", "speed" and "hd".
KNN_out <- td_knn_sqle(train.data = computers_train1_encoded$result,
test.data = computers_test1,
k=10,
model.type="neighbors",
id.column="id",
input.columns = c("price", "speed", "hd"),
emit.distances=TRUE,
emit.neighbors=TRUE)
# Print the result.
print(KNN_out$result)