Description
The VectorDistance function takes a tbl_teradata object of target vectors
and a tbl_teradata object of reference vectors and returns the distance between
each target-reference pair present in the two objects.
Usage
td_vector_distance_mle (
target.data = NULL,
ref.data = NULL,
target.id = NULL,
target.feature = NULL,
target.value = NULL,
ref.id = NULL,
ref.feature = NULL,
ref.value = NULL,
reftable.size = "small",
distance.measure = "cosine",
ignore.mismatch = TRUE,
replace.invalid = "positiveinfinity",
top.k = 2147483647,
max.distance = NULL,
target.data.sequence.column = NULL,
ref.data.sequence.column = NULL,
target.data.partition.column = NULL,
target.data.order.column = NULL,
ref.data.order.column = NULL
)
Arguments
target.data |
Required Argument.
Specifies a tbl_teradata object that contains the target vectors.
|
target.data.partition.column |
Required Argument.
Specifies Partition By columns for "target.data".
Values to this argument can be provided as a vector, if multiple
columns are used for partition.
Types: character OR vector of Strings (character)
|
target.data.order.column |
Optional Argument.
Specifies Order By columns for "target.data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
ref.data |
Required Argument.
Specifies a tbl_teradata object that contains the reference vectors.
|
ref.data.order.column |
Optional Argument.
Specifies Order By columns for "ref.data".
Values to this argument can be provided as a vector, if multiple
columns are used for ordering.
Types: character OR vector of Strings (character)
|
target.id |
Required Argument.
Specifies the names of the columns that comprise the target vector
identifier. You must partition the "target.data" tbl_teradata by these
columns and specify them with this argument.
Types: character OR vector of Strings (character)
|
target.feature |
Required Argument.
Specifies the name of the column that contains the target vector
feature name (for example, the axis of a 3-D vector).
Note: Entries with NULL values in the "target.feature" column are dropped.
Types: character
|
target.value |
Optional Argument.
Specifies the name of the column that contains the value for the
column in the "target.feature" argument.
Note: Entries with NULL values in the "target.value" column are dropped.
Default Value: The first column in the "target.data" tbl_teradata object.
Types: character
|
ref.id |
Optional Argument.
Specifies the names of the columns that comprise the reference vector
identifier.
Default Value: The value of the "target.id" argument.
Types: character OR vector of Strings (character)
|
ref.feature |
Optional Argument.
Specifies the name of the column that contains the reference vector
feature name.
Default Value: The value of the "target.feature" argument.
Types: character
|
ref.value |
Optional Argument.
Specifies the name of the column that contains the value for the
reference vector feature.
Note: Entries with NULL values are dropped.
Default Value: The value of the "target.value" argument.
Types: character
|
reftable.size |
Optional Argument.
Specifies the size of the "ref.data" tbl_teradata object.
Specify "LARGE" only if the "ref.data" tbl_teradata object does
not fit in memory. "SMALL" allows faster processing.
Default Value: "small"
Permitted Values: small, large
Types: character
|
distance.measure |
Optional Argument.
Specifies one or more distance measures that the function must use.
Default Value: "cosine"
Permitted Values: COSINE, EUCLIDEAN, MANHATTAN, BINARY
Types: character OR vector of characters
|
ignore.mismatch |
Optional Argument.
Specifies whether to drop mismatched dimensions. If "distance.measure"
is "cosine", then this argument is FALSE. If you specify TRUE,
then two vectors with no common features become two empty vectors
when only their common features are considered, and the function
cannot measure the distance between them.
Default Value: TRUE
Types: logical
|
replace.invalid |
Optional Argument.
Specifies the value to return when the function encounters an
infinite value or empty vectors. To customize, you can specify any
numeric value in quotes.
Default Value: "positiveinfinity"
Types: character
|
top.k |
Optional Argument.
Specifies, for each target vector and for each measure, the maximum
number of closest reference vectors to include in the output tbl_teradata.
Default Value: 2147483647
Types: integer
|
max.distance |
Optional Argument.
Specifies the maximum distance between a pair of target and reference
vectors. If the distance exceeds the "max.distance" threshold value,
the pair does not appear in the output tbl_teradata object. If the
"distance.measure" argument specifies multiple measures, then the
"max.distance" argument must specify a threshold value for each measure.
The ith threshold corresponds to the ith measure. If you
omit this argument, then the function returns all results.
Types: numeric OR vector of numerics
|
target.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "target.data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
ref.data.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "ref.data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_vector_distance_mle" which is
a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using the name: result.
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("vectordistance_example", "target_mobile_data", "ref_mobile_data")
# Create object(s) of class "tbl_teradata".
target_mobile_data <- tbl(con, "target_mobile_data")
ref_mobile_data <- tbl(con, "ref_mobile_data")
# Example 1 - Using the default ("cosine") distance measure with no threshold.
td_vector_distance_out <- td_vector_distance_mle(target.data = target_mobile_data,
target.data.partition.column = c("userid"),
ref.data = ref_mobile_data,
target.id = c("userid"),
target.feature = "feature",
target.value = "value1"
)
# Example 2 - Using three distance measures with corresponding thresholds "max.distance".
td_vector_distance_out1 <- td_vector_distance_mle(target.data = target_mobile_data,
target.data.partition.column = c("userid"),
ref.data = ref_mobile_data,
target.id = c("userid"),
target.feature = "feature",
target.value = "value1",
distance.measure = c("Cosine","Euclidean",
"Manhattan"),
max.distance = c(0.03,0.8,1.0)
)