Teradata R Package Function Reference | 17.00 - 17.00 - VectorDistance - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

The VectorDistance function takes a tbl_teradata object of target vectors and a tbl_teradata object of reference vectors and returns the distance between each target-reference pair present in the two objects.

Usage

  td_vector_distance_mle (
      target.data = NULL,
      ref.data = NULL,
      target.id = NULL,
      target.feature = NULL,
      target.value = NULL,
      ref.id = NULL,
      ref.feature = NULL,
      ref.value = NULL,
      reftable.size = "small",
      distance.measure = "cosine",
      ignore.mismatch = TRUE,
      replace.invalid = "positiveinfinity",
      top.k = 2147483647,
      max.distance = NULL,
      target.data.sequence.column = NULL,
      ref.data.sequence.column = NULL,
      target.data.partition.column = NULL,
      target.data.order.column = NULL,
      ref.data.order.column = NULL
  )

Arguments

target.data

Required Argument.
Specifies a tbl_teradata object that contains the target vectors.

target.data.partition.column

Required Argument.
Specifies Partition By columns for "target.data".
Values to this argument can be provided as a vector, if multiple columns are used for partition.
Types: character OR vector of Strings (character)

target.data.order.column

Optional Argument.
Specifies Order By columns for "target.data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

ref.data

Required Argument.
Specifies a tbl_teradata object that contains the reference vectors.

ref.data.order.column

Optional Argument.
Specifies Order By columns for "ref.data".
Values to this argument can be provided as a vector, if multiple columns are used for ordering.
Types: character OR vector of Strings (character)

target.id

Required Argument.
Specifies the names of the columns that comprise the target vector identifier. You must partition the "target.data" tbl_teradata by these columns and specify them with this argument.
Types: character OR vector of Strings (character)

target.feature

Required Argument.
Specifies the name of the column that contains the target vector feature name (for example, the axis of a 3-D vector).
Note: Entries with NULL values in the "target.feature" column are dropped.
Types: character

target.value

Optional Argument.
Specifies the name of the column that contains the value for the column in the "target.feature" argument.
Note: Entries with NULL values in the "target.value" column are dropped.
Default Value: The first column in the "target.data" tbl_teradata object.
Types: character

ref.id

Optional Argument.
Specifies the names of the columns that comprise the reference vector identifier.
Default Value: The value of the "target.id" argument.
Types: character OR vector of Strings (character)

ref.feature

Optional Argument.
Specifies the name of the column that contains the reference vector feature name.
Default Value: The value of the "target.feature" argument.
Types: character

ref.value

Optional Argument.
Specifies the name of the column that contains the value for the reference vector feature.
Note: Entries with NULL values are dropped.
Default Value: The value of the "target.value" argument.
Types: character

reftable.size

Optional Argument.
Specifies the size of the "ref.data" tbl_teradata object. Specify "LARGE" only if the "ref.data" tbl_teradata object does not fit in memory. "SMALL" allows faster processing.
Default Value: "small"
Permitted Values: small, large
Types: character

distance.measure

Optional Argument.
Specifies one or more distance measures that the function must use.
Default Value: "cosine"
Permitted Values: COSINE, EUCLIDEAN, MANHATTAN, BINARY
Types: character OR vector of characters

ignore.mismatch

Optional Argument.
Specifies whether to drop mismatched dimensions. If "distance.measure" is "cosine", then this argument is FALSE. If you specify TRUE, then two vectors with no common features become two empty vectors when only their common features are considered, and the function cannot measure the distance between them.
Default Value: TRUE
Types: logical

replace.invalid

Optional Argument.
Specifies the value to return when the function encounters an infinite value or empty vectors. To customize, you can specify any numeric value in quotes.
Default Value: "positiveinfinity"
Types: character

top.k

Optional Argument.
Specifies, for each target vector and for each measure, the maximum number of closest reference vectors to include in the output tbl_teradata.
Default Value: 2147483647
Types: integer

max.distance

Optional Argument.
Specifies the maximum distance between a pair of target and reference vectors. If the distance exceeds the "max.distance" threshold value, the pair does not appear in the output tbl_teradata object. If the "distance.measure" argument specifies multiple measures, then the "max.distance" argument must specify a threshold value for each measure. The ith threshold corresponds to the ith measure. If you omit this argument, then the function returns all results.
Types: numeric OR vector of numerics

target.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "target.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

ref.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "ref.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.
Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_vector_distance_mle" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using the name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("vectordistance_example", "target_mobile_data", "ref_mobile_data")

    # Create object(s) of class "tbl_teradata".
    target_mobile_data <- tbl(con, "target_mobile_data")
    ref_mobile_data <- tbl(con, "ref_mobile_data")

    # Example 1 - Using the default ("cosine") distance measure with no threshold.
    td_vector_distance_out <- td_vector_distance_mle(target.data = target_mobile_data,
                                                 target.data.partition.column = c("userid"),
                                                 ref.data = ref_mobile_data,
                                                 target.id = c("userid"),
                                                 target.feature = "feature",
                                                 target.value = "value1"
                                                 )

    # Example 2 - Using three distance measures with corresponding thresholds "max.distance".
    td_vector_distance_out1 <- td_vector_distance_mle(target.data = target_mobile_data,
                                                 target.data.partition.column = c("userid"),
                                                 ref.data = ref_mobile_data,
                                                 target.id = c("userid"),
                                                 target.feature = "feature",
                                                 target.value = "value1",
                                                 distance.measure = c("Cosine","Euclidean",
                                                                      "Manhattan"),
                                                 max.distance = c(0.03,0.8,1.0)
                                                 )