Teradata R Package Function Reference - 16.20 - VectorDistance - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
16.20
created_date
February 2020
category
Programming Reference
featnum
B700-4007-098K

Description

The VectorDistance (td_vector_distance_mle) function takes a tbl_teradata of target vectors and a tbl_teradata of reference vectors and returns a tbl_teradata that contains the distance between each target-reference pair.

Usage

  td_vector_distance_mle (
      target.data = NULL,
      ref.data = NULL,
      target.id = NULL,
      target.feature = NULL,
      target.value = NULL,
      ref.id = NULL,
      ref.feature = NULL,
      ref.value = NULL,
      reftable.size = "small",
      distance.measure = "cosine",
      ignore.mismatch = TRUE,
      replace.invalid = "positiveinfinity",
      top.k = 2147483647,
      max.distance = NULL,
      target.data.sequence.column = NULL,
      ref.data.sequence.column = NULL,
      target.data.partition.column = NULL
  )

Arguments

target.data

Required Argument.
Specifies a tbl_teradata that contains the target vectors.

target.data.partition.column

Required Argument.
Specifies the Partition By columns for target.data. Values to this argument can be provided as vector, if multiple columns are used for partition.

ref.data

Required Argument.
Specifies a tbl_teradata that contains the reference vectors.

target.id

Required Argument.
Specifies the names of the columns that comprise the target vector identifier. You must partition the "target.data" table by these columns and specify them with this argument.

target.feature

Required Argument. Specifies the name of the column that contains the target vector feature name (for example, the axis of a 3-D vector).
Note: Entries with NULL values in the "target.feature" column are dropped.

target.value

Optional Argument.
Specifies the name of the column that contains the value for the column in the "target.feature" argument. Default value is the first column in the "target.data" table.
Note: Entries with NULL values in the "target.value" column are dropped.

ref.id

Optional Argument.
Specifies the names of the columns that comprise the reference vector identifier. The default value is the "target.id" argument value.

ref.feature

Optional Argument.
Specifies the name of the column that contains the reference vector feature name. The default value is the "target.feature" argument value.

ref.value

Optional Argument.
Specifies the name of the column that contains the value for the reference vector feature. The default value is the "target.value" argument value.
Note: Entries with NULL values are dropped.

reftable.size

Optional Argument.
Specifies the size of the "ref.data" table. Specify "LARGE" only if the "ref.data" table does not fit in memory. The default value, "SMALL", allows faster processing.
Default Value: SMALL
Permitted Values: SMALL, LARGE

distance.measure

Optional Argument.
Specifies one or more distance measures that the function must use. Default Value: "COSINE"
Permitted Values: COSINE, EUCLIDEAN, MANHATTAN, BINARY

ignore.mismatch

Optional Argument.
Specifies whether to drop mismatched dimensions. If "distance.measure" is "cosine", then this argument is FALSE. If you specify TRUE, then two vectors with no common features become two empty vectors when only their common features are considered, and the function cannot measure the distance between them.
Default Value: TRUE

replace.invalid

Optional Argument.
Specifies the value to return when the function encounters an infinite value or empty vectors. To customize, you can specify any numeric value in quotes.
Default Value: "positiveinfinity"

top.k

Optional Argument.
Specifies, for each target vector and for each measure, the maximum number of closest reference vectors to include in the output table. The default value is the maximum integer value: 2147483647.

max.distance

Optional Argument.
Specifies the maximum distance between a pair of target and reference vectors. If the distance exceeds the "max.distance" threshold value, the pair does not appear in the output table. If the "distance.measure" argument specifies multiple measures, then the "max.distance" argument must specify a threshold value for each measure. The ith threshold corresponds to the ith measure. If you omit this argument, then the function returns all results.

target.data.sequence.column

Optional Argument. Specifies the vector of column(s) that uniquely identifies each row of the input argument "target.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

ref.data.sequence.column

Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row of the input argument "ref.data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run.

Value

Function returns an object of class "td_vector_distance_mle" which is a named list containing Teradata tbl object.
Named list member can be referenced directly with the "$" operator using name: result.

Examples

    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("vectordistance_example", "target_mobile_data", "ref_mobile_data")
    
    # Create remote tibble objects.
    target_mobile_data <- tbl(con, "target_mobile_data")
    ref_mobile_data <- tbl(con, "ref_mobile_data")
    
    # Example - Using the default ("cosine") distance measure with no threshold.
    td_vector_distance_out <- td_vector_distance_mle(target.data = target_mobile_data,
                                                 target.data.partition.column = c("userid"),
                                                 ref.data = ref_mobile_data,
                                                 target.id = c("userid"),
                                                 target.feature = "feature",
                                                 target.value = "value1"
                                                 )
    # Example - Using three distance measures with corresponding thresholds (max.distance).
    td_vector_distance_out1 <- td_vector_distance_mle(target.data = target_mobile_data,
                                                 target.data.partition.column = c("userid"),
                                                 ref.data = ref_mobile_data,
                                                 target.id = c("userid"),
                                                 target.feature = "feature",
                                                 target.value = "value1",
                                                 distance.measure = c("Cosine","Euclidean","Manhattan"),
                                                 max.distance = c(0.03,0.8,1.0)
                                                 )