| |
Methods defined here:
- __init__(self, target_data=None, ref_data=None, target_id=None, target_feature=None, target_value=None, ref_id=None, ref_feature=None, ref_value=None, reftable_size='small', distance_measure='cosine', ignore_mismatch=True, replace_invalid='positiveinfinity', top_k=2147483647, max_distance=None, target_data_sequence_column=None, ref_data_sequence_column=None, target_data_partition_column=None, target_data_order_column=None, ref_data_order_column=None)
- DESCRIPTION:
The VectorDistance function takes a teradataml DataFrame of target
vectors and a teradataml DataFrame of reference vectors and returns a
teradataml DataFrame that contains the distance between each
target-reference pair.
PARAMETERS:
target_data:
Required Argument.
Specifies a teradataml DataFrame that contains target vectors.
target_data_partition_column:
Required Argument.
Specifies Partition By columns for target_data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Types: str OR list of Strings (str)
target_data_order_column:
Optional Argument.
Specifies Order By columns for target_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
ref_data:
Required Argument.
Specifies a teradataml DataFrame that contains reference vectors.
ref_data_order_column:
Optional Argument.
Specifies Order By columns for ref_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
target_id:
Required Argument.
Specifies the names of the columns that comprise the target vector
identifier. You must partition the target input teradataml DataFrame
by these columns and specify them with this argument.
Types: str OR list of Strings (str)
target_feature:
Required Argument.
Specifies the name of the column that contains the target vector
feature name (for example, the axis of a 3-D vector).
Note: An entry with a NULL value in a feature_column is dropped.
Types: str
target_value:
Optional Argument.
Specifies the name of the column that contains the value for the
target vector feature. The default value is 1.
Note: An entry with a NULL value in a value_column is dropped.
Types: str
ref_id:
Optional Argument.
Specifies the names of the columns that comprise the reference vector
identifier. The default value is the target_id argument value.
Types: str OR list of Strings (str)
ref_feature:
Optional Argument.
Specifies the name of the column that contains the reference vector
feature name. The default value is the target_feature argument value.
Types: str
ref_value:
Optional Argument.
Specifies the name of the column that contains the value for the
reference vector feature. The default value is the target_value
argument value.
Note: An entry with a NULL value in a value_column is dropped.
Types: str
reftable_size:
Optional Argument.
Specifies the size of the reference table. Specify "LARGE" only if
the reference teradataml DataFrame does not fit in memory.
Default Value: "small"
Permitted Values: small, large
Types: str
distance_measure:
Optional Argument.
Specifies the distance measures that the function uses.
Default Value: "cosine"
Permitted Values: COSINE, EUCLIDEAN, MANHATTAN, BINARY
Types: str OR list of Strings (str)
ignore_mismatch:
Optional Argument.
Specifies whether to drop mismatched dimensions. If distance_measure
is "cosine", then this argument is "False". If you specify "True",
then two vectors with no common features become two empty vectors
when only their common features are considered, and the function
cannot measure the distance between them.
Default Value: True
Types: bool
replace_invalid:
Optional Argument.
Specifies the value to return when the function encounters an
infinite value or empty vectors. For custom, you can supply any float
value.
Default Value: "positiveinfinity"
Types: str
top_k:
Optional Argument.
Specifies, for each target vector and for each measure, the maximum
number of closest reference vectors to include in the output table.
For k, you can supply any integer value.
Default Value: 2147483647
Types: int
max_distance:
Optional Argument.
Specifies the maximum distance between a pair of target and reference
vectors. If the distance exceeds the threshold, the pair does not
appear in the output table. If the distance_measure argument
specifies multiple measures, then the max_distance argument must
specify a threshold for each measure. The ith threshold corresponds
to the ith measure. Each threshold can be any float value. If you
omit this argument, then the function returns all results.
Types: float OR list of Floats (float)
target_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "target_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
ref_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "ref_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of VectorDistance.
Output teradataml DataFrames can be accessed using attribute
references, such as VectorDistanceObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("vectordistance", ["target_mobile_data", "ref_mobile_data"])
# Create teradataml DataFrame objects.
target_mobile_data = DataFrame.from_table("target_mobile_data")
ref_mobile_data = DataFrame.from_table("ref_mobile_data")
# Example 1 - Using the default ("cosine") distance measure with no threshold.
VectorDistance_out1 = VectorDistance(target_data = target_mobile_data,
target_data_partition_column = ["userid"],
ref_data = ref_mobile_data,
target_id = ["userid"],
target_feature = "feature",
target_value = "value1"
)
# Print the output data
print(VectorDistance_out1)
# Example 2 - Using three distance measures with corresponding thresholds (max.distance).
VectorDistance_out2 = VectorDistance(target_data = target_mobile_data,
target_data_partition_column = ["userid"],
ref_data = ref_mobile_data,
target_id = ["userid"],
target_feature = "feature",
target_value = "value1",
distance_measure = ["Cosine","Euclidean","Manhattan"],
max_distance = [0.03,0.8,1.0]
)
# Print the output data
print(VectorDistance_out2)
- __repr__(self)
- Returns the string representation for a VectorDistance class instance.
|