| |
Methods defined here:
- __init__(self, target_data=None, ref_data=None, target_id=None, target_feature=None, target_value=None, ref_id=None, ref_feature=None, ref_value=None, reftable_size='small', distance_measure='cosine', ignore_mismatch=True, replace_invalid='positiveinfinity', top_k=None, max_distance=None, target_data_sequence_column=None, ref_data_sequence_column=None, target_data_partition_column='ANY', target_data_order_column=None, ref_data_order_column=None, ref_columns=None, output_format='sparse', input_data_same=False, target_columns=None)
- DESCRIPTION:
The VectorDistance function takes a teradataml DataFrame of target
vectors and a teradataml DataFrame of reference vectors and returns a
teradataml DataFrame that contains the distance between each
target-reference pair.
PARAMETERS:
target_data:
Required Argument.
Specifies a teradataml DataFrame that contains target vectors.
target_data_partition_column:
Required Argument. Optional when teradataml is connected to
Vantage 1.3 version.
Specifies Partition By columns for target_data.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Note:
1. If teradataml is not connected to Vantage 1.3 then user must use
this argument by passing column name(s) only, passing "ANY" is
not supported.
2. If teradataml is connected to Vantage 1.3 and target_data
teradataml DataFrame is in sparse-format then user must use
this argument by passing column name(s).
3. If teradataml is connected to Vantage 1.3 and target_data
teradataml DataFrame is in dense-format then user must
specify "ANY" to this argument.
Default Value: ANY (If teradataml is connected to Vantage 1.3)
Types: str OR list of Strings (str)
target_data_order_column:
Optional Argument.
Specifies Order By columns for target_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
ref_data:
Required Argument.
Specifies a teradataml DataFrame that contains reference vectors.
ref_data_order_column:
Optional Argument.
Specifies Order By columns for ref_data.
Values to this argument can be provided as a list, if multiple
columns are used for ordering.
Types: str OR list of Strings (str)
target_id:
Required Argument.
Specifies the names of the columns that comprise the target vector
identifier. You must partition the target input teradataml DataFrame
by these columns and specify them with this argument.
Types: str OR list of Strings (str)
target_feature:
Required Argument. Optional when teradataml is connected to
Vantage 1.3 version.
Specifies the name of the column that contains the target vector
feature name (for example, the axis of a 3-D vector).
Note: An entry with a NULL value in a feature_column is dropped.
Types: str
target_value:
Optional Argument.
Specifies the name of the column that contains the value for the
target vector feature. The default value is 1.
Note: An entry with a NULL value in a value_column is dropped.
Types: str
ref_id:
Optional Argument.
Specifies the names of the columns that comprise the reference vector
identifier. The default value is the target_id argument value.
Types: str OR list of Strings (str)
ref_feature:
Optional Argument.
Specifies the name of the column that contains the reference vector
feature name. The default value is the target_feature argument value.
Types: str
ref_value:
Optional Argument.
Specifies the name of the column that contains the value for the
reference vector feature. The default value is the target_value
argument value.
Note: An entry with a NULL value in a value_column is dropped.
Types: str
reftable_size:
Optional Argument.
Specifies the size of the reference table. Specify "LARGE" only if
the reference teradataml DataFrame does not fit in memory.
Default Value: "small"
Permitted Values: small, large
Types: str
distance_measure:
Optional Argument.
Specifies the distance measures that the function uses.
Default Value: "cosine"
Permitted Values: COSINE, EUCLIDEAN, MANHATTAN, BINARY
Types: str OR list of Strings (str)
ignore_mismatch:
Optional Argument.
Specifies whether to drop mismatched dimensions. If distance_measure
is "cosine", then this argument is "False". If you specify "True",
then two vectors with no common features become two empty vectors
when only their common features are considered, and the function
cannot measure the distance between them.
Default Value: True
Types: bool
replace_invalid:
Optional Argument.
Specifies the value to return when the function encounters an
infinite value or empty vectors. For custom, you can supply any float
value.
Default Value: "positiveinfinity"
Types: str
top_k:
Optional Argument.
Specifies, for each target vector and for each measure, the maximum
number of closest reference vectors to include in the output table.
For k, you can supply any integer value.
Types: int
max_distance:
Optional Argument.
Specifies the maximum distance between a pair of target and reference
vectors. If the distance exceeds the threshold, the pair does not
appear in the output table. If the distance_measure argument
specifies multiple measures, then the max_distance argument must
specify a threshold for each measure. The ith threshold corresponds
to the ith measure. Each threshold can be any float value. If you
omit this argument, then the function returns all results.
Types: float OR list of Floats (float)
target_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "target_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
ref_data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "ref_data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
ref_columns:
Optional Argument.
Specifies the columns that contains the value for the ref vector
features.
For Example:
The names of the three axes of a 3-D vector.
Note:
1. "ref_columns" argument support is only available when teradataml
is connected to Vantage 1.3 version.
2. If "target_data" teradataml DataFrame is in dense-format input,
"target_columns" and "ref_columns" must specify the same columns;
otherwise results are invalid.
Types: str OR list of Strings (str)
output_format:
Optional Argument.
Specifies the format of the output teradataml DataFrame.
For large data sets, Teradata recommends input in dense format,
for which computing distances is faster.
Note:
"output_format" argument support is only available when teradataml
is connected to Vantage 1.3 version.
Default Value: "sparse"
Permitted Values: sparse, dense
Types: str
input_data_same:
Optional with "top_k" Argument, disallowed otherwise.
Specifies whether target_data and ref_data teradataml DataFrame
are same. Specify 'True' to increase speed of computing distances
when both the DataFrames are same..
Note:
"input_data_same" argument support is only available when teradataml
is connected to Vantage 1.3 version.
Default Value: False
Types: bool
target_columns:
Optional Argument.
Specifies the columns that contains the value for the target vector
features.
For Example:
The names of the three axes of a 3-D vector.
Note:
"target_columns" argument support is only available when teradataml
is connected to Vantage 1.3 version.
Types: str OR list of Strings (str)
RETURNS:
Instance of VectorDistance.
Output teradataml DataFrames can be accessed using attribute
references, such as VectorDistanceObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
result
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Load example data.
load_example_data("vectordistance", ["target_mobile_data", "ref_mobile_data",
"target_mobile_data_dense", "ref_mobile_data_dense"])
# Create teradataml DataFrame objects.
target_mobile_data = DataFrame.from_table("target_mobile_data")
ref_mobile_data = DataFrame.from_table("ref_mobile_data")
target_mobile_data_dense = DataFrame.from_table("target_mobile_data_dense")
ref_mobile_data_dense = DataFrame.from_table("ref_mobile_data_dense")
# Example 1 - Using the default ("cosine") distance measure with no threshold.
VectorDistance_out1 = VectorDistance(target_data = target_mobile_data,
target_data_partition_column = ["userid"],
ref_data = ref_mobile_data,
target_id = ["userid"],
target_feature = "feature",
target_value = "value1"
)
# Print the output data.
print(VectorDistance_out1)
# Example 2 - Using three distance measures with corresponding thresholds (max.distance).
VectorDistance_out2 = VectorDistance(target_data = target_mobile_data,
target_data_partition_column = ["userid"],
ref_data = ref_mobile_data,
target_id = ["userid"],
target_feature = "feature",
target_value = "value1",
distance_measure = ["Cosine","Euclidean","Manhattan"],
max_distance = [0.03,0.8,1.0]
)
# Print the output data.
print(VectorDistance_out2)
# Example 3 - target_data DataFrame is in 'dense' format with no threshold.
# Note:
# This Example will work only when teradataml is connected
# to Vantage 1.3 or later.
VectorDistance_out3 = VectorDistance(target_data = target_mobile_data_dense,
target_data_partition_column = "ANY",
ref_data = ref_mobile_data_dense,
target_id = ["userid"],
target_columns=["CallDuration", "DataCounter", "SMS"],
distance_measure = "Euclidean"
)
# Print the output data.
print(VectorDistance_out3)
# Example 4 - Using the same "target_data" and "ref_data" teradata DataFrame same
# with "input_data_same" set to 'True'.
# Note:
# This Example will work only when teradataml is connected
# to Vantage 1.3 or later.
VectorDistance_out4 = VectorDistance(target_data = target_mobile_data,
target_data_partition_column = ["userid"],
ref_data = target_mobile_data,
target_id = ["userid"],
target_feature = "feature",
target_value = "value1",
distance_measure = "Euclidean",
input_data_same = True
)
# Print the output data.
print(VectorDistance_out4)
- __repr__(self)
- Returns the string representation for a VectorDistance class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|