Function Name | Description |
---|---|
TD_TextParser | The function performs the following operations:
|
TD_OrdinalEncodingFit and TD_OrdinalEncodingTransform functions | The TD_OrdinalEncodingFit function identifies distinct categorical values from the input table or a user-defined list and returns the distinct categorical values along with the ordinal value for each category. The TD_OrdinalEncodingTransform function maps the categorical value to a specified ordinal value using the TD_OrdinalEncodingFit output. |
TD_NonLinearCombineFit and TD_NonLinearCombineTransform functions | TD_NonLinearCombineFit function returns the target columns and a specified formula which uses the non-linear combination of existing features. TD_NonLinearCombineTransform generates the values of the new feature using the specified formula from the TD_NonLinearCombineFit function output. |
TD_ANOVA | Analysis of variance (ANOVA) is a statistical test that analyzes the difference between the means of more than two groups. The null hypothesis (H0) of ANOVA is that there is no difference among group means. However, if any one of the group means is significantly different from the overall mean, then the null hypothesis is rejected. You can use one-way Anova when you have data on an independent variable with at least three levels and a dependent variable. For example, assume that your independent variable is insect spray type, and you have data on spray type A, B, C, D, E, and F. You can use one-way ANOVA to determine whether there is any difference in the dependent variable, insect count based on the spray type used. |
TD_NaiveBayesTextClassifierTrainer | The function calculates the conditional probabilities for token-category pairs, the prior probabilities, and the missing token probabilities for all categories. The trainer function trains the model with the probability values, and the predict function uses the values to classify documents into categories. |
TD_RegressionEvaluator | The function computes metrics to evaluate and compare multiple models and summarizes how close predictions are to their expected values. |
TD_ClassificationEvaluator | The function computes the Confusion matrix, precision, recall and F1-score based on the observed labels (true labels) and the predicted labels. The function works for multi-class scenarios as well. In any case, the primary output table contains class-level metrics, whereas the secondary output table contains metrics that are applicable across classes. |
TD_GetFutileColumns | The function returns the futile column names if either of the conditions is met:
|
TD_KMeans | The K-means algorithm groups a set of observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid). This algorithm minimizes the objective function, that is, the total Euclidean distance of all data points from the center of the cluster as follows:
The algorithm doesn't necessarily find the optimal configuration as it depends significantly on the initial randomly selected cluster centers. You can run the function multiple times to reduce the effect of this limitation. Also, this function returns the within-cluster-squared-sum, which you can use to determine an optimal number of clusters using the Elbow method. |
TD_KMeansPredict | The function uses the cluster centroids in the TD_KMeans function output to assign the input data points to the cluster centroids. |
TD_Silhouette | The Silhouette function refers to a method of interpretation and validation of consistency within clusters of data. The function determines how appropriately data is clustered and determines the separation distance between the resulting clusters. The silhouette value determines the similarity of an object to its cluster (cohesion) compared to other clusters (separation). The silhouette plot displays a measure of how close each point in one cluster is to the points in the neighboring clusters and thus provides a way to assess parameters like the number of clusters. |
TD_SentimentExtractor | The function uses a dictionary model to extract the sentiment (positive, negative, or neutral) of each input document or sentence. |
TD_ROC | The Receiver Operating Characteristic (ROC) function accepts a set of prediction-actual pairs for a binary classification model and calculates the following values for a range of discrimination thresholds:
A receiver operating characteristic (ROC) curve shows the performance of a binary classification model as its discrimination threshold varies. For a range of thresholds, the curve plots the true positive rate against the false-positive rate. |
TD_VectorDistance | The function accepts a table of target vectors and a table of reference vectors and returns a table that contains the distance between target-reference pairs. |
TD_RandomProjectionMinComponents | The function calculates the minimum number of components required for applying RandomProjection on the given dataset for the specified epsilon(distortion) parameter value. The function estimates the minimum value of the NumComponents argument in the TD_RandomProjectionFit function for a given dataset. The function uses the Johnson-Lindenstrauss Lemma algorithm to calculate the value. |
TD_RandomProjectionFit | The function returns a random projection matrix based on the specified arguments. The function returns the required parameters for transforming the input data into lower-dimensional data. The TD_RandomProjectionTransform function uses the TD_RandomProjectionFit output to reduce the dimensionality of the input data. |
TD_RandomProjectionTransform | The function converts the high-dimensional input data to a lower-dimensional space using the TD_RandomProjectionFitfunction output. |
TD_ColumnTransformer | The function transforms the input table columns in a single operation. You only need to provide the FIT tables to the function, and the function runs all transformations that you require in a single operation. |
TD_GLM | The function is a generalized linear model (GLM) that performs regression and classification analysis on data sets, where the response follows an exponential family distribution. The function supports the following models:
|
TD_GLMPredict | The function predicts target values (regression) and class labels (classification) for test data using a GLM model trained by the TD_GLM function. |
TD_DecisionForest | The function is an ensemble algorithm used for classification and regression predictive modeling problems. It is an extension of bootstrap aggregation (bagging) of decision trees. |