Cluster Scoring - INPUT - Data Selection - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Teradata Warehouse Miner
February 2018
  1. On the Cluster Scoring dialog box, click INPUT.
  2. Click data selection.
    Add New Analysis > Input > Data Selection

  3. On this screen, select:
    • Select Input Source — Users may select between different sources of input.
      By selecting the Input Source Table, the user can select from available databases, tables (or views) and columns in the usual manner. By selecting the Input Source Analysis, however, the user can select directly from the output of another analysis of qualifying type in the current project. Analyses that may be selected from directly include all of the Analytic Data Set (ADS) and Reorganization analyses (except Refresh). In place of Available Databases, the user may select from Available Analyses, while Available Tables then contains a list of all the output tables that will eventually be produced by the selected Analysis.
      Since this analysis cannot select from a volatile input table, Available Analyses will contain only those qualifying analyses that create an output table or view.
    • Select Columns From a Single Table
      • Available Databases — All available source databases that have been added on the Connection Properties dialog box.
      • Available Tables — The tables available for scoring are listed in this window, though all may not strictly qualify; the input table to be scored must contain the same column names used in the original analysis.
      • Available Columns — The columns available for scoring are listed in this window.
      • Selected Columns — The Selected Columns window is a split window for specifying Index and/or Retain columns.
        • Index Columns — If a table is specified as input, the primary index of the table is defaulted here but can be changed. If a view is specified as input, an index must be provided. When scoring a Fast K-Means model, any columns used to determine clusters in the analysis being scored are not necessarily specified as Index columns when scoring. A duplicate definition error can occur.
        • Retain Columns — Other columns within the table being scored can be appended to the scored table by specifying them here. Columns specified in Index Columns are not necessarily specified here. None of the columns involved in Fast K-Means clustering can contain leading or trailing spaces or, if publishing, a separator character ' | '.
    • Select Model Analysis — Select from the list an existing Cluster analysis on which to run the scoring. The Cluster analysis must exist in the same project as the Cluster Scoring analysis.