Creating a New or Modifying an Existing Histogram Analysis

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

brand
Software
prodname
Teradata Warehouse Miner
vrm_release
5.4.4
category
User Guide
featnum
B035-2300-077K

In order to create a new or modify an existing Histogram analysis, the following analysis properties are defined:

Analysis Properties

  • Type — “Histogram” (needed only if “new” is “true”)
  • Name — the name of the new Histogram analysis or the name of an existing Histogram to modify
  • New — “true” (needed to define a new Histogram analysis)
  • Modify — “true” (needed to modify an existing Histogram analysis)

InputDataProperties needs to be defined if this is a “new” analysis. InputDataProperties takes a database, table, and a list of various column definitions, depending on the type of histogram desired. Histogram takes 5 types of column input. They define a collection of bins, quantiles, widths, boundaries, or bins with minimum and maximum values.

Column Input Data

  • Database (required for any histogram type) — the name of the database
  • Table (required for any histogram type) — the name of the table

The following defines each type of Histogram input:

BinColumns — a list of bin column names and their bin values

  • BinColumn
    • Name — the name of the column
    • Bins — the number of bins

An XML example to define bin columns follows.

<InputDataProperties>
	Database="twm_source"
	Table="twm_customer">
	<BinColumns>
		<BinColumn name="age" bins="5"/>
		<BinColumn name="income" bins="3"/>
	</BinColumns>
</InputDataProperties>

QuantileColumns — a list of bin column names and their bin values

  • QuantileColumn
    • Name — the name of the column
    • Quantiles — the number of quantiles

An XML example to define quantile columns follows.

<InputDataProperties table="twm_customer" database="twm_source">
	<QuantileColumns>
		<QuantileColumn name="age" quantiles="5"/>
		<QuantileColumn name="income" quantiles="3"/>
	</QuantileColumns>
</InputDataProperties>

WidthColumns — a list of column names and their width values

  • WidthColumn
    • Name — the name of the column
    • Width — the width values

An XML example to define width columns follows.

<InputDataProperties
	Database="twm_source"
	Table="twm_customer">
	<WidthColumns>
		<WidthColumn name="age" width="10"/>
		<WidthColumn name="income" width="10000"/>
	</WidthColumns>
</InputDataProperties>

BoundaryColumns — a list of column names and the list of boundary values for each column

  • BoundaryColumn
    • Name — the name of the column
  • Boundaries
    • Boundary
      • Value — the value of each boundary

An XML example to define boundary columns follows.

<InputDataProperties
	Database="twm_source"
	Table="twm_customer">
	<BoundaryColumns>
		<BoundaryColumn name="age">
			<Boundaries>
				<Boundary Value="20"/>
				<Boundary Value="35"/>
				<Boundary Value="75"/>
			</Boundaries>
		</BoundaryColumn>
		<BoundaryColumn name="nbr_children">
			<Boundaries>
				<Boundary Value="0"/>
				<Boundary Value="2"/>
				<Boundary Value="4"/>
			</Boundaries>
		</BoundaryColumn>
	</BoundaryColumns>
</InputDataProperties>

BinsMinMaxColumns — a list of column names that includes the bins and minimum and maximum values

  • BinMinMaxColumn
    • Name — the name of the column
    • Bins — the number of bins
    • MinimumValue — the minimum value to be binned for the column
    • MaximumValue — the maximum value to be binned for the column

An XML example to define boundary columns follows:

<InputDataProperties
	Database="twm_source"
	Table="twm_customer">
	<BinsMinMaxColumns>
		<binsMinMaxColumn name="age" bins="6" minimumValue="10" maximumValue="80"/>
		<binsMinMaxColumn name="income" bins="5"minimumValue="10000" 
			maximumValue="100000"/>
	</BinsMinMaxColumns>
</InputDataProperties>

An XML example to define BinColumns, StatisticsColumns and OverlayColumns for a new analysis follows.

<InputDataProperties>
	Database="twm_source"
	Table="twm_customer">
	<BinColumns>
		<BinColumn name="age" bins="5"/>
		<BinColumn name="income" bins="3"/>
	</BinColumns>
	<StatisticsColumns>
		<StatisticsColumn name="nbr_children"/>
		<StatisticsColumn name="years_with_bank"/>
	</StatisticsColumns>
	<OverlayColumns>
		<OverlayColumn name="income"/>
	</OverlayColumns>
</InputDataProperties>

Statistics Column Input Data (optional)

  • Statistics Columns — a list of column names of numeric or date type
    • Name — the name of the column to collect statistics on

Overlay Column Input Data (optional)

  • Overlay Columns — a list of column names that cannot include any of the histogram binning columns
    • Name — the name of the overlay column

Input Data Analysis Properties

  • CrossTabulateBins — “true” or “false”. Default is false.

An XML example to define InputDataAnalysis properties follows.

<InputDataAnalysisProperties
	crossTabulateBins=”true”/>

Output Properties

For the definition of output properties, see Modifying Output Batch Properties And Post Processing Properties.

Expert Properties

  • WhereClause — the where clause to be defined

An XML example to define Expert properties follows.

<ExpertProperties>
	whereClause=”age>50”/>

Sample XML Definition for a New Histogram Analysis

<Analysis name="MyHistogram" type="Histogram" new="true">
	<InputDataProperties>
		Database="twm_source"
		Table="twm_customer">
		<BinColumns>
			<BinColumn name="age" bins="5"/>
			<BinColumn name="income" bins="3"/>
		</BinColumns>
		<StatisticsColumns>
			<StatisticsColumn name="nbr_children"/>
			<StatisticsColumn name="years_with_bank"/>
		</StatisticsColumns>
		<OverlayColumns>
			<OveralyColumn name="income"/>
		</OverlayColumns>
	</InputDataProperties>
	<InputDataAnalysisProperties
		CrosstabulateBins=”true”/>
	<OutputProperties
		outputStyle="CreateTable"
		outputDatabase="twm_results"
		outpuName="MyHistogramOutput"
	</OutputProperties >
</Analysis>