An answer table is produced for each requested type of analysis, the output including requested table names and column names in order to allow results from multiple tables to be included in each answer table.
- If Frequency is selected, Values must be selected.
- If Histogram is selected, Values must be selected and Statistics must be selected including the Count, Minimum, Maximum, Mean and Standard Deviation.
Data Explorer includes intelligence about which functions should be performed on which columns, with decisions based partly on column type and partly on results obtained. It also includes performance enhancements resulting in minimal passes on the input data. You may also specify a separate SQL Where Clause to apply to each of the input tables selected for analysis.
The Data Explorer normal processing scheme is outlined below. Note that underlined values given in the following topics are threshold values which can be set by the user. The program first builds up to four output tables, then the steps below are applied to each requested input table, one at a time. If parallel processing is requested, however, the tables are, in a sense, processed n at a time, where n is the number of tables to process in parallel. That is, the program establishes n threads and performs the steps below for each input table in a separate thread until all tables are processed.