| |
Methods defined here:
- __init__(self, data=None, auto_bin=None, custom_bin_table=None, custom_bin_column=None, bin_size=None, start_value=None, end_value=None, value_column=None, inclusion='left', groupby_columns=None, data_sequence_column=None, custom_bin_table_sequence_column=None)
- DESCRIPTION:
Histograms are used to assess the shape of a data distribution.
The Histogram function calculates the frequency distribution of a
data set using sophisticated binning techniques that can
automatically calculate the bin width and number of bins. The
function maps each input row to one bin and returns the frequency
(row count) and proportion (percentage of rows) of each bin.
PARAMETERS:
data:
Required Argument.
Specifies the teradataml DataFrame containing the input data.
auto_bin:
Optional Argument.
Specifies either the algorithm to be used for selecting bin
boundaries or the approximate number of bins to be found. The
permitted values are STURGES, SCOTT, or a positive integer. If
this argument is present, the arguments custom_bin_table,
custom_bin_column, start_value, bin_size, and end_value cannot
be present.
Types: str
custom_bin_table:
Optional Argument.
Specifies a teradataml DataFrame containing the boundary
points between bins. If this argument is present, the argument
custom_bin_column must also be present and the arguments
auto_bin, start_value, bin_size, and end_value cannot be
present.
custom_bin_column:
Optional Argument.
Specifies the column, in the custom_bin_table, containing the
boundary values. Input columns must contain values with numeric
Python data types (int, float). If this argument is present, the
argument custom_bin_table must also be present and the
arguments auto_bin, start_value, bin_size, and end_value cannot
be present.
Types: str
bin_size:
Optional Argument.
For equally sized bins, a double value specifying the width of
the bin. Omit this argument if you are not using equally sized
bins. The input value must be greater than 0.0. If this
argument is present, the arguments start_value and end_value
must also be present and the arguments auto_bin,
custom_bin_table and custom_bin_column cannot be present.
Types: float
start_value:
Optional Argument.
Specifies the smallest value to be used in binning. If this
argument is present, the arguments bin_size and end_value must
also be present and the arguments auto_bin, custom_bin_table
and custom_bin_column cannot be present.
Types: float
end_value:
Optional Argument.
Specifies the largest value to be used in binning. If this
argument is present, the arguments start_value and bin_size
must also be present and the arguments auto_bin,
custom_bin_table and custom_bin_column cannot be present.
Types: float
value_column:
Required Argument.
Specifies the column in the input teradataml DataFrame for
which statistics will be computed. Column must contain a values
with numeric Python data types (int, float).
Types: str
inclusion:
Optional Argument.
Indicates whether points on bin boundaries should be included
in the bin on the left or the bin on the right.
Default Value: "left"
Permitted Values: left, right
Types: str
groupby_columns:
Optional Argument.
Specifies the columns in the input teradataml DataFrame used to
group values for binning. These columns cannot contain values
with double or float data types.
Types: str OR list of Strings (str)
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
custom_bin_table_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "custom_bin_table". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of Histogram.
Output teradataml DataFrames can be accessed using attribute
references, such as HistogramObj.<attribute_name>.
Output teradataml DataFrame attribute name is:
1. output
2. output_table
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example.
load_example_data("histogram", ['bin_breaks', 'cars_hist'])
# The 'cars_hist' table has the cylinder (cyl) and horsepower (hp)
# data for different car models.
# The 'bin_breaks' table has the boundary values for the custom
# bins to be used while generating the histogram.
# Create TeradataML DataFrame objects.
cars_hist = DataFrame.from_table('cars_hist')
custom_bin = DataFrame.from_table('bin_breaks')
# Example 1: Using auto_bin.
result = Histogram( data = cars_hist,
value_column = 'hp',
auto_bin = 'Sturges'
)
# Print the results
print(result.output_table)
# Example 2: Using start_value, end_value and bin_size.
result = Histogram( data = cars_hist,
value_column = 'hp',
inclusion = 'left',
start_value = 100.0,
end_value = 400.0,
bin_size = 100.0
)
# Print the results
print(result.output_table)
# Example 3: Using custom_bin_table.
result = Histogram( data = cars_hist,
value_column = 'hp',
inclusion = 'left',
custom_bin_table = custom_bin,
custom_bin_column ='break'
)
# Print the results
print(result.output_table)
# Example 4: Using groupby_columns on auto_bin feature.
result = Histogram( data = cars_hist,
value_column = 'hp',
inclusion = 'left',
auto_bin = 'STURGES',
groupby_columns = 'cyl'
)
# Print the results
print(result.output_table)
# Example 5: Using right 'inclusion' feature.
result = Histogram( data = cars_hist,
bin_size = 50.0,
start_value = 20.0,
end_value = 400.0,
value_column = 'hp',
inclusion = 'right'
)
# Print the results
print(result.output_table)
- __repr__(self)
- Returns the string representation for a Histogram class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|