TD_QQNorm Function | QQNorm | Teradata Vantage - TD_QQNorm - Analytics Database

Database Analytic Functions

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-06
dita:mapPath
gjn1627595495337.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
jmh1512506877710
Product Category
Teradata Vantageā„¢

The TD_QQNorm function is a Q-Q (quantile-quantile) norm method that compares the distribution of a data set to a normal distribution. TD_QQNorm checks whether the values in an input table columns are normally distributed. The function returns the quantiles of the column values and corresponding theoretical quantile values from a normal distribution. If the column values are normally distributed, then the quantiles of column values and normal quantile values appear in a straight line with a slope of 1, when plotted on a graph.

The data is first sorted in ascending order, and then the corresponding quantiles are calculated. Next, the expected quantiles are calculated based on the theoretical distribution being compared to. For a normal distribution, the expected quantiles are calculated based on the mean and standard deviation of the data.

When plotted on a graph, the function output displays the quantiles of the dataset against the expected quantiles of the theoretical distribution, usually on a scatter plot. Deviations from a straight line indicate deviations from normality.

The TYD_QQNorm function is commonly used in statistics and data analysis to check the assumptions of statistical models that rely on normality, such as linear regression.

Quartile normalization is a data preprocessing technique commonly used in bioinformatics and statistics to normalize gene expression data. The method aims to remove technical variation that can occur between samples due to differences in data acquisition or processing. It is often used to ensure that data from different samples or platforms are comparable, by adjusting for systematic differences in the data that may arise due to different experimental conditions.

In quartile normalization, the data is sorted in ascending order, and then divided into four equal-sized groups, or quartiles. The median value of each quartile is then computed, and these medians are used to adjust the data values so that the medians across all samples are the same. This equalizes the distribution of the data across samples, making it easier to compare gene expression levels between samples.

Quartile normalization is often used as a preprocessing step for other analyses, such as differential gene expression analysis or clustering.

To use quantile normalization to formulate QQ norm data, follow these steps:
  1. For each element in your dataset, rank the expression values from smallest to largest. This creates a new dataset where each value is represented by a rank.
  2. Calculate the average rank for each value across all samples. This gives you a new dataset where each element is represented by a single value, which is the average rank across all samples.
  3. Sort the average ranks from step 2 in ascending order. This creates a new dataset that represents the order of the values from lowest to highest average rank.
  4. Calculate the quantiles of the sorted average ranks, using the bins or quantiles. For example, if we want to use 100 quantiles, we would divide the sorted average ranks into 100 equal-sized bins and calculate the quantiles for each bin.
  5. For each sample in your dataset, map the original expression values to their corresponding quantiles from step 4. This ensures that the distribution of expression values within each sample is the same as the distribution of average ranks across all samples.

Once you have the normalized data using quantile normalization, you can create a QQ norm table by tabulating the observed quantiles of your dataset against the expected quantiles from a theoretical distribution, such as the normal distribution. If your data is normally distributed, the points on the QQ plot fall along a straight line. If there are deviations from normality, such as skewness or heavy-tailedness, the points on the QQ plot deviate from the straight line.