The TD_QQNorm function is a Q-Q (quantile-quantile) norm method that compares the distribution of a data set to a normal distribution. TD_QQNorm checks whether the values in an input table columns are normally distributed. The function returns the quantiles of the column values and corresponding theoretical quantile values from a normal distribution. If the column values are normally distributed, then the quantiles of column values and normal quantile values appear in a straight line with a slope of 1, when plotted on a graph.
The data is first sorted in ascending order, and then the corresponding quantiles are calculated. Next, the expected quantiles are calculated based on the theoretical distribution being compared to. For a normal distribution, the expected quantiles are calculated based on the mean and standard deviation of the data.
When plotted on a graph, the function output displays the quantiles of the dataset against the expected quantiles of the theoretical distribution, usually on a scatter plot. Deviations from a straight line indicate deviations from normality.
The TYD_QQNorm function is commonly used in statistics and data analysis to check the assumptions of statistical models that rely on normality, such as linear regression.
Quartile normalization is a data preprocessing technique commonly used in bioinformatics and statistics to normalize gene expression data. The method aims to remove technical variation that can occur between samples due to differences in data acquisition or processing. It is often used to ensure that data from different samples or platforms are comparable, by adjusting for systematic differences in the data that may arise due to different experimental conditions.
In quartile normalization, the data is sorted in ascending order, and then divided into four equal-sized groups, or quartiles. The median value of each quartile is then computed, and these medians are used to adjust the data values so that the medians across all samples are the same. This equalizes the distribution of the data across samples, making it easier to compare gene expression levels between samples.
Quartile normalization is often used as a preprocessing step for other analyses, such as differential gene expression analysis or clustering.
- For each element in your dataset, rank the expression values from smallest to largest. This creates a new dataset where each value is represented by a rank.
- Calculate the average rank for each value across all samples. This gives you a new dataset where each element is represented by a single value, which is the average rank across all samples.
- Sort the average ranks from step 2 in ascending order. This creates a new dataset that represents the order of the values from lowest to highest average rank.
- Calculate the quantiles of the sorted average ranks, using the bins or quantiles. For example, if we want to use 100 quantiles, we would divide the sorted average ranks into 100 equal-sized bins and calculate the quantiles for each bin.
- For each sample in your dataset, map the original expression values to their corresponding quantiles from step 4. This ensures that the distribution of expression values within each sample is the same as the distribution of average ranks across all samples.
Once you have the normalized data using quantile normalization, you can create a QQ norm table by tabulating the observed quantiles of your dataset against the expected quantiles from a theoretical distribution, such as the normal distribution. If your data is normally distributed, the points on the QQ plot fall along a straight line. If there are deviations from normality, such as skewness or heavy-tailedness, the points on the QQ plot deviate from the straight line.