Correlation filtering Sample correlation matrix Analysis of variance (ANOVA)
Before considering supervised analysis below, one can first try unsupervised gene filtering and then gene clustering (order samples by time in array list file and do not cluster samples) to obtain clusters of genes of all possible interesting time patterns. One can find expected increasing or decreasing patterns, early or late response patterns, as well as surprising patterns of biological interest. Clustering these genes can also reveal gene clusters enriched by functions. When the samples are not clustered and ordered by time, select “Tools/Options/Clustering/Gene ordering by peaking time” to order gene clusters by peak time.
Use “Analysis/Analysis of Variance” to perform supervised analysis of variance (ANOVA) or correlation analysis. Sample information file can be used to specify sample properties. If a discrete sample variable (e.g. grade) is specified as “factor”, ANOVA will be performed to find genes correlated with this variable. If a continuous sample variable (e.g. age) is specified as “factor”, correlation analysis will be performed to find genes correlated with this variable. However, if the “Clustering view” is the current view and a gene or branch is selected, correlation analysis will be performed to find genes correlated with this gene or branch.

Correlation filtering and use time as continuous variable
See Lu et al. 2004 for examples. Use “Analysis/Analysis of Variance” to filter genes by its correlation with time, by selecting “Time(numeric)” as factor and specify a small p-value (to take into account the larger number of genes). Try different p-values to obtain reasonable number of genes. Use “Analysis/Hierarchical clustering” to view the obtained gene list. The p-value is testing the null hypothesis that the correlation is 0; see equation (14.5.5) in Numerical Recipes, Section 14.5 and equation (14.6.2) in section 14.6.
If in “Clustering view”, a gene or a branch of gene is selected, “Analysis/Analysis of Variance” will perform correlation filtering using this gene or branch as template to find genes that have similar profile with this gene or gene branch. To use discrete factors for ANOVA filtering or continuous variable for correlation filtering, click the “Analysis” icon on the left and then select menu “Analysis/Analysis of Variance”.
“Analysis/Hierarchical clustering” can be used to view the filtered gene list.
See Lu et al. 2004 (Figure 1b) or below for examples. When samples are clustered or ordered by “Tools/Array list file”, select “Clustering/Sample correlation matrix” to output a file of sample correlation matrix using gene-wise standardized expression values (not raw expression values) of the genes used in clustering. Because the genes with large magnitude in raw expression values will dominate the correlation, it is more meaningful to use gene-wise standardized values for computing sample correlations using a set of genes. Such correlation matrix is the basis of the original hierarchical clustering (left figure below, data courtesy of Tao Lu), and may facilitate the perception of sample relations. [V11/18/07+] The correlation matrix will be displayed in the Plots view (right figure below). Use Left/Right arrow to change color scale, and Up/Down arrows to change image size.

[Old method] To view this correlation matrix, use “Analysis/Get external data” to read this data file back, and use “Analysis/Hierarchical clustering” to view the matrix (select the same data file as “gene list file”), with “Clustering samples” and “Clustering genes” unchecked, “Options/Standardize rows” unchecked and “Options/Displaying range” to be 1 or smaller (correlations are between –1 and 1). The goal of this function is mainly to visualize the correlation among the samples, without re-clustering the samples based on the correlation matrix, since the samples are already ordered by the clustering results in the external data file. However if you do want to cluster samples using the correlation matrix, uncheck "Options/standardize rows" and "Standardize columns", since the correlation values are already within [-1, 1].
ANOVA and use time
as discrete category
Analysis of variance (ANOVA) compares more than two groups to detect unequal group means (See StatSoft introduction). Specify a sample information file at “Analysis/Open group/other information”, which has information for samples: one column for time or any factor (e.g. day1, day2…), one column for treatment (e.g. control, treatment) and can have other category information of samples. Do not specify time as “(numeric)” so it will be recognized as a discrete factor.
For both one-way or two-way ANOVA, we need replicate samples for each value of a factor. In this example sample information file, “Time” has 2 values, each has 3 replicates; “Treat” has 3 values, each has two replicates. If a value has only one replicate, you may use “Compare samples” functions instead of ANVOA.
After “Open group”, use
“Tools/Array list file” to order samples in desired order for display. Do not use any replicate separators; otherwise they will be
averaged as one sample. Sample information file supplies this information for
ANOVA purpose. Also do not truncate negative expression values to 0,
since this may make the mean of replicates the same as individual values and
cause 0 residual value when fitting ANOVA (reported by
Cristina Rubio).
The raw expression values will be standardized gene-wise (make mean 0 and standard deviation 1 across all samples) before performing ANOVA. This gives the same result as the raw expression value since it is linear transformation unless there are different standardize groups. In this case there are different batches with possible batch effects, and one can separate batches by “Standardize separators” and check “Use standardize separators” for the functions below to see if the results are better. Adjusting away the batch effect by using standardize separators can lead to more genes at the same p-value threshold.
Select “Analysis/Analysis of Variance”. At “Using factors”, click to select one or two factors. This is to filter genes by time alone (genes that change by time, but similar across treatment), treatment alone (genes that change by treatment, but similar across time) or both (genes that have different time patterns across treatments). Samples with the same conditions of the selected factors are regarded as replicates in ANOVA. When two factors are selected at “Analysis/Analysis of Variance”, replicates are needed for each factor 1 by factor 2 combination to perform two-way ANOVA, and the p-value is for testing interaction.
In addition, one can add a new factor (e.g. with name “Group”) with different values for each time/treatment combination. For example, if time has “Young”, “Mid” and “Old” groups and treatment has “Control” and “Treatment” groups, the new factor will have 6 different values, from 1 (Young/Control), 2 (Yong/Treatment) to 6 (Old/Treatment). Then select only this “Group” factor at “Analysis/Analysis of Variance” and perform one-way ANOVA. The filtered genes will differ in any pattern between the 6 groups, but have similar expression levels within each group. These genes could contain the genes obtained by ANOVA filtering by time or treatment alone or both, thus effectively combine one-way and two-way ANOVA involving time and treatment in one filtering. Note this may not work well if one-way and two-way ANOVA require different p-value threshold and in a single filtering genes from one can dominate the other.
Genes with p-value less than a threshold will be exported in the specified file. This gene list may be used at “Analysis/Filter genes/Filter on gene list” to combine ANOVA filtering with other gene filtering criterion.
(Updated 11/18/07)