dChip: Compare samples

 

Compute group means                          Comparison criteria                              t-statistic and its p-value

Compute false discovery rate               Combine comparisons                           Compare result file

Comparison plot

 

Another high-level analysis performed by dChip is the comparison analysis. Given two samples or two groups of samples, we want to identify genes that are reliably differentially expressed between the two groups. dChip provides several filtering criterion to filter out interesting genes. Select menu “Analysis/Compare Samples”:

 

 

Select one or more arrays in the listbox of the Baseline group and the Experiment group. The current “Array list file” specifies the samples used in the comparison and “pooling replicate arrays” will be performed before the comparison if needed. The pooled replicate arrays are indicated by the ending *’s in the sample name.

If a “Sample information file” is specified in “Analysis/Open Group/Other information” dialog, we can use the “Select by category” button to specify samples with particular annotation:

           

 

Check one option among “Overwrite”, “Or”, “And” and “Use inversion” to specify how the specified samples with a particular annotation is combined with the existing selected sample in the “Baseline” and “Experiment” sample list. For example, the “Use inversion” option is for selecting samples not having a particular annotation.

 

Compute group means and their standard errors

 

“E” and “B” in the dialog stand for the group means, which are the mean expression level of samples in the groups. When a group contains multiple samples, the group mean and its standard error (SE) is computed by pooling arrays considering measurement accuracy (Li and Wong 2003, page 5, section 5.2.4), where samples in the same group are regarded as “replicates”. When a group contains only one sample, the group mean is the expression value in this sample, and its SE is the attached standard error (if this sample corresponds to one array, the SE is the model-based standard error). The SE is set to 0 if the expression values are from “Analysis/Get External Data”, are truncated or log-transformed.

 

The standard errors (SE) attached to MBEI values are array-specific measurement errors, and they are obtained using the model by comparing the probe response pattern in a particular array to the pattern seen in most arrays (Li and Wong 2001a). This standard error is usually smaller than the variation between replicate arrays or samples. For example, given two replicate arrays, the distribution of (MBEI1 – MBEI2) / sqrt(SE1^2 + SE^2) (of all probe sets) can have much heavier tail (can reach 40 or larger) than the normal distribution and is more like the t distribution with d.f. 2-3. Thus when comparing two single arrays with lower bound fold change or t-test, the standard errors of MBEI help to prevent unreliable MBEI (with large SE) from giving the erroneous result, but does not substitute the use of true replication arrays or samples to obtain more realistic lower bound fold change or t-test p-values, since the biological or experimental variation can only be estimated through such replicates.

 

Pooling arrays is mainly for down-weighting the unreliable expression values with large standard errors when computing group means. As a result, the calculated means, standard errors of means, and t-statistics (stored in the “compare result file”) may be slightly different than those computed by the traditional t-test software such as Excel. The general effect of considering measurement error is to avoid under-estimation of the standard error of group means when the sample size is small, and thus lead to less significant p-values than the standard analysis. In V1.3+, uncheck “Tools/Options/Analysis/Consider measurement error when average” to compute group means and standard errors without considering measurement error. For example, the group mean is the simple average.

 

One note is that the replicate pooling procedure is carried out by resampling. As a result, when one switch the order of arrays in the baseline or comparison group, the random number generator yields random numbers in a different order for each array and thus the groups means and standard errors will be also slightly different. Another example is that when the baseline and the comparison group are switched at paired t-test, the difference values between paired samples are reversed and eventually this lead to different t-statistics and p-values. In this case, the intersection (e.g. by “Combine comparisons”) of the two gene lists when switching the B and E groups are more reliable.


Comparison criteria

 

Next we can check to select the filtering criterion to be applied when comparing the two groups. The compare criterion (1) requires the fold change between the group means to exceed a specified threshold. We can also check the “Use lower 90% confidence bound” to specify to use the lower confidence bound of fold changes for filtering. The computation of confidence intervals of fold changes and the application of using lower bound as filtering criterion are described in Li and Wong 2001b (page 6, section “Confidence interval for fold change”). Theta_1_hat and Theta_2_hat are the E and B described above (expression values in one samples or the mean expression of samples in the baseline or experiment group). The motivation is that we are really interested in the fold change of real expression values r = Theta_1 / Theta_2, rather than observed fold change Theta_1_hat / Theta_2_hat. 

 

A negative expression value (or group mean) for a probe set is truncated to 1 before calculating fold changes. When one expression is large (say 1000) and the other is -10 (at noise level of absent genes), it is helpful to bring -10 to a small positive number so a large fold change is calculated and the gene gets selected. In the comparison output file, fold changes are always larger than 1 with signs representing the direction of changes, while confidence intervals are for the absolute value of the fold change. We have found it useful to order genes by the lower confidence bound ("Lower CB" columns), which is a conservative estimate of the real underlying fold change. In some cases, one may get (0, infinity) as the confidence interval, indicating very unreliable estimated fold change.

 

The compare criterion (2) specifies the threshold for absolute difference between the two group means. Since the down-regulated genes and the up-regulated genes may have different change magnitude, once can specify different fold change criterion for E/B and B/E and different mean difference criterion for E-B and B-E. If the expression values have been log-transformed, E-B and B-E refer to the log fold change (logE* – logB* = log (E*/B*) when E* and B* are in original scale), and correspondingly the E-B and B-E threshold should also be specified in log scale. The criterion (1) is generally not used since they are more difficult to interpret when the expression value are in log scale.

 

The compare criterion (3) tests whether the mean difference in (2) equals to zero by the unpaired t-test. See t-statistic and its p-value. The default p-value threshold 0.05 filters genes that differ in group means with a two-tailed p-value < 0.05. When the sample size is small, the t-statistics and p-values should better be used only as a ranking or filtering device rather than the actual significance value, especially in the context of applying t-test on thousands of genes. If no multiple comparison adjustment is applied, one can specify very stringent p-value threshold (e.g. 0.05 divided by the total number of genes on the array; this is Bonferroni correction assuming all genes are independent).

 

The compare criterion (4) requires that the percentage of samples called “Present” in the samples used in both groups be larger than a threshold. In V1.3+, this criterion can be specified for the baseline and the experiment group separately. This can be used to obtain genes called as P in 100% samples of the B group and called as A in 100% samples of the E group. For example, if Comparison 1 is “P call of B >= 100% and P call of E >= 0%” and Comparison 2 is “P call of B >= 0% and P call of E >= 1%”, in "Compare samples/Combine comparisons" one can use "And not" to combine the two comparisons: Comparison 1 "and not" Comparison 2.

 

The compare criterion (5) specifies that the p-value threshold for the paired t-test. The “paired t-test” assumes the first baseline sample to be compared with the first experiment sample, and so on. To ease the pairing specification the samples can first be ordered at “Tools/Array list file”. When the baseline and experimental arrays are paired, using paired t-test has larger power than the unpaired t-test (detect more real changes).

 

Combing these statistics can help us focus attention on a small set of interesting genes. Typically one may need to estimate the number of differentially expressed genes beforehand, look at these statistics for known differentially expressed genes, and experiment with different parameters to finally filter a reasonable set of interesting genes.

 

t-statistic and its p-value

 

The t-statistic is computed as (mean1 – mean2) / sqrt(SE(mean1)2 + SE(mean2) 2) (see Compute group means), and its p-value is computed based on the t-distribution, and the degree of freedom is set according to Welch modified two-sample t-test (see S-PLUS t.test function; the same is used in Excel TTEST() function for unequal variance case). It is required to have at least two samples in each group to perform unpaired and paired t-test.

 

When “Tools/Options/Analysis/Consider measurement error when average” is unchecked, group means and standard errors will be computed without considering measurement error, and thus agree with the results from standard analysis such as Excel two-sample t-test. The effect of considering measurement error is to avoid under-estimation of the standard error of group means when the sample size is small, and thus lead to less significant p-values than the standard analysis.

 

[Analysis example] We need to pay attention to the degree of freedom of the t-test. If both groups have one sample (e.g. each is the pooled result of N replicated arrays in the “Array list file”), the degree of freedom for the t-test is 1+1-1 = 1 (in old dChip version, d.f. is N1+N2-1 to accommodate 1 vs. 1 comparison); however if we do not pool replicates but specify N vs. N comparison in this dialog to compare the two groups of arrays, the degree of freedom for t-test is N+N-1 > 1. The former gives less significant p-values for the same t-statistics and results fewer filtered genes. Therefore the N vs N comparison without pooling replicates is preferred to the 1 vs. 1 comparison. In V1.3 5/21/03+, 1 vs 1 t-test is not permitted.

 

Compute false discovery rate

 

Since tens of thousands of genes are compared, many genes can be false positives. However, genes are not all independent and genes in the same pathway could have similar t-statistics or p-values. Methods of multiple-comparison adjusted p-values (Dudoit et al. 2002b), False Discovery Rate (FDR, Tusher et al. 2001; SAM software, Q-VALUE software) and resampling-based multiple test (Pollard and van der Laan 2003) have been proposed to handle the multiple comparison issues in the context of microarray data.

 

In dChip, the empirical false discovery rate can be estimated by permutation. For more rigorous FDR computation, use the SAM or Q-VALUE software. Check “Analysis/Compare samples/Combine comparison/Permute samples to assess False Discovery Rate” to assess how many genes would be obtained by the same comparison criteria when permuting group labels randomly. Usually it is good to try different comparison criteria to finally obtain a list of genes with median FDR < 5% or 10%. Such approach has been pioneered by Golub et al. 1999 and Tusher et al. 2001. FDR can be applied only when they are more than 1 sample in each group. Also it should be applied when comparing on all genes, instead of only on the selected gene list such as an ANOVA filtered gene list.

 

FDR in dChip is computed by applying the original comparison criteria (e.g. fold change, p-value) to the sample-wise permuted datasets and recording the number of obtained genes at each such permutation. After a number of permutations (say 100), we get 100 values of the number of genes in the obtained gene lists. The median of these 100 values is reported as the median FDR, and the 90-th percentile (90th largest value) of these 100 values is reported as 90% FDR, a more conservative FDR estimate. When the number of permutation is small (e.g. 100), the FDR reported in different runs could be more variable.

 

Combine comparisons

 

After specifying these criterion a user can either click “OK” to proceed with comparison or go to “Analysis/Compare Samples/Combine comparisons” tab to specify other advanced options. For example, one compares sample group A and B and get a gene list 1, and compare sample group C and D and get a gene list 2. One may be interested in what genes are common in both lists, or what genes are in list 1 but not list 2. “Combine comparisons ” is for this purpose.

 

 

The comparison parameters in “Compare Samples” tab are automatically added in the grid sheet. We can add multiple comparisons and combine them using logical relations. To do this, click the last line of the grid sheet (the line with a single right parenthesis ‘)’) to highlight this line, then click “Compare Samples” tab on the top to specify other comparison samples and filtering criterion, then click “Combine comparisons” tab on the top to come back to this dialog, select “Combine Type” to be “AND” or “OR”, then click “Insert comparison” button to insert the new comparison. Parentheses can also be inserted after the highlighted line in the grid sheet using the “Insert parenthesis” button, to specify two-level logic combination of comparisons.

 

In addition, “not” relation can be specified by the “And not” or “Or not” combine type (“Insert complement” button for Version 1.0). This is useful, for example, when one wants a list of genes that are up-regulated by growth factor A when comparing to the control but not the genes that are also up-regulated by growth factor B.

 

To change a comparison condition, highlight the corresponding line and click “Compare samples” tab to change comparing and filtering parameters, then click “Combine comparisons” tab to come back and the highlighted line will be changed; to insert a new comparison at a certain row, first click to highlight the last line (the line with single right parenthesis ‘)‘), then click “Compare samples” tab to specify comparing and filtering parameters, then click “Combine comparisons” tab and click the grid sheet to highlight a line after which the new comparison is to be inserted, and click “Insert comparison” button. The “Delete entry” button will delete a comparison, and the logical relations associated with parentheses may be used in the deleted row.

 

The comparison can be restricted to an existing gene list or its complement set, if a “Gene list file” (a tab-delimited text file with the first column of each row being the probe set name) is specified in the “Compare on gene list” button. Click the button multiple times to switch between “using all genes”, “using gene list” and “excluding gene list”.

 

Compare result file

 

Specify an output “compare result file”. This file contains genes that pass the filtering criterion (marked by ‘*’ in the “Filtered” columns) and the comparison statistics for these genes. A previously saved “compare criterion file” can be loaded by specifying the file and clicking the “Use” button (“Use the criteria in this file” in V1.3+). One may also manually edit the "compare criteria file" (save in text format) and then click the "Use" button to use the specified comparison. In V1.3+, the compare criterion are saved in the beginning of the “Compare result file” if  “Output comparison criteria” is checked, instead of a separate “Compare criterion file”.

 

Sometimes we may wonder why some known differentially expressed genes are not selected by the comparison. In such case it is helpful to check the button “Output all genes” to output the statistics of all genes, and then check the statistics for known genes to suggest refined filtering criterion.

 

To output the comparison statistics not used in the filtering, check the corresponding compare criterion in the “Compare samples” dialog but specify non-informative thresholds. For example, set the p-value to 1 for criterion (3) or (5), or E/B value to be 0 (since the lower confidence bound of a > 1 fold change can be less than 1) for criterion (1), so that these statistics are calculated and exported but do not influence the comparison. Check the option “Tools/Options/Analysis/Treat outlier expression as missing values” to treat “array-outlier” expression values as missing data in the comparison analysis and compare result file (output as blanks).

 

After clicking “OK” a “Compare criterion file” will be saved. Its file name is stored in the Windows clipboard automatically, so one can use Control+V to paste the file name into other dialogs. The “compare result file” can be used as the “Gene list file” in “Analysis/Hierarchical clustering” dialog to visualize their expression changes and identify functional groups enriched in the comparison gene lists. It is also useful to specify this result file for classifing genes by annotation terms.

 

[New] [Analysis example] For a list of genes, we want to view their expression values in clustering figure and annotate genes with fold changes, like this example figure (data courtesy of Jooeun Bae). Use “Compare samples” to compare this list of genes and output all genes, and then copy the probe set name, gene names and fold change columns into a new Excel file and sort by gene names or other criteria. Then copy the probe set names into a text file and save as a “gene list file”. Finally perform hierarchical clustering using this gene list but uncheck “Cluster genes”. Use “View/Export image” to copy and paste EMF file into the Excel file, and stretch the figure to align with the rows.

 

Comparison plot

 

[V2/26/07++] At "Analysis/Compare samples/Combine comparisons", check "Make M-A plot" to plot log ratio against log product of the two groups' mean expression level.  The genes selected by "Compare samples" will be highlighted in red circles. This plot is useful to check the effect of using different comparison criteria. Genes selected by other methods may also be highlighted, by unchecking all comparison criteria and specifying a gene list file at "Combine comparisons/Compare on gene list".