dChip: Array list file

 

Handling batch effect                          Handle replicate arrays

 

An “Array list file” may be created to specify which arrays should be used and in what order in the hierarchical clustering or other analysis. To do this, select “Tools/Array list file”:

Use “Control”, “Shift” keys and left mouse button to select arrays from the “All arrays” list and click the “Add array” button to add arrays to the “Arrays to be used” list. Click an array name in the “Arrays to be used” list to add arrays immediately after this array. Select arrays in the “Arrays to be used” list and click “Delete” button to delete arrays. Then click the “Save & Close ” button to save the “Array list file” as the specified file after the “Open” button. This file is a text file whose content resembles what we see in the “Arrays to be used” list, thus one can also add all arrays first and then save and manually edit the file to create a complex “array list file”.

 

One can also use the “Open” button to specify an exiting “Array list file”. If no “Array list file” is necessary and we want to use all arrays in their natural order, clear the file name after the “Open” button and click “Save & Close” (in V1.3+ click the “Do not use” button).

 

After saving the file the “Array list file” is automatically used. Then “Analysis/Filter Genes” will restrict gene filtering procedures and hierarchical clustering to only samples specified in the “array list file”. If “Cluster samples” is not checked, samples will be displayed by their order in the “Array list file” in the clustering picture.

Handling batch effect by standardize separators

Sometimes we may observe that the arrays from two array batches or obtained at different array cores are not directly comparable, even after the array brightness is adjusted by normalization. That is, the batch (or core/operator/machine) variation becomes as large as or larger than the biological (treatments, tissue type) variation, such that samples are clustered by batches instead of by biological factors. At other times, we may be mainly interested in the differences between different treatments applied to several cell lines, and the gene expression differences between cell lines are confounding and need to be eliminated.

To eliminate the confounding batch/cell line effects, we can insert “Standardize separators” in “array list file” to standard the samples in the same batch/cell line separately before performing “Filter genes” and clustering:

Treatment A on cell line 1
Treatment B on cell line 1
Treatment C on cell line 1
Treatment D on cell line 1
---Standardize separator---
Treatment A on cell line 2
Treatment B on cell line 2
Treatment C on cell line 2
Treatment D on cell line 2

That is, instead of standardizing a gene to have mean 0 and standard deviation 1 across all samples, we standardize its expression values in the same cell line to have mean 0 and standard deviation 1. Samples separated by “Standardize separators” will be standardized within themselves at the “Filter genes” and “Hierarchical clustering” steps. Usually filtering genes should be re-performed after inserting or deleting “Standardize separators”.

For adjusting batch effect to work well, array list file requires > 10 samples in each standardize group and each group has similar composition of samples types (e.g. it’s not good to have one group to all have normal samples and another group all tumor samples).

The “Standardize separator” has higher precedence than “Replicate separator”. Arrays after a “Standardize separator” will be considered as single replicate unless “Replicate separators” are inserted again. Also, if “Use standardize separator” is unchecked (or “Only draw lines for standardize separator” is checked in older version) and samples are not clustered, “Standardize separators” just add vertical lines between group of samples in the clustering picture but do not perform within-group standardization.

 

After an “Array list file” with "Standard separator" is used (“Use standardize separator” is checked (or “Only draw lines for standardize separator” is unchecked)), when performing “Analysis/Filter genes” the "Standard deviation / Mean” statistic in criteria (1) is the average " Standard deviation / Mean " of the sample groups separated by "Standardize separators". Note that “Standardize separators” affect the “Filter genes” and “Hierarchical Clustering” results when “Use standardize separator” is checked (or “Only draw lines for standardize separator” is unchecked in older version).

[Analysis example] “Standardize separators” do not affect the “Compare samples” result, where the original expression values are used. However, after “Compare samples” is performed within each batch individually (may also use “Combine comparisons”), the comparison gene list can be visualized by hierarchical clustering using an “Array list file” with proper “Standardize separators”. Alternatively, one may use batch-wise standardized values for sample comparison pooling samples of all batches. To do this, first cluster all genes in dChip (specify the gene info file as the gene list file, uncheck “Tools/Options/Clustering/Pre-calculate distance”, and use Standardize separators in array list file to separate batches (check “Use standardize separator” (or uncheck “Only draw lines…”)), select the whole gene branch, and export the batch-wise standardized values (Clustering/Selected branch/Export data). Then read this file back by “Analysis/Get external data” and proceed to compare samples. However since the magnitude of original expression value is lost, fold change is not meaningful, and the mean difference threshold should be adjusted to be comparable to the standardized values normally within [-3, 3].

Handling replicate arrays

 

Note: Generally there is no need to insert “Replicate separators” and samples in different compare groups can be specified at Compare samples or Analysis of variance.

 

Microarray experiments often include replicate arrays for some or all experiment conditions. If there are multiple level of replicates (e.g. different individuals in the experiment group, each individual has two samples taken, and each sample has two IVT replicates), the arrays in the lowest replicate level can be first combined and then enter “Filter genes” or “Compare samples”. “Replicate separators” can be inserted into “array list file” to achieve this purpose:

sample1 replicate 1
sample1 replicate 2
---Replicate---
sample2 without replicate
---Replicate---
sample3 replicate 1
sample3 replicate 2
sample3 replicate 3

Expression values for the same gene in replicate arrays separated by “Replicate separators” will be pooled considering measurement error (Li and Wong 2003, page 5, section 5.2.4). Absolute calls for the pooled sample will be determined by the “majority-vote” scheme: if the number of “P” / the number of replicates >= 0.5, the probe set is called "P" (otherwise "A").

 

Click an array name in “Arrays to be used” listbox to add a “Replicate separator” after it. If no “Replicate separator” ever occurs in an “array list file”, all arrays will be treated as no replicate. There is also a button to automatically add “Replicate separators” separating array names starting with identical strings.

 

Replicate arrays can be used to filter genes in “Analysis/Filter genes” tab. Criteria 3 there requires a gene’s expression level within replicate arrays to be consistent, in terms of median “Standard deviation/mean” ratio of all replicate groups where this gene is called “Present”. This filtering excludes genes that are inherently variable during sample amplification and hybridization stages, as manifested in inconsistent expression values between replicate arrays. A similar but more involved filtering scheme can be found in Perou et al 2000.

 

In the “Analysis/Model-based Expression/Export” dialog, expression values for replicate arrays will be pooled before exporting. In “Select arrays to be exported” listbox, sample names are the name of the first array, followed by “*” referring how many additional replicate arrays are pooled for this sample. To not use any “array list file” and treat each array as one sample, one can clear the file name after the “Specify file name” button and then click “Save & close”.

[Analysis example] One has biological duplicates (clone 1 and clone 2) and technical duplicates (duplicate in each clone) for each time point. We may use replicate separators to pool the technical duplicates, and then specify clone 1 and clone 2 individually in “Compare samples”. This way we consider the two level of replication variance. Alternatively the four arrays can be regarded all as replicates at one level and specify them individually at “Compare samples”. If the clone variation is larger then technical variation, it will give similar result as the first option, but has larger degree of freedom when doing t-test. The 2nd option is used in Tusher et al. 2001.

(Updated 9/20/05)