dChip: Array list file
An “Array list file” may be created to specify which
arrays should be used and in what order in the hierarchical clustering or other
analysis. To do this, select “Tools/Array list file”:

Use “Control”, “Shift” keys and left mouse button to
select arrays from the “All arrays” list and click the “Add array” button to
add arrays to the “Arrays to be used” list. Click an array name in the “Arrays
to be used” list to add arrays immediately after this array. Select arrays in
the “Arrays to be used” list and click “Delete” button to delete arrays. Then
click the “Save & Close ” button to save the “Array list file” as the
specified file after the “Open” button. This file is a text file whose content
resembles what we see in the “Arrays to be used” list, thus one can also add
all arrays first and then save and manually edit the file to create a complex
“array list file”.
One can also use the “Open” button to specify an
exiting “Array list file”. If no “Array list file” is necessary and we want to
use all arrays in their natural order, clear the file name after the “Open”
button and click “Save & Close” (in V1.3+ click the “Do not use” button).
After saving the file the “Array list file” is
automatically used. Then “Analysis/Filter Genes” will restrict gene filtering
procedures and hierarchical clustering to only samples specified in the “array
list file”. If “Cluster samples” is not checked, samples will be displayed by
their order in the “Array list file” in the clustering picture.
Sometimes we may
observe that the arrays from two array batches or obtained at different array
cores are not directly comparable, even after the array brightness is adjusted
by normalization. That is, the batch (or core/operator/machine) variation
becomes as large as or larger than the biological (treatments, tissue type)
variation, such that samples are clustered by batches instead of by biological
factors. At other times, we may be mainly interested in the differences between
different treatments applied to several cell lines, and the gene expression
differences between cell lines are confounding and need to be eliminated.
To eliminate the
confounding batch/cell line effects, we can insert “Standardize separators” in
“array list file” to standard the samples in the same batch/cell line
separately before performing “Filter genes” and clustering:
Treatment A on cell line 1
Treatment B on cell line 1
Treatment C on cell line 1
Treatment D on cell line 1
---Standardize separator---
Treatment A on cell line 2
Treatment B on cell line 2
Treatment C on cell line 2
Treatment D on cell line 2
That is, instead
of standardizing a gene to have mean 0 and standard deviation 1 across all
samples, we standardize its expression values in the same cell line to have
mean 0 and standard deviation 1. Samples separated by “Standardize separators”
will be standardized within themselves at the “Filter genes” and “Hierarchical
clustering” steps. Usually filtering genes should be re-performed after
inserting or deleting “Standardize separators”.
For adjusting
batch effect to work well, array list file requires > 10 samples in each
standardize group and each group has similar composition of samples types (e.g.
it’s not good to have one group to all have normal samples and another group
all tumor samples).
The “Standardize separator” has higher precedence
than “Replicate separator”. Arrays after a “Standardize separator” will be
considered as single replicate unless “Replicate separators” are inserted
again. Also, if “Use standardize separator” is unchecked (or “Only draw lines
for standardize separator” is checked in older version) and samples are not
clustered, “Standardize separators” just add vertical lines between group of samples
in the clustering picture but do not perform within-group standardization.
After an “Array list file” with "Standard
separator" is used (“Use standardize separator” is checked (or “Only draw
lines for standardize separator” is unchecked)), when performing
“Analysis/Filter genes” the "Standard deviation / Mean” statistic in
criteria (1) is the average " Standard deviation / Mean " of the
sample groups separated by "Standardize separators". Note that
“Standardize separators” affect the “Filter genes” and “Hierarchical
Clustering” results when “Use standardize separator” is checked (or “Only draw
lines for standardize separator” is unchecked in older version).
[Analysis
example] “Standardize separators” do not affect the “Compare samples” result,
where the original expression values are used. However, after “Compare samples”
is performed within each batch individually (may also use “Combine
comparisons”), the comparison gene list can be visualized by hierarchical
clustering using an “Array list file” with proper “Standardize separators”.
Alternatively, one may use batch-wise standardized values for sample comparison
pooling samples of all batches. To do this, first cluster all genes in dChip
(specify the gene info file as the gene list file, uncheck “Tools/Options/Clustering/Pre-calculate
distance”, and use Standardize separators in array list file to separate
batches (check “Use standardize separator” (or uncheck “Only draw lines…”)),
select the whole gene branch, and export the batch-wise standardized values
(Clustering/Selected branch/Export data). Then read this file back by
“Analysis/Get external data” and proceed to compare samples. However since the
magnitude of original expression value is lost, fold change is not meaningful,
and the mean difference threshold should be adjusted to be comparable to the
standardized values normally within [-3, 3].
Handling
replicate arrays
Note: Generally there is no need to insert
“Replicate separators” and samples in different compare groups can be specified
at Compare samples or Analysis of variance.
Microarray experiments often include replicate
arrays for some or all experiment conditions. If there are multiple level of
replicates (e.g. different individuals in the experiment group, each individual
has two samples taken, and each sample has two IVT replicates), the arrays in
the lowest replicate level can be first combined and then enter “Filter genes”
or “Compare samples”. “Replicate separators” can be inserted into “array list
file” to achieve this purpose:
sample1 replicate 1
sample1 replicate 2
---Replicate---
sample2 without replicate
---Replicate---
sample3 replicate 1
sample3 replicate 2
sample3 replicate 3
Expression values for the same gene in replicate
arrays separated by “Replicate separators” will be pooled considering
measurement error (Li and Wong 2003, page 5,
section 5.2.4). Absolute calls for the pooled sample will be determined by the
“majority-vote” scheme: if the number of “P” / the number of replicates >=
0.5, the probe set is called "P" (otherwise "A").
Click an array name in “Arrays to be used” listbox
to add a “Replicate separator” after it. If no “Replicate separator” ever occurs
in an “array list file”, all arrays will be treated as no replicate. There is
also a button to automatically add “Replicate separators” separating array
names starting with identical strings.
Replicate arrays can be used to filter genes in
“Analysis/Filter genes” tab. Criteria 3 there requires a gene’s expression
level within replicate arrays to be consistent, in terms of median “Standard
deviation/mean” ratio of all replicate groups where this gene is called
“Present”. This filtering excludes genes that are inherently variable during
sample amplification and hybridization stages, as manifested in inconsistent
expression values between replicate arrays. A similar but more involved
filtering scheme can be found in Perou et al 2000.
In the “Analysis/Model-based Expression/Export”
dialog, expression values for replicate arrays will be pooled before exporting.
In “Select arrays to be exported” listbox, sample names are the name of the
first array, followed by “*” referring how many additional replicate arrays are
pooled for this sample. To not use any “array list file” and treat each array
as one sample, one can clear the file name after the “Specify file name” button
and then click “Save & close”.
[Analysis
example] One has biological duplicates (clone 1 and clone 2) and technical
duplicates (duplicate in each clone) for each time point. We may use replicate
separators to pool the technical duplicates, and then specify clone 1 and clone
2 individually in “Compare samples”. This way we consider the two level of
replication variance. Alternatively the four arrays can be regarded all as
replicates at one level and specify them individually at “Compare samples”. If
the clone variation is larger then technical variation, it will give similar
result as the first option, but has larger degree of freedom when doing t-test.
The 2nd option is used in Tusher et al. 2001.
(Updated 9/20/05)