dChip: Reading external data
files
If no DAT/CEL files are available for Affymetrix
array data, dChip can still read in a tab delimited data file (in text format) with expression value,
absolute call and standard error (SE) data as columns:
|
probe set
|
130a
|
130a call
|
130a SE
|
130b
|
130b call
|
130b SE
|
|
AFFX-BioB-5_at
|
3348.89
|
P
|
281.398
|
3825.92
|
P
|
225.898
|
|
AFFX-BioB-M_at
|
3478.22
|
P
|
400.583
|
6778.75
|
P
|
273.612
|
|
AFFX-BioB-3_at
|
2322.84
|
P
|
180.836
|
3437.77
|
P
|
158.029
|
|
AFFX-BioC-5_at
|
7837.85
|
P
|
628.778
|
7590.25
|
P
|
402.236
|
|
AFFX-BioC-3_at
|
5887.03
|
P
|
501.962
|
6473.87
|
P
|
316.34
|
|
AFFX-BioDn-5_at
|
4416.52
|
P
|
711.782
|
8313.93
|
P
|
556.247
|
|
AFFX-BioDn-3_at
|
16049.3
|
P
|
1870.28
|
18681.5
|
P
|
1048.73
|
|
AFFX-CreX-5_at
|
24904.8
|
P
|
1728.4
|
29241.8
|
P
|
1095.15
|
The absolute call and SE columns are optional and
can be specified in the “Analysis/ Get External Data” dialog:

The data file should have the first row containing
array names and the first column containing gene names. The data files exported by dChip may have the
addition columns of gene annotations starting from the 2nd column
(such as “gene, Accession, LocusLink, Description”). To read such files by
“Analysis/Get External Data”, these columns should be deleted and then save the
file as tab-delimited text file in Excel. Alternatively, specify "Skip
column 2 to x" to ignore column 2 to x.
If there are missing values in the external data
file, leave them blank in Excel and then save as tab-delimited text files so
that “Get external data” will regard blank cells as missing values. If there
may be any missing values in the last column, add an additional pseudo last
column with all values of “1” to make the data read correctly. Afterwards use
“Array list file” to specify only the real samples that will be used in the
analysis.
The “Get External Data/Other information” tab will
prompt user to read in “gene information
file” or “sample information file”,
and these files are the same as those used for data read in by “Analysis/Open
group”. However, if in the external data file the sample (column) names are
already meaningful, in sample
information file one can have both “array name” and “sample name” columns
identical as the sample names in the first line of the external data file.
Click “OK” to read in the data file. If successful,
the “Modeled” indicator will appear in the lower-right corner to indicate the
expression data is available for high-level analysis. The “Normalized”
indicator will not be shown, and if the data has been normalized one can
proceed to high-level analysis.
If the data has not been normalized beforehand, one
can then use “Analysis/Normalize” to normalize the expression values using the Invariant Set Normalization method
(Version 1.0 uses a using a simplified ISN method with fixed rank difference
threshold 50 without iteration), and the standard error attached to an
expression value will be scaled by the ratio of the expression values before
and after normalization. Check the “Show scatter-plot…” option to show normalization
scatter-plots when normalizing. (installation of R
needed).
Afterwards, the high-level analysis can be applied
without the “Analysis/Model-based expression” step (since no CEL values are
read in). For example, the “Tools/Array list file” function can be used to pool
replicate arrays, and “Analysis/Hierarchical Clustering” and “Analysis/Compare
Samples” can be performed as usual.
Combine
external data file
Many researchers
have made the data available in Excel or tab-delimited text format but not the
original CEL or DAT files. To pool several datasets together for analysis, one
can follow similar steps as the step 3-7 of section “Combine differnt array types”. Note
two external data files should have matched rows of probe sets and consistent
columns (e.g. both with or without absolute call column).
Excel can only be
used to save an external data file with less than 65535 rows and 256 columns.
These two software packages may be tried to overcome the limit: GS-Calc, Quattro Pro (free
trial).
Extract subset of data file
[Analysis
example, Version 9/28/06+] Here we show how to extract a subset of data from a
larger data file, with possible reordering of rows and columns. This can be
useful when subsetting 500K Hapmap
genotype calls to make a reference call file for
LOH inference. First use "Get external data" to read in
"Sty_HapMap270_brlmm_call.txt" (uncheck "Has both signal and
call" and check "SNP data"), then specify an array list file
containing 60 CEPH parent samples. Then
go to "Tools/Export
expression data", select all samples and specify a probe set list file
with the same ordering as probe sets in CDF or genome information file.
Affymetrix library files contain a PSI file (example,
unzip it), which can be used as probe set list file (but check "Use probe
set name in the 2nd column"). Uncheck "Has both signal and
call" if making reference genotype file. The exported file can then be used
as "Analysis/Chromosome/Options/Reference genotype file".
Read cDNA array data
To read in cDNA
array or other two-channel microarray data, one may put background-adjusted
green and red channel signals for one array as two columns in the external data
file, and do normalization as above when needed. Alternatively, log ratios in
one cDNA array can be put in one column of an external data file, and one can
then directly go to high-level analysis such as clustering.