dChip: Function
Updates (2000-06)
Recent updates
12/14/06: For tumor-only LOH inference, when no
"Options/Reference genotype file" is specified, the normal samples
specified in sample info file as "Ploidy(numeric)"
of 2 will be used to estimate SNP heterozygosity and genotype dependence
probabilities.
11/16/06: At the clustering or
chromosome view, use “View/Find sample” to find a
particular sample. (suggested by Yohan)
9/17/06: At the chromosome view, control+click a data point to append its information to a
"data points.txt" file in the working directory. The information
includes sample name, chromosome position, and gene names within 100Kb
surrounding region. This can help to manually identify copy changed regions
from raw copy number view when data is noisy or human eyes work better. 2/8/07:
Show or export the nearest gene to a SNP.
8/29/06: Faster display update when
zooming at the "Analysis/Chromosome" view.
7/31/06: At "Open group",
there is an option for specifying TXT file suffix
(e.g. ".brlmm.txt"). At "View/Export image", "Export all chromosomes individually" can be
checked. (suggested by Charlotte Schjerling)
7/27/06: Check "Analysis/Open
group/Perform 'Analysis/Normalize & MBEI afterwards" to continuously execute these three steps. Normalization
and MBEI will use options set at "Open group/Options/Model", and the
baseline array will be the default one with median overall intensity. (suggested by Charles Mullighan)
6/1/06: Specify consanguineous relationship in a family
to reduce pedigree size and speed up linkage analysis.
5/29/06: Read SNP CEL files without matching TXT
genotype files but with combined genotype file.
4/10/06: Combine two sub-arrays at "Open
group" without using external data files.
12/12/05: To make a copy number summary plot similar to Figure 1B
of Zhao et al.
2005, select "Chromosome/Summary Plot" at the inferred copy
number view. (suggested by Edward Attiyeh)
11/17/05: In the chromosome view,
check “Tools/Options/Clustering/Sample names always
visible” to always display the sample names and information on the top
of the data area. (suggested by Charles Mullighan)
11/13/05: Specify “Analysis/Open
group/Other information/SNP
information file” to provide allele frequency and other SNP information. (suggested by Ann Mullally)
11/11/05: Check
“Tools/Options/Chromosome/Show only tumor of paired sample” to display only tumor samples in copy number view when
paired normal and tumor samples both exist in array list file. (suggested by Peter Ouillette and Changzhong Chen)
11/8/05: Process
human exon array data and view data along chromosome.
11/3/05: Dahia
et al. 05 identified a novel loci for familial pheochromocytoma syndrome by integration of two-locus
linkage analysis, transcription profiling, and genome-wide SNP-based copy
number mapping.
11/1/05: Use an array list file to define batches
and the “Tools/Adjust batch effect” function to scale (multiply a value) the
expression values from batch 2 to the last batch, so that for each gene, the
mean of each batch is the same as the mean of the 1st batch. Then one can redo
gene filtering and clustering to see if batch effect is gone. If so one may use
“Analysis/Compare samples” to pool samples from different batches and do
comparison.
10/15/05: (1) Select
“Analysis/Normalize/Options/Normalization method: Quantile normalization” for
quantile normalization (Bolstad et al. 2003, Workman
et al. 2002). 6000 matching quantiles from two
arrays are used to fit a running median normalization curve. M-A plots are also
added in the Normalization Plot.
(suggested by Igor Klacansky)
(2) Select “Analysis/Model-based
Expression/Options/Method used: Average Difference” to use Average Difference method to
compute expression values or signal.
(3) Set
"Tools/Options/Chromosome/HMM length" to N to perform linkage
analysis for segments of N markers at a time (e.g. 2000 for 100K array
where chromosome 1 has more than 9000 markers). (suggested
by Annemieke Verkerk)
10/13/05: Use “Analysis/Filter SNPs” to select
better SNPs using fragment length or No Call rate to
use in the downstream analysis. (suggested by Zhigang
Wang)
10/12/05: “Image/Normalization Plot” is now
implemented within dChip without calling R. They can be viewed during
normalization as well by checking “Analysis/Normalize/View normalization
plot”.
10/7/05: Add "Pathway drawing and
analysis".
10/6/05: Add the manual page “Allele sharing analysis” for SNP array, including Application in graft-versus-host disease, Non-parametric
linkage analysis and Permute to find significant loci.
10/5/05: Add functions to analyze 500K SNP array.
10/3/05: (1) Set
“Options/Chromosome/Inferred copy method” to be “Median smoothing” and set a SNP marker window
size (e.g. 10) to median smooth raw copy numbers as the inferred copy number.
(2) To infer the LOH status of non-informative LOH calls from paired normal/tumor LOH
analysis, the method “Options/Inferred LOH method/Same boundary” can be used in addition to the
HMM method.
(3) In the clustering or chromosome
view, set “Options/Clustering/Number of letters shown for sample information”
to be greater than 1 to display 1 or more letters above
samples. (suggested by Charles
Mullighan)
10/2/05: (1) In the SNP copy number
analysis, when normal samples are not available or too few, set
“Options/Chromosome/% of samples trimmed” to be
> 0 to obtain reference signal distribution without using the information of
which samples are normal.
(2) Set “Options/Chromosome/HMM
length” to N to perform HMM inference of LOH and copy number for a
stretch of maximum N SNP markers each time. This can increase the speed
for SNP array with density > 100K, where chromosome 1 has > 9K marker but
one can set “HMM length” to be 1000.
10/1/05: (1) In
the SNP copy number analysis, check “Options/Chromosome/User paired normal as
reference” to use
the signal of the paired normal to obtain the raw copy numbers of
tumor samples, as opposed to using the average signal of all normal samples. (suggested by Peter Ouillette)
(2) Use the
"Tools/Percentile
filtering" function to select genes by its fold change between
a high and a low percentile across samples. (suggested
by Wing Wong)
(3) At the gene
clustering view, “Clustering/Export Same Gene” can export the probe sets
belonging to the same gene as the selected probe sets (click the area between
the clustering tree and blue-red image to select a single probe set). The
exported gene listed can be viewed by “Analysis/Clustering”. (suggested by Wing Wong)
9/29/05: Cheng and
Peter Ouillette successfully troubleshooted a dChip
usage problem through remotely accessing
Peter’s PC. This first try appears to be an efficient way to solve elusive
problems.
9/28/05: Start to
put the dChip manual into a WikiBook, so anyone can edit it or post discussions.
(3/13/06: this effort has stopped)
9/23/05: At
“Analysis/Genome”, report in the analysis view the gene names in the
significant stretches and the information of multiple comparison,
such as “24 significant stretches found at 0.05 level from 2494 p-value
assessments”. (suggested by Pawel Michalak and Wei
Zhao)
9/21/05: Added common probe set file for HG-U95Av2 vs. HG-U133_plus_2
array.
9/15/05:
“Tools/Make information file” may report the error of “NumTerm == MaxTerm
at category 'Gene Ontology'”, which is due to more GO terms than before
in the latest GO structure files. Update to the latest dChip to correct it. (reported by Xueqing Zhang and Patrick Loerch)
9/13/05:
“Tools/Options/Model” adds an option to truncate negative PM/MM differences to
0 before modeling in the “PM/MM difference model”. By default it is checked to
compute all-positive expression values; uncheck it to use the method as before.
9/12/05: “Google site search”
function is added to the dChip main page and manual page.
5/26/05: (1) Use Affy Files parsers SDK
to read in binary CEL files.
So it is no longer needed to convert binary CEL
files to text CEL files for dChip to read.
(2) Uncheck "Analysis/Open
group/Options/Load probe data in memory" to not
load probe data so that a large dataset containing many arrays or large
array types (e.g. 100K SNP array) can be loaded faster. Then do normalization
and model-based expression computation the same way as before. However CEL
image and PM/MM data views are not available since they use probe level data.
5/5/05: Check
“Tools/Export expression value/Append to
this file” to append the output data to an
existing data file. This is useful for combining the data of sub-arrays.
(suggested by Changzhong Chen)
3/29/05: Update to
handle 3/23/05 Affymetrix annotation CSV
files. In the “HG-U133_Plus_2_annot.csv” file, “LocusLink”
is changed to “Entrez Gene” in the header line and
"Chr:" is changed to "chr" in the “Chromosomal Location” column. Also handle
the tab in the "Gene Title" column (e.g. 1425167_a_at in
Mouse430_2_annot.csv) to generate correctly formatted gene info file. (suggested by Andrea Richardson)
2/23/05:
“Analysis/Open group/Other information/Probe set mask
file” can accept individual probes to
mask them out from CDF file. 4/15/05: Corrected the bug that eliminates
both probe 4 and 14 for a string "14,". (suggested by Igor Leykin and Bin Yao)
9/14/04: At “Analysis/Hierarchical
clustering/Options/Standardize rows”, one can select a
sample (default is mean) to be subtracted during standardization. This
is useful when a known baseline is desired to be displayed as white for all
genes, and other samples display relative up-regulation (red) or
down-regulation (blue). (suggested by Changzhong Chen)
8/28/04: Use Affy C++ source code (Files parsers SDK) to read binary
CDF files. (suggested by Lucy He)
4/13/04: “Tools/Print settings”
will print out the current settings and parameters.
Then the whole analysis log can be saved using “Analysis/Save log”. (suggested
by Bill Sellers) [11/11/05: moved this function to the “Tools/Options/Print
Settings” button.]
4/4/04: Update “Tools/Make
Information file” to handle 4/2/04 NetAffx annotation
and ortholog CSV
files, which has slightly different format than previous CSV
files.
3/19/04: Update “Tools/Make
Information file” to convert current NetAffx ortholog CSV files to
dChip common probe set files.
This is useful for combining
expression data across species. (requested by
Enrique Millan)
12/26/03: (1) One may add a numerical column in “sample information file”. The
column header needs to contain “(numeric)”, for example, “Age(numeric)”. Such
continuous variable will be standardized and displayed in the clustering
picture. (2) “R view/ANOVA filtering” and “Clustering/Similar profile” are merged into “Analysis/Analysis of Variance”.
10/22/03: Update to handle Oct.
2003 or later NetAffx CSV
files when making gene info files.
10/16/03: You may use this Affy CEL file converting tool to convert the new
binary-format CEL file to the old text-format CEL file so dChip can read.
8/9/03: [New] In the PM/MM
data view, select menu “Data/Show All Array” to view the probe patterns of the
current probe set in all arrays specified by the array list file. Such
probe-level data and patterns are very useful for confirming the properness of
computed gene expression values and changes. (suggested by Yu Guo)
7/30/03: Two
software GoSurfer and Tight
Clustering developed by Wong lab can be called from dChip at the
“Tools” menu. (suggested by Wing Wong. GoSurfer
developed by Sheng Zhong. Tight Clustering developed by George C. Tseng)
7/8/03: “Tools/Make
information file”: the maximal number of protein domain terms allowed is
increased to 4000, so that the NetAffx June 2003
annotation file for HG_U133A can be processed (reported by Mike Wang).
7/8/03: [6/27/03 bug fixed]
RG_U34A array has probe set names containing “#”
(e.g. L00981mRNA#2_at). Anything in a line after “#”will be interpreted as
comments by the R “read.table()” fucntion
used in the R view “Get expression”
button and this caused failure of the “Get expression” function. Now the “comment.char = ""” parameter is added to the read.table() function. (reported by Charlotte Schjerling)
7/3/03: [Bug fixed] When
“Analysis/Open group/Options/Mask redundant probe sets” or “Omit Affymetrix
control probe sets” is checked, the annotation information in the gene
information file for redundant or Affy probe sets was
not used at all. However we want to read the information for these probe sets,
but not count them in the gene numbers associated with annotation terms to
avoid biasing the “significant gene clusters” or “Tools/Classify genes” functions.
This is corrected now. (reported by Susan G. Hilsenbeck)
7/3/03: Use dates in the version number (e.g. Version 1.3
Test (7/3/03)) to better keep track of dChip updates. (suggested by Charlotte
Schjerling)
6/27/03: The expression data
object obtained by “Get Expression” in the R
view now has probe set names as row names. (suggested by Leo Schalkwyk)
6/25/03: [Bug fixed] A bug
introduced on 6/13/03 switched the PM and MM row in the Grid (2, 1) of PM/MM
Data view. (reported by Jorg D. Becker)
6/12/03: [Bug corrected] 5/21/03+
versions regard probe set names starting “AF” (“AFFX”
had been used before) as Affy control probe sets.
However, when “Tools/Options/Analysis/Omit Affy probe
sets” is checked, some probe sets in RG_U34A array are wrongly regarded as
control probe sets (e.g. AF007107_s_at). Now “AFFX”
is used again. (reported by Charlotte Schjerling)
5/28/03: [New feature] At
“Analysis/Filter genes”, one can choose “Standard deviation” in criterion 1 for
variation filtering. This is useful when the data is log transformed (by
checking “Analysis/Open group/Options/Log transform” or reading in log-scaled
data at “Analysis/Get external data”), and standard deviation is preferred over
CV (coefficient of variation, or standard deviation / mean) due to variance
stabilization property of the log transformation. (suggested by Wing Wong)
5/22/03: [New feature] Check
“Analysis/Open group/Options/Ignore probe sets” to not
use probe set information in the CDF file. This is useful when there are
too many probes in a probe set, and our main interest is only normalizing
arrays. (suggested by Todd Mockler)
5/21/03: [New feature] Assess empirical false discovery rate at
“Compare samples”.
5/21/03: [Change] For the unpaired
t-test of “Analysis/Compare samples”, the degree of
freedom was group.size1 + group.size2 – 1 (instead of –2 to accommodate
1 vs. 1 comparison, so it’s ok when n1=n2=1 and standard error of a single
expression value is model-based). Now the
Weltch correction for d.f. is used. The comparison result files have
header line “[COMPARE_CRITERIA_V2]” to indicate these changes.
5/19/03: [bug corrected] Extra tabs added to the end of array list file name and filter input file name in INI file. The problem
may be due to that a comparison criterion file has been saved in Excel, and
Excel added some extra tabs on rows with only 1 column. Later on dChip reads in
the comparison criterion file and uses the array list file and filter input
file name in it. (reported by Keith Crist; 6/11/03:
similar problem reported and corrected for “lda
result file”)
5/17/03: [New
feature] In “Image/Normalization
plot”, any array can be selected as the baseline
array. Thus the scatterplot of probe values between any two arrays can
be visualized.
5/8/03: [New and
change; updates from now on may not be reflected in the manual at the same
time] 1) Euclidean distance can be used for clustering
(set at “Tools/Options/Clustering”), which may be more reasonable to use for
sample clustering using gene-wise standardized values. 2) By default
“Tools/Options/Clustering/Add new color for Control+Click”
is not checked and new “Control+Click” clusters will
be in light blue to distinguish from the current “Click” cluster in blue. 3) If
a single gene is selected in the clustering picture, the value range for a gene
is displayed but the displaying range is from 0, so that relative fold change
can be visualized. 4) At "Cluster/Selected branch/Export" data, if
"Output all colored gene branches" is checked, all the colored
branches (selected by Control+Click and Click) will
be exported. Note that the samples exported are all the colored ones, and the
gene clusters exported are in the order of clicking, not the visualized order.
(suggested by Wing Wong)
5/8/03:
[Withdrawn] “Analysis/Normalization” starts the “Invariant Set” selection form
a list of “Stable probe sets”.
4/20/03: [New
feature] If “Clustering/Selected branch/Export data/Cut the tree
at the height of current branch and export all branches” is checked, one may
export gene expression data grouped in clusters. These clusters are obtained by
cutting the gene clustering tree at the height of the selected blue branch. In
previous version, multiple colored (by using Control+Click)
gene clusters was exported. (suggested by Bin Zhang)
4/17/03: [New feature] In “Image/Normalization plot”, one can also
plot the normalized values on the X-axis by checking the option “Use normalized
values to view the result after normalization”. (suggested by Wenhong Fan)
4/12/03: [New feature] In “Tools/Gene list file/By
Annotation”, one can select multiple terms to get the union or intersection of
the genes belonging to these categories, and apply the “Filter genes” function
immediately using this gene list as the input gene list. (suggested by Wing
Wong)
4/12/03: [Change] “View/Go to GenBank;
Go to UniGene; Go to LocusLink;
Go to NetAffx”
are combined into “View/Online Database”. Check the “GenBank”
option to link to the “UniGene” databae.
Withdraw the “probe number” option from “View/Find Gene”.
4/9/03: [Bug corrected] When
“Tools/Options/Analysis/Do not read array list file” is checked, the individual
arrays are not correctly treated as individual samples, and this causes no
sample names in “Compare samples” dialog and incorrect “Filter genes” results.
7/7/03: a 4/9/03 bug is corrected: checking “Tools/Options/Analysis/Open
group/Do not read array list file” made array list files not used both at “Open
group” and after “Open group” (reported by Sabina Chiaretti
and Mark).
4/7/03: [But corrected] When the lines of a
“Comparison gene list file” are long, “Tools/Classify genes” may causes
crashing. (reported by Shahab Asgharzadeh)
4/5/03: [New feature] One
may use “Tools/Make information
file” to generate dChip information files based on the quarterly updated NetAffx
annotation files. Also see the ChipInfo
software for broader applications of this effort.
3/17/03: [New feature] In hierarchical clustering,
the “Average linkage” method can be specified at “Tools/Options/Clustering”.
Previously the only linkage method is centroid linkage. (suggested by Casper Frederiksen and Jean-philippe
Brunet)
3/16/03: [New feature] Check the “Copy to clipboard”
option in “View/Export image” to
copy the image to the clipboard in BMP or EMF format. (suggested by Yu Guo)
3/16/03: [Change] In “Analysis/Compare samples”, the presence call % criterion
can be specified for the baseline and the experiment group separately.
(suggested by Tao Lu)
3/16/03: [Change] The compare criteria are saved in
the beginning of the “Compare result
file” if “Analysis/Compare
samples/Combine comparisons/Output comparison criteria” is checked, instead of
a separate “Compare criterion file”.
3/6/03: [Bug
corrected] A file name like “D:\chip.data\all.dchip\file.CEL” led dChip to extract the file as a DAT format file since “.dat” is found in the file name (reported by Xueqing Zhang).
3/6/03: [New
feature] The outputs of “Analysis/Normalize” contains
the median probe intensity before and after normalization (suggested by Eric
Libby).
3/3/03: [New
feature] Read in CEL file of PM-only array
(suggested and testing data by Jeremy Erickson).
3/1/03: [Bug
corrected] The “LDA classification view” sometimes did
not update the screen image correctly between separate “Analysis/LDA
Classification” calls.
2/26/03: [Bug
corrected] An ending "\" or internal “\\” specified at the
"Tools/Options/Analysis/Working directory" string (e.g.
“C:\array\\other\lung\lung_loh\” caused that the file names in some dialogs
such as “View/Export image” cannot be clicked and changed. To change such
existing path names, either edit the configuration (*.ini)
files in the same directory as dchip.exe, or apply “Tools/Options/Reset
default” (reported by Yu Guo).
2/5/03: [New
feature] Check “Tools/Export data/Expression value/Include
header information” to include information such as the modeling method,
baseline array into the exported expression data files (suggested Victoria Perreau).
1/31/03: [Bug
corrected] dChip reverses signs of theta’s and phi’s during model fitting to
ensure most theta’s and phi’s are positive. In PM-only
model, counting zero theta values as negative caused the possibility of
having negative expression values (reported by Tanya Logvinenko).
1/28/03: [Bug
corrected] 1. When the number of arrays is small (e.g. 2), the gene names were
not displayed in the Clustering View. 2. The “View/Export image” dialog
sometimes didn’t show up; also see note 2/26/03 (reported by Yu Guo).
12/30/02:
[Withdrawn feature] The “Analysis/Print” function does not work properly and is
withdrawn. One can use “Analysis/Save” to first save the contents of the
Analysis View into a Word file and then print. (reported by Susan G. Hilsenbeck)
12/19/02: [Bug
corrected] “Image/Unscrambled” (renamed to “Image/Probe Together”) caused
various problems when computing expression values in the “unscrambled” mode
(e.g. the background values used for PM-only model don’t correctly consider the
unscrambling effect). Now this function is re-implemented in a new mechanism.
(reported by Reinhard Hoffmann, Thomas Seidl, Laurent
Gautier and James MacDonald)
12/11/02: [Bug
corrected] DAT files of arrays with large dimension (e.g. ATH1 with 712^2
probes) were not read correctly at right and bottom margins. (reported by Susan
J. Miller)
12/10/02: [New feature]
“Tools/Options/Model/Exclude x 5’ probes” to always call the x most 5’ probes
in a probe set as probe outlier, thus not use them in the model-based
expression values. This is useful when there is known mRNA degradation in the
sample and 5’ signals are not reliable, or when small samples are amplified
using 2-round IVT protocols and 5’ probes tend to have amplification biases.
(suggested by Edward Fox and Christine Konradi)
12/9/02: [Change]
The “Analysis/Model-based expression/Log x transform expression values” option
is moved to “Tools/Options/Analysis”. This way the DCP files always store
original expression values, and a user can choose to log transform the expression
values at the “Analysis/Open group” step. (A bug – always log-transforming
after “Open group” – was introduced here and is corrected on 12/30/02; reported
by Philippe Guardiola and Susan G. Hilsenbeck) (A bug – storing log-transformed expression
values into DCP files after “Model-based expression” – was introduced here and
is corrected on 1/16/03; reported by James MacDonald)
12/6/02: [New
feature] Check “Tools/Options/Model/Do not call all replicate arrays as array
outlier" and then specify an array list
file with replicate separators to discard array
outliers called in all replicates of a tissue type, since this is the
real biology effect.
(suggested by Joerg D. Becker)
11/28/02: [New feature]
“Image/Scale CEL value” can be used to scale (multiply a constant value) the unnormalized
CEL values in an array so that the median intensity is a
particular value. This is useful for normalizing different tissue
types. (suggested by Joerg
D. Becker)
11/19/02: [Bug corrected] Sample
information file with more than 20 columns caused dChip to crash. Now the limit
is increased to 40 columns and the boundary check for this value is added
(reported by Tao Shi).
11/16/02: [V1.3+] “Analysis/Chromosome” function can display the
expression data of a list of genes along chromosomes. (suggested by Stanley F.
Nelson and Robert Gentleman)
11/15/02: dChip 1.2 released. See below for function updates.
10/31/02: [New
feature] Display the relative probe
position information in the “Data View”.
10/31/02: [Change] “Analysis/Map
chromosome” renamed to “Analysis/Genome”.
10/31/02: [Change]
“Tools/Options/Model/Perform outlier detection” is split into “Check array outlier”,
“Check single outlier” and “Check probe outlier”. One may
change these options to perform or not
perform a particular outlier detection.
10/30/02: [Change] When a TXT
file (containing A/P calls) has any probe set not in the CDF file,
“Analysis/Open group” will ignore the TXT
file and compute its own A/P calls. One can disable this feature by checking
“Tools/Options/Analysis/Allow TXT files
to contain probe sets not in the CDF file”, so that dChip only ignores
the unknown probe sets (e.g. those masked by “Probe set mask file”). (suggested by Igor Klacansky)
10/27/02: [New feature] The
“Image/Normalization Plot/Use
smoothing spline to normalize and save result to
DCP file” option allows for using a smoothing spline to fit the normalization curve to the points in the
“Invariant Set”. (suggested by Xinmin Zhang)
10/12/02:
[New feature; 05/03: withdrawn] “Analysis/Normalization” can start the
“Invariant Set” selection form a list of “Stable probe
sets”. For example, one can perform “Analysis/Model-based expression” without
normalization and then use “Analysis/Filter
genes” to use only criteria (2) with 100% threshold to obtain genes called
“Present” in all samples. Then use this gene list as
“Analysis/Normalization/Stable probe sets” to normalize arrays and re-compute
expression values. Uncheck “Apply ‘Invariant Set’ selection…” will use all
probes of the “Stable probe set” for normalization. It is also good to use “Image/Normalization plot”
(see below) to check the validity of the normalization.
9/19/02: [Bug corrected] Source
code bug in partial_sort() function corrected. This
bug may affect the median or percentile computation such as those used in the
outlier detection procedures (reported by Ming Lin).
[New feature] Gene Filtering by
ANOVA through the “R
View” (see Interface with R software for
necessary setup procedures to use the function) (suggested by Frank Buxton,
Susan Hilsenbeck and Dona Wu).
[New feature] Look for EXP files with the same name as CEL
file for “Description”, and if available use it as ChipName
(also can be supplied by “sample information file”). Check
“Tools/Options/Analysis/Use ‘Description’ in EXP file as array name” to enable
this option.
[New feature] Report the file format number of
DCP and CDF.BIN files during “Open group”. dChip 1.1 and 1.2 uses format 3;
dChip 1.0 uses format 2; dChip beta test version used format 0 and 1. (suggested
by Susan Hilsenbeck)
[New feature] Read in the MAS5 “Signal” from MAS5
analysis result file by checking “Open group/Read in MAS5 Signal” (suggested by
Greer M. Murphy and Song Her).
[New feature] Check
“Tools/Options/Analysis/Search and save DCP file in the Working directory” to store DCP files into different places than CEL
files. This way we may perform different analysis (e.g. normalization using
different baseline array, MBEI with log transformation) and store the results
into DCP files under different directories while maintaining the single copy of
CEL files. (suggested
by Anne Bowcock, Victoria Perreau
and Susan Hilsenbeck)
[New feature] Take log base on 10 or other bases at
“Analysis/Model-based expression” (suggested by Casper M. Frederiksen).
[New feature] “Logged” indicator at the lower-right corner to
distinguish log-transformed expression indexes. (suggested by Susan G. Hilsenbeck)
[New feature] The “Select sample by
category” button in the “Compare samples” dialog has “Use
inversion” button for selecting samples not having a particular
property.
[New feature] In the “Chromosome
View”, use “View/Find gene” to find a specific gene in the highlighted set
(suggested by Isabella Tai).
[New feature] Add “Windows Enhanced Metafile (*.emf)”
image format at “View/Export image”. The file is in vector format and can be
enlarged without losing resolution. It can be inserted in Word or Powerpoint files by “Insert/Picture/From file” or converted
to EPS format by Adobe Illustrator.
[New feature] “Image/Export CEL”
dialog has an option “Export probe set name, probe
pair order and PM/MM indication”, which will add additional data columns
correlating a probe cell to its corresponding probe sets (suggested by Yuval Kluger)
[Change] Simply the menu items
“View/PM/MM Data”, “View/CEL Image”, etc. to
a single “View/Next view”. Also one can use “Enter” or “Shift+Enter”
key to switch to other views.
[Change] In “Compare sample result”
files the number of decimal digit is reduced to 2
for easier reading. (suggested by Feng Wu)
[Change] “Image/Export CEL”
can export all arrays into CEL
files at once.
[Change] “Image/Normalization plot”
will use the chosen baseline if the array has
been normalized; otherwise use the default baseline with median overall
intensity (suggested by Tiago Duarte)
[Change] Require MAS
text file has the “Signal” column as well as the “Detection” column.
[Change] The
“Analysis/Hierarchical clustering/Only draw lines for standard separator”
checkbox is moved to “Tools/Array list file”. (suggested by Robert Gentleman)
[Withdrawn feature] V1.2 cannot convert CDF.BIN file and DCP file in the old
format to the current format (file format 3). Use dChip v1.1 to do this.
[Bug corrected]
“Tools/Options/Analysis/Mask redundant probe sets…” and
““Tools/Options/Analysis/Omit Affy control probe
sets…” take effect at reading gene lists or filtering genes, but not at “Open
group” where the gene information file is read in. This leads to artificial significant functional groups. For
example, with “Mask redundant probe sets” checked, “Tools/Classify genes” on
the all the probe sets in HG_U95AV2 chip will result in "Found 544 GeneOntology ‘cell fraction’ genes in a 5933-group (all:
703/8100, PValue: 0.004912) ***”; this is because the
relative size of some functional groups has been increased by removing
duplicate probe sets from the list. This bug is now corrected; note that after
changing these two options one needs to do “Analysis/Open group” to re-compute
the number of used probe sets for each GO function (reported by Kieran
Holland).
[Bug corrected] When using
“Analysis/Save” to save the analysis results into a word file, dChip ignores
the user-specified file name (reported by Anna Tsimelzon).
[Bug corrected] “Tools/Export
data/Expression value” exports GCT file (format 1.2) that can correctly work
with GeneCluster 2.0.
5/8/02: The dChip console
(command-line) version continuously executes the normalization and model-based
expression steps to generate a tab-delimited text file containing the
expression values. Source codes available to use dChip on other platforms. (suggested by
Casper Frederiksen and Allen Day)
4/28/02: dChip version 1.1 (Suggested and helped by: Jianhua
Hu, Edward J. Oakeley, Simon M Lin, Allen Fienberg,
Sanjay Jain, Chunfa Jie,
Greer Murphy, Ken Aldape, Tiago
Duarte, Tiago R Magalhaes,
Igor Zwir, Dale Muzzey, Mayetri Gupta):
New:
· PM-only
model results in always-positive expression indexes. Specify different
methods through the “Analysis/Model-based expression/Options” dialog or use the
menu “Data/Next model” to switch between models.
· Handles Human U133 chip
via file format change (V1.1 will upgrade the DCP or CDF.BIN files generated by
V1.0)
· New PSI
file format; specify PSI file through the
“Analysis/Model-based expression/Options” dialog; use “Tools/Export data/Probe
sensitivity index” to export f vales and their standard errors as text format.
· The input gene list in
the “Analysis/Filter genes; Compare samples” dialog can be used to exclude
these genes from filtering or comparison. Click the “Filter on” or “Compare on”
file buttons to switch the mode.
· The context-specific
links to online manual in various dialogs.
¾ Clustering
·
“Tools/Options/Clustering/Standardize rows” option. One may choose not to
standardize a gene’s expression value across samples when the scale of the data
is already adjusted.
· “Clustering/Selected
branch/Export data” has the option to export gene-wise standardized values.
·
“Tools/Options/Clustering/Distance” has the option to use 1 - |r| as distance
measure, where r is the Pearson’s correlation.
· “Clustering/Similar
profile” can search genes with high positive or negative correlations with the
current gene or the selected gene branch. When “Standardize separators” are
present, check “Analysis/Hierarchical clustering/Only draw lines…” to make this
function work properly.
Changes:
· New outlier detection algorithm handles the
image contaminations more reasonably.
· The Menu item
“Tools/Reset default settings” is changed as the “Tools/Options/Reset default”
button.
· The “Analysis/Filter
genes; Compare samples” function by default ignores Affy’s
control genes (probe set names starting with “AFFX-”),
since their changes are generally not interesting. “Tools/Options/Analysis/Omit
Affymetrix control…” to change this setting.
· “Analysis/Model-based
expression/Export” function moved to “Tools/Export data/Expression value”.
· “Data/Export probe set”
function moved to “Tools/Export data/Probe set”.
· In the “Analysis/Compare
samples/Combine comparisons” dialog, the “Insert complement” button is changed
to the “And not” and “Or not” options. Thus a single comparison can be negated.
· The “Tools/Classify
samples” function copies all columns of the “gene list file” into the output
“classified file”, so the output file can have expression values or fold
changes.
· After “Analysis/Get
external data”, “Analysis/Normalize” uses the Invariant Set Normalization method (V1.0
uses a using a simplified ISN method with fixed rank difference threshold 50
without iteration). Check the “Show scatter-plot…” option to show normalization
scatter-plot (installation of R needed) when
normalizing. Also when fitting the running median curve at the two tails, 5% of
the “invariant” points are used to fit a ray at one end fixed (V1.0 uses 1¤300
of the “invariant” points); this makes the high-end normalization relationship
more smooth and robust.
· The “Analysis/Map
chromosome” function only checks gene stretches of length < 20 for
significant p-values. Previously all gene stretches are checked.
Withdrawn:
· The option
“Analysis/Model-based expression/Use average difference instead of MBEI” is
gone. Affy’s MAS
5.0 software adopts “Signal” as expression index.
Bug corrected:
· After “Analysis/Get
external data”, the “Analysis/Map chromosome” and “Analysis/LDA Classification”
does not show the result images.
· In the “Analysis/View”
some letters cannot be input, such as “A” or “M”. This is due to the shortcut
keys for menu “Data/Animate” or “Data/Next model”. Now these shortcut keys are
changed to “Control+A” or “Control+M”.
4/4/02: [Version 1.1 Test only] PM-only model results in always-positive expression
indexes. New outlier detection algorithm
handles the image contaminations more reasonably.
3/13/02: Ecoli
gene information and genome information file available
(suggested by Igor Zwir). In the “Map chromosome”
function, ignore the “MAX_STRETCH limit is
reached” message and uncheck “Tools/Options/Chromosome/Outline significant…” to
turn off the p-value highlighting, since there is only one chromosome and may
result in too many significant p-values.
3/12/02: HG-U133
gene information files available (A, B, unzip all
files to the same directory; helped by Siming Shou and Miguel Rea).
2/1/02: “View/Go
to NetAffx”
to go the NetAffy website for the current probe set. (suggested by Victoria Perreau)
1/31/02: Linking
to online resources such as “View/Go to LocusLink”
may not work on some computers. Checking "Tools/Options/Analysis/Show
online link dialog" to show a dialog containing the web address and also
automatically copy the address to the clipboard, then one can manually paste it
into the address bar of Internet browser. (reported
by Susan Hilsenbeck, Casper Frederiksen,
Victoria Perreau)
1/29/02: (1) Bug corrected: When going from “Clustering View” to
"Data View", the PM/MM data image was not refreshed correctly; as a
result the same probe set persists there. (reported by Greer M. Murphy)
(2) “Image/Export CEL” will export model-based single outliers and array outliers in the
[OUTLIERS] section of the CEL file. (suggested by Edward J. Oakeley and Yizheng Li)
1/26/02: Check
the button “Perform Principal Component Analysis instead” in the “Analysis/LDA
Classification” dialog to perform Principal Component
Analysis. (suggested by Anne Bowcock
and Stephen Haggarty)
1/14/02: Use the
“Analysis/Compare samples/Combine comparisons/Compare on” button to restrict the comparison to a gene list. (suggested by Yingxi Lin)
1/12/02: Updated Yeast S98 gene information file
with GeneOntology terms and added its genome information file.
Downloading of the new version of dChip is needed. (suggested by Simon Lin, courtesy of SGD database)
1/10/02: (1) Combine the data for different species (suggested by Florian Storch and Stephen Haggarty,
courtesy of TIGR RESOURCERER
database)
(2) Bug corrected: when the expression values
are truncated at “Analysis/Model-based expression”, the standard errors are set
to 0. Failure to consider this led to incorrect average of the identically
truncated values between replicates. (reported by
Michael Boutros)
12/30/01: Updated
HG_U95AV2 and MG_U74AV2 gene information file
(944 GeneOntology terms, 971 ProteinDomain
terms and 377 Cytoband terms). In the clustering
picture, use Shift+Left/Right key to change the width
of the annotational columns, and Right-click to go to
the website of GeneOntology or Pfam
entries. (protein domain suggested by Wing Wong and Florian Storch)
12/14/01: (1)
After clustering, the p-values of the gene and sample
clusters are calculated using exact hypergeometric
distribution. Previously the binomial approximation of hypergeometric distribution, and then normal approximation
of the binomial was used. But for these very small p-values a high accuracy is
desirable. At “Tools/Options/Clustering” the default p-value thresholds for
gene clusters is changed to 0.005. (suggested by
Steve Horvath)
(2) The sample cluster p-values are now calculated
with regard to the samples defined in the “Array list file”, not all the
arrays in the group. (suggested by Steve Horvath and
Robert Gentleman)
12/12/01: Map a list of genes to chromosome by
“Analysis/Map chromosome” (suggested by Wing Wong,
Robert Gentleman and Andrea Richardson)
12/11/01: (1) In the “Analysis View”, the error
messages are colored in red. (suggested by Robert
Gentleman)
(2) Small Excel files and exported images are inserted into the
“Analysis View” for convenience (uncheck “Tools/Options/Analysis/Insert Excel
and Image outputs into the Analysis View” to disable the function). The analysis output can be saved into a Word file by
“Analysis/Save”.
(3) “Tools/More gene information” to read in a customized gene information file and use it
with priority over the main gene info file specified in “Analysis/Open group”. (suggested by Michael Boutros)
12/2/01: (1)
Check “Analysis/Model-based expression/Apply log2 transform” to log2 transform the expression values. (also check
“Ignore existing calculated expressions” if necessary). The model-based
standard errors are set to 0 for the modified or transformed expression values.
When working with the log-transformed values at “Analysis/Compare samples”, use E-B, B-E instead of E/B, B/E for fold changes. (suggested by Bradley Messmer)
(2) Check “Tools/Options/Analysis/Mask redundant probe sets when reading
gene list file” to exclude the redundant probe sets
from a gene list. Multiple probe sets for the same gene tend to bias the
result of array clustering and also lead to erroneous functional group
identification in the gene clustering. (suggested by
Bradley Messmer)
12/1/01: (1) Used the Nearest Neighbor algorithm (see the reference) to increase the speed of clustering (e.g. the time of
clustering on 1400*6 values reduces from 80s to 4s). In addition, one can
uncheck “Tools/Options/Clustering/Pre-calculate distances” to calculate the distances between genes or samples on-the-fly;
this is useful when clustering on a large number of genes (e.g
12K), which requires too much memory to store all the distances and causes
virtual-memory swapping that slows the process down. (suggested
by Edward Oakeley)
(2) Uncheck “Analysis/Hierarchical clustering/Cluster genes” to cluster samples without clustering genes. (suggested by Ruty Shai and Bradley Messmer)
(3) Use “Control+Click” to change the color of the GeneOntology
blocks in the clustering picture; the selected colors cannot be saved
right now. (suggested Michael Boutros)
11/23/01: Merged dchip.exe and “dchip
large.exe”. Drosophila chip users can use the normal version of dChip as
well. File conversions of cdf.bin and dcp files will be automatically performed.
11/19/01:
“Analysis/Compare samples” can have different fold
change criterion for E/B and B/E and different mean difference criterion
for E-B and B-E. (suggested by Tiago
R Magalhaes)
11/14/01:
“Clustering/Selected branch/Export image” to export
the clustering image of the selected main gene cluster outlined by blue
lines. The sample clustering tree is not attached to the image. (suggested by Sanjay Jain and Huan
Dong)
11/13/01: In the
“Clustering” view, use Control+Click
to select and color multiple gene or sample clusters. The multiple
clusters can be exported or deleted (gene only) by “Clustering/Selected branch”
functions. Clicking still works to select the main cluster (outlined by blue
lines), used for cluster resampling. (suggested by Huan Dong)
11/11/01: Add
“Tools/Classify Genes” for classifying
genes by functional groups (suggested by Miguel Ramalho Santos and Nikhil Munshi)
11/9/01: (1) Negative expression values are set
to 1 when calculating fold changes in “Analysis/Compare samples”.
Previously fold changes involving negative expression values are set to be
non-informative 0; however when one expression is large (say 1000) and the
other is -10 (at noise level of absent genes), it is helpful to bring the -10
to a small positive number so a large fold change is calculated and the gene
gets selected.
(2) Began to use R as the engine for some
computing and graphic tasks. (suggested
by Robert Gentleman)
(3) Use “Image/Normalization
plot” to view the normalization scatter plot between one array and the
baseline array. (suggested by Casper M. Frederiksen; data courtesy of Andrea Richardson)
(4) On start of dChip there is an automatic display of the dChip updates
since the last use.
10/29/01: (1) Add
“Select by category” button in the
“Analysis/Compare Sample” and “Analysis/Model-based Expression/Export” dialog. (suggested by Robert Gentleman; data courtesy of Andrea
Richardson)
(2) Deleted “Array list file” selection button in many dialogs. Specify
“Array list file” only through “Tools/Array list file”.
10/23/01: Combine comparison criteria using “not” operator, by
“Analysis/Compare Samples/Combine Comparison/Insert complement” button. (suggested by by John K. Park and
Wing Wong)
10/4/01: (1) Use
"Clustering/Similar Profile" function to export
a list of genes with similar profile with the current highlighted gene.
The resultant list can be used as the "gene list file" in
"Analysis/Hierarchical Clustering" dialog to view these genes. (suggested by Andrea Richardson)
(2) Bug corrected: MG_U74 gene information files updated
using Sep.7.01 version of Unigene file. In the
old “mg_u74av2 gene info.xls”, probe set
160309_at was annotated as amelogenin. By checking
with LocusLink (ID: 11704), UniGene
(Mm.172556) and BLAST, it seems that is a mistake. (reported by Feng Wu)
9/27/01: Add paired t-test p-value as a
filtering criteria in “Analysis/Compare
samples”. (suggested by Stephen Henderson, Susan Hilsenbeck and Jenny Z. Xiang)
9/6/01: Gene filtering and clustering decoupled:
first use “Analysis/Filter genes” to generate a filtered gene list (the
filtering can be restricted to an input gene list), then use
“Analysis/Hierarchical clustering” to cluster on the filtered gene list. (suggested by Wing Wong and Laura Forsberg)
9/5/01: (1) “Tools/Gene list file/By keywords”
and “View/Find gene” accepts wildcard strings as
“keywords”. (suggested
by Wing Wong, codes
courtesy of Florian Schintke)
(2) Export functional category information in “compare result file” by
checking "Tools/Options/Analysis/Output GeneOntology
terms". (suggested by Miguel Ramalho
Santos)
8/29/01, 6/17/01:
Output gene list by GeneOntology or keywords
(suggested by Robert Gentleman and Casper M. Frederiksen)
8/23/01: Probe sensitivity
index file. [V1.0 manual] If a PSI file is specified in the “Analysis/Model-based expression/Calculate”
dialog and the checkbox “Use existing probe sensitivity index in this PSI file” is unchecked, the probe sensitivity indexes will be saved after
the model fitting is performed on all probe sets. At a later time, the PSI file can be used to fit the expression values for other arrays by
checking the checkbox “Use existing probe sensitivity index in this PSI file”. (suggested
by Richard Lempicki and Robert Gentleman)
8/20/01: (1) In
“Compare Samples”, dChip will export both fold change or confidence bound of
it, if either of them is used in the filtering criterion. (suggested by Andrew Bent and Soemini
Kasanmoentalib)
(2) “Tools/Reset Default Settings” restores
dChip parameters to the default values. (suggested
by Robert Gentleman)
8/16/01: Improvement of array outlier calling
method: For a probe set, the model fitting still uses all arrays, but
the identification of array-outliers is done for absent arrays and present
arrays separately, to avoid the situation that small standard errors of
expression indexes of absent arrays make present arrays called as array-outlier;
previously I tried to avoid this by only fitting the model using present
arrays, as a result absent arrays are not fitted and not called as outliers
--this led to much fewer array-outliers. (03/03: this is obsolete; in the current method, P/A calls help to correct
signs but do not affect array outlier calling. However, the number of
array/probe outliers is restricted to be at most 50% of all arrays/probes.)
6/29/01: The
model fitting is changed to use only the arrays where a probe set is called
“Present” by Affy’s algorithm (or minimum of 3 arrays
regardless of the Absolute calls). This avoids the situation
where a gene is “Present” in a minority of arrays but these arrays are called
“Array-outlier” for the gene. Now these arrays
are correctly identified as having good patterns. Other changes in the “Data
View” are: in grid (2, 2) an array is represented by a cyan circle if it is
called “Absent” for the gene (blue circles still representing
“array-outliers”); in grid (1, 3) the red fitted curve is always shown whether
the array is “array-outlier” for the gene or not. (suggested
by Brain Yandell, data courtesy of Daniel Auclair and Elizabeth K. Robinson)
7/27/01: If
checkbox “Always show sample names and clusters on the
top” in “Tools/Options/Cluster” dialog is on, when ones scroll down to
see other genes in the cluster one can still see the samples names and cluster
trees. (suggested by Stefano Colella)
7/18/01: Image contamination correction. (suggested by Robert Gentleman, data courtesy of Eric Schadt)
7/13/01: The
“Data/Export probe set” menu can export the PM/MM data for multiple probe sets.
(suggested by Laura Forsberg)
7/1/01: After
“Analysis/Get External Data”, one can use
“Analysis/Normalize” to normalize expression values using a simplified version
of the Invariant Method (see manual).
This function used to be a linear scaling to make the arrays to have the same
median. (suggested by Arindam
Bhattacharjee)
6/30/01: Using
“View/Find Gene” and “View/Find Next”, one can search
genes by keywords such as “troponin”. (suggested by Arindam Bhattacharjee)
6/28/01: A user
can specify a “Working directory” in the
“Analysis/Open group” dialog, under which dChip exports configuration (.ini) file and other output files. (suggested
by Victoria Perreau)
6/27/01: (1)
Navigate probe sets in the array CEL image using 'Home' and 'End' keys. (suggested by Yizheng Li)
(2) Bug corrected: In some
dChip output files line breaks occur after gene descriptions and cause
“frame-shift” in output files. I tried to correct this by eliminating “\n” at
the end of gene descriptions, but let me know if this is still a problem. (reported by Brain Yandell,
Thomas Cappola and David Gerhold)
6/26/01: (1)
Replace “sample name file” by “sample information file” in
“Analysis/Open Group/Other information” dialog. Significant sample clusters can be
calculated. (data courtesy of Andrea Richardson and
Catherine Gradek)
(2) Bug corrected: During
“Analysis/Open Group”, dChip reports “Search and extract PM/MM data from CEL files of chip type
under” but finds no array data files. This may due to “.” in the
directory name. (reported by
Karen Vranizan and Adam Olshen,
Michel Bellis)
6/22/01: (1) The
displaying range of the clustering picture used to be [-3, 3] for the
standardized expression values for each gene. Now this range can be customized
at “Tools/Options/Clustering” dialog. (suggested by
Thomas Seidl)
6/17/01: (1)
Moved “Array list file” dialog under “Tools” menu, instead of having it many
times under various “Analysis/*” dialogs.
(2) Took away
“Start clustering using filtered genes” from “Analysis/Compare samples/Combine
comparisons” dialog, so that “Compare samples” and “Hierarchical clustering”
are decoupled. Use “compare result file” as “gene list file” in
“Analysis/Hierarchical clustering/Filtering genes” dialog for clustering
analysis using filtered genes.
6/6/01: Color genes with a particular function in blue in the
clustering picture. Clicking the function bars on
the right side of the clustering data will select the corresponding function as
the “current function” and color the genes of this function in blue. The
“current function” is also reset when selecting the “functional cluster” icons
on the left pane (suggested by Robert Gentleman)
5/7/01: Use probe set mask file to exclude probe
sets from the analysis . Using dChip “Image/Unscrambled” function we can
move all the excluded probe sets to the bottom of the array
image (still randomly placed; U74A array); we note that for these probe
sets there are still hybridization signals.
(suggested by Jason M. Laramie and Scott Oakes)
4/30/01: Combine the data for
Human arrays of different chip types. (suggested
by Stan Nelson, Daniel Auclair and Isabella Tai)
4/17/01: (1) In
“Analysis/Model-based Expression”, we can check “Use Average Difference instead
of Model-based expression” to calculate traditional Average Difference as expression
levels. [Obsolete: The standard errors are
still for model-based expression levels, and the array-outliers is still
computed using the model-based approach. Note that the Affymetrix Average
Difference method uses a super-scoring method to exclude probes whose PM/MM
difference is outside 3 standard deviation of all probe differences in either
of the two comparing arrays in their comparison analysis. Here since we are analyzing
multiple arrays at the same time, when calculating Average Differences a probe
is excluded if its difference is outlier in any of the arrays, until a minimum
of 5 probes is reached then all 5 probes will be used.] (suggested by Matthew Tudor)
(2) “Analysis/Alternative Transcripts” function: Different tissues types
may give different probe response patterns, this may be due to alternative
splicing. Probe sets called
"array-outlier" in all selected arrays will be exported. (suggested by Patrick Jay)
4/10/01: (1) Bug report: in “Analysis/Compare Samples”, the
size of Experiment group was not used correctly, this may lead to incorrect
standard error and fold change confidence interval calculation. (Stefan Horvath)
(2) “Analysis/Get
External Data” can read in Whitehead RES format data file. (Andy Bhattacharjee)
3/17/01:
“Analysis/Get External Data” to read in an external
tab-delimited data file with first row
array names and first column gene names. Absolute call and standard error
columns can also be contained.
3/15/01: Add CEL intensity pictures also in PM/MM Data view. (Andy Bhattacharjee)
3/7/01: In
“Analysis/Open Group/CDF file” dialog we can specify
a gene information file, which contains
gene descriptions from Affymetrix EASI database, as well as LocusLink
Gene Ontology terms classifying a
gene by its biological process, molecular functions and cellular components.
“Analysis/Hierarchical Clustering” will use such functional category
information to assess whether a local cluster is enriched by genes having a
particular function, and highlight these
“functionally significant” clusters. (Data courtesy
to Dan Tang)
3/2/01: Merge
“Analysis/Pair-wise Comparison, Two-group Comparison,
Filter Interesting Genes” into “Analysis/Compare
Samples”. The three dialogs here can be used to
specify comparisons, combine comparisons and specify arrays used and replicates
to be pooled. The genes satisfying the comparison criterion can be exported to
a file or used for clustering. Sample names followed by “*” refer to how many
additional replicate arrays are pooled for this sample. The function of
exporting expression values is moved to “Analysis/Model-based
Expression/Export”.
2/20/01: In
“Analysis/Hierarchical Clustering/Sample handling” tab, we can read in a gene
function file. The functional categories of genes will be shown as color bars
on the right (data from the reference). In this way we may visually check if
genes belonging to a functional category is enriched in a cluster. (Reference: Cho et al. 2001. Transcriptional regulation and
function during the human cell cycle, Nature Genetics, Vol 27, 48-54)
2/18/01: Take
away “Pool duplicate arrays” and “Group every n
samples” checkboxes in “Analysis/Hierarchical Clustering/Sample handling” tab.
Instead, we can insert “Replicate separator” and “Standardize separator” in array list file. Replicate arrays separated by
“Replicate separators” will be pooled using weighted averaging method (weights
being the measurement accuracy, so expression values with large standard errors
receive smaller weights), and samples separated by “Standardize separators”
will be standardized (rescale to have mean 0 and standard deviation 1 across
samples for each gene) within themselves. If “Only draw lines for standardize
separator” checkbox in “Sample handling” tab is checked, “Standardize
separators” just add vertical lines between
group of samples.
2/11/01: We can
specify a data file list in “Analysis/Open
Group” dialog (leaving “Data directory” blank) and
dChip will use the arrays specified in the file. There can be directory names
in the data file list. (suggested by Michael Angelo)
2/10/01:
“Analysis/Hierarchical Clustering” can read in a tab-delimited data file,
without opening a group of arrays first.
2/7/01:
“Clustering/Show Profile” function displays a profile
plot for the currently selected cluster. The Y-axis has the same range as
the color scale on the bottom of the picture. The value of the profile curve
for each sample is the average of the standardized expression values of all
selected genes in this sample (standardization is a linear scaling for each
gene so its expression values across all samples have mean 0 and standard
deviation 1). The error bar extends 1 standard deviation (of the selected
genes’ standardized expression values in a sample) on both sides. Shorter error
bars indicate tighter clustering of genes at this sample point. (suggested by Deming Wang)
2/6/01: (1) Add
“Clustering/Save Tree” function for saving the clustering result, and the file
can be read in as “Analysis/Hierarchical Clustering/Filter genes/Gene list or
tree file” (and the filtering criterion are thus ignored). (suggested by Stan Nelson)
(2) In
“Analysis/Model-based Expression” dialog, we can specify to output an array
quality summary file, containing percent of probe sets called “array outlier”,
percent of probe pairs called “single outlier”, and percent of “P” calls.
Arrays with more than 5% array outliers change their icons to dark blue. We
need to redo “Open Group” (check “ignore existing DCP file), “Normalize” and
“Model-based Expression” to calculate these statistics. (suggested by Stan Nelson, Andrew Kirby)
2/1/01: (1)
Merged “Analysis/Export Expression” into “Analysis/Two-group Comparison”. If no
arrays are chosen in group 2, the expression values in group 1 will be
exported.
(2) We can use “Analysis/ Hierarchical Clustering/Array list file” tab
to create an “array list file”. When specified in “Analysis/ Hierarchical
Clustering/Sample handling” tab, this file dictates which arrays are used for
clustering in what order.
1/31/01: (1) Add
“Clustering/Export Selected” menu to export the expression data of selected
braches. The exported file can be used as “gene list file” in
“Analysis/Hierarchical Clustering” dialog to perform clustering using only this
subset of genes. (suggested by Deming Wang)
(2) In
“Analysis/Model-based Expression” dialog, we can choose to truncate low or
negative expression values to a small value, or to a given percentile of the
expression values that are called “A”. (suggested by
Stan Nelson)
1/30/01: The
“Analysis/Filtering interesting genes” dialog now accepts simple logical
combination of criterion using AND and OR. (suggested
by Michael Zhang)
1/25/01: An icon
will be added for each exported tab-delimited analysis result file, under the
Analysis icon. Clicking it will invoke Excel to open the file. (suggested by Mei Xu)
1/24/01: Add
“Image/Unscramble” function, which re-organizes the probes of the same probe
set together for arrays using "distributed probe set format" (e.g.
Human U95 arrays), so that we can view such arrays in the old way. This makes
“Image/Array outlier” function still applicable for such probe-scrambled
arrays. (suggested by Andy Bhattacharjee)
1/23/01: (1) Add
checkbox “GCT format for GeneCluster” in
“Analysis/Export Expression” dialog, for exporting expression data files to use
with GeneCluster. (suggested by
Michael Angelo)
(2) Clustering
View is linked to CEL and PM/MM Data views. That is, in the clustering
picture, we can click a data point (the expression value of a gene in a
sample), and go to look at the CEL level
data. This is useful for those who are curious about unusual data points in the
clustering picture (such as large negative expressions values), and want to
trace back to the raw data.
1/22/01: Options
added in “Analysis/Export Expression” and “Analysis/Hierarchical Clustering”
dialogs so we can treat expression values identified by the model to be outlier
(i.e. array-outliers in CEL images) as missing values. They are exported as
blank entries in tab-delimited file or shown as black (Blue/Red coloring) or
white (Green/Red coloring) boxes in clustering picture. This is another way of
using measurement error of model-based expression values in down-stream
analysis, besides resampling clustering trees. (suggested by Priya
Sudarsanam)
1/21/01: (1) Add
“View/Export image” menu item, for exporting CEL, PM/MM data or clustering images into BMP file.
(2) In “Analysis/Hierarchical Clustering” dialog, we
may read in a “gene list file” (each line has a probe set name) to cluster a
pre-selected subset of genes (filtering criterion are thus ignored; this file
may be the output file of “Analysis/Filter interesting genes” function). We can
also read in an “array list file” with each line specifying an array to be used
as columns of clustering data matrix. We may also pool duplicate arrays using
measurement-error weighting scheme before clustering.
1/18/01: (1) Add
“Image/Export CEL” menu item. We can use it to export normalized data
into CEL-like file. If you want to export the raw data,
check the “Use unnormalized data” checkbox in
“Analysis/Open group” dialog when opening a group. (suggested by Margaret C. Cam)
(2) When viewing
array images, we can use the four arrow keys to zoom in and out. (suggested by Andy Bhattacharjee)
1/15/01: (1)
Changed “Analysis/Open group” dialog, so we can read in gene name file
(tab-delimited file, the 1st column is Affymetrix probe id, the 2nd
column is gene name/description) and sample name file (the 1st
column is array file name (without .cel or .dat suffix), the 2nd column is sample name).
Such information will be used when exporting results or displaying clustering
trees.
(2) Add
“Analysis/Filter interesting genes” dialog, for filtering genes by fold changes
(or the lower confidence bound of them) of multiple pair-wise comparisons. (suggested by Dan Tang)
1/10/01: In
“Analysis/Hierarchical Clustering” dialog, we may group the samples for the standardization
purpose. That is, in stead of standardizing a gene to have mean 0 and standard
deviation 1 across all samples, we standardize it’s expression values in
samples of the same experiment (using the same cell lines) to have mean 0 and
standard deviation 1. This is because we are interested in the differences
caused by the various treatments, instead of the differences existing among
cell lines. (suggested by Deming Wang)
1/9/01: We can
right-click a non-gene node in the clustering tree to exchange the positions of
its two branches, in order to interactively adjust the ordering of genes in
clustering trees. (suggested by Andy Bhattacharjee)
1/5/01: In
“Analysis/Open group” dialog, we can read dChip (DCP) files, which is the
format used internally by dChip. In this way, we only need to carry dChip files
around along with dChip to demonstrate the downstream analysis.
1/1/01: (1) In
“Analysis/Open group” dialog, we can extract CEL and DAT files at the same time. (suggested by Yan Cui)
(2) “Hierarchical
Clustering” is now two-way.
12/18/00: Add
“Hierarchical Clustering” analysis item and menu.
12/4/00: Take
away "Look for presence calls in TXT files?" checkbox in "Open Group" dialog. dChip will always look for TXT files for presence calls, if not found it will calculate them in a
similar way that is described in Affymetrix Analysis Manual (93% agreement with
GeneChip’calls in one comparison).
(Updated 12/6/07)