dChip: Function Updates (2000-06)


Recent updates

 

12/14/06: For tumor-only LOH inference, when no "Options/Reference genotype file" is specified, the normal samples specified in sample info file as "Ploidy(numeric)" of 2 will be used to estimate SNP heterozygosity and genotype dependence probabilities.

 

11/16/06: At the clustering or chromosome view, use “View/Find sample” to find a particular sample. (suggested by Yohan)

 

9/17/06: At the chromosome view, control+click a data point to append its information to a "data points.txt" file in the working directory. The information includes sample name, chromosome position, and gene names within 100Kb surrounding region. This can help to manually identify copy changed regions from raw copy number view when data is noisy or human eyes work better. 2/8/07: Show or export the nearest gene to a SNP.

 

8/29/06: Faster display update when zooming at the "Analysis/Chromosome" view.

 

7/31/06: At "Open group", there is an option for specifying TXT file suffix (e.g. ".brlmm.txt"). At "View/Export image", "Export all chromosomes individually" can be checked. (suggested by Charlotte Schjerling)

 

7/27/06: Check "Analysis/Open group/Perform 'Analysis/Normalize & MBEI afterwards" to continuously execute these three steps. Normalization and MBEI will use options set at "Open group/Options/Model", and the baseline array will be the default one with median overall intensity. (suggested by Charles Mullighan)

 

6/1/06: Specify consanguineous relationship in a family to reduce pedigree size and speed up linkage analysis.

 

5/29/06: Read SNP CEL files without matching TXT genotype files but with combined genotype file.

 

4/10/06: Combine two sub-arrays at "Open group" without using external data files.

 

12/12/05: To make a copy number summary plot similar to Figure 1B of Zhao et al. 2005, select "Chromosome/Summary Plot" at the inferred copy number view. (suggested by Edward Attiyeh)

 

11/17/05: In the chromosome view, check “Tools/Options/Clustering/Sample names always visible” to always display the sample names and information on the top of the data area. (suggested by Charles Mullighan)

 

11/13/05: Specify “Analysis/Open group/Other information/SNP information file” to provide allele frequency and other SNP information. (suggested by Ann Mullally)

 

11/11/05: Check “Tools/Options/Chromosome/Show only tumor of paired sample” to display only tumor samples in copy number view when paired normal and tumor samples both exist in array list file. (suggested by Peter Ouillette and Changzhong Chen)

 

11/8/05: Process human exon array data and view data along chromosome.

 

11/3/05: Dahia et al. 05 identified a novel loci for familial pheochromocytoma syndrome by integration of two-locus linkage analysis, transcription profiling, and genome-wide SNP-based copy number mapping.

 

11/1/05: Use an array list file to define batches and the “Tools/Adjust batch effect” function to scale (multiply a value) the expression values from batch 2 to the last batch, so that for each gene, the mean of each batch is the same as the mean of the 1st batch. Then one can redo gene filtering and clustering to see if batch effect is gone. If so one may use “Analysis/Compare samples” to pool samples from different batches and do comparison.

 

10/15/05: (1) Select “Analysis/Normalize/Options/Normalization method: Quantile normalization” for quantile normalization (Bolstad et al. 2003, Workman et al. 2002). 6000 matching quantiles from two arrays are used to fit a running median normalization curve. M-A plots are also added in the Normalization Plot. (suggested by Igor Klacansky)

(2) Select “Analysis/Model-based Expression/Options/Method used: Average Difference” to use Average Difference method to compute expression values or signal.

(3) Set "Tools/Options/Chromosome/HMM length" to N to perform linkage analysis for segments of N markers at a time (e.g. 2000 for 100K array where chromosome 1 has more than 9000 markers). (suggested by Annemieke Verkerk)

 

10/13/05: Use “Analysis/Filter SNPs” to select better SNPs using fragment length or No Call rate to use in the downstream analysis. (suggested by Zhigang Wang)

 

10/12/05: “Image/Normalization Plot” is now implemented within dChip without calling R. They can be viewed during normalization as well by checking “Analysis/Normalize/View normalization plot”. 

 

10/7/05: Add "Pathway drawing and analysis".

 

10/6/05: Add the manual page “Allele sharing analysis” for SNP array, including Application in graft-versus-host disease, Non-parametric linkage analysis and Permute to find significant loci.

 

10/5/05: Add functions to analyze 500K SNP array.

 

10/3/05: (1) Set “Options/Chromosome/Inferred copy method” to be “Median smoothing” and set a SNP marker window size (e.g. 10) to median smooth raw copy numbers as the inferred copy number.

(2) To infer the LOH status of non-informative LOH calls from paired normal/tumor LOH analysis, the method “Options/Inferred LOH method/Same boundary” can be used in addition to the HMM method.

(3) In the clustering or chromosome view, set “Options/Clustering/Number of letters shown for sample information” to be greater than 1 to display 1 or more letters above samples. (suggested by Charles Mullighan)

 

10/2/05: (1) In the SNP copy number analysis, when normal samples are not available or too few, set “Options/Chromosome/% of samples trimmed” to be > 0 to obtain reference signal distribution without using the information of which samples are normal.

(2) Set “Options/Chromosome/HMM length” to N to perform HMM inference of LOH and copy number for a stretch of maximum N SNP markers each time. This can increase the speed for SNP array with density > 100K, where chromosome 1 has > 9K marker but one can set “HMM length” to be 1000.

 

10/1/05: (1) In the SNP copy number analysis, check “Options/Chromosome/User paired normal as reference” to use the signal of the paired normal to obtain the raw copy numbers of tumor samples, as opposed to using the average signal of all normal samples. (suggested by Peter Ouillette)

(2) Use the "Tools/Percentile filtering" function to select genes by its fold change between a high and a low percentile across samples. (suggested by Wing Wong)

(3) At the gene clustering view, “Clustering/Export Same Gene” can export the probe sets belonging to the same gene as the selected probe sets (click the area between the clustering tree and blue-red image to select a single probe set). The exported gene listed can be viewed by “Analysis/Clustering”. (suggested by Wing Wong)

 

9/29/05: Cheng and Peter Ouillette successfully troubleshooted a dChip usage problem through remotely accessing Peter’s PC. This first try appears to be an efficient way to solve elusive problems.

 

9/28/05: Start to put the dChip manual into a WikiBook, so anyone can edit it or post discussions. (3/13/06: this effort has stopped)

 

9/23/05: At “Analysis/Genome”, report in the analysis view the gene names in the significant stretches and the information of multiple comparison, such as “24 significant stretches found at 0.05 level from 2494 p-value assessments”. (suggested by Pawel Michalak and Wei Zhao)

 

9/21/05: Added common probe set file for HG-U95Av2 vs. HG-U133_plus_2 array.

 

9/15/05: “Tools/Make information file” may report the error of “NumTerm == MaxTerm at category 'Gene Ontology'”, which is due to more GO terms than before in the latest GO structure files. Update to the latest dChip to correct it. (reported by Xueqing Zhang and Patrick Loerch)

 

9/13/05: “Tools/Options/Model” adds an option to truncate negative PM/MM differences to 0 before modeling in the “PM/MM difference model”. By default it is checked to compute all-positive expression values; uncheck it to use the method as before.

 

9/12/05: “Google site search” function is added to the dChip main page and manual page.

 

5/26/05: (1) Use Affy Files parsers SDK to read in binary CEL files. So it is no longer needed to convert binary CEL files to text CEL files for dChip to read.

(2) Uncheck "Analysis/Open group/Options/Load probe data in memory" to not load probe data so that a large dataset containing many arrays or large array types (e.g. 100K SNP array) can be loaded faster. Then do normalization and model-based expression computation the same way as before. However CEL image and PM/MM data views are not available since they use probe level data.

 

5/5/05: Check “Tools/Export expression value/Append to this file” to append the output data to an existing data file. This is useful for combining the data of sub-arrays. (suggested by Changzhong Chen)

 

3/29/05: Update to handle 3/23/05 Affymetrix annotation CSV files. In the “HG-U133_Plus_2_annot.csv” file, “LocusLink” is changed to “Entrez Gene” in the header line and "Chr:" is changed to "chr" in the “Chromosomal Location” column. Also handle the tab in the "Gene Title" column (e.g. 1425167_a_at in Mouse430_2_annot.csv) to generate correctly formatted gene info file. (suggested by Andrea Richardson)

 

2/23/05: “Analysis/Open group/Other information/Probe set mask file” can accept individual probes to mask them out from CDF file. 4/15/05: Corrected the bug that eliminates both probe 4 and 14 for a string "14,".  (suggested by Igor Leykin and Bin Yao)

 

9/14/04: At “Analysis/Hierarchical clustering/Options/Standardize rows”, one can select a sample (default is mean) to be subtracted during standardization. This is useful when a known baseline is desired to be displayed as white for all genes, and other samples display relative up-regulation (red) or down-regulation (blue). (suggested by Changzhong Chen)

 

8/28/04: Use Affy C++ source code (Files parsers SDK) to read binary CDF files. (suggested by Lucy He)

 

4/13/04: “Tools/Print settings” will print out the current settings and parameters. Then the whole analysis log can be saved using “Analysis/Save log”. (suggested by Bill Sellers) [11/11/05: moved this function to the “Tools/Options/Print Settings” button.]

 

4/4/04: Update “Tools/Make Information file” to handle 4/2/04 NetAffx annotation and ortholog CSV files, which has slightly different format than previous CSV files.

 

3/19/04: Update “Tools/Make Information file” to convert current NetAffx ortholog CSV files to dChip common probe set files. This is useful for combining expression data across species. (requested by Enrique Millan)

 

12/26/03: (1) One may add a numerical column in “sample information file”. The column header needs to contain “(numeric)”, for example, “Age(numeric)”. Such continuous variable will be standardized and displayed in the clustering picture. (2) “R view/ANOVA filtering” and “Clustering/Similar profile” are merged into “Analysis/Analysis of Variance”.

 

10/22/03: Update to handle Oct. 2003 or later NetAffx CSV files when making gene info files.


10/16/03: You may use this Affy CEL file converting tool to convert the new binary-format CEL file to the old text-format CEL file so dChip can read.

8/9/03: [New] In the PM/MM data view, select menu “Data/Show All Array” to view the probe patterns of the current probe set in all arrays specified by the array list file. Such probe-level data and patterns are very useful for confirming the properness of computed gene expression values and changes. (suggested by Yu Guo)

7/30/03: Two software GoSurfer and Tight Clustering developed by Wong lab can be called from dChip at the “Tools” menu. (suggested by Wing Wong. GoSurfer developed by Sheng Zhong. Tight Clustering developed by George C. Tseng)

7/8/03: “Tools/Make information file”: the maximal number of protein domain terms allowed is increased to 4000, so that the NetAffx June 2003 annotation file for HG_U133A can be processed (reported by Mike Wang).

7/8/03: [6/27/03 bug fixed] RG_U34A array has probe set names containing “#” (e.g. L00981mRNA#2_at). Anything in a line after “#”will be interpreted as comments by the R “read.table()” fucntion used in the R view  “Get expression” button and this caused failure of the “Get expression” function. Now the “comment.char = ""” parameter is added to the read.table() function. (reported by Charlotte Schjerling)

7/3/03: [Bug fixed] When “Analysis/Open group/Options/Mask redundant probe sets” or “Omit Affymetrix control probe sets” is checked, the annotation information in the gene information file for redundant or Affy probe sets was not used at all. However we want to read the information for these probe sets, but not count them in the gene numbers associated with annotation terms to avoid biasing the “significant gene clusters” or “Tools/Classify genes” functions. This is corrected now. (reported by Susan G. Hilsenbeck)

7/3/03: Use dates in the version number (e.g. Version 1.3 Test (7/3/03)) to better keep track of dChip updates. (suggested by Charlotte Schjerling)

6/27/03: The expression data object obtained by “Get Expression” in the R view now has probe set names as row names. (suggested by Leo Schalkwyk)

6/25/03: [Bug fixed] A bug introduced on 6/13/03 switched the PM and MM row in the Grid (2, 1) of PM/MM Data view. (reported by Jorg D. Becker)

 

6/12/03: [Bug corrected] 5/21/03+ versions regard probe set names starting “AF” (“AFFX” had been used before) as Affy control probe sets. However, when “Tools/Options/Analysis/Omit Affy probe sets” is checked, some probe sets in RG_U34A array are wrongly regarded as control probe sets (e.g. AF007107_s_at). Now “AFFX” is used again. (reported by Charlotte Schjerling)

 

5/28/03: [New feature] At “Analysis/Filter genes”, one can choose “Standard deviation” in criterion 1 for variation filtering. This is useful when the data is log transformed (by checking “Analysis/Open group/Options/Log transform” or reading in log-scaled data at “Analysis/Get external data”), and standard deviation is preferred over CV (coefficient of variation, or standard deviation / mean) due to variance stabilization property of the log transformation. (suggested by Wing Wong)

 

5/22/03: [New feature] Check “Analysis/Open group/Options/Ignore probe sets” to not use probe set information in the CDF file. This is useful when there are too many probes in a probe set, and our main interest is only normalizing arrays. (suggested by Todd Mockler)

 

5/21/03: [New feature] Assess empirical false discovery rate at “Compare samples”.

 

5/21/03: [Change] For the unpaired t-test of “Analysis/Compare samples”, the degree of freedom was group.size1 + group.size2 – 1 (instead of –2 to accommodate 1 vs. 1 comparison, so it’s ok when n1=n2=1 and standard error of a single expression value is model-based). Now the Weltch correction for d.f. is used. The comparison result files have header line “[COMPARE_CRITERIA_V2]” to indicate these changes.

 

5/19/03: [bug corrected] Extra tabs added to the end of array list file name and filter input file name in INI file. The problem may be due to that a comparison criterion file has been saved in Excel, and Excel added some extra tabs on rows with only 1 column. Later on dChip reads in the comparison criterion file and uses the array list file and filter input file name in it. (reported by Keith Crist; 6/11/03: similar problem reported and corrected for “lda result file”)

5/17/03: [New feature] In “Image/Normalization plot”, any array can be selected as the baseline array. Thus the scatterplot of probe values between any two arrays can be visualized.

5/8/03: [New and change; updates from now on may not be reflected in the manual at the same time] 1) Euclidean distance can be used for clustering (set at “Tools/Options/Clustering”), which may be more reasonable to use for sample clustering using gene-wise standardized values. 2) By default “Tools/Options/Clustering/Add new color for Control+Click” is not checked and new “Control+Click” clusters will be in light blue to distinguish from the current “Click” cluster in blue. 3) If a single gene is selected in the clustering picture, the value range for a gene is displayed but the displaying range is from 0, so that relative fold change can be visualized. 4) At "Cluster/Selected branch/Export" data, if "Output all colored gene branches" is checked, all the colored branches (selected by Control+Click and Click) will be exported. Note that the samples exported are all the colored ones, and the gene clusters exported are in the order of clicking, not the visualized order. (suggested by Wing Wong)

5/8/03: [Withdrawn] “Analysis/Normalization” starts the “Invariant Set” selection form a list of “Stable probe sets”.

4/20/03: [New feature] If “Clustering/Selected branch/Export data/Cut the tree at the height of current branch and export all branches” is checked, one may export gene expression data grouped in clusters. These clusters are obtained by cutting the gene clustering tree at the height of the selected blue branch. In previous version, multiple colored (by using Control+Click) gene clusters was exported. (suggested by Bin Zhang)

4/17/03: [New feature] In “Image/Normalization plot”, one can also plot the normalized values on the X-axis by checking the option “Use normalized values to view the result after normalization”. (suggested by Wenhong Fan)

 

4/12/03: [New feature] In “Tools/Gene list file/By Annotation”, one can select multiple terms to get the union or intersection of the genes belonging to these categories, and apply the “Filter genes” function immediately using this gene list as the input gene list. (suggested by Wing Wong)

 

4/12/03: [Change] “View/Go to GenBank; Go to UniGene; Go to LocusLink; Go to NetAffx” are combined into “View/Online Database”. Check the “GenBank” option to link to the “UniGenedatabae. Withdraw the “probe number” option from “View/Find Gene”.

 

4/9/03: [Bug corrected] When “Tools/Options/Analysis/Do not read array list file” is checked, the individual arrays are not correctly treated as individual samples, and this causes no sample names in “Compare samples” dialog and incorrect “Filter genes” results. 7/7/03: a 4/9/03 bug is corrected: checking “Tools/Options/Analysis/Open group/Do not read array list file” made array list files not used both at “Open group” and after “Open group” (reported by Sabina Chiaretti and Mark).

 

4/7/03: [But corrected] When the lines of a “Comparison gene list file” are long, “Tools/Classify genes” may causes crashing. (reported by Shahab Asgharzadeh)

 

4/5/03: [New feature] One may use “Tools/Make information file” to generate dChip information files based on the quarterly updated NetAffx annotation files. Also see the ChipInfo software for broader applications of this effort.

 

3/17/03: [New feature] In hierarchical clustering, the “Average linkage” method can be specified at “Tools/Options/Clustering”. Previously the only linkage method is centroid linkage. (suggested by Casper Frederiksen and Jean-philippe Brunet)

 

3/16/03: [New feature] Check the “Copy to clipboard” option in “View/Export image” to copy the image to the clipboard in BMP or EMF format. (suggested by Yu Guo)

 

3/16/03: [Change] In “Analysis/Compare samples”, the presence call % criterion can be specified for the baseline and the experiment group separately. (suggested by Tao Lu)

 

3/16/03: [Change] The compare criteria are saved in the beginning of the “Compare result fileif  Analysis/Compare samples/Combine comparisons/Output comparison criteria” is checked, instead of a separate “Compare criterion file”.

3/6/03: [Bug corrected] A file name like “D:\chip.data\all.dchip\file.CEL” led dChip to extract the file as a DAT format file since “.dat” is found in the file name (reported by Xueqing Zhang).

3/6/03: [New feature] The outputs of “Analysis/Normalize” contains the median probe intensity before and after normalization (suggested by Eric Libby).

3/3/03: [New feature] Read in CEL file of PM-only array (suggested and testing data by Jeremy Erickson).

3/1/03: [Bug corrected] The “LDA classification view” sometimes did not update the screen image correctly between separate “Analysis/LDA Classification” calls.

2/26/03: [Bug corrected] An ending "\" or internal “\\” specified at the "Tools/Options/Analysis/Working directory" string (e.g. “C:\array\\other\lung\lung_loh\” caused that the file names in some dialogs such as “View/Export image” cannot be clicked and changed. To change such existing path names, either edit the configuration (*.ini) files in the same directory as dchip.exe, or apply “Tools/Options/Reset default” (reported by Yu Guo).

2/5/03: [New feature] Check “Tools/Export data/Expression value/Include header information” to include information such as the modeling method, baseline array into the exported expression data files (suggested Victoria Perreau).

1/31/03: [Bug corrected] dChip reverses signs of theta’s and phi’s during model fitting to ensure most theta’s and phi’s are positive. In PM-only model, counting zero theta values as negative caused the possibility of having negative expression values (reported by Tanya Logvinenko).

1/28/03: [Bug corrected] 1. When the number of arrays is small (e.g. 2), the gene names were not displayed in the Clustering View. 2. The “View/Export image” dialog sometimes didn’t show up; also see note 2/26/03 (reported by Yu Guo).

12/30/02: [Withdrawn feature] The “Analysis/Print” function does not work properly and is withdrawn. One can use “Analysis/Save” to first save the contents of the Analysis View into a Word file and then print. (reported by Susan G. Hilsenbeck)

12/19/02: [Bug corrected] “Image/Unscrambled” (renamed to “Image/Probe Together”) caused various problems when computing expression values in the “unscrambled” mode (e.g. the background values used for PM-only model don’t correctly consider the unscrambling effect). Now this function is re-implemented in a new mechanism. (reported by Reinhard Hoffmann, Thomas Seidl, Laurent Gautier and James MacDonald)

12/11/02: [Bug corrected] DAT files of arrays with large dimension (e.g. ATH1 with 712^2 probes) were not read correctly at right and bottom margins. (reported by Susan J. Miller)

12/10/02: [New feature] “Tools/Options/Model/Exclude x 5’ probes” to always call the x most 5’ probes in a probe set as probe outlier, thus not use them in the model-based expression values. This is useful when there is known mRNA degradation in the sample and 5’ signals are not reliable, or when small samples are amplified using 2-round IVT protocols and 5’ probes tend to have amplification biases. (suggested by Edward Fox and Christine Konradi)

12/9/02: [Change] The “Analysis/Model-based expression/Log x transform expression values” option is moved to “Tools/Options/Analysis”. This way the DCP files always store original expression values, and a user can choose to log transform the expression values at the “Analysis/Open group” step. (A bug – always log-transforming after “Open group” – was introduced here and is corrected on 12/30/02; reported by Philippe Guardiola and Susan G. Hilsenbeck) (A bug – storing log-transformed expression values into DCP files after “Model-based expression” – was introduced here and is corrected on 1/16/03; reported by James MacDonald)

12/6/02: [New feature] Check “Tools/Options/Model/Do not call all replicate arrays as array outlier" and then specify an array list file with replicate separators to discard array outliers called in all replicates of a tissue type, since this is the real biology effect. (suggested by Joerg D. Becker)

11/28/02: [New feature] “Image/Scale CEL value” can be used to scale (multiply a constant value) the unnormalized CEL values in an array so that the median intensity is a particular value. This is useful for normalizing different tissue types. (suggested by Joerg D. Becker)

 

11/19/02: [Bug corrected] Sample information file with more than 20 columns caused dChip to crash. Now the limit is increased to 40 columns and the boundary check for this value is added (reported by Tao Shi).

 

11/16/02: [V1.3+] “Analysis/Chromosome” function can display the expression data of a list of genes along chromosomes. (suggested by Stanley F. Nelson and Robert Gentleman)

 

11/15/02: dChip 1.2 released. See below for function updates.

 

10/31/02: [New feature] Display the relative probe position information in the “Data View”.

 

10/31/02: [Change] “Analysis/Map chromosome” renamed to “Analysis/Genome”.

 

10/31/02: [Change] “Tools/Options/Model/Perform outlier detection” is split into “Check array outlier”, “Check single outlier” and “Check probe outlier”. One may change these options to perform or not perform a particular outlier detection. 

 

10/30/02: [Change] When a TXT file (containing A/P calls) has any probe set not in the CDF file, “Analysis/Open group” will ignore the TXT file and compute its own A/P calls. One can disable this feature by checking “Tools/Options/Analysis/Allow TXT files to contain probe sets not in the CDF file”, so that dChip only ignores the unknown probe sets (e.g. those masked by “Probe set mask file”). (suggested by Igor Klacansky)

 

10/27/02: [New feature] The “Image/Normalization Plot/Use smoothing spline to normalize and save result to DCP file” option allows for using a smoothing spline to fit the normalization curve to the points in the “Invariant Set”. (suggested by Xinmin Zhang)

 

10/12/02: [New feature; 05/03: withdrawn] “Analysis/Normalization” can start the “Invariant Set” selection form a list of “Stable probe sets”. For example, one can perform “Analysis/Model-based expression” without normalization and then use “Analysis/Filter genes” to use only criteria (2) with 100% threshold to obtain genes called “Present” in all samples. Then use this gene list as “Analysis/Normalization/Stable probe sets” to normalize arrays and re-compute expression values. Uncheck “Apply ‘Invariant Set’ selection…” will use all probes of the “Stable probe set” for normalization. It is also good to use “Image/Normalization plot” (see below) to check the validity of the normalization.

 

9/19/02: [Bug corrected] Source code bug in partial_sort() function corrected. This bug may affect the median or percentile computation such as those used in the outlier detection procedures (reported by Ming Lin).

 

[New feature] Gene Filtering by ANOVA through the “R View” (see Interface with R software for necessary setup procedures to use the function) (suggested by Frank Buxton, Susan Hilsenbeck and Dona Wu).

 

[New feature] Look for EXP files with the same name as CEL file for “Description”, and if available use it as ChipName (also can be supplied by “sample information file”). Check “Tools/Options/Analysis/Use ‘Description’ in EXP file as array name” to enable this option.


[New feature] Report the file format number of DCP and CDF.BIN files during “Open group”. dChip 1.1 and 1.2 uses format 3; dChip 1.0 uses format 2; dChip beta test version used format 0 and 1. (suggested by Susan Hilsenbeck)

 

[New feature] Read in the MAS5 “Signal” from MAS5 analysis result file by checking “Open group/Read in MAS5 Signal” (suggested by Greer M. Murphy and Song Her).

 

[New feature] Check “Tools/Options/Analysis/Search and save DCP file in the Working directory” to store DCP files into different places than CEL files. This way we may perform different analysis (e.g. normalization using different baseline array, MBEI with log transformation) and store the results into DCP files under different directories while maintaining the single copy of CEL files. (suggested by Anne Bowcock, Victoria Perreau and Susan Hilsenbeck)

 

[New feature] Take log base on 10 or other bases at “Analysis/Model-based expression” (suggested by Casper M. Frederiksen).

 

[New feature] “Logged” indicator at the lower-right corner to distinguish log-transformed expression indexes. (suggested by Susan G. Hilsenbeck)

 

[New feature] The “Select sample by category” button in the “Compare samples” dialog has “Use inversion” button for selecting samples not having a particular property.

 

[New feature] In the “Chromosome View”, use “View/Find gene” to find a specific gene in the highlighted set (suggested by Isabella Tai).

 

[New feature] Add “Windows Enhanced Metafile (*.emf)” image format at “View/Export image”. The file is in vector format and can be enlarged without losing resolution. It can be inserted in Word or Powerpoint files by “Insert/Picture/From file” or converted to EPS format by Adobe Illustrator.

 

[New feature] “Image/Export CEL” dialog has an option “Export probe set name, probe pair order and PM/MM indication”, which will add additional data columns correlating a probe cell to its corresponding probe sets (suggested by Yuval Kluger)

 

[Change] Simply the menu items “View/PM/MM Data”, “View/CEL Image”, etc. to a single “View/Next view”. Also one can use “Enter” or “Shift+Enter” key to switch to other views.

 

[Change] In “Compare sample result” files the number of decimal digit is reduced to 2 for easier reading. (suggested by Feng Wu)

 

[Change] “Image/Export CEL” can export all arrays into CEL files at once.

 

[Change] “Image/Normalization plot” will use the chosen baseline if the array has been normalized; otherwise use the default baseline with median overall intensity (suggested by Tiago Duarte)

 

[Change] Require MAS text file has the “Signal” column as well as the “Detection” column.

 

[Change] The “Analysis/Hierarchical clustering/Only draw lines for standard separator” checkbox is moved to “Tools/Array list file”. (suggested by Robert Gentleman)

 

[Withdrawn feature] V1.2 cannot convert CDF.BIN file and DCP file in the old format to the current format (file format 3). Use dChip v1.1 to do this.

 

[Bug corrected] “Tools/Options/Analysis/Mask redundant probe sets…” and ““Tools/Options/Analysis/Omit Affy control probe sets…” take effect at reading gene lists or filtering genes, but not at “Open group” where the gene information file is read in. This leads to artificial significant functional groups. For example, with “Mask redundant probe sets” checked, “Tools/Classify genes” on the all the probe sets in HG_U95AV2 chip will result in "Found 544 GeneOntology ‘cell fraction’ genes in a 5933-group (all: 703/8100, PValue: 0.004912) ***”; this is because the relative size of some functional groups has been increased by removing duplicate probe sets from the list. This bug is now corrected; note that after changing these two options one needs to do “Analysis/Open group” to re-compute the number of used probe sets for each GO function (reported by Kieran Holland).

 

[Bug corrected] When using “Analysis/Save” to save the analysis results into a word file, dChip ignores the user-specified file name (reported by Anna Tsimelzon).

 

[Bug corrected] “Tools/Export data/Expression value” exports GCT file (format 1.2) that can correctly work with GeneCluster 2.0.

 

5/8/02: The dChip console (command-line) version continuously executes the normalization and model-based expression steps to generate a tab-delimited text file containing the expression values. Source codes available to use dChip on other platforms. (suggested by Casper Frederiksen and Allen Day)

 

4/28/02: dChip version 1.1 (Suggested and helped by: Jianhua Hu, Edward J. Oakeley, Simon M Lin, Allen Fienberg, Sanjay Jain, Chunfa Jie, Greer Murphy, Ken Aldape, Tiago Duarte, Tiago R Magalhaes, Igor Zwir, Dale Muzzey, Mayetri Gupta):

 

New:

· PM-only model results in always-positive expression indexes. Specify different methods through the “Analysis/Model-based expression/Options” dialog or use the menu “Data/Next model” to switch between models.

· Handles Human U133 chip via file format change (V1.1 will upgrade the DCP or CDF.BIN files generated by V1.0)

· New PSI file format; specify PSI file through the “Analysis/Model-based expression/Options” dialog; use “Tools/Export data/Probe sensitivity index” to export f vales and their standard errors as text format.

· The input gene list in the “Analysis/Filter genes; Compare samples” dialog can be used to exclude these genes from filtering or comparison. Click the “Filter on” or “Compare on” file buttons to switch the mode.

· The context-specific links to online manual in various dialogs.

 

¾ Clustering

· “Tools/Options/Clustering/Standardize rows” option. One may choose not to standardize a gene’s expression value across samples when the scale of the data is already adjusted.

· “Clustering/Selected branch/Export data” has the option to export gene-wise standardized values.

· “Tools/Options/Clustering/Distance” has the option to use 1 - |r| as distance measure, where r is the Pearson’s correlation.

· “Clustering/Similar profile” can search genes with high positive or negative correlations with the current gene or the selected gene branch. When “Standardize separators” are present, check “Analysis/Hierarchical clustering/Only draw lines…” to make this function work properly.

 

Changes:

· New outlier detection algorithm handles the image contaminations more reasonably.

· The Menu item “Tools/Reset default settings” is changed as the “Tools/Options/Reset default” button.

· The “Analysis/Filter genes; Compare samples” function by default ignores Affy’s control genes (probe set names starting with “AFFX-”), since their changes are generally not interesting. “Tools/Options/Analysis/Omit Affymetrix control…” to change this setting.

· “Analysis/Model-based expression/Export” function moved to “Tools/Export data/Expression value”.

· “Data/Export probe set” function moved to “Tools/Export data/Probe set”.

· In the “Analysis/Compare samples/Combine comparisons” dialog, the “Insert complement” button is changed to the “And not” and “Or not” options. Thus a single comparison can be negated.

· The “Tools/Classify samples” function copies all columns of the “gene list file” into the output “classified file”, so the output file can have expression values or fold changes.

· After “Analysis/Get external data”, “Analysis/Normalize” uses the Invariant Set Normalization method (V1.0 uses a using a simplified ISN method with fixed rank difference threshold 50 without iteration). Check the “Show scatter-plot…” option to show normalization scatter-plot (installation of R needed) when normalizing. Also when fitting the running median curve at the two tails, 5% of the “invariant” points are used to fit a ray at one end fixed (V1.0 uses 1¤300 of the “invariant” points); this makes the high-end normalization relationship more smooth and robust.

· The “Analysis/Map chromosome” function only checks gene stretches of length < 20 for significant p-values. Previously all gene stretches are checked.

 

Withdrawn:

· The option “Analysis/Model-based expression/Use average difference instead of MBEI” is gone. Affy’s MAS 5.0 software adopts “Signal” as expression index.

 

Bug corrected:

· After “Analysis/Get external data”, the “Analysis/Map chromosome” and “Analysis/LDA Classification” does not show the result images.

· In the “Analysis/View” some letters cannot be input, such as “A” or “M”. This is due to the shortcut keys for menu “Data/Animate” or “Data/Next model”. Now these shortcut keys are changed to “Control+A” or “Control+M”.

 

4/4/02: [Version 1.1 Test only] PM-only model results in always-positive expression indexes. New outlier detection algorithm handles the image contaminations more reasonably.

 

3/13/02: Ecoli gene information and genome information file available (suggested by Igor Zwir). In the “Map chromosome” function, ignore the “MAX_STRETCH limit is reached” message and uncheck “Tools/Options/Chromosome/Outline significant…” to turn off the p-value highlighting, since there is only one chromosome and may result in too many significant p-values.

3/12/02: HG-U133 gene information files available (A, B, unzip all files to the same directory; helped by Siming Shou and Miguel Rea).

2/1/02: “View/Go to NetAffx” to go the NetAffy website for the current probe set. (suggested by Victoria Perreau)

1/31/02: Linking to online resources such as “View/Go to LocusLink” may not work on some computers. Checking "Tools/Options/Analysis/Show online link dialog" to show a dialog containing the web address and also automatically copy the address to the clipboard, then one can manually paste it into the address bar of Internet browser. (reported by Susan Hilsenbeck, Casper Frederiksen, Victoria Perreau)

1/29/02: (1) Bug corrected: When going from “Clustering View” to "Data View", the PM/MM data image was not refreshed correctly; as a result the same probe set persists there. (reported by Greer M. Murphy)
(2) “Image/Export
CEL” will export model-based single outliers and array outliers in the [OUTLIERS] section of the CEL file. (suggested by Edward J. Oakeley and Yizheng Li)

1/26/02: Check the button “Perform Principal Component Analysis instead” in the “Analysis/LDA Classification” dialog to perform Principal Component Analysis. (suggested by Anne Bowcock and Stephen Haggarty)

1/14/02: Use the “Analysis/Compare samples/Combine comparisons/Compare on” button to restrict the comparison to a gene list. (suggested by Yingxi Lin)

1/12/02: Updated Yeast S98 gene information file with GeneOntology terms and added its genome information file. Downloading of the new version of dChip is needed. (suggested by Simon Lin, courtesy of SGD database)

1/10/02: (1) Combine the data for different species (suggested by Florian Storch and Stephen Haggarty, courtesy of TIGR RESOURCERER database)
(2) Bug corrected: when the expression values are truncated at “Analysis/Model-based expression”, the standard errors are set to 0. Failure to consider this led to incorrect average of the identically truncated values between replicates. (reported by Michael Boutros)

12/30/01: Updated HG_U95AV2 and MG_U74AV2 gene information file (944 GeneOntology terms, 971 ProteinDomain terms and 377 Cytoband terms). In the clustering picture, use Shift+Left/Right key to change the width of the annotational columns, and Right-click to go to the website of GeneOntology or Pfam entries. (protein domain suggested by Wing Wong and Florian Storch)

12/19/01: (1) Sample classification by Linear Discriminant Analysis (data courtesy of Andrea Richardson)
(2) The dChip manual is changed to the HTML format to keep its content more current.

12/14/01: (1) After clustering, the p-values of the gene and sample clusters are calculated using exact hypergeometric distribution. Previously the binomial approximation of hypergeometric distribution, and then normal approximation of the binomial was used. But for these very small p-values a high accuracy is desirable. At “Tools/Options/Clustering” the default p-value thresholds for gene clusters is changed to 0.005. (suggested by Steve Horvath)
(2) The sample cluster p-values are now calculated with regard to the samples defined in the “Array list file”, not all the arrays in the group. (suggested by Steve Horvath and Robert Gentleman)

12/12/01: Map a list of genes to chromosome by “Analysis/Map chromosome” (suggested by Wing Wong, Robert Gentleman and Andrea Richardson)

12/11/01: (1) In the “Analysis View”, the error messages are colored in red. (suggested by Robert Gentleman)
(2) Small Excel files and exported images are inserted into the “Analysis View” for convenience (uncheck “Tools/Options/Analysis/Insert Excel and Image outputs into the Analysis View” to disable the function). The analysis output can be saved into a Word file by “Analysis/Save”. 
(3) “Tools/More gene information” to read in a customized gene information file and use it with priority over the main gene info file specified in “Analysis/Open group”. (suggested by Michael Boutros)

12/2/01: (1) Check “Analysis/Model-based expression/Apply log2 transform” to log2 transform the expression values. (also check “Ignore existing calculated expressions” if necessary). The model-based standard errors are set to 0 for the modified or transformed expression values. When working with the log-transformed values at “Analysis/Compare samples”, use E-B, B-E instead of E/B, B/E for fold changes. (suggested by Bradley Messmer)
(2) Check “Tools/Options/Analysis/Mask redundant probe sets when reading gene list file” to exclude the redundant probe sets from a gene list. Multiple probe sets for the same gene tend to bias the result of array clustering and also lead to erroneous functional group identification in the gene clustering. (suggested by Bradley Messmer)

12/1/01: (1) Used the Nearest Neighbor algorithm (see the reference) to increase the speed of clustering (e.g. the time of clustering on 1400*6 values reduces from 80s to 4s). In addition, one can uncheck “Tools/Options/Clustering/Pre-calculate distances” to calculate the distances between genes or samples on-the-fly; this is useful when clustering on a large number of genes (e.g 12K), which requires too much memory to store all the distances and causes virtual-memory swapping that slows the process down. (suggested by Edward Oakeley)
(2) Uncheck “Analysis/Hierarchical clustering/Cluster genes” to cluster samples without clustering genes. (suggested by Ruty Shai and Bradley Messmer)
(3) Use “Control+Click” to change the color of the GeneOntology blocks in the clustering picture; the selected colors cannot be saved right now. (suggested Michael Boutros)

11/23/01: Merged dchip.exe and “dchip large.exe”. Drosophila chip users can use the normal version of dChip as well. File conversions of cdf.bin and dcp files will be automatically performed.

11/19/01: “Analysis/Compare samples” can have different fold change criterion for E/B and B/E and different mean difference criterion for E-B and B-E. (suggested by Tiago R Magalhaes)

11/16/01: Combine HG_U95A and V2 arrays at the CEL file level. (suggested by Elinor Dehan and Yoseph Barash)

11/14/01: “Clustering/Selected branch/Export image” to export the clustering image of the selected main gene cluster outlined by blue lines. The sample clustering tree is not attached to the image. (suggested by Sanjay Jain and Huan Dong)

11/13/01: In the “Clustering” view, use Control+Click to select and color multiple gene or sample clusters. The multiple clusters can be exported or deleted (gene only) by “Clustering/Selected branch” functions. Clicking still works to select the main cluster (outlined by blue lines), used for cluster resampling. (suggested by Huan Dong)

11/11/01: Add “Tools/Classify Genes” for classifying genes by functional groups (suggested by Miguel Ramalho Santos and Nikhil Munshi)

11/9/01: (1) Negative expression values are set to 1 when calculating fold changes in “Analysis/Compare samples”. Previously fold changes involving negative expression values are set to be non-informative 0; however when one expression is large (say 1000) and the other is -10 (at noise level of absent genes), it is helpful to bring the -10 to a small positive number so a large fold change is calculated and the gene gets selected.
(2) Began to use R as the engine for some computing and graphic tasks. (suggested by Robert Gentleman)
(3) Use “Image/Normalization plot” to view the normalization scatter plot between one array and the baseline array. (suggested by Casper M. Frederiksen; data courtesy of Andrea Richardson)
(4) On start of dChip there is an automatic display of the dChip updates since the last use.

10/29/01: (1) Add “Select by category” button in the “Analysis/Compare Sample” and “Analysis/Model-based Expression/Export” dialog. (suggested by Robert Gentleman; data courtesy of Andrea Richardson)
(2) Deleted “Array list file” selection button in many dialogs. Specify “Array list file” only through “Tools/Array list file”.

10/23/01: Combine comparison criteria using “not” operator, by “Analysis/Compare Samples/Combine Comparison/Insert complement” button. (suggested by by John K. Park and Wing Wong)

10/4/01: (1) Use "Clustering/Similar Profile" function to export a list of genes with similar profile with the current highlighted gene. The resultant list can be used as the "gene list file" in "Analysis/Hierarchical Clustering" dialog to view these genes. (suggested by Andrea Richardson)
(2) Bug corrected: MG_U74 gene information files updated using Sep.7.01 version of Unigene file. In the old  “mg_u74av2 gene info.xls”, probe set 160309_at was annotated as amelogenin. By checking with LocusLink (ID: 11704), UniGene (Mm.172556) and BLAST, it seems that is a mistake. (reported by Feng Wu)

9/27/01: Add paired t-test p-value as a filtering criteria in “Analysis/Compare samples”. (suggested by Stephen Henderson, Susan Hilsenbeck and Jenny Z. Xiang)

9/6/01: Gene filtering and clustering decoupled: first use “Analysis/Filter genes” to generate a filtered gene list (the filtering can be restricted to an input gene list), then use “Analysis/Hierarchical clustering” to cluster on the filtered gene list. (suggested by Wing Wong and Laura Forsberg)

9/5/01: (1) “Tools/Gene list file/By keywords” and “View/Find gene” accepts wildcard strings as “keywords”. (suggested by Wing Wong, codes courtesy of Florian Schintke)
(2) Export functional category information in “compare result file” by checking "Tools/Options/Analysis/Output GeneOntology terms". (suggested by Miguel Ramalho Santos)

8/29/01, 6/17/01: Output gene list by GeneOntology or keywords (suggested by Robert Gentleman and Casper M. Frederiksen)

8/23/01: Probe sensitivity index file. [V1.0 manual] If a
PSI file is specified in the “Analysis/Model-based expression/Calculate” dialog and the checkbox “Use existing probe sensitivity index in this PSI file” is unchecked, the probe sensitivity indexes will be saved after the model fitting is performed on all probe sets. At a later time, the PSI file can be used to fit the expression values for other arrays by checking the checkbox “Use existing probe sensitivity index in this PSI file”. (suggested by Richard Lempicki and Robert Gentleman)

8/20/01: (1) In “Compare Samples”, dChip will export both fold change or confidence bound of it, if either of them is used in the filtering criterion. (suggested by Andrew Bent and Soemini Kasanmoentalib)
(2) “Tools/Reset Default Settings” restores dChip parameters to the default values. (suggested by Robert Gentleman)

8/16/01: Improvement of array outlier calling method: For a probe set, the model fitting still uses all arrays, but the identification of array-outliers is done for absent arrays and present arrays separately, to avoid the situation that small standard errors of expression indexes of absent arrays make present arrays called as array-outlier; previously I tried to avoid this by only fitting the model using present arrays, as a result absent arrays are not fitted and not called as outliers --this led to much fewer array-outliers. (03/03: this is obsolete; in the current method, P/A calls help to correct signs but do not affect array outlier calling. However, the number of array/probe outliers is restricted to be at most 50% of all arrays/probes.)

6/29/01: The model fitting is changed to use only the arrays where a probe set is called “Present” by Affy’s algorithm (or minimum of 3 arrays regardless of the Absolute calls). This avoids the situation where a gene is “Present” in a minority of arrays but these arrays are called “Array-outlier” for the gene. Now these arrays are correctly identified as having good patterns. Other changes in the “Data View” are: in grid (2, 2) an array is represented by a cyan circle if it is called “Absent” for the gene (blue circles still representing “array-outliers”); in grid (1, 3) the red fitted curve is always shown whether the array is “array-outlier” for the gene or not. (suggested by Brain Yandell, data courtesy of Daniel Auclair and Elizabeth K. Robinson)

7/27/01: If checkbox “Always show sample names and clusters on the top” in “Tools/Options/Cluster” dialog is on, when ones scroll down to see other genes in the cluster one can still see the samples names and cluster trees. (suggested by Stefano Colella)

7/18/01: Image contamination correction. (suggested by Robert Gentleman, data courtesy of Eric Schadt)

7/13/01: The “Data/Export probe set” menu can export the PM/MM data for multiple probe sets. (suggested by Laura Forsberg)

7/1/01: After “Analysis/Get External Data”, one can use “Analysis/Normalize” to normalize expression values using a simplified version of the Invariant Method (see manual). This function used to be a linear scaling to make the arrays to have the same median. (suggested by Arindam Bhattacharjee)

6/30/01: Using “View/Find Gene” and “View/Find Next”, one can search genes by keywords such as “troponin”. (suggested by Arindam Bhattacharjee)

6/28/01: A user can specify a “Working directory” in the “Analysis/Open group” dialog, under which dChip exports configuration (.ini) file and other output files. (suggested by Victoria Perreau)

6/27/01: (1) Navigate probe sets in the array CEL image using 'Home' and 'End' keys. (suggested by Yizheng Li)

(2) Bug corrected: In some dChip output files line breaks occur after gene descriptions and cause “frame-shift” in output files. I tried to correct this by eliminating “\n” at the end of gene descriptions, but let me know if this is still a problem. (reported by Brain Yandell, Thomas Cappola and David Gerhold)

6/26/01: (1) Replace “sample name file” by “sample information file” in “Analysis/Open Group/Other information” dialog. Significant sample clusters can be calculated. (data courtesy of Andrea Richardson and Catherine Gradek)

(2) Bug corrected: During “Analysis/Open Group”, dChip reports “Search and extract PM/MM data from CEL files of chip type  under” but finds no array data files. This may due to “.” in the directory name. (reported by Karen Vranizan and Adam Olshen, Michel Bellis)

6/22/01: (1) The displaying range of the clustering picture used to be [-3, 3] for the standardized expression values for each gene. Now this range can be customized at “Tools/Options/Clustering” dialog. (suggested by Thomas Seidl)

6/17/01: (1) Moved “Array list file” dialog under “Tools” menu, instead of having it many times under various “Analysis/*” dialogs.

(2) Took away “Start clustering using filtered genes” from “Analysis/Compare samples/Combine comparisons” dialog, so that “Compare samples” and “Hierarchical clustering” are decoupled. Use “compare result file” as “gene list file” in “Analysis/Hierarchical clustering/Filtering genes” dialog for clustering analysis using filtered genes.

6/6/01: Color genes with a particular function in blue in the clustering picture. Clicking the function bars on the right side of the clustering data will select the corresponding function as the “current function” and color the genes of this function in blue. The “current function” is also reset when selecting the “functional cluster” icons on the left pane (suggested by Robert Gentleman)

5/7/01: Use probe set mask file to exclude probe sets from the analysis . Using dChip “Image/Unscrambled” function we can move all the excluded probe sets to the bottom of the array image (still randomly placed; U74A array); we note that for these probe sets there are still hybridization signals. (suggested by Jason M. Laramie and Scott Oakes)

4/30/01: Combine the data for Human arrays of different chip types. (suggested by Stan Nelson, Daniel Auclair and Isabella Tai)

 

4/17/01: (1) In “Analysis/Model-based Expression”, we can check “Use Average Difference instead of Model-based expression” to calculate traditional Average Difference as expression levels. [Obsolete: The standard errors are still for model-based expression levels, and the array-outliers is still computed using the model-based approach. Note that the Affymetrix Average Difference method uses a super-scoring method to exclude probes whose PM/MM difference is outside 3 standard deviation of all probe differences in either of the two comparing arrays in their comparison analysis. Here since we are analyzing multiple arrays at the same time, when calculating Average Differences a probe is excluded if its difference is outlier in any of the arrays, until a minimum of 5 probes is reached then all 5 probes will be used.] (suggested by Matthew Tudor)

(2) “Analysis/Alternative Transcripts” function: Different tissues types may give different probe response patterns, this may be due to alternative splicing.  Probe sets called "array-outlier" in all selected arrays will be exported. (suggested by Patrick Jay)

4/10/01: (1) Bug report: in “Analysis/Compare Samples”, the size of Experiment group was not used correctly, this may lead to incorrect standard error and fold change confidence interval calculation. (Stefan Horvath)

(2) “Analysis/Get External Data” can read in Whitehead RES format data file. (Andy Bhattacharjee)

3/17/01: “Analysis/Get External Data” to read in an external tab-delimited data file with first row array names and first column gene names. Absolute call and standard error columns can also be contained.

3/15/01: Add CEL intensity pictures also in PM/MM Data view. (Andy Bhattacharjee)

3/7/01: In “Analysis/Open Group/CDF file” dialog we can specify a gene information file, which contains gene descriptions from Affymetrix EASI database, as well as LocusLink Gene Ontology terms classifying a gene by its biological process, molecular functions and cellular components. “Analysis/Hierarchical Clustering” will use such functional category information to assess whether a local cluster is enriched by genes having a particular function, and highlight these “functionally significant” clusters. (Data courtesy to Dan Tang)

3/2/01: Merge “Analysis/Pair-wise Comparison, Two-group Comparison, Filter Interesting Genes” into “Analysis/Compare Samples”. The three dialogs here can be used to specify comparisons, combine comparisons and specify arrays used and replicates to be pooled. The genes satisfying the comparison criterion can be exported to a file or used for clustering. Sample names followed by “*” refer to how many additional replicate arrays are pooled for this sample. The function of exporting expression values is moved to “Analysis/Model-based Expression/Export”.

2/20/01: In “Analysis/Hierarchical Clustering/Sample handling” tab, we can read in a gene function file. The functional categories of genes will be shown as color bars on the right (data from the reference). In this way we may visually check if genes belonging to a functional category is enriched in a cluster. (Reference: Cho et al. 2001. Transcriptional regulation and function during the human cell cycle, Nature Genetics, Vol 27, 48-54)

2/18/01: Take away “Pool duplicate arrays” and “Group every n samples” checkboxes in “Analysis/Hierarchical Clustering/Sample handling” tab. Instead, we can insert “Replicate separator” and “Standardize separator” in array list file. Replicate arrays separated by “Replicate separators” will be pooled using weighted averaging method (weights being the measurement accuracy, so expression values with large standard errors receive smaller weights), and samples separated by “Standardize separators” will be standardized (rescale to have mean 0 and standard deviation 1 across samples for each gene) within themselves. If “Only draw lines for standardize separator” checkbox in “Sample handling” tab is checked, “Standardize separators” just add vertical lines between group of samples.

2/11/01: We can specify a data file list in “Analysis/Open Group” dialog (leaving “Data directory” blank) and dChip will use the arrays specified in the file. There can be directory names in the data file list. (suggested by Michael Angelo)

2/10/01: “Analysis/Hierarchical Clustering” can read in a tab-delimited data file, without opening a group of arrays first.

2/7/01: “Clustering/Show Profile” function displays a profile plot for the currently selected cluster. The Y-axis has the same range as the color scale on the bottom of the picture. The value of the profile curve for each sample is the average of the standardized expression values of all selected genes in this sample (standardization is a linear scaling for each gene so its expression values across all samples have mean 0 and standard deviation 1). The error bar extends 1 standard deviation (of the selected genes’ standardized expression values in a sample) on both sides. Shorter error bars indicate tighter clustering of genes at this sample point. (suggested by Deming Wang)

2/6/01: (1) Add “Clustering/Save Tree” function for saving the clustering result, and the file can be read in as “Analysis/Hierarchical Clustering/Filter genes/Gene list or tree file” (and the filtering criterion are thus ignored). (suggested by Stan Nelson)

(2) In “Analysis/Model-based Expression” dialog, we can specify to output an array quality summary file, containing percent of probe sets called “array outlier”, percent of probe pairs called “single outlier”, and percent of “P” calls. Arrays with more than 5% array outliers change their icons to dark blue. We need to redo “Open Group” (check “ignore existing DCP file), “Normalize” and “Model-based Expression” to calculate these statistics. (suggested by Stan Nelson, Andrew Kirby)

2/1/01: (1) Merged “Analysis/Export Expression” into “Analysis/Two-group Comparison”. If no arrays are chosen in group 2, the expression values in group 1 will be exported.

(2) We can use “Analysis/ Hierarchical Clustering/Array list file” tab to create an “array list file”. When specified in “Analysis/ Hierarchical Clustering/Sample handling” tab, this file dictates which arrays are used for clustering in what order.

1/31/01: (1) Add “Clustering/Export Selected” menu to export the expression data of selected braches. The exported file can be used as “gene list file” in “Analysis/Hierarchical Clustering” dialog to perform clustering using only this subset of genes. (suggested by Deming Wang)

(2) In “Analysis/Model-based Expression” dialog, we can choose to truncate low or negative expression values to a small value, or to a given percentile of the expression values that are called “A”. (suggested by Stan Nelson)

1/30/01: The “Analysis/Filtering interesting genes” dialog now accepts simple logical combination of criterion using AND and OR. (suggested by Michael Zhang)

1/25/01: An icon will be added for each exported tab-delimited analysis result file, under the Analysis icon. Clicking it will invoke Excel to open the file. (suggested by Mei Xu)

1/24/01: Add “Image/Unscramble” function, which re-organizes the probes of the same probe set together for arrays using "distributed probe set format" (e.g. Human U95 arrays), so that we can view such arrays in the old way. This makes “Image/Array outlier” function still applicable for such probe-scrambled arrays. (suggested by Andy Bhattacharjee)

1/23/01: (1) Add checkbox “GCT format for GeneCluster” in “Analysis/Export Expression” dialog, for exporting expression data files to use with GeneCluster. (suggested by Michael Angelo)

(2) Clustering View is linked to CEL and PM/MM Data views. That is, in the clustering picture, we can click a data point (the expression value of a gene in a sample), and go to look at the CEL level data. This is useful for those who are curious about unusual data points in the clustering picture (such as large negative expressions values), and want to trace back to the raw data.

1/22/01: Options added in “Analysis/Export Expression” and “Analysis/Hierarchical Clustering” dialogs so we can treat expression values identified by the model to be outlier (i.e. array-outliers in CEL images) as missing values. They are exported as blank entries in tab-delimited file or shown as black (Blue/Red coloring) or white (Green/Red coloring) boxes in clustering picture. This is another way of using measurement error of model-based expression values in down-stream analysis, besides resampling clustering trees. (suggested by Priya Sudarsanam)

1/21/01: (1) Add “View/Export image” menu item, for exporting CEL, PM/MM data or clustering images into BMP file.

(2) In “Analysis/Hierarchical Clustering” dialog, we may read in a “gene list file” (each line has a probe set name) to cluster a pre-selected subset of genes (filtering criterion are thus ignored; this file may be the output file of “Analysis/Filter interesting genes” function). We can also read in an “array list file” with each line specifying an array to be used as columns of clustering data matrix. We may also pool duplicate arrays using measurement-error weighting scheme before clustering.

1/18/01: (1) Add “Image/Export CEL” menu item. We can use it to export normalized data into CEL-like file. If you want to export the raw data, check the “Use unnormalized data” checkbox in “Analysis/Open group” dialog when opening a group. (suggested by Margaret C. Cam)

(2) When viewing array images, we can use the four arrow keys to zoom in and out. (suggested by Andy Bhattacharjee)

1/15/01: (1) Changed “Analysis/Open group” dialog, so we can read in gene name file (tab-delimited file, the 1st column is Affymetrix probe id, the 2nd column is gene name/description) and sample name file (the 1st column is array file name (without .cel or .dat suffix), the 2nd column is sample name). Such information will be used when exporting results or displaying clustering trees.

(2) Add “Analysis/Filter interesting genes” dialog, for filtering genes by fold changes (or the lower confidence bound of them) of multiple pair-wise comparisons. (suggested by Dan Tang)

1/10/01: In “Analysis/Hierarchical Clustering” dialog, we may group the samples for the standardization purpose. That is, in stead of standardizing a gene to have mean 0 and standard deviation 1 across all samples, we standardize it’s expression values in samples of the same experiment (using the same cell lines) to have mean 0 and standard deviation 1. This is because we are interested in the differences caused by the various treatments, instead of the differences existing among cell lines. (suggested by Deming Wang)

1/9/01: We can right-click a non-gene node in the clustering tree to exchange the positions of its two branches, in order to interactively adjust the ordering of genes in clustering trees. (suggested by Andy Bhattacharjee)

1/5/01: In “Analysis/Open group” dialog, we can read dChip (DCP) files, which is the format used internally by dChip. In this way, we only need to carry dChip files around along with dChip to demonstrate the downstream analysis.

1/1/01: (1) In “Analysis/Open group” dialog, we can extract CEL and DAT files at the same time. (suggested by Yan Cui)

(2) “Hierarchical Clustering” is now two-way.

12/18/00: Add “Hierarchical Clustering” analysis item and menu.

12/4/00: Take away "Look for presence calls in TXT files?" checkbox in "Open Group" dialog. dChip will always look for TXT files for presence calls, if not found it will calculate them in a similar way that is described in Affymetrix Analysis Manual (93% agreement with GeneChip’calls in one comparison).

(Updated 12/6/07)