dChip: Tutorials

 

Microarray facilities for dChip consulting                  Short tutorial               Example dataset

 

Related tutorials

·       dChip animated tutorial, by Yu Guo, HSPH

·       Introduction to Microarrays and dChip (ppt), by Xiaochun Li, DFCI

·       Two-page guide, The Basics of dChip (ppt), Gene Annotations (ppt), by Keith Baggerly and Kevin Coombes, M. D. Anderson

·       Become familiar with dChip, by Steve Horvath and David Elashoff, UCLA

 

Microarray facilities providing dChip consulting

Dana-Farber Cancer Institute                          Baylor College of Medicine                            BCM Breast Cancer Center

London Regional Genomics Centre                 Roswell Park Cancer Insitute                          University of Tulsa

Johns Hopkins Medical Institutions                 Cornell University                                           The Scripps Research Institute

MUSC Hollings Cancer Center                       University of Arizona

 

Short tutorial

 

The following describes simplified steps that can be used to get a quick impression of dChip or for demonstration purpose, and it takes about 20 minutes. You may either use your own dataset, or use this example dataset. To explore more dChip functions, please see the dChip manual. See Affymetrix technology review for background of microarrays.

 

·       Prepare the files. Copy dchip2005.exe, gene information file of your chip type into a local directory. Also make sure the CDF (or dChip-generated cdf.bin) file of your chip type, and the CEL files (or dChip-generated DCP files if for demonstration) are available on the computer. Double-click “dchip.exe” to start.

·       Open group. Select menu “Analysis/Open group”, click the “Data directory” button to specify the data directory. Specify the “data file type”. Click “Other information” on the top, specify CDF file, gene information file, and click “OK”. The CDF file, a group of CEL/DCP files, the gene information file will be read in.

·       View CEL image. Select menu  View/CEL image”; use arrow keys to zoom in and out.

·       View PM/MM data. “View/PM/MM data” to view the PM and MM data (upper-left, PM-MM data (middle-left), and fitted values (lower-left, red curve) for the current probe set; “Data/Animate” to cycle through all the arrays for the current probe set; “Data/Pause” to stop; “Page Up” or “Page Down” key to do this manually; “Home” or “End” key to go to another probe set.

·       Normalization, Model-based expression. “Analysis/Normalize”, “Analysis/Model-based expression”, use the default values in the dialog to perform these two steps.

·       Export expression value. “Analysis/Model-based expression/Export”, highlight arrays in the “select arrays to be exported” listbox, click “OK”; an Excel icon will be added to the left side navigation panel, click the icon or start Excel to view the file.

·       Filter genes. “Analysis/Filter genes”, use the default settings, click “OK”. The filtered genes will be saved in a file. Reopen the dialog to adjust the filtering parameters until the number of filtered genes is below 1000 (for faster processing below).

·       Hierarchical clustering. “Analysis/Hierarchical clustering”, select “gene list file or tree file” to be the output file from “Filter genes”, use the default settings, click “OK”; a clustering picture is shown, use Arrow or Control+Arrow keys to adjust size; click a data point (red or blue), use “View/Go to LocusLink” to see the online information of this gene; mouse over or click a gene annotation (color bars on the right side) to observe the changes; click a icon below “Clustering” in the left navigational panel to highlight an annotationally enriched gene cluster. Use Enter key to cycle through other views.

·       Find a gene. “View/Clustering” to come back to the clustering picture; “View/Find gene”, input a term, click “OK”; then select “View/Find next” or press F3 multiple times to find this gene.

·       Compare samples. “Analysis/Compare samples”, highlight first several samples in the “Baseline” listbox and the last several samples in the “Experiment” listbox, use the default settings. Click “Combine comparisons” on the top, review the settings, click “OK”. Reopen the dialog to adjust the comparison parameters until the number of genes is below 1000. Select menu “Analysis/Hierarchical clustering”, change “gene list or tree file” to be the output file from “Compare samples”, uncheck “Cluster samples”, click “OK”; review the data picture to confirm these genes have differential expression values between the specified two group of samples.

·       Online help. Select various items under the “Help” menu to learn more about dChip.

 

Example Dataset

 

·       Install dChip and example data files

 

--Obtain dChip


-- Download and unzip example data CEL files,

The paper describing this dataset

 

Download and unzip these files:

scaling_factors_and_fig_key.txt

ALL1, ALL2, MLL1, MLL2 zipped files, may need to rename the extension to “.gz” before using Winzip to unzip them.

 

-- Download and unzip CDF file:HG_U95A.zip

 

-- Download and unzip gene information file: HG-U95Av2 gene info2.zip

 

-- Download the sample information file made from “scaling_factors_and_fig_key.txt”: ALL sample info.xls

 

·       Data extraction, normalization and expression computation

 

Follow the steps in the dChip short tutorial to do analysis.

 

In particular:

-- “Analysis/Open group”: specify data directory, working directory (in “Options”), sample information file, gene information file

 

After this step is finished, click and look at the “array summary file”; are there arrays with outlying P call % and median intensity?

 

-- “Analysis/Normalization”: are the arrays being normalized to have similar median intensity?

 

-- Click the “PM/MM” data on the left, use Home, End (go to another probe set), PageUP and PageDown (go to another array) keys to look at the probe level data, and the model fitted for the current probe set.

 

-- “Analysis/Model-based expression”:

 

After this is finished, look at the “array summary file” for any outlying arrays.

 

Also check array images for marked single outliers in pink; press key “O” to toggle displaying array outliers; flip back and forth two array images to see if these outliers are identified reasonably.


-- Use “Image/Normalization plot” to view the scatterplot of outlier arrays and baseline arrays.

 

-- “Analysis/Filter genes”: usually it’s good to obtain < 1000 genes to look at in clustering

 

-- “Analysis/Hierarchical clustering”: Check both sample and gene clustering

 

Are samples of similar types clustering together? Is there anything special about mis-clustered samples? Enlarge the image; what are the genes highly expressed in particular groups samples? Are these many replicate probe sets for the same gene selected and clustered closely?

 

What are the functionally significant gene clusters?

 

Redo gene filtering using different criteria to get gene lists of different size, and then do clustering. Is the sample clustering similar?

 

-- Redo “Analysis/Open group” with “Options/Log 2 transform expression values” checked; redo filtering and clustering. Is the result similar to the original scale?

 

·       Compare samples and visualize and assess compare result

 

“Analysis/Compare samples”

“Analysis/Hierarchical clustering”: look for replicate probe sets, find known genes, array list file, go to online database,

“Tools/Classify genes”

 

·       Use genome information

 

Download and save a cytoband file in text format. Download the genome information file: hg_u95av2 genome info2.xls (hg11)

“Analysis/Chromosome”

 

·       When you have more time

-- Generate gene information file based on the latest NetAffx annotation files.

 

-- Explore dChip website: www.dchip.org.