dChip: View data along chromosome

 

RefGene and Cytoband file                Chromosome view                 Adjusting the chromosome view

View array CGH data

 

Genome information file

 

A “Genome information file” contains chromosome number, transcription starting and ending sites (direction is from p-arm to q-arm for Human genome), and strand indication (a gene’s sense strand relative to the genome sequence). Note that a gene or SNP may be mapped to multiple chromosome positions, but the current dChip only uses the information in the last line where this gene or SNP occurs.

 

You can use the “Tools/Make information file” function to make genome information files for expression arrays. Here are the SNP array genome information files.

 

Downloadable genome information file (NetAffx update date; unzip the file): HG_U133 Plus 2.0 (08/05). Old genome information files.

 

RefGene and Cytoband file

 

A RefGene file provides reference gene information in the genome and has similar format as genome information file. It is made by Microsoft Access by combining information (linked by NM identifier) from refGene.txt (format) and refLink.txt (format) files downloaded from UCSC Genome Bioinformatics (select organism/“Annotation database”). The reference gene names can be displayed at the “Chromosome View”.

Download: Human: 7/27/02 (hg11), 9/14/02 (hg12), hg12.zip, hg15.zip, hg16.zip (for 100K SNP array), hg17.zip, hg18.zip. Mosue: 7/27/02 (mm2)

 

[V4/1/06+] The UCSC refFlat files may be directly downloaded and used as reference gene file. It contains exon position information (example hg17 file, format). Download the file and unzip to text file if needed.

 

Cytoband files (format) can be obtained with the RefGene files in the Zip files above, or directly downloaded from UCSC Genome Bioinformatics. Select an organism and use links “Annotation database”/cytoband.txt. Download and unzip to get the cytoband text file. The original hg17 and hg18 cytoband files are not ordered by chromosome, so unzip and use these instead.

 

Chromosome view

 

This function is for visualizing SNP LOH or copy number data or expression data of genes according to their relative chromosome positions, thus the relationship between gene expression changes and their physical locations can be investigated in a greater detail than the “Analysis/Genome” function. At “Analysis/Open group”, specify a sample info file with a “Ploidy(numeric)” column to specify the reference samples. Samples with “Ploidy(numeric)” column as 2 are regarded as reference samples and are used to compute the mean signal of normal expression level or 2 copy. Select the menu “Analysis/Chromosome” and provide a “Genome information file” and optional “RefGene file”, cytoband file and gene list file in the dialog, select the analysis method as “Expression”, and then click “OK”. If there is multiple chromosomal mapping for a probe set in the genome info file, dChip will use the last one (row). A “Chromosome” icon will be added in the left panel and the “Chromosome View” displays the expression or SNP data across samples along the chromosome (data from Janne et al. 2004):

 

Displayed on columns are the ordered samples in the currently used “Tools/Array list file”, and genes are displayed in rows according to their relative chromosomal positions. In the log2 data view, the red and blue colors represent the log2 ratios between a sample and the mean signal of the reference samples for a gene. In the “Copy number” view, the raw copy number is computed as the ratio between a sample and the reference mean multiplied by 2 and is displayed. The color range can be adjusted by “Tools/Options/Clustering/Display range of standardized values”. The presence or absence calls can also be displayed, by using “Chromosome/Next Data Type” or key “D, S” to toggle between the expression values and the presence calls. In this display, light red color represents “Present” calls and light blue color “Absent” or “Marginal” calls. Cytoband information is display on the left side, and gene names are displayed on the right side.

 

Use “Home” and “End” keys to view other chromosomes, and “Chromosome/Show All” or key “A” to display genes on all chromosomes in a single picture. For SNP arrays the chromosome Y may not be available since there are no probe sets on the array for chromosome Y markers. Mouse-over the colored data point area to display information such as the expression values (and standardized values) or presence calls. Mouse-overing a refgene name on the left will display its transcription start and end site by a vertical bar, and clicking the gene name can bring you to the NCBI website for this gene.

 

Use “Chromosome/Proportional Distance” or key “P” to toggle between the “Natural order” and the “Proportional distance” displays. In the “Natural order” display, genes are ordered according to their order in the chromosome, but their relative distances are not reflected. In the “Proportional distance” display, the genes are displayed with their relative screen distances proportional to real physical chromosomal distance. The cytoband and reference gene names will also be display when available. In this display, since the data points or gene names occupy a certain screen height, for some genes there is not enough space to display their data and names, and they are represented by small blue dots on the left margin of the gene name display region.

 

If “RefGene file” or Cytoband file are specified at “Analysis/Chromosome”, the “Proportional distance” display will display the cytoband information and reference gene names on the left side. If cytoband information is not available, a ruler is displayed on the left of the refGene names, with space between horizontal lines representing 1 megabase distance. One can use “Chromosome/Find refGene” or “Chromosome/Find Next” to find a gene in the refGene list by keyword, or click a “refGene” name on the left side will start the LocusLink webpage for this gene.

 

A gene list containing a set of interesting genes (may be obtained by “Analysis/Filter genes”, “Analysis/Filter SNPs” or “Analysis/Compare samples” functions) can be specified in the “Analysis/Chromosome” dialog to display only these interesting genes.

 

Adjusting the chromosome view

 

Use Arrow keys to zoom the data figure, and Control+Arrow keys to adjust the width of the cytoband information area and height of the sample name area. Click a colored data point in the figure to select a “Current” data point and “ESC” to dismiss it. When a “Current” data point is selected, it will always be displayed in the viewable region of the picture when zooming the figure by the arrow keys. For example, a data point in a peak score region can be clicked and then use the Down arrow to zoom in this region.

 

When “Chromosome/Proportional distance” is unchecked, use Up and Down arrows to adjust the height of the data point and gene/SNP names, Left or Right arrow to adjust the width of the figure, and Shift+Up or Shift+Down arrow and to adjust the height of the sample information rows on the top and the color legend on the bottom. When “Chromosome/Proportional distance” is checked, use Shift+Up and Shift+Down keys to adjust the height of the data point and gene/SNP names, but the height of the sample information rows cannot be adjusted.

 

The displaying range of the score curve (e.g. LOH prevalence score) on the right side can be adjusted at “Tools/Options/Chromosome/Curve along chromosome”. The red threshold line can also be adjusted directly by Shift+Left and Shift+Right keys, with the SNPs, genes and cytoband names shown as blue for the regions with the score exceeding the threshold.

 

View array CGH data

 

Prepare an “External data file” in tab-delimited text format, with first column as probe ID and the rest columns as normalized log2 ratio, and a Genome information file describing the chromosome and position of probes. Use “Analysis/Get external data” to read the file (uncheck “SNP data”). [Before V11/12/07: If a sample info file is used, make sure it does not contain a “Ploidy(numeric)” column so log2 ratio data will be recognized.] Then use "Array list file" to separate each array by "Standardize separators", or do not use array list file but uncheck “Tools/Options/Chromosome/Show only first sample name”. Finally use "Analysis/Chromosome/Analysis method: Expression" to view the data (see figure below, data courtesy of Xiaojun Zhao). Missing values (blanks in data table) are displayed in gray.

 

V11/12/07+: At “Get external data”, check “Log ratio data” to indicate log ratio data. If the data values are not in log base 2, set log base x at “Options/Log x transform” but do not check the box on the left. Copy number data view and inferred log2/copy view will work to display median smoothed values.

 

 

(Updated 12/18/07)