Export
averaged region data Cluster samples
Specify and view chromosome regions
A chromosome region file may be specified at "Analysis/Chromosome" to view only the markers in these regions. First make or download a text format chromosome region file (having similar format as a cytoband file). For example, cancer gene census hg17.xls uses the known cancer genes as regions (suggested by Rameen Beroukhim, based on Cancer Gene Census database, 9/21/05 version). One can also specify a cytoband file to view and cluster cytobands as regions. As another example, this region file will display SNP or tiling probe sets that are in the corresponding regions, so we can focus on regions with various sizes containing the RB gene.
[V4/16/07+] After using "Analysis/Chromosome" to obtain inferred LOH and copy number data, select "Chromosome/Region & Clustering":

Reference gene or refFlat files can be used to regard each gene as a region, by checking "Use refGene file as region file". Each gene's transcription starting and ending site will be extended by 5 Kb to define a gene's chromosome region.
SNP markers in a region will be averaged to obtain the region's data of inferred copy number (log2 ratio) and probability of LOH, and these data can be used to filter regions. Only the regions containing more than one marker or passing the filtering criteria will be displayed or used in chromosome region clustering.
[Version before 4/15/07] The chromosome region file should be specified at "Analysis/Chromosome". Specify a refFlat file at "Analysis/Chromosome/Reference gene file", and check "In refFlat file format". If "Use as region file" is checked, the genes will be the display unit instead of chromosomes. Home and End key will go to another gene, where on the left the black bands represent exon regions and white band intron regions. Only the SNPs or genes in the exon level region is displayed in the view. This view will be more useful for latest SNP, exon or tiling arrays with more markers. (Suggested by Bill Sellers, Rameen Beroukhim and Tom Look, data courtesy of Rani George)

[V8/12/07+] Inferred copy number and probability (LOH) can be averaged
for SNPs in a chromosome region. Such averaged data for regions can be used for
region filtering or be exported. Filtering criteria A or B in the above dialog
will be used respectively, when the current chromosome view is copy number or
LOH. In this example exported
region data file, the reference gene file specified at
"Analysis/Chromosome" are used as the chromosome region file, and the
column named "% Sample satisfying A or B" can be sorted to identify
genes with high percentage of copy number alterations across samples.
Cluster samples using SNP data
LOH or copy number data from SNP array can be used to cluster tumor
samples (Garraway et al. 2005; Janne et al. 2004; Koed et al. 2005; Lieberfarb
et al. 2003; Lin et al. 2004) or SNP markers (Girard et al. 2000).
[V11/6/05+] See Lin et al. 2004 and Janne et al. 2004 for references. To perform sample clustering, after "Analysis/Chromosome" and at the “Chromosome” view, select menu “Chromosome/Show all” and “Chromosome/Clustering”. Set “Options/chromosome/min, max, threshold” to be 0, 0.5, 0.25 in the beginning, and use Shift+left/right key to adjust the red threshold line to cluster samples using the chromosome regions with LOH score exceeding the threshold.
In the clustering figure below, the LOH score is plotted on the right side of
the LOH data picture in blue. A high LOH score indicates that many samples have
LOH events in the nearby region. Adjusting the score threshold line (in red),
the markers or genes in the chromosome regions with LOH score exceeding the
threshold will be colored blue. Only the SNP makers in the regions with LOH
score above the threshold are used for sample clustering. The distance between
two samples is defined as the average absolute difference of the Probability
(LOH) in the two samples for these markers. The average linkage is used during
hierarchical clustering. Intuitively if two samples often have LOH or retention
together for the selected chromosome regions, they will cluster closely.

By changing the LOH score threshold from 0.01 to 1.00, we can progressively look at 100 samples clustering trees quickly (equivalent to gene filtering in clustering analysis of expression array). The sample type information on the top can be hidden at first (use Shift+Up Arrow key at non-proportional view), until at a particular threshold, good separation can be seen from the sample clustering tree on the top. Then the sample information can be brought up to correlate with the clustering results. Alternatively, sample information can be used to determine the LOH score threshold used for clustering. If at a particular threshold, the sample clustering agrees well with known cancer types or other clinical variables, the selected chromosome regions by this threshold (thus used in the clustering) may contain LOH differences that distinguish the sample subgroups.
At the SNP genotype view, LOH view and copy number view, selecting “Chromosome/Cluster samples” will perform sample clustering using the data in that particular view. The distance metric between two samples is:
Clustering samples by genotypes can
suggest pairs of samples are from the
same ancestors:
