dChip: Exon and tiling array analysis

 

View data along chromosome                        Convert exon array data to U133 array data              Process tiling sub-array

Gene ST array

 

Processing exon array data

 

[V3/5/07+] PC with 2GB or more memory is desired. Download and unzip the library file ("Human Exon 1.0 ST Array Analysis") to get the Probe Group File (PGF file). Specify this PGF file at “Open group/other information/CDF file”. For gene information file, the annotation CSV file of exon array has a different format we cannot use "Tools/Make information file" to make it. Unzip and use this file instead. The gene/accession names are manually copied from the file “HuEx-1_0-st-probeset-annot.csv”. Click OK to read in the PGF and CEL files. For faster performance, use “Options/Model/Quantile normalization” and “Model method: Average difference” in the normalization and expression value computing steps. For mouse exon array, you need to check "Open group/options/Array has only PM probe" and select "Open group/other information/Array type" as "Exon array".

 

Affymetrix exon array: Support materials, Sample data

Mouse exon array: Gene information file, genome information file (both made by Igor Leykin)

 

View exon array data along chromosome

 

The expression data could be visualized by the chromosome order of genes. The exon array data of 7 pairs of normal and tumor colon samples are used here for testing. [For old colon dataset CEL files (7 pairs), rename the file “HuEx-1_0-st-v2.pgf” to “HuEx-1_0-st-v1.pgf” so the file name matches the array type specified in the CEL file header. This is not needed for the latest colon data files of 10 pairs.] Probe sets (corresponding to exons) are computed for expression values without considering the gene level correlation between adjacent exons. Then the “Analysis/Chromosome” function is used to view expression data along chromosome. Unzip and use this exon array genome information file (hg17), and the hg17 cytoband and refgene files can be found here.

 

In chromosome 1, between 161 Kb and 209 Kb (1p36.33, between the gene C1orf119 and MRPL20), some normal and tumor colon samples have amplified expression values (raw copy numbers are median smoothed by 10 exon probe sets):

           

 

Convert exon array data to U133 array data

 

[V4/1/06+] After obtaining signal values for exon array, use “Tools/Export expression value” and unzip and specify map.hg-u133-2.0-plus-cons.zip as gene list file. This file is from Affymetrix and maps exon array probe sets to U133 consensus or examplar sequences. Specifying any sample is fine and all the samples will be exported. The output file will be averaged exon expression value for each U133 probe set/gene. This file can then be read by “Get external data” and regarded as U133 data for analysis with U133 gene information files.

 

[V8/30/06+] To export exon level data, do not specify the above special gene list file (map.hg-u133-2.0-plus-consa.txt) at "Tools/Export expression".

[V3/24/08+] Unzip and specify a new mapping file u133_to_exon_mapping.rar to export faster.

 

Process tiling sub-array

 

[V4/1/06+, experimental functions] We will use "Human Tiling 2.0 Array Set, 7th array" as example below.

Affymetrix human tiling array: 1.0R, 2.0R, Promoter 1.0R, TAS software

 

Obtain and unzip the BPMAP files from Affymetrix website (e.g. Human tiling 2.0R library file). Also download and unzip this ProbeExporter software if it is not in the library above. In the unzip library files, go to the BPMAP directory, run ProbeExporter to convert Hs35b_P07R_v01-3_NCBIv34.bpmap to text format (418Mb after conversion) and rename "*.bpmap.txt" to "*.bpmap".

At dChip "Open group", specify data directory as the one containing the tiling array CEL files, and specify "Other information/CDF file" as Hs35b_P07R_v01-3_NCBIv34.bpmap (converted text format). The CDF and CEL files will be read in. A genome info file "Hs35b_P07R_v01-3_NCBIv34genome info.txt" will be generated, which contains the chromosome and start basepair position of probe sets (6 consecutive tiling probes). For faster speed, do "Analysis/Normalize" using "Options/Quantile normalization" and "Analysis/Model" using "Options/Average difference".

To perform similar analysis as SNP copy number analysis, specify a sample info file containing the "Ploidy(numeric)" column, with value 2 indicating normal (baseline) samples. Do "Analysis/Chromosome" as the SNP array, using the genome information file above.

 

Gene ST array

 

[V11/19/07+, data courtesy of Ed Fox] Use HuGene-1_0-st-v1.r3.cdf (unzip library file) at “Open group”. Since multiple probe sets for a gene's alternative transcripts may use the same probes, binary CDF file (cdf.bin) will not be created and each "Open group" will extract the CDF file. There are several probe sets with > 300 probes, and currently only first 300 probes are kept. At the PM/MM data view, use 'M' to toggle to PM-only data model if needed since this array only has PM probes. [V1/24/08+] Use MoGene-1_0-st-v1.r3.cdf (unzip library file) for mouse Gene ST arrays.

 

(Updated 11/19/07)