dChip: View PM/MM data

 

Model fitting and outlier indication                      Display relative probe position and all arrays

Export probe data

 

It is a useful practice to view the probe level PM and MM data to confirm the expression changes reported by high-level analysis such as clustering or sample comparison; since outlier data points may lead to erroneous expression values. Click the “PM/MM Data” icon in the left pane (or select the menu View/PM/MM Data or simply press Enter at “Image View”) to view the probe level data for the current probe set in the current array, as well as the model-based expression indexes (q), probe sensitivity indexes (f) and fitted values (red curve):

 

 

 

There are 6 grids in this "PM/MM Data View”. Let us use (x, y) to denote different grids, with (1, 1) the upper-left grid and (2, 3) the lower-right grid. Grid (1, 1) displays the PM and MM data in blue and green curves, with the x-axis ordering probe sets from 1 to 20, and y-axis for probe intensities with range (0, 2686). Grid (1, 2) is the PM/MM difference curve, the horizontal blue line is y=0. Grid (1, 3) overlays red fitted curve (Li and Wong 2001a) to the blue PM/MM difference curve, and also shows the residual curve in light gray. From this pane we can also read that the explained energy is 97.21% after fitting the model to the PM/MM difference data of this probe set (a data table of 4 arrays by 20 probes), and it takes 3 rounds of iteration for model fitting and outlier identification. Here we found 0 array outlier, 0 probe outlier and 0 single outliers. Grid (2, 1) is the intensity image of the current probe set in current array, and the intensity values are displayed such that the images for the same probe set across all arrays are comparable. Recall that in the “CEL image view” all probe cells in the same array are displayed comparable, and the same probe set may look different in “CEL image view” or “Data view”.

 

Click anywhere in the right pane to activate the “PM/MM Data View”. Use key “PageDown” and “PageUp” to go to the data view of this probe set in other arrays. Select “Data/Animate” to sequentially displays data curves for the current probe set in different arrays, by the order of fitted q values. Select “Data/Pause” to stop animation, “Data/Faster/Slower” for different speed. In this way we can visualize how PM and MM responses increase as mRNA level in the sample increases. Viewing such animation is instrumental in the development of our model in Li and Wong 2001a.

 

Press Home and “End” key to go to the previous or the next probe set. Generally it is only interesting to look at probe sets called “present” across more than half of the arrays. Toggle menu item “Data/Jump to Present” for “Home” and “End” keys to go through each probe set or jump to “Present” probe sets. To find a particular probe set, select “View/Find Gene” to search by probe set name or by keywords using match patterns.


One can choose “Data/Next model” to switch to PM-only model (indicated in grid (2, 1)), and the background values are plotted as a largely flat light curve.

 

Model fitting and outlier indication

 

Grid (2, 2) displays the scatterplot of the standard error of q, versus q (Model-Based Expression Values, MBEI). From Li and Wong 2001a (page 2, the formula just below equation [3]), we see that the model-based expression values are weighted average of PM-MM (or PM) values, with larger weights (f’s) given to sensitive (responding) probes, and non-responsive probes with small f’s are down-weighted or ignored in the MBEI. The probe sensitivity indexes (f’s) are estimated from all arrays from a group and should be reliable when a gene is present in at least several arrays.

 

The model is fitted on the fly for the current probe set in “PM/MM Data View”, using unnormalized or normalized probe data. But to perform “Analysis/Model-based expression” for all probe sets, normalization is desired to be performed first. In Grid (2, 2), the black dot represents the current array, and its value (709, 10) is displayed.  Grid (2, 3) displays the sactterplot of the standard error of f (has range (0, 0.19) in this case), versus f (probe sensitivity parameter). The value of standard error of f is also shown as vertical blue line in Grid (1, 1) to (1, 3): probes with larger standard errors behave inconsistently with the remaining probes across arrays.

 

The outlier detection method described in Li and Wong 2001a is used by the model fitting procedure to identify array, probe and single outliers. Array outliers refer to an unusual probe response pattern (the connected PM, MM or PM/MM difference curve) in one array, which is different from the probe response pattern seen in most arrays. It can be the result of image contamination or saturated PM or MM signals. Probe outliers are most likely due to cross-hybridization to non-target genes. Single outlier are usually caused by image spikes only affecting one probe in one array.

 

Here are several visual clues in grid (1, 3) indicating different outliers handled in the modeling. Single outliers do not have small circles (see two black arrows), meaning their values are replaced by imputed values (red curve), thus leading to large residuals (these “imputed” residuals are treated as 0 when calculating standard errors of expression levels or probe sensitivity indexes (q and f in Li and Wong 2001a, the picture is for another probe set). Probe outliers do not have fitted red curves at that probe location for all arrays (see the arrow). The data coming from this probe is not used to estimate q (expression index) parameters, although f value and standard error for this probe are still calculated using the trained q’s. Similarly, an array-outlier does not have fitted red curves for all probes in that array. This means that probe response pattern of this probe set in the current array is inconsistent with the patterns seen in other array for the same probe set (may be due to image contamination). Although an expression value q is calculated for such “array-outlier”, it will be attached with large standard error indicating it is not reliable.

 

 

Although not shown in this example, if occurred in grid (2, 2) and (2, 3), the vertical green lines indicate zero value position for q or f, and blue circles represent array outliers or probe outliers which are not used to train the model. After model fitting, the values and standard errors of excluded probes or arrays are still calculated, using the estimated q and f parameters, but their unreliability is indicated by large standard errors.

 

Display relative probe position and all arrays

 

Early CDF files also has information of probe positions in the reference sequence; if such CDF file is used, checking “Data/Probe Position” can display the relative probe positions along the X-axis in grid (1, 1) to (1, 3):

 

 

[V1.2+] In addition, Affymetrix has also provided the probe sequence and probe position information for each array type on their download center (need a free account to access). One can download and unzip the “Probe tabular” file into the same directory as the corresponding CDF file. The probe tabular file name replaces “.cdf” in the CDF file name with “_probe_tab” (e.g. “HG-U133A_probe_tab”; if needed rename the probe tabular file name), and dChip will read in the relative probe positions in this file after extracting the CDF file. The probe positions are relative to the reference sequence of the probe set. If “Data/Probe position” is checked, Grid (1, 2) will display the range of the x-axis in basepair. In the above picture, the distance in basepair between the leftmost probe and the rightmost probe is 154.

 

Toggle "Data/Show all array" to view the current probe set in all arrays in the current "array list file":

 

Export probe data

A user can use “Tools/Export probe set” (“Data/Export Probe Set” for Version 1.0) to export the probe-level data in all arrays for the current probe set or a list of probe sets, for analysis and visualization in other statistical software such as R/S-PLUS. The order of arrays in the “Array” column of the output file is the same as in the "array summary file" written at the "Open group" step, except starting from 0 instead of 1. Check “Output CEL position of PM probe” to export the x and y positions of the PM probe in the CEL file; the paired MM probe has the same x position and pm.y + 1 as the y position since MM probe is one row below the paired PM probe.

If "Tools/Options/Model/Method used" is "PM-only model", this function exports the background-subtracted probe values. If it is “PM/MM difference model”, the PM and MM probe value without background adjustment are exported. The exported probe values are normalized if “Normalized” indicator is on in the lower-right corner of the dChip window. To export the unnormalized raw probe data as in the CEL files, one can check “Open group/Read unnormalized probe data” to open a group, and select "Tools/Options/Model/Method used/PM/MM difference model" before exporting data.

[Version 3/25/06+] For SNP array, “Tools/Export probe set” will export an “Allele_A” column, with 1 indicating probes for A allele, and 0 for B allele. If first clicking the “PM/MM data” icon on the left and then running “Tools/Export probe set” for one probe set, the theta (MBEI), phi (PSI) and their standard errors for the current probe set will be exported.

 

The data image can be exported by the “View/Export Image” menu. After using “Analysis/Model-based expression value” to fit the model for all probe sets, the identified outliers can also be exported.