dChip: Allele Sharing Analysis

 

Non-parametric linkage analysis        

 

Hidden Markov Models (HMM) have been used in the linkage analysis of family pedigree or sibling pair data to extract full inheritance information from all the genetic markers in a chromosome (Kruglyak et al. 1995). Here we adopt a similar approach to estimate the allele sharing between two samples (e.g. siblings) at SNP markers, and use an allele sharing score across all the sample groups to search for interesting chromosomal loci.

 

After reading in SNP data, specify an “Array list file” using standardize separators to divide sibling pairs (example file). Then select “Analysis/Chromosome/Analysis method: Allele sharing”. Allele frequencies are specified through a SNP information file at "Open group". [Version before 11/13/05: use a genome information file with informative SNP allele frequency column (example file, 0-100 in “freq_A” column stands for A allele frequency of 0% to 100%)]. Click OK to run analysis, and then press “D” to toggle to a blue-white allele sharing data view below.

 

For each sample group separated by "Standardize separators", the pair-wise sharing between all the samples in this group will be computed, and the average sharing value of a sample with the rest of the samples in the group is displayed in the blue (2) to white (0) scale. If a sample group contains two siblings, their allele sharing values will be the same. The curve on the right will be the average allele sharing score across all the samples in all sample groups. Its genome-wide significance can be assessed by “Chromosome/Compute Score”.

 

Non-parametric linkage analysis

 

Allele sharing may be used for non-parametric linkage analysis. For example, put one affected sample with every other affected samples in a Standardize group, so each group contains a pair. Or put all affected samples in one Standardize group, and all non-affected samples in another Standardize group, and look for regions that have high sharing in the affected group but not in the unaffected group.

 

[New: Analysis example, using version 11/18/05+] Re-analysis of the 10K SNP data of the SIDDT disease from Puffenberger et al. 04 (Data site). The vertical red threshold line represents the average sharing score of 1.62 among all the 4 patients, which has genome wide p-value of 0.001. The peak region exceeding the threshold is 3.8 Mb and contains 17 SNPs and 21 genes (helped by Dietrich Stephan and Erik Puffenberger).

 

                 

 

(Updated 11/4/07)