================================================= HapMap Haplotypes for data release #22 (13/08/07) ================================================= The haplotypes for all the autosomes and chrX will be released as they are processed in the directories below all/ consensus/ phased/ which contain 3 versions of the data we have created phased/ contains the haplotypes for each panel but only at SNPs that are segregating in each panel i.e. if a SNP does not segregate in the CEU panel then it is not included in the CEU haplotypes. Each dataset has 3 files per chromosome/population panel. For example, for chr22 in the CEU panel there are the files /phased/CEU/chr22/genotypes_chr22_CEU_r22_nr.b36_fwd_legend.txt /phased/CEU/chr22/genotypes_chr22_CEU_r22_nr.b36_fwd_sample.txt /phased/CEU/chr22/genotypes_chr22_CEU_r22_nr.b36_fwd_phased The _legend.txt file contains a legend detailing the rs id, base pair position, the allele coded 0 and the allele coded 1 for each of the segregating SNPs e.g. rs position 0 1 rs11089130 14431347 C G rs738829 14432618 A G rs915674 14433624 A G The _phased contains the haplotypes arranged one haplotype per row. For the CEU and YRI the haplotypes are arranged as follows: row 1 - trio 1 parent 1 transmitted haplotype row 2 - trio 1 parent 1 untransmitted haplotype row 3 - trio 1 parent 2 transmitted haplotype row 4 - trio 1 parent 2 untransmitted haplotype row 5 - trio 2 parent 1 transmitted haplotype row 6 - trio 2 parent 1 untransmitted haplotype row 7 - trio 2 parent 2 transmitted haplotype row 8 - trio 2 parent 2 untransmitted haplotype . . For the JPT+CHB the haplotypes are arranged as row 1 - individual 1 haplotype 1 row 2 - individual 1 haplotype 2 row 3 - individual 2 haplotype 1 row 4 - individual 2 haplotype 2 row 5 - individual 3 haplotype 1 row 6 - individual 3 haplotype 2 . . For chrX there are *non_par.phased, *par1.phased and *par2.phased files for non-pseudo autosomal region, first pseudo autosomal region and second pseudo autosomal region respectively. Files with haplotypes from the non-pseudo autosomal region have 2 lines per individual but males have only one haplotype with the other haplotype represented as a row of dashes (-). This makes the *sample.txt files match up with the haplotypes in a straight-forward way. The _sample.txt file contains an ordered list of individual ids that corresponds to the _phased file. For the CEU and YRI panels the individual id's are arranged as follows but the _phased files do not contain the haplotypes of the children as these can be infered from the parents haplotypes trio 1 parent 1 trio 1 parent 2 trio 2 parent 1 trio 2 parent 2 . . . trio 30 parent 1 trio 30 parent 2 trio 1 child trio 2 child . . . trio 30 child /all contains the haplotypes for each panel with the monomorphic sites put back into the haplotypes. Each panel has two files per chromosome/region. A _phased_all file that contains the haplotypes in the same format as described above and a _legend_all file that contains the rs, position and allele coding for each SNP in the haplotypes. SNPs that exist in multiple panels have the same coding of alleles. /consensus - contains a set of haplotypes for each population at only those SNPs that are in all 3 panels. There is one _phased_consensus file that contains the haplotypes for each chromosome/region and one _consensus_legend+freq for each chromosome/region that details rs, position, allele coded 0, allele coded 1, CEU allele 1 count, YRI allele 1 count and JPT+CHB allele 1 count.