The 1000 Genomes Project is the first large scale project that aims to sequence the genomes of 2500 people of different global ethnicities, to provide a comprehensive resource on human genetic variation. The goal of the 1000 Genomes Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Such information will be useful for a range of genetic investigations specifically including genome-wide association studies and mutation screens for various monogenic disorders.

To sequence a person's genome, many copies of the DNA are broken into short pieces and each piece is sequenced. The many copies of DNA mean that the DNA pieces are more-or-less randomly distributed across the genome. The pieces are then aligned to the reference sequence and joined together. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. As with other major human genome reference projects, data from the 1000 Genomes Project will be made available quickly to the worldwide scientific community through freely accessible public databases.

The Center for Non-Communicable Diseases is the local coordinating center to collect DNA samples from 150 Pakistani individuals (50 families). These samples have been collected from Lahore, Punjab. These Pakistani Punjabi samples are the first South Asian samples that have been collected from South Asia; and will provide a useful resource to conduct informative imputations for genome-wide association studies. These samples will be genotyped for whole-genome and exome sequencing and transformed to develop lymphoblastoid cell lines for various functional experiments.