Workshops

Population structure and principal component analysis

83
reads

Hua Tang

2011-12-20
10:55:00 - 11:50:00

101 , Mathematics Research Center Building (ori. New Math. Bldg.)



For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. Principal Component Analysis has been widely used for describing population genetic structure. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study “virtual genomes” of admixed individuals. An alternative implementation of the Principal Component Analysis was used to trace these virtual genomes to precise regions within each continent. I will discuss how this approach can be used to study the adaptive and demographic history of indigenous people.