Population substructure, which included population
stratification and cryptic relatedness as two extreme
manifestations, is a major confounder in genetic association
studies. Recent studies found that rare variants are more
geographically localized or private to specific populations.
It is indicated that the impact of population substructure
is more complex and stronger on rare variants than common
variants. When substructure is caused by discrete clusters
(e.g., continental-wise stratifications), stratified
permutation or incorporating principal component covariates
can be effective at controlling the confounding effect.
However, for regional and complex substructure, existing
correction methods for single marker analysis may fail to
control for confounding effects. Using the framework of
similarity regression, we propose to model the whole genome
sharing at the rare variants loci, a measure that is
suggested to reflect the local substructure. By modeling
average allele sharing level across genome, it corrects the
non-zero baseline sharing induced by population substructure
and explicitly controls the substructure effect on the
phenotype. The proposed method is robust to a wide range of
substructure induced by genealogy, from continual to
regional, and from population stratification to cryptic
relatedness. We evaluate the performance of the proposed
framework based on grid genotype simulation, and demonstrate
the robustness, validity and utility of the proposed
approaches. |