Robust similarity regression for population substructure in rare variant aggregation analyses


Jung-Ying Tzeng

13:30:00 - 14:20:00

308 , Mathematics Research Center Building (ori. New Math. Bldg.)

Population substructure, which included population stratification and cryptic relatedness as two extreme manifestations, is a major confounder in genetic association studies. Recent studies found that rare variants are more geographically localized or private to specific populations. It is indicated that the impact of population substructure is more complex and stronger on rare variants than common variants. When substructure is caused by discrete clusters (e.g., continental-wise stratifications), stratified permutation or incorporating principal component covariates can be effective at controlling the confounding effect. However, for regional and complex substructure, existing correction methods for single marker analysis may fail to control for confounding effects. Using the framework of similarity regression, we propose to model the whole genome sharing at the rare variants loci, a measure that is suggested to reflect the local substructure. By modeling average allele sharing level across genome, it corrects the non-zero baseline sharing induced by population substructure and explicitly controls the substructure effect on the phenotype. The proposed method is robust to a wide range of substructure induced by genealogy, from continual to regional, and from population stratification to cryptic relatedness. We evaluate the performance of the proposed framework based on grid genotype simulation, and demonstrate the robustness, validity and utility of the proposed approaches.