Seminars

Statistical inference after Selection with applications to Microarray Data analysis- What can we do to be statistically valid and efficient?

121
reads

Gene Hwang

2012-11-02
12:30:00 - 14:30:00

103 , Mathematics Research Center Building (ori. New Math. Bldg.)

Modern statistical applications often involve many parameters and the scientific interest often lies in estimating or making inference regarding the parameters that were selected by data. For example ,in microarray data analysis, statistical inference for the parameters corresponding to the most significant genes is of great interest. This type of statistical inference is called post selection statistical inference. We shall assume that no further data are collected which happens often. Hence the same data used for selection is now used for statistical inference. This problem is becoming increasingly important in recent years. Naive statistical inference ignoring the selection causes severe bias especially in the large p small n (i.e. large population size and small sample size) scenario. Bonferroni type procedures are valid for post selection inference, but are very conservative. We shall demonstrate how Empirical Bayes Bayes procedures including estimators and confidence intervals are superior to the Naive and the Bonferroni's procedure. The Empirical Bayes (Lindley-James- Stein) estimator has virtually no selection bias. The empirical Bayes interval centered at the empirical Bayes estimator is short and is valid for post selection inference in the sense that its coverage probabilities with respect to a class of priors are numerically shown to be above a nominal level. We shall report applications of the proposed intervals to Microarray data and it time allows, we shall report results relating to false coverage rate (FCR), which parallels false discovery rate (FDR) for hypothesis testing.