Combining Inferences from Multiple Sources Using Bootstrap, Data depth and Confidence Distribution


Regina Liu

11:05:00 - 11:50:00

103 , Mathematics Research Center Building (ori. New Math. Bldg.)

Advanced data acquisition technologies have greatly facilitated automatic data collection in many domains. The collected data are searched for useful patterns or inferences for scientific discoveries or marketing purpose. Motivated by competition and/or collaboration, multiple data sources are often investigated for the same target hypotheses or parameters. Combining the findings from multiple individual studies can often lead to a more effective overall conclusion for the common hypotheses or parameters. In medicine or clinical trials, meta-analysis is often used to refer to synergizing the findings or combining inferences from different studies.

We apply the concepts of confidence distribution and data depth together with bootstrap to develop a new nonparametric approach for combining inferences from multiple studies for a common hypothesis. A confidence distribution (CD) is a sample-dependent distribution function that can be used as an estimate for an unknown parameter. It can be viewed as a “distribution estimator” of the parameter. CDs have been shown be effective tools in statistical inference. We discuss a new nonparametric approach to combining the test results from independent studies. Specifically, in each study we apply data depth and bootstraps to obtain a p-value function for the common hypothesis. The p-value functions are then combined under the framework of combining confidence distributions. This approach has several advantages. First, it allows resampling directly from the empirical distribution, rather than from the estimated population distribution satisfying the null constraints. Second, it yields test results directly without the standard two step procedure of having to construct an explicit test statistic and then establish or approximate its sampling distribution. The proposed method provides a valid inference approach for a broad class of testing problems involving multiple studies where the parameters of interest can be either finite or infinite dimensional. The method will be illustrated using simulation data and aircraft landing data collected from airlines.

This is joint work with Dungang Liu (Yale University) and Minge Xie (Rutgers University).