Prof. Wei Zheng
Indiana University–Purdue University Indianapolis
Informative sampling of large database
Abstract: For many tasks of data analysis, a large database of explanatory variables is readily available, however, the responses are missing and expensive to obtain. A natural remedy is to judiciously select a sample of the data, for which the responses are to be measured. In this paper, we adopt the classical criteria in design of experiments to quantify the information of a given sample. Then, we provide a theoretical justification for approximating the optimal sample problem by a continuous problem, for which fast algorithms can be further developed with the guarantee of global convergence. Our approach exhibits the following features: (i) The statistical efficiency of any candidate sample can be evaluated without knowing the exact optimal sample; (ii) It can be applied to a very wide class of statistical models; (iii) It can be integrated with a broad class of information criteria; (iv) It is scalable for big data.
Friday January 20, 2017 at 3:00 PM in SEO 636