

Steklov Mathematical Institute Seminar
June 18, 2015 16:00, Moscow, Steklov Mathematical Institute of RAS, Conference Hall (8 Gubkina)






Probabilistic and statistical methods of significant factors identification
A. V. Bulinski^{} 
Video records: 

MP4 
2,012.0 Mb 

MP4 
510.5 Mb 
Number of views: 
This page:  1180  Video files:  382  Youtube Video:  
Photo Gallery

Abstract:
In many models a studied variable (response) $Y$ depends on some collection of factors $X=(X_1,…,X_n)$. For instance in medical and biological research $Y$ can characterize the health state of a patient whereas the components of $X$ describe genetic and nongenetic factors. One of the challenging problems concerning the response analysis consists in identification of “significant collection” of factors $(X_{i_1},…,X_{i_r})$, $1\le i_1<…< i_r\le n$, such that $Y$ depends on it in essential way (in a sense). A number of complementary methods for solving this problem are considered. They include probabilistic and statistical techniques, machine learning and computer simulation. We mention only a few of modern methods such as LASSO, SCAD, BOOST, GARROTE and their modifications. Main attention is paid to MDR (multifactor dimensionality reduction) method introduced by M.Ritchie et al. in 2001. This method was applied and developed further in more than 200 publications. The talk is based on a cycle of 7 recent author's papers published in Doklady Mathematics (2014), Journal of Multivariate Analysis (2015), Lecture Notes in Mathematics (2015) and others. We emphasize that a new approach is proposed for identification of significant variables. It involves a statistical estimate of the error functional for a response forecast employing a penalty function and a crossvalidation procedure. We can manage with nonbinary response. The regularized estimates of the mentioned functional are defined and a central limit theorem is established for them. New results related to asymptotic normality of arrays of exchangeable random variables are of their own interest. We discuss also the results of computer simulation showing the effectiveness of the developed approach.

