A non-asymptotic theory for model selection in high-dimensional mixture of experts via joint rank and variable selection


Mixture of experts (MoE) models are among the most popular and interesting combination techniques, with great potential for improving the performance of machine learning and statistical learning systems. We are the first to consider a polynomial softmax-gated block-diagonal mixture of experts (PSGaBloME) model for the identification of potentially nonlinear regression relationships for complex and high-dimensional heterogeneous data, where the number of explanatory and response variables can be much larger than the sample size and possibly hidden graph-structured interactions exist. These PSGaBloME models are characterized by several hyperparameters, including the number of mixture components, the complexity of softmax gating networks and Gaussian mean experts, and the hidden block-diagonal structures of covariance matrices. We contribute a nonasymptotic theory for model selection of such complex hyperparameters with the help of the slope heuristic approach in a penalized maximum likelihood estimation (PMLE) framework. In particular, we establish a non-asymptotic risk bound on the PMLE, which takes the form of an oracle inequality, given lower bound assumptions on the penalty function. Furthermore, we propose two Lasso-MLE-rank procedures, based on a new generalized expectation-maximization algorithm, to tackle the estimation problem of the collection of PSGaBloME models.

hal-03984011. Accepted at AJCAI 2023 as an oral presentation
TrungTin Nguyen
TrungTin Nguyen
Postdoctoral Fellow

A central theme of my research is data science at the intersection of statistical learning, machine learning and optimization.