On the Asymptotic Distribution of the Minimum Empirical Risk

Abstract

Empirical risk minimization (ERM) is a foundational framework for the estimation of statistical and machine learning models. Characterizing the distributional properties of the minimum empirical risk (MER) provides valuable tools for conducting inference and assessing the goodness of model fit. We provide a comprehensive account of the asymptotic distribution for the order-$\sqrt{n}$ blowup of the MER under generic and abstract assumptions, and present practical conditions under which our theorems hold. Our results improve upon and relax the assumptions made in previous works. Specifically, we provide asymptotic distributions for MERs when data are not independent and identically distributed, and when the loss functions that may be discontinuous or indexed by a non-Euclidean space. We further present results that enable the application of these asymptotics for statistical inference. Specifically the construction of consistent confidence sets using bootstrap and consistent hypothesis tests using penalized model selection. A pair of practical applications of our framework to neural network problems are provided to illustrate the utility of our approach.

Publication
In Proceedings of The 41st International Conference on Machine Learning, ICML 2024, Acceptance rate 27.5% over 9473 submissions
TrungTin Nguyen
TrungTin Nguyen
Postdoctoral Research Fellow

A central theme of my research is data science at the intersection of statistical learning, machine learning and optimization.

Next
Previous