Roughness Index for Loss Landscapes of Neural Network Models of Partial Differential Equations*
- Resource Type
- Conference
- Authors
- Wu, Keke; Jian, Xiangru; Du, Rui; Chen, Jingrun; Zhou, Xiang
- Source
- 2023 IEEE International Conference on Big Data (BigData) Big Data (BigData), 2023 IEEE International Conference on. :966-975 Dec, 2023
- Subject
- Bioengineering
Computing and Processing
Geoscience
Robotics and Control Systems
Signal Processing and Analysis
Technological innovation
Partial differential equations
Optimization methods
Artificial neural networks
Big Data
Mathematical models
Data models
roughness index
landscapes
total variation
- Language
Loss landscape is a useful tool for characterizing and comparing neural network models. The main challenge for analysis of loss landscape for the deep neural networks is that they are generally highly nonconvex in very high-dimensional space. In this paper, we develop the “roughness” concept for understanding such landscapes in high dimensions and apply this technique to study two neural network models arising from solving differential equations. Our main innovation is the proposal of a well-defined and easy-to-compute roughness index (RI) which is based on the mean and variance of the (normalized) total variation for one-dimensional functions projected on randomly sampled directions. A large RI at the local minimizer indicates an oscillatory landscape profile and indicates a severe challenge for the first-order optimization method. Particularly, we observe the increasing-then-decreasing pattern for RI along the gradient descent path in most models. We apply our method to two types of loss functions used to solve partial differential equations (PDEs) when the solution of PDE is parametrized by neural networks. Our empirical results on these PDE problems reveal important and consistent observations that the landscapes from the deep Galerkin method around its local minimizers are less rough than the deep Ritz method.