A New Formula for Faster Computation of the K-Fold Cross-Validation and Good Regularisation Parameter Values in Ridge Regression
- Resource Type
- Periodical
- Authors
- Liland, K.H.; Skogholt, J.; Indahl, U.G.
- Source
- IEEE Access Access, IEEE. 12:17349-17368 2024
- Subject
- Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Mathematical models
Presses
Linear regression
Predictive models
Context modeling
Velocity measurement
Numerical models
Parameter estimation
Cross-validation
GCV
PRESS statistic
ridge regression
SVD
Tikhonov regularisation
- Language
- ISSN
- 2169-3536
In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares ( $PRESS$ ) statistic. We also suggest strategies for efficient estimation of the minimum $PRESS$ value and full $PRESS$ function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.