Published on 01 January 2025

Fractional cross-validation for optimizing hyperparameters of supervised learning algorithms

View Dataset
Yerramilli, Suraj;Apley, Daniel W.

Description

K-fold cross-validation (CV) is a robust method for estimating generalization performance of supervised learning models. Although CV is more reliable than using a single hold-out test set, it is also more computationally expensive since the model must be fit K times. This can be prohibitive when optimizing the hyperparameters, since this involves conducting K-fold CV repeatedly at many hyperparameter configurations. In this work, we propose a highly-efficient Bayesian optimization algorithm for optimizing the hyperparameters of supervised learning algorithms with K-fold CV error as the evaluation criterion. Our approach exploits the fact that the single-fold out-of-sample error is pairwise correlated across different hyperparameter configurations. We introduce a hierarchical Gaussian process model that is well-suited to accommodate this inherent correlation structure across folds and across the hyperparameter space. Our resulting algorithm requires evaluating only a single fold for many hyperparameter configurations, enabling us to efficiently find the optimal hyperparameters. We refer to this as “fractional CV”, since it requires only a small fraction of the folds to be evaluated, relative to what is required for full K-fold CV. We demonstrate the efficacy of our method on a number of models and real datasets.

Citations (1)

Mentions (0)

Metrics

Dataset Index

0.7

FAIR Score

13%

Citations

1

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Taylor & Francis

Assigned Domain

Subfield

Artificial Intelligence

Field

Computer Science

Domain

Physical Sciences

Confidence Score

52%

Source

Scholar Data Model

Keywords

BiophysicsPhysical Sciences not elsewhere classifiedMedicineCell BiologyNeurosciencePhysiologyFOS: Biological sciencesBiotechnologyBiological Sciences not elsewhere classifiedInformation Systems not elsewhere classifiedMathematical Sciences not elsewhere classified

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00