|
| 1 | +""" |
| 2 | +========================================= |
| 3 | +Nested versus non-nested cross-validation |
| 4 | +========================================= |
| 5 | +
|
| 6 | +This example compares non-nested and nested cross-validation strategies on a |
| 7 | +classifier of the iris data set. Nested cross-validation (CV) is often used to |
| 8 | +train a model in which hyperparameters also need to be optimized. Nested CV |
| 9 | +estimates the generalization error of the underlying model and its |
| 10 | +(hyper)parameter search. Choosing the parameters that maximize non-nested CV |
| 11 | +biases the model to the dataset, yielding an overly-optimistic score. |
| 12 | +
|
| 13 | +Model selection without nested CV uses the same data to tune model parameters |
| 14 | +and evaluate model performance. Information may thus "leak" into the model |
| 15 | +and overfit the data. The magnitude of this effect is primarily dependent on |
| 16 | +the size of the dataset and the stability of the model. See Cawley and Talbot |
| 17 | +[1]_ for an analysis of these issues. |
| 18 | +
|
| 19 | +To avoid this problem, nested CV effectively uses a series of |
| 20 | +train/validation/test set splits. In the inner loop, the score is approximately |
| 21 | +maximized by fitting a model to each training set, and then directly maximized |
| 22 | +in selecting (hyper)parameters over the validation set. In the outer loop, |
| 23 | +generalization error is estimated by averaging test set scores over several |
| 24 | +dataset splits. |
| 25 | +
|
| 26 | +The example below uses a support vector classifier with a non-linear kernel to |
| 27 | +build a model with optimized hyperparameters by grid search. We compare the |
| 28 | +performance of non-nested and nested CV strategies by taking the difference |
| 29 | +between their scores. |
| 30 | +
|
| 31 | +.. topic:: See Also: |
| 32 | +
|
| 33 | + - :ref:`cross_validation` |
| 34 | + - :ref:`grid_search` |
| 35 | +
|
| 36 | +.. topic:: References: |
| 37 | +
|
| 38 | + .. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and |
| 39 | + subsequent selection bias in performance evaluation. |
| 40 | + J. Mach. Learn. Res 2010,11, 2079-2107. |
| 41 | + <http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_ |
| 42 | +
|
| 43 | +""" |
| 44 | +from sklearn.datasets import load_iris |
| 45 | +from matplotlib import pyplot as plt |
| 46 | +from sklearn.svm import SVC |
| 47 | +from sklearn.model_selection import GridSearchCV, cross_val_score, KFold |
| 48 | +import numpy as np |
| 49 | + |
| 50 | +print(__doc__) |
| 51 | + |
| 52 | +# Number of random trials |
| 53 | +NUM_TRIALS = 30 |
| 54 | + |
| 55 | +# Load the dataset |
| 56 | +iris = load_iris() |
| 57 | +X_iris = iris.data |
| 58 | +y_iris = iris.target |
| 59 | + |
| 60 | +# Set up possible values of parameters to optimize over |
| 61 | +p_grid = {"C": [1, 10, 100], |
| 62 | + "gamma": [.01, .1]} |
| 63 | + |
| 64 | +# We will use a Support Vector Classifier with "rbf" kernel |
| 65 | +svr = SVC(kernel="rbf") |
| 66 | + |
| 67 | +# Arrays to store scores |
| 68 | +non_nested_scores = np.zeros(NUM_TRIALS) |
| 69 | +nested_scores = np.zeros(NUM_TRIALS) |
| 70 | + |
| 71 | +# Loop for each trial |
| 72 | +for i in range(NUM_TRIALS): |
| 73 | + |
| 74 | + # Choose cross-validation techniques for the inner and outer loops, |
| 75 | + # independently of the dataset. |
| 76 | + # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. |
| 77 | + inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) |
| 78 | + outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) |
| 79 | + |
| 80 | + # Non_nested parameter search and scoring |
| 81 | + clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv) |
| 82 | + clf.fit(X_iris, y_iris) |
| 83 | + non_nested_scores[i] = clf.best_score_ |
| 84 | + |
| 85 | + # Nested CV with parameter optimization |
| 86 | + nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv) |
| 87 | + nested_scores[i] = nested_score.mean() |
| 88 | + |
| 89 | +score_difference = non_nested_scores - nested_scores |
| 90 | + |
| 91 | +print("Average difference of {0:6f} with std. dev. of {1:6f}." |
| 92 | + .format(score_difference.mean(), score_difference.std())) |
| 93 | + |
| 94 | +# Plot scores on each trial for nested and non-nested CV |
| 95 | +plt.figure() |
| 96 | +plt.subplot(211) |
| 97 | +non_nested_scores_line, = plt.plot(non_nested_scores, color='r') |
| 98 | +nested_line, = plt.plot(nested_scores, color='b') |
| 99 | +plt.ylabel("score", fontsize="14") |
| 100 | +plt.legend([non_nested_scores_line, nested_line], |
| 101 | + ["Non-Nested CV", "Nested CV"], |
| 102 | + bbox_to_anchor=(0, .4, .5, 0)) |
| 103 | +plt.title("Non-Nested and Nested Cross Validation on Iris Dataset", |
| 104 | + x=.5, y=1.1, fontsize="15") |
| 105 | + |
| 106 | +# Plot bar chart of the difference. |
| 107 | +plt.subplot(212) |
| 108 | +difference_plot = plt.bar(range(NUM_TRIALS), score_difference) |
| 109 | +plt.xlabel("Individual Trial #") |
| 110 | +plt.legend([difference_plot], |
| 111 | + ["Non-Nested CV - Nested CV Score"], |
| 112 | + bbox_to_anchor=(0, 1, .8, 0)) |
| 113 | +plt.ylabel("score difference", fontsize="14") |
| 114 | + |
| 115 | +plt.show() |
0 commit comments