.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/learning_curve.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_learning_curve.py: ============== Learning Curve ============== This example demonstrates the learning curve, which helps answer two questions: 1) Is my model over-fitted? 2) Will my model benefit from more data? Code has been adapted from the `sklearn example `_. This `machinelearningmastery article `_ is a great resource for interpretation of learning curves. .. GENERATED FROM PYTHON SOURCE LINES 17-33 .. code-block:: default import logging import plotly from sklearn.datasets import load_digits, load_diabetes from sklearn.linear_model import RidgeCV from sklearn.model_selection import ShuffleSplit from sklearn.naive_bayes import GaussianNB from sklearn.pipeline import make_pipeline, Pipeline from sklearn.preprocessing import StandardScaler from elphick.sklearn_viz.model_selection import LearningCurve, plot_learning_curve, metrics logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s', datefmt='%Y-%m-%dT%H:%M:%S%z') .. GENERATED FROM PYTHON SOURCE LINES 34-36 Load Data --------- .. GENERATED FROM PYTHON SOURCE LINES 36-39 .. code-block:: default X, y = load_digits(return_X_y=True) .. GENERATED FROM PYTHON SOURCE LINES 40-44 Create a Classifier Pipeline ---------------------------- The pipeline will likely include some pre-processing. .. GENERATED FROM PYTHON SOURCE LINES 44-48 .. code-block:: default pipe: Pipeline = make_pipeline(StandardScaler(), GaussianNB()).set_output(transform='pandas') pipe .. raw:: html
Pipeline(steps=[('standardscaler', StandardScaler()),
                    ('gaussiannb', GaussianNB())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 49-51 Plot using the function ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 51-58 .. code-block:: default cv = ShuffleSplit(n_splits=50, test_size=0.2, random_state=0) fig = plot_learning_curve(pipe, x=X, y=y, cv=cv) fig.update_layout(height=600) # noinspection PyTypeChecker plotly.io.show(fig) .. raw:: html :file: images/sphx_glr_learning_curve_001.html .. GENERATED FROM PYTHON SOURCE LINES 59-66 Plot using the object --------------------- Plotting using the object allows access to the underlying data. .. tip:: You can use `n_jobs` to parallelize the computation. .. GENERATED FROM PYTHON SOURCE LINES 66-71 .. code-block:: default lc: LearningCurve = LearningCurve(pipe, x=X, y=y, cv=5, n_jobs=5) fig = lc.plot(title='Learning Curve').update_layout(height=600) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 72-73 View the data .. GENERATED FROM PYTHON SOURCE LINES 73-76 .. code-block:: default lc.results .. rst-class:: sphx-glr-script-out .. code-block:: none LearningCurveResult(training_scores=array([[0.97902098, 0.98601399, 0.98601399, 0.98601399, 0.98601399], [0.85010707, 0.87366167, 0.91648822, 0.91648822, 0.91648822], [0.85063291, 0.8556962 , 0.86582278, 0.81392405, 0.81392405], [0.87780773, 0.88679245, 0.87241689, 0.79964061, 0.79245283], [0.82602644, 0.84620738, 0.83716075, 0.78705637, 0.84133612]]), validation_scores=array([[0.70277778, 0.59722222, 0.56267409, 0.64345404, 0.55153203], [0.69722222, 0.71388889, 0.7632312 , 0.84122563, 0.76044568], [0.69722222, 0.73055556, 0.78272981, 0.80779944, 0.76044568], [0.76388889, 0.75277778, 0.77715877, 0.80779944, 0.72423398], [0.75277778, 0.74444444, 0.77715877, 0.7994429 , 0.76044568]]), training_sizes=array([ 143, 467, 790, 1113, 1437]), metrics=None) .. GENERATED FROM PYTHON SOURCE LINES 77-78 Results as a dataframe .. GENERATED FROM PYTHON SOURCE LINES 78-82 .. code-block:: default df = lc.results.get_results() df.head(10) .. raw:: html
train_count_143 train_count_467 train_count_790 train_count_1113 train_count_1437 dataset
0 0.979021 0.850107 0.850633 0.877808 0.826026 training
1 0.986014 0.873662 0.855696 0.886792 0.846207 training
2 0.986014 0.916488 0.865823 0.872417 0.837161 training
3 0.986014 0.916488 0.813924 0.799641 0.787056 training
4 0.986014 0.916488 0.813924 0.792453 0.841336 training
5 0.702778 0.697222 0.697222 0.763889 0.752778 validation
6 0.597222 0.713889 0.730556 0.752778 0.744444 validation
7 0.562674 0.763231 0.782730 0.777159 0.777159 validation
8 0.643454 0.841226 0.807799 0.807799 0.799443 validation
9 0.551532 0.760446 0.760446 0.724234 0.760446 validation


.. GENERATED FROM PYTHON SOURCE LINES 83-87 Regressor Learning Curve ------------------------ This example uses a regression model. .. GENERATED FROM PYTHON SOURCE LINES 87-95 .. code-block:: default diabetes = load_diabetes(as_frame=True) X, y = diabetes.data, diabetes.target y.name = "progression" pipe: Pipeline = make_pipeline(StandardScaler(), RidgeCV()).set_output(transform='pandas') pipe .. raw:: html
Pipeline(steps=[('standardscaler', StandardScaler()), ('ridgecv', RidgeCV())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 96-100 .. code-block:: default lc: LearningCurve = LearningCurve(pipe, x=X, y=y, cv=5) fig = lc.plot(title='Learning Curve').update_layout(height=600) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 101-106 Learning Curve with Metrics --------------------------- While a model is fitted based on the defined scorer, we may be interested in other metrics. The `metrics` parameter allows us to define additional metrics to calculate. .. GENERATED FROM PYTHON SOURCE LINES 106-113 .. code-block:: default lc: LearningCurve = LearningCurve(pipe, x=X, y=y, metrics={'mse': metrics.mean_squared_error, 'moe': metrics.moe_95}, cv=5, n_jobs=5) fig = lc.plot(title='Learning Curve with Metrics', metrics=['mse', 'moe'], col_wrap=2).update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 114-115 Learning Curve for a metric without the scorer .. GENERATED FROM PYTHON SOURCE LINES 115-117 .. code-block:: default fig = lc.plot(title='Learning Curve - Metric, no scorer', metrics=['moe'], plot_scorer=False).update_layout(height=700) fig .. raw:: html


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 4.786 seconds) .. _sphx_glr_download_auto_examples_learning_curve.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: learning_curve.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: learning_curve.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_