.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/principal_components.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_principal_components.py: ============================ Principal Component Analysis ============================ Principal Component Analysis is a feature reduction (decomposition) technique that aims to maximise the retained variance in less features. It is a tool to help manage the "curse of dimensionality". .. GENERATED FROM PYTHON SOURCE LINES 10-23 .. code-block:: default import logging import pandas as pd import plotly.io as pio import plotly.express as px from sklearn.datasets import load_diabetes from elphick.sklearn_viz.features import plot_principal_components, plot_scatter_matrix, \ plot_explained_variance, PrincipalComponents logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s', datefmt='%Y-%m-%dT%H:%M:%S%z') .. GENERATED FROM PYTHON SOURCE LINES 24-26 Load Classification Data ------------------------ .. GENERATED FROM PYTHON SOURCE LINES 26-32 .. code-block:: default df = px.data.iris().drop(columns=['species_id']) df['species'] = df['species'].astype('category') x = df[[col for col in df.columns if col != 'species']] y = df['species'] .. GENERATED FROM PYTHON SOURCE LINES 33-35 Plot Classification Data ------------------------ .. GENERATED FROM PYTHON SOURCE LINES 37-39 SPLOM - Original Feature Space ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 39-44 .. code-block:: default fig = plot_scatter_matrix(x=x, y=y, original_features=True) fig.update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 45-47 SPLOM - Principal Components ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 47-52 .. code-block:: default fig = plot_scatter_matrix(x=x, y=y) fig.update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 53-55 Scatter - 2D PCA ^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 55-60 .. code-block:: default fig = plot_principal_components(x=x, color=y, plot_3d=False, loading_vectors=False) fig.update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 61-62 Plotting loading vectors is the default. .. GENERATED FROM PYTHON SOURCE LINES 62-67 .. code-block:: default fig = plot_principal_components(x=x, color=y, plot_3d=False) fig.update_layout(height=800) # noinspection PyTypeChecker pio.show(fig) .. raw:: html :file: images/sphx_glr_principal_components_001.html .. GENERATED FROM PYTHON SOURCE LINES 68-70 Explained Variance ^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 70-74 .. code-block:: default fig = plot_explained_variance(x=x, y=y) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 75-77 Scatter - 3D PCA ^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 77-82 .. code-block:: default fig = plot_principal_components(x=x, color=y, loading_vectors=False) fig.update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 83-84 Plotting loading vectors is the default. .. GENERATED FROM PYTHON SOURCE LINES 84-88 .. code-block:: default fig = plot_principal_components(x=x, color=y) fig.update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 89-94 Regression Datasets ------------------- The preceding examples demonstrated a categorical target variable. Regression problems with a numeric variable are also supported. .. GENERATED FROM PYTHON SOURCE LINES 94-100 .. code-block:: default diabetes = load_diabetes(as_frame=True, scaled=False) x, y = diabetes.data, diabetes.target.rename('target') df = pd.concat([x, y], axis=1) df.shape .. rst-class:: sphx-glr-script-out .. code-block:: none (442, 11) .. GENERATED FROM PYTHON SOURCE LINES 101-105 .. code-block:: default fig = plot_principal_components(x=x, color=y, plot_3d=False) fig.update_layout(height=800) fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 106-108 This dataset requires more variables to retain a reasonable proportion of the total variance compared to the iris dataset as indicated in the section below. .. GENERATED FROM PYTHON SOURCE LINES 110-114 Accessing the Data ------------------ By plotting with the object rather than the function you can access the data. .. GENERATED FROM PYTHON SOURCE LINES 114-119 .. code-block:: default pca = PrincipalComponents(x=x, color=y) fig = pca.plot_explained_variance() fig .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 120-120 .. code-block:: default pca.data .. rst-class:: sphx-glr-script-out .. code-block:: none {'raw': PCResults(data= PC1 PC2 PC3 ... PC8 PC9 PC10 0 -37.015229 18.660758 -3.516635 ... 0.181757 -0.432592 0.121130 1 -15.751789 -22.835567 13.417056 ... 0.303021 0.616363 -0.159894 2 -37.369635 17.075089 -0.217778 ... 0.348658 -0.251111 0.096245 3 14.080519 -15.486712 -22.652022 ... -0.405063 0.247089 0.164029 4 8.052537 -2.108751 -0.220784 ... -0.353808 0.429188 -0.034163 .. ... ... ... ... ... ... ... 437 -1.450109 21.028323 3.482378 ... -0.101225 -0.601114 0.105718 438 58.918190 -11.872762 -22.349991 ... 0.038697 -0.639989 -0.072849 439 -24.233049 17.265420 -4.102360 ... 0.105526 -0.314848 -0.163713 440 13.903409 -5.839565 -10.239533 ... -0.419348 0.164559 0.139823 441 52.914252 -53.019841 32.987254 ... 0.928400 0.663407 0.243303 [442 rows x 10 columns], explained_variance=0 73.249152 1 9.621208 2 7.471087 3 4.316437 4 3.216342 5 1.642907 6 0.468170 7 0.007436 8 0.006300 9 0.000960 Name: explained_variance, dtype: float64, loadings= PC1 PC2 PC3 ... PC8 PC9 PC10 age 3.714173 7.118496 4.898654 ... -0.001158 0.001110 -0.000139 sex 0.049703 0.185346 -0.101937 ... 0.357964 -0.261106 0.000440 bmi 1.254303 2.003842 -0.374368 ... 0.008416 -0.002737 -0.001047 bp 3.629662 10.214990 4.524721 ... -0.001625 0.002870 -0.000129 s1 33.906939 -2.747011 5.006616 ... -0.005617 -0.008839 -0.005336 s2 29.319152 -0.279329 -6.728494 ... -0.000552 0.002171 0.004946 s3 -0.979456 -7.609406 9.679073 ... 0.026486 0.022667 0.005935 s4 0.803214 0.514210 -0.626845 ... 0.282119 0.328417 0.006791 s5 0.241819 0.210375 0.048459 ... -0.013738 -0.014107 0.163766 s6 4.073738 6.720256 0.820792 ... -0.002170 -0.000065 -0.000257 [10 rows x 10 columns]), 'std': PCResults(data= PC1 PC2 PC3 ... PC8 PC9 PC10 0 0.587199 -1.946832 0.589205 ... -0.757431 -0.181075 -0.048953 1 -2.831625 1.372082 0.027930 ... 0.188436 0.505128 0.043599 2 0.272129 -1.634901 0.739244 ... -0.843203 -0.025353 -0.054175 3 0.049281 0.382278 -2.013032 ... 0.367871 -0.137857 -0.074558 4 -0.756421 0.811960 -0.057238 ... 1.059751 0.044284 -0.010914 .. ... ... ... ... ... ... ... 437 1.239525 -1.035968 0.928679 ... -0.126490 -0.377893 -0.025229 438 1.264719 0.761319 -1.750191 ... -0.180439 -0.371759 0.033447 439 -0.205206 -1.205487 0.496186 ... 0.491849 -0.113220 0.058875 440 0.692871 0.210127 -0.868724 ... -0.078684 -0.127211 -0.045540 441 -1.903941 3.975777 -0.048338 ... -1.185359 0.730475 -0.154558 [442 rows x 10 columns], explained_variance=0 40.242108 1 14.923197 2 12.059663 3 9.554764 4 6.621814 5 6.027171 6 5.365657 7 4.336820 8 0.783200 9 0.085607 Name: explained_variance, dtype: float64, loadings= PC1 PC2 PC3 ... PC8 PC9 PC10 age 0.434662 0.054261 0.543842 ... -0.009848 0.002269 0.000302 sex 0.375489 -0.472743 -0.117488 ... -0.292022 -0.000590 0.000339 bmi 0.608846 -0.191130 0.184181 ... -0.259050 0.011873 0.000764 bp 0.545735 -0.169098 0.564625 ... 0.314720 0.007619 -0.000298 s1 0.689365 0.700806 -0.075396 ... -0.085320 -0.011778 0.065746 s2 0.706648 0.557612 -0.296499 ... 0.126139 -0.100671 -0.052168 s3 -0.567223 0.619125 0.424407 ... -0.214029 0.134833 -0.029405 s4 0.861234 -0.083384 -0.418523 ... 0.119059 0.216804 -0.008392 s5 0.760385 -0.032026 0.069956 ... -0.296473 -0.053082 -0.024497 s6 0.647045 -0.103892 0.304363 ... 0.109843 -0.004279 0.000242 [10 rows x 10 columns])} .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.719 seconds) .. _sphx_glr_download_auto_examples_principal_components.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: principal_components.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: principal_components.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_