.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/01_basic_usage.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_01_basic_usage.py: Basic usage =========== A simple example demonstrating how to use `pandera_utils`. .. GENERATED FROM PYTHON SOURCE LINES 8-17 .. code-block:: Python import inspect import pandas as pd from pathlib import Path import yaml from elphick.pandera_utils.pandera_utils import load_schema_from_yaml, DataFrameMetaProcessor __file__ = Path(inspect.getfile(inspect.currentframe())).resolve() .. GENERATED FROM PYTHON SOURCE LINES 18-21 Load Schema ----------- Load the schema from the YAML file .. GENERATED FROM PYTHON SOURCE LINES 21-29 .. code-block:: Python yaml_path = __file__.parents[1] / "assets/example_schema.yaml" schema = load_schema_from_yaml(yaml_path) # Print the YAML file in a nicely formatted way with open(yaml_path, "r", encoding="utf-8") as f: schema_yaml = yaml.safe_load(f) print(yaml.dump(schema_yaml, sort_keys=False, indent=2)) .. rst-class:: sphx-glr-script-out .. code-block:: none columns: column1: dtype: int checks: greater_than: 0 required: false coerce: true metadata: pandera_utils: unit_of_measure: m aliases: - col1 column2: dtype: float checks: greater_than_or_equal_to: 0.0 required: false metadata: pandera_utils: decimals: 1 column3: dtype: float checks: less_than: 100.0 required: false metadata: pandera_utils: calculation: column1 + column2 inputs: - column1 - column2 decimals: 2 .. GENERATED FROM PYTHON SOURCE LINES 30-32 Create a sample DataFrame ------------------------- .. GENERATED FROM PYTHON SOURCE LINES 32-41 .. code-block:: Python dataframe = pd.DataFrame({ "col1": [1, 2, 3], "column2": [0.546, 1.568, 2.578], }) # preserve a copy for comparison later dataframe_copy = dataframe.copy(deep=True) dataframe .. raw:: html
col1 column2
0 1 0.546
1 2 1.568
2 3 2.578


.. GENERATED FROM PYTHON SOURCE LINES 42-45 Initialize ---------- Initialize the DataFrameMetaProcessor with the schema .. GENERATED FROM PYTHON SOURCE LINES 45-47 .. code-block:: Python processor = DataFrameMetaProcessor(schema) .. GENERATED FROM PYTHON SOURCE LINES 48-50 Rename Aliases -------------- .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: Python df_with_alias = processor.apply_rename_from_alias(dataframe) df_with_alias .. raw:: html
column1 column2
0 1 0.546
1 2 1.568
2 3 2.578


.. GENERATED FROM PYTHON SOURCE LINES 54-56 Apply calculations ------------------ .. GENERATED FROM PYTHON SOURCE LINES 56-59 .. code-block:: Python df_with_calculations = processor.apply_calculations(df_with_alias) df_with_calculations .. raw:: html
column1 column2 column3
0 1 0.546 1.546
1 2 1.568 3.568
2 3 2.578 5.578


.. GENERATED FROM PYTHON SOURCE LINES 60-62 Apply rounding -------------- .. GENERATED FROM PYTHON SOURCE LINES 62-65 .. code-block:: Python df_with_decimals = processor.apply_rounding(df_with_calculations) df_with_decimals .. raw:: html
column1 column2 column3
0 1 0.5 1.55
1 2 1.6 3.57
2 3 2.6 5.58


.. GENERATED FROM PYTHON SOURCE LINES 66-73 One Step Preprocessing ---------------------- Preprocess the DataFrame, with alias renaming, rounding, and calculations. If metadata: decimals is not null, the column will be rounded to that number of decimal places after the other preprocessing steps. When set to True, the round_before_calc argument will round the DataFrame before applying calculations, as well as after. .. GENERATED FROM PYTHON SOURCE LINES 73-76 .. code-block:: Python processed_df = processor.preprocess(dataframe_copy, round_before_calc=False) processed_df .. raw:: html
column1 column2 column3
0 1 0.5 1.55
1 2 1.6 3.57
2 3 2.6 5.58


.. GENERATED FROM PYTHON SOURCE LINES 77-78 We can check that the individual steps are equivalent to the one-step preprocessing .. GENERATED FROM PYTHON SOURCE LINES 78-80 .. code-block:: Python assert processed_df.equals(df_with_decimals) .. GENERATED FROM PYTHON SOURCE LINES 81-84 Validate -------- Validate the DataFrame using Pandera .. GENERATED FROM PYTHON SOURCE LINES 84-86 .. code-block:: Python validated_df = processor.validate(processed_df) validated_df .. raw:: html
column1 column2 column3
0 1 0.5 1.55
1 2 1.6 3.57
2 3 2.6 5.58


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.069 seconds) .. _sphx_glr_download_auto_examples_01_basic_usage.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 01_basic_usage.ipynb <01_basic_usage.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 01_basic_usage.py <01_basic_usage.py>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_