.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/06_profiling.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_06_profiling.py: Parquet Profiling ================= `ydata-profiling` provides a convenient way to profile Parquet files using the `ProfileReport` class. Their documentation provides options for profiling large datasets. This example describes an alternative approach. In cases where a parquet may be very wide, and you want to profile it column by column, you can use the `ParquetProfileReport` class from the `parq_tools.utils.profile_utils` module. This allows you to generate a profile report by loading columns in batches, reducing memory consumption. .. GENERATED FROM PYTHON SOURCE LINES 12-19 .. code-block:: Python import tempfile import pandas as pd from pathlib import Path from parq_tools import ParquetProfileReport .. GENERATED FROM PYTHON SOURCE LINES 20-22 Create a Parquet file for profiling ----------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 22-35 .. code-block:: Python temp_dir = Path(tempfile.gettempdir()) / "profile_parquet_example" temp_dir.mkdir(parents=True, exist_ok=True) # Create a sample DataFrame and save as Parquet df = pd.DataFrame({ "col1": range(100), "col2": ["a"] * 100, "col3": [True, False] * 50, }) parquet_path = temp_dir / "example.parquet" df.to_parquet(parquet_path) .. GENERATED FROM PYTHON SOURCE LINES 36-41 Profile by column ----------------- The `ParquetProfileReport` class allows you to profile a Parquet file by loading columns in batches. While we are profiling 3 columns, the 4th progress step is used to capture the merging process. .. GENERATED FROM PYTHON SOURCE LINES 41-52 .. code-block:: Python report = ParquetProfileReport( parquet_path=parquet_path, columns=None, # None means all columns batch_size=1, # Process 1 column at a time show_progress=True, ) report.profile() report.show() .. rst-class:: sphx-glr-script-out .. code-block:: none Profiling columns: 0%| | 0/4 [00:00` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 06_profiling.py <06_profiling.py>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_