.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/09_lazy_parquet_df.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_09_lazy_parquet_df.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_09_lazy_parquet_df.py:


LazyParquetDF
=============

A focused example demonstrating the LazyParquetDF "lazyframe" API for
indexed Parquet loading, lazy column access, filtering, and chunked
iteration / saving.

This complements the filtering and memory-usage examples by showing
how to work interactively with a Parquet-backed DataFrame-like object
without loading all columns into memory at once.

.. GENERATED FROM PYTHON SOURCE LINES 13-21

.. code-block:: Python

    from pathlib import Path
    import tempfile

    import pandas as pd

    from parq_tools.lazy_parquet import LazyParquetDF


.. GENERATED FROM PYTHON SOURCE LINES 22-29

Create a Parquet file
---------------------

We first build a small DataFrame. In practice this could be a much larger
dataset. Here we keep ``"i"`` as a regular data column rather than an index
so that it can be referenced directly in lazy operations like ``filter``
and ``iter_row_chunks``.

.. GENERATED FROM PYTHON SOURCE LINES 29-45

.. code-block:: Python


    def create_parquet(path: Path) -> None:
        df = pd.DataFrame(
            {
                "i": [0, 1, 2, 3],
                "j": [10, 11, 12, 13],
                "value": [1.0, 2.0, 3.0, 4.0],
            }
        )
        df.to_parquet(path)


    parquet_path = Path(tempfile.gettempdir()) / "lazyparquetdf_example.parquet"
    create_parquet(parquet_path)


.. GENERATED FROM PYTHON SOURCE LINES 46-53

Construct a LazyParquetDF
-------------------------

When no ``index_col`` is given, LazyParquetDF reconstructs the logical
index using the pandas metadata stored in the Parquet file. For this
file it will just be a simple RangeIndex, and ``"i"`` remains a regular
data column.

.. GENERATED FROM PYTHON SOURCE LINES 53-59

.. code-block:: Python


    lazy = LazyParquetDF(parquet_path)
    print("Shape:", lazy.shape)
    print("Columns:", lazy.columns)
    print("Index name:", lazy.index.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Shape: (4, 3)
    Columns: ['i', 'j', 'value']
    Index name: None


.. GENERATED FROM PYTHON SOURCE LINES 60-65

Lazy column access
-------------------

Columns are loaded on demand. The ``dtypes`` property only reports
columns that have been materialised so far.

.. GENERATED FROM PYTHON SOURCE LINES 65-77

.. code-block:: Python


    print("Dtypes before loading any column:")
    print(lazy.dtypes)

    # Access a single column; only this column is loaded into memory.
    value_series = lazy["value"]
    print("Loaded 'value' column:")
    print(value_series)

    print("Dtypes after loading 'value':")
    print(lazy.dtypes)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Dtypes before loading any column:
    Series([], dtype: object)
    Loaded 'value' column:
    0    1.0
    1    2.0
    2    3.0
    3    4.0
    Name: value, dtype: float64
    Dtypes after loading 'value':
    value    float64
    dtype: object


.. GENERATED FROM PYTHON SOURCE LINES 78-84

Filtering and query
-------------------

``filter`` uses a PyArrow-style predicate and reads only the columns
needed for the filter, while ``query`` operates on a materialised
pandas DataFrame.

.. GENERATED FROM PYTHON SOURCE LINES 84-93

.. code-block:: Python


    filtered = lazy.filter(("i", ">", 1))
    print("\nFiltered rows where i > 1 (filter):")
    print(filtered)

    queried = lazy.query("value > 2.0")
    print("\nFiltered rows where value > 2.0 (query):")
    print(queried)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Filtered rows where i > 1 (filter):
       i
    0  2
    1  3

    Filtered rows where value > 2.0 (query):
       i   j  value
    2  2  12    3.0
    3  3  13    4.0


.. GENERATED FROM PYTHON SOURCE LINES 94-100

Chunked iteration and saving
----------------------------

We can add derived columns, then iterate over the dataset in row-wise
chunks and save back to Parquet without holding the full DataFrame in
memory.

.. GENERATED FROM PYTHON SOURCE LINES 100-116

.. code-block:: Python


    lazy["double"] = lazy["value"] * 2

    print("\nIterating in chunks of size 2 (columns i and double):")
    for chunk in lazy.iter_row_chunks(chunk_size=2, columns=["i", "double"]):
        print(chunk)

    out_path = parquet_path.with_name("lazyparquetdf_example_out.parquet")

    # Save using chunked write-back.
    lazy.to_parquet(out_path, allow_overwrite=True, chunk_size=2)

    print("\nRound-trip check:")
    roundtrip = pd.read_parquet(out_path)
    print(roundtrip)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Iterating in chunks of size 2 (columns i and double):
       i  double
    0  0     2.0
    1  1     4.0
       i  double
    2  2     6.0
    3  3     8.0

    Round-trip check:
       i   j  value  double
    0  0  10    1.0     2.0
    1  1  11    2.0     4.0
    2  2  12    3.0     6.0
    3  3  13    4.0     8.0


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.048 seconds)


.. _sphx_glr_download_auto_examples_09_lazy_parquet_df.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 09_lazy_parquet_df.ipynb <09_lazy_parquet_df.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 09_lazy_parquet_df.py <09_lazy_parquet_df.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_