parq_tools.utils.index_utils.reindex_parquet

parq_tools.utils.index_utils.reindex_parquet(sparse_parquet_path, output_path, new_index, chunk_size=100000, sort_after_reindex=True)[source]

Reindex a sparse Parquet file to align with a new index, processing in chunks.

Parameters:
  • sparse_parquet_path (Path) – Path to the sparse Parquet file.

  • output_path (Path) – Path to save the re-indexed Parquet file.

  • new_index (pa.Table) – New index as a PyArrow table.

  • chunk_size (int) – Number of rows to process per chunk.

  • sort_after_reindex (bool) – Whether to sort the output after reindexing. Defaults to True.

Return type:

None