parq_tools.utils.index_utils.sort_parquet_file

parq_tools.utils.index_utils.sort_parquet_file(input_path, output_path, columns, chunk_size=100000)[source]

Globally sort a Parquet file by the specified columns.

Parameters:
  • input_path (Path) – Path to the input Parquet file.

  • output_path (Path) – Path to save the sorted Parquet file.

  • columns (List[str]) – List of column names to sort by.

  • chunk_size (int, optional) – Number of rows to process per chunk. Defaults to 100_000.

Return type:

None