parq_tools.utils.index_utils.sort_parquet_file
- parq_tools.utils.index_utils.sort_parquet_file(input_path, output_path, columns, chunk_size=100000)[source]
Globally sort a Parquet file by the specified columns.
- Parameters:
input_path (Path) – Path to the input Parquet file.
output_path (Path) – Path to save the sorted Parquet file.
columns (List[str]) – List of column names to sort by.
chunk_size (int, optional) – Number of rows to process per chunk. Defaults to 100_000.
- Return type:
None