parq_tools.parq_concat.concat_parquet_files

parq_tools.parq_concat.concat_parquet_files(files, output_path, axis=0, index_columns=None, filter_query=None, columns=None, batch_size=100000, show_progress=False)[source]

Concatenate multiple Parquet files into a single file, supporting both row-wise and column-wise concatenation.

Parameters:
  • files (List[Path]) – List of input Parquet file paths.

  • output_path (Path) – Path to save the concatenated Parquet file.

  • axis (int) – Axis along which to concatenate (0 for row-wise, 1 for column-wise).

  • index_columns (Optional[List[str]]) – List of index columns for row-wise sorting after concatenation.

  • filter_query (Optional[str]) – Filter expression to apply to the concatenated data.

  • columns (Optional[List[str]]) – List of columns to include in the output.

  • batch_size (int) – Number of rows per batch to process. Defaults to 100_00.

  • show_progress (bool) – If True, displays a progress bar using tqdm (if installed).

Raises:

ValueError – If the input files list is empty or if any file is not accessible.

Return type:

None