parq_tools.parq_concat.concat_parquet_files
- parq_tools.parq_concat.concat_parquet_files(files, output_path, axis=0, index_columns=None, filter_query=None, columns=None, batch_size=100000, show_progress=False)[source]
Concatenate multiple Parquet files into a single file, supporting both row-wise and column-wise concatenation.
- Parameters:
files (List[Path]) – List of input Parquet file paths.
output_path (Path) – Path to save the concatenated Parquet file.
axis (int) – Axis along which to concatenate (0 for row-wise, 1 for column-wise).
index_columns (Optional[List[str]]) – List of index columns for row-wise sorting after concatenation.
filter_query (Optional[str]) – Filter expression to apply to the concatenated data.
columns (Optional[List[str]]) – List of columns to include in the output.
batch_size (int) – Number of rows per batch to process. Defaults to 100_00.
show_progress (bool) – If True, displays a progress bar using tqdm (if installed).
- Raises:
ValueError – If the input files list is empty or if any file is not accessible.
- Return type:
None