parq_tools.parq_concat.ParquetConcat

class parq_tools.parq_concat.ParquetConcat(files, axis=0, index_columns=None, show_progress=False)[source]

A utility for concatenating Parquet files while supporting axis-based merging, filtering, and progress tracking.

__init__(files, axis=0, index_columns=None, show_progress=False)[source]

Initializes ParquetConcat with specified parameters.

Parameters:
  • files (List[Path]) – List of Parquet files to concatenate.

  • axis (int, optional) – Concatenation axis (0 = row-wise, 1 = column-wise). Defaults to 0.

  • index_columns (Optional[List[str]], optional) – Index columns for sorting. Defaults to None.

  • show_progress (bool, optional) – If True, enables tqdm progress bar (if installed). Defaults to False.

Methods

__init__(files[, axis, index_columns, ...])

Initializes ParquetConcat with specified parameters.

concat_to_file(output_path[, filter_query, ...])

Concatenates input Parquet files and writes the result to a file.

concat_to_file(output_path, filter_query=None, columns=None, batch_size=100000, show_progress=False)[source]

Concatenates input Parquet files and writes the result to a file.

Parameters:
  • output_path (Path) – Destination path for the output Parquet file.

  • filter_query (Optional[str]) – Filter expression to apply.

  • columns (Optional[List[str]]) – List of columns to include in the output.

  • batch_size (int, optional) – Number of rows per batch to process. Defaults to 1024.

  • show_progress (bool, optional) – If True, displays a progress bar using tqdm (if installed). Defaults to False.

Return type:

None