parq_tools.parq_concat.ParquetConcat
- class parq_tools.parq_concat.ParquetConcat(files, axis=0, index_columns=None, show_progress=False)[source]
A utility for concatenating Parquet files while supporting axis-based merging, filtering, and progress tracking.
- __init__(files, axis=0, index_columns=None, show_progress=False)[source]
Initializes ParquetConcat with specified parameters.
- Parameters:
files (List[Path]) – List of Parquet files to concatenate.
axis (int, optional) – Concatenation axis (0 = row-wise, 1 = column-wise). Defaults to 0.
index_columns (Optional[List[str]], optional) – Index columns for sorting. Defaults to None.
show_progress (bool, optional) – If True, enables tqdm progress bar (if installed). Defaults to False.
Methods
__init__(files[, axis, index_columns, ...])Initializes ParquetConcat with specified parameters.
concat_to_file(output_path[, filter_query, ...])Concatenates input Parquet files and writes the result to a file.
- concat_to_file(output_path, filter_query=None, columns=None, batch_size=100000, show_progress=False)[source]
Concatenates input Parquet files and writes the result to a file.
- Parameters:
output_path (Path) – Destination path for the output Parquet file.
filter_query (Optional[str]) – Filter expression to apply.
columns (Optional[List[str]]) – List of columns to include in the output.
batch_size (int, optional) – Number of rows per batch to process. Defaults to 1024.
show_progress (bool, optional) – If True, displays a progress bar using tqdm (if installed). Defaults to False.
- Return type:
None