Welcome to parq-tools’ documentation!
parq-tools
Overview
parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets.
A typical use case is asset-based workflows with large scientific datasets.
If your datasets are not large, you might find the pandas library more convenient.
Features
Filtering → Efficiently filter large parquet files.
Concatenation → Combines multiple Parquet files efficiently along rows (
axis=0) or columns (axis=1).Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
Profiling Enhancements → Improves
ydata-profilingby profiling specific columns incrementally, merging results for large files.DataFrame Enhancements → Provides a
LazyParquetDataFrameclass that extendspandas.DataFramewith lazy loading from Parquet files.