parq_tools.parq_profile.ParquetProfileReport
- class parq_tools.parq_profile.ParquetProfileReport(parquet_path, columns=None, batch_size=1, show_progress=True, title='Parquet Profile Report', dataset_metadata=None, column_descriptions=None)[source]
For ydata-profiler reports on large parquet files.
Useful for profiling large Parquet files without loading them entirely into memory. This class supports both native profiling (without chunking) and columnar profiling (with chunking).
- __init__(parquet_path, columns=None, batch_size=1, show_progress=True, title='Parquet Profile Report', dataset_metadata=None, column_descriptions=None)[source]
Initialize the ParquetProfileReport.
- Parameters:
parquet_path (
Union[str,Path]) – Path to the Parquet file to profile.columns (
Optional[List[str]]) – List of column names to include in the profile. If None, all columns are used.batch_size (
Optional[int]) – Optional[int]: Number of columns to process in each batch. If None, processes all columns at once.show_progress (
bool) – bool: If True, displays a progress bar during profiling.title (
str) – Title of the report.dataset_metadata (
Union[dict,ProfileMetadata,None]) – Optional[Union[dict, ProfileMetadata]]: Metadata for the dataset. Will over-ride any metadata in the Parquet file.column_descriptions (
Optional[dict[str,str]]) – Optional[dict[str, str]]: Column descriptions for the dataset. Will over-ride any descriptions in the Parquet file.
Methods
__init__(parquet_path[, columns, ...])Initialize the ParquetProfileReport.
profile()Profiles the Parquet file
save_html(output_html)Save the profile report to a HTML file.
show([notebook])Display the profile report in a notebook or open in a browser.
to_html()The HTML representation of the profile report.