parq_tools.parq_profile.ParquetProfileReport

class parq_tools.parq_profile.ParquetProfileReport(parquet_path, columns=None, batch_size=1, show_progress=True, title='Parquet Profile Report', dataset_metadata=None, column_descriptions=None)[source]

For ydata-profiler reports on large parquet files.

Useful for profiling large Parquet files without loading them entirely into memory. This class supports both native profiling (without chunking) and columnar profiling (with chunking).

__init__(parquet_path, columns=None, batch_size=1, show_progress=True, title='Parquet Profile Report', dataset_metadata=None, column_descriptions=None)[source]

Initialize the ParquetProfileReport.

Parameters:
  • parquet_path (Union[str, Path]) – Path to the Parquet file to profile.

  • columns (Optional[List[str]]) – List of column names to include in the profile. If None, all columns are used.

  • batch_size (Optional[int]) – Optional[int]: Number of columns to process in each batch. If None, processes all columns at once.

  • show_progress (bool) – bool: If True, displays a progress bar during profiling.

  • title (str) – Title of the report.

  • dataset_metadata (Union[dict, ProfileMetadata, None]) – Optional[Union[dict, ProfileMetadata]]: Metadata for the dataset. Will over-ride any metadata in the Parquet file.

  • column_descriptions (Optional[dict[str, str]]) – Optional[dict[str, str]]: Column descriptions for the dataset. Will over-ride any descriptions in the Parquet file.

Methods

__init__(parquet_path[, columns, ...])

Initialize the ParquetProfileReport.

profile()

Profiles the Parquet file

save_html(output_html)

Save the profile report to a HTML file.

show([notebook])

Display the profile report in a notebook or open in a browser.

to_html()

The HTML representation of the profile report.

profile()[source]

Profiles the Parquet file

Return type:

ParquetProfileReport

save_html(output_html)[source]

Save the profile report to a HTML file.

Return type:

None

show(notebook=False)[source]

Display the profile report in a notebook or open in a browser.

Parameters:

notebook (bool) – If True, display in Jupyter notebook. If False, open in browser.

to_html()[source]

The HTML representation of the profile report.

Return type:

str