parq_tools.utils.profile_utils.ColumnarProfileReport

class parq_tools.utils.profile_utils.ColumnarProfileReport(column_generator, column_count=None, batch_size=1, show_progress=True, title='Profile Report', dataset_metadata=None, column_descriptions=None)[source]

Memory-efficient, column-wise profiler for large datasets using ydata-profiling.

This class can be leveraged by any file reader that can yield pandas Series.

__init__(column_generator, column_count=None, batch_size=1, show_progress=True, title='Profile Report', dataset_metadata=None, column_descriptions=None)[source]

Initialize the ColumnarProfileReport. This profiler processes columns in batches, allowing for profiling large datasets without loading them entirely into memory.

Parameters:
  • column_generator (Iterator[Series]) – A generator or iterable that yields pandas Series.

  • column_count (Optional[int]) – The total number of columns used by the progressbar.

  • batch_size (int) – The number of columns to process in each batch.

  • show_progress (bool) – If True, displays a progress bar during profiling.

  • title (Optional[str]) – The title of the report.

  • dataset_metadata (Optional[ProfileMetadata]) – Optional dataset metadata to include in the report.

  • column_descriptions (Optional[dict[str, str]]) – Optional descriptions for each column, used in the report.

Methods

__init__(column_generator[, column_count, ...])

Initialize the ColumnarProfileReport.

profile()

save_html(output_html)

show([notebook])

Display the profile report in a notebook or open in a browser.

to_html()

show(notebook=False)[source]

Display the profile report in a notebook or open in a browser.

Parameters:

notebook (bool) – If True, display in Jupyter notebook. If False, open in browser.