parq_tools.utils.profile_utils.ColumnarProfileReport

class parq_tools.utils.profile_utils.ColumnarProfileReport(column_generator, column_count=None, batch_size=1, show_progress=True, title='Profile Report', dataset_metadata=None, column_descriptions=None)[source]

Memory-efficient, column-wise profiler for large datasets using ydata-profiling.

This class can be leveraged by any file reader that can yield pandas Series.

__init__(column_generator, column_count=None, batch_size=1, show_progress=True, title='Profile Report', dataset_metadata=None, column_descriptions=None)[source]

Initialize the ColumnarProfileReport. This profiler processes columns in batches, allowing for profiling large datasets without loading them entirely into memory.

Parameters:

column_generator (Iterator[Series]) – A generator or iterable that yields pandas Series.
column_count (Optional[int]) – The total number of columns used by the progressbar.
batch_size (int) – The number of columns to process in each batch.
show_progress (bool) – If True, displays a progress bar during profiling.
title (Optional[str]) – The title of the report.
dataset_metadata (Optional[ProfileMetadata]) – Optional dataset metadata to include in the report.
column_descriptions (Optional[dict[str, str]]) – Optional descriptions for each column, used in the report.

Methods

`__init__`(column_generator[, column_count, ...])	Initialize the ColumnarProfileReport.
`profile`()
`save_html`(output_html)
`show`([notebook])	Display the profile report in a notebook or open in a browser.
`to_html`()

show(notebook=False)[source]

Display the profile report in a notebook or open in a browser.

Parameters:: notebook (bool) – If True, display in Jupyter notebook. If False, open in browser.