parq_tools.utils.profile_utils.ColumnarProfileReport
- class parq_tools.utils.profile_utils.ColumnarProfileReport(column_generator, column_count=None, batch_size=1, show_progress=True, title='Profile Report', dataset_metadata=None, column_descriptions=None)[source]
Memory-efficient, column-wise profiler for large datasets using ydata-profiling.
This class can be leveraged by any file reader that can yield pandas Series.
- __init__(column_generator, column_count=None, batch_size=1, show_progress=True, title='Profile Report', dataset_metadata=None, column_descriptions=None)[source]
Initialize the ColumnarProfileReport. This profiler processes columns in batches, allowing for profiling large datasets without loading them entirely into memory.
- Parameters:
column_generator (
Iterator[Series]) – A generator or iterable that yields pandas Series.column_count (
Optional[int]) – The total number of columns used by the progressbar.batch_size (
int) – The number of columns to process in each batch.show_progress (
bool) – If True, displays a progress bar during profiling.title (
Optional[str]) – The title of the report.dataset_metadata (
Optional[ProfileMetadata]) – Optional dataset metadata to include in the report.column_descriptions (
Optional[dict[str,str]]) – Optional descriptions for each column, used in the report.
Methods
__init__(column_generator[, column_count, ...])Initialize the ColumnarProfileReport.
profile()save_html(output_html)show([notebook])Display the profile report in a notebook or open in a browser.
to_html()