OMF Profile Block Model

Profiling a dataset is a common task in data analysis. This example demonstrates how to profile an OMF block model. The profile report is persisted inside the omf file.

import shutil
import tempfile
from pathlib import Path

import pandas as pd

from omfpandas import OMFPandasReader, OMFPandasWriter

Instantiate

Create the object OMFPandas with the path to the OMF file.

test_omf_path: Path = Path('../assets/test_file.omf')

# create a temporary copy to preserve the original file
temp_omf_path: Path = Path(tempfile.gettempdir()) / 'test_file_copy.omf'
shutil.copy(test_omf_path, temp_omf_path)

# Display the head of the original block model
blocks: pd.DataFrame = OMFPandasReader(filepath=temp_omf_path).read_blockmodel(blockmodel_name='regular')
blocks.head()

			random attr
x	y	z
10.5	10.5	-9.5	0.727986
		-8.5	0.277389
		-7.5	0.351741
		-6.5	0.999272
		-5.5	0.495092

Profile

Create the writer, write the pandera schema and the profile report into the file. The use of a pandera schema is optional, but it provides a way to describe the attributes in the dataset.

omfpw: OMFPandasWriter = OMFPandasWriter(filepath=temp_omf_path)
omfpw.write_block_model_schema(blockmodel_name='regular', pd_schema_filepath=test_omf_path.with_suffix('.schema.yaml'))
omfpw.profile_blockmodel(blockmodel_name='regular')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Summarize dataset:   0%|          | 0/6 [00:00<?, ?it/s, Describe variable:random attr]
Summarize dataset:  17%|█▋        | 1/6 [00:00<00:00, 30.49it/s, Get variable types]
Summarize dataset:  29%|██▊       | 2/7 [00:00<00:00, 60.73it/s, Get dataframe statistics]
Summarize dataset:  38%|███▊      | 3/8 [00:00<00:00, 88.78it/s, Calculate auto correlation]
Summarize dataset:  50%|█████     | 4/8 [00:00<00:00, 117.91it/s, Get scatter matrix]
Summarize dataset:  44%|████▍     | 4/9 [00:00<00:00, 117.55it/s, scatter random attr, random attr]
Summarize dataset:  56%|█████▌    | 5/9 [00:00<00:00, 27.17it/s, scatter random attr, random attr]
Summarize dataset:  45%|████▌     | 5/11 [00:00<00:00, 27.17it/s, Missing diagram bar]
Summarize dataset:  55%|█████▍    | 6/11 [00:00<00:00, 27.17it/s, Missing diagram matrix]
Summarize dataset:  64%|██████▎   | 7/11 [00:00<00:00, 27.17it/s, Take sample]
Summarize dataset:  73%|███████▎  | 8/11 [00:00<00:00, 26.85it/s, Take sample]
Summarize dataset:  73%|███████▎  | 8/11 [00:00<00:00, 26.85it/s, Detecting duplicates]
Summarize dataset:  82%|████████▏ | 9/11 [00:00<00:00, 26.85it/s, Get alerts]
Summarize dataset:  91%|█████████ | 10/11 [00:00<00:00, 26.85it/s, Get reproduction details]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 26.85it/s, Completed]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 36.59it/s, Completed]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  3.26it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  3.26it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  6.13it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  6.12it/s]

View the profile report, which benefits from the attribute descriptions from the schema.

omfpw.view_block_model_profile(blockmodel_name='regular')

Profile a subset with a query filter string

omfpw.profile_blockmodel(blockmodel_name='regular', query='`random attr`>0.5')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Summarize dataset:   0%|          | 0/6 [00:00<?, ?it/s, Describe variable:random attr]
Summarize dataset:  17%|█▋        | 1/6 [00:00<00:00, 131.58it/s, Get variable types]
Summarize dataset:  29%|██▊       | 2/7 [00:00<00:00, 257.53it/s, Get dataframe statistics]
Summarize dataset:  38%|███▊      | 3/8 [00:00<00:00, 361.63it/s, Calculate auto correlation]
Summarize dataset:  50%|█████     | 4/8 [00:00<00:00, 474.78it/s, Get scatter matrix]
Summarize dataset:  44%|████▍     | 4/9 [00:00<00:00, 469.21it/s, scatter random attr, random attr]
Summarize dataset:  56%|█████▌    | 5/9 [00:00<00:00, 49.65it/s, scatter random attr, random attr]
Summarize dataset:  45%|████▌     | 5/11 [00:00<00:00, 49.65it/s, Missing diagram bar]
Summarize dataset:  55%|█████▍    | 6/11 [00:00<00:00, 49.65it/s, Missing diagram matrix]
Summarize dataset:  64%|██████▎   | 7/11 [00:00<00:00, 49.65it/s, Take sample]
Summarize dataset:  73%|███████▎  | 8/11 [00:00<00:00, 49.65it/s, Detecting duplicates]
Summarize dataset:  82%|████████▏ | 9/11 [00:00<00:00, 49.65it/s, Get alerts]
Summarize dataset:  91%|█████████ | 10/11 [00:00<00:00, 45.97it/s, Get alerts]
Summarize dataset:  91%|█████████ | 10/11 [00:00<00:00, 45.97it/s, Get reproduction details]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 45.97it/s, Completed]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 50.93it/s, Completed]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  3.45it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  3.44it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, 16.66it/s]

View the profile report of the subset. The dataset tab in the profile report describes the filter applied to the dataset.

omfpw.view_block_model_profile(blockmodel_name='regular', query='`random attr`>0.5')

Total running time of the script: (0 minutes 7.516 seconds)

Gallery generated by Sphinx-Gallery