OMF Profile Block Model

Profiling a dataset is a common task in data analysis. This example demonstrates how to profile an OMF block model. The profile report is persisted inside the omf file.

import logging
import shutil
import tempfile
from pathlib import Path

import pandas as pd

from omfpandas import OMFPandasReader, OMFPandasWriter

Instantiate

Create the object OMFPandas with the path to the OMF file.

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
                    datefmt='%Y-%m-%dT%H:%M:%S%z')
test_omf_path: Path = Path('./../assets/v2/test_file.omf')

# create a temporary copy to preserve the original file
temp_omf_path: Path = Path(tempfile.gettempdir()) / 'test_file_copy.omf'
shutil.copy(test_omf_path, temp_omf_path)

# Display the head of the original block model
blocks: pd.DataFrame = OMFPandasReader(filepath=temp_omf_path).read_blockmodel(blockmodel_name='vol')
blocks.head()
random attr
x y z dx dy dz
10.5 10.5 -9.5 1.0 1.0 1.0 0.727986
11.5 10.5 -9.5 1.0 1.0 1.0 0.277389
12.5 10.5 -9.5 1.0 1.0 1.0 0.351741
13.5 10.5 -9.5 1.0 1.0 1.0 0.999272
14.5 10.5 -9.5 1.0 1.0 1.0 0.495092


Profile

View the elements in the OMF file first.

omfpw: OMFPandasWriter = OMFPandasWriter(filepath=temp_omf_path)

omfpw.write_block_model_schema(blockmodel_name='vol', pd_schema_filepath=test_omf_path.with_suffix('.schema.yaml'))
omfpw.profile_blockmodel(blockmodel_name='vol')
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Summarize dataset:   0%|          | 0/6 [00:00<?, ?it/s, Describe variable:random attr]
Summarize dataset:  17%|█▋        | 1/6 [00:00<00:00, 82.43it/s, Get variable types]
Summarize dataset:  29%|██▊       | 2/7 [00:00<00:00, 163.03it/s, Get dataframe statistics]
Summarize dataset:  38%|███▊      | 3/8 [00:00<00:00, 228.87it/s, Calculate auto correlation]
Summarize dataset:  50%|█████     | 4/8 [00:00<00:00, 302.27it/s, Get scatter matrix]
Summarize dataset:  44%|████▍     | 4/9 [00:00<00:00, 300.02it/s, scatter random attr, random attr]
Summarize dataset:  56%|█████▌    | 5/9 [00:00<00:00, 18.13it/s, scatter random attr, random attr]
Summarize dataset:  45%|████▌     | 5/11 [00:00<00:00, 18.13it/s, Missing diagram bar]
Summarize dataset:  55%|█████▍    | 6/11 [00:00<00:00, 18.13it/s, Missing diagram matrix]
Summarize dataset:  64%|██████▎   | 7/11 [00:00<00:00, 16.25it/s, Missing diagram matrix]
Summarize dataset:  64%|██████▎   | 7/11 [00:00<00:00, 16.25it/s, Take sample]
Summarize dataset:  73%|███████▎  | 8/11 [00:00<00:00, 16.25it/s, Detecting duplicates]
Summarize dataset:  82%|████████▏ | 9/11 [00:00<00:00, 16.25it/s, Get alerts]
Summarize dataset:  91%|█████████ | 10/11 [00:00<00:00, 16.25it/s, Get reproduction details]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 16.25it/s, Completed]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 25.84it/s, Completed]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  2.56it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  2.55it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  5.49it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  5.48it/s]
omfpw.view_block_model_profile(blockmodel_name='vol')

Profile a subset with a query filter string

omfpw.profile_blockmodel(blockmodel_name='vol', query='`random attr`>0.5')
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Summarize dataset:   0%|          | 0/6 [00:00<?, ?it/s, Describe variable:random attr]
Summarize dataset:  17%|█▋        | 1/6 [00:00<00:00, 118.44it/s, Get variable types]
Summarize dataset:  29%|██▊       | 2/7 [00:00<00:00, 233.06it/s, Get dataframe statistics]
Summarize dataset:  38%|███▊      | 3/8 [00:00<00:00, 329.05it/s, Calculate auto correlation]
Summarize dataset:  50%|█████     | 4/8 [00:00<00:00, 433.05it/s, Get scatter matrix]
Summarize dataset:  44%|████▍     | 4/9 [00:00<00:00, 428.89it/s, scatter random attr, random attr]
Summarize dataset:  56%|█████▌    | 5/9 [00:00<00:00, 49.65it/s, scatter random attr, random attr]
Summarize dataset:  45%|████▌     | 5/11 [00:00<00:00, 49.65it/s, Missing diagram bar]
Summarize dataset:  55%|█████▍    | 6/11 [00:00<00:00, 49.65it/s, Missing diagram matrix]
Summarize dataset:  64%|██████▎   | 7/11 [00:00<00:00, 49.65it/s, Take sample]
Summarize dataset:  73%|███████▎  | 8/11 [00:00<00:00, 49.65it/s, Detecting duplicates]
Summarize dataset:  82%|████████▏ | 9/11 [00:00<00:00, 49.65it/s, Get alerts]
Summarize dataset:  91%|█████████ | 10/11 [00:00<00:00, 47.02it/s, Get alerts]
Summarize dataset:  91%|█████████ | 10/11 [00:00<00:00, 47.02it/s, Get reproduction details]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 47.02it/s, Completed]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 51.96it/s, Completed]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  3.63it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  3.62it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, 16.33it/s]
omfpw.view_block_model_profile(blockmodel_name='vol', query='`random attr`>0.5')

Total running time of the script: (0 minutes 3.382 seconds)

Gallery generated by Sphinx-Gallery