Note
Go to the end to download the full example code.
OMF Profile Block Model
Profiling a dataset is a common task in data analysis. This example demonstrates how to profile an OMF block model. The profile report is persisted inside the omf file.
import logging
import shutil
import tempfile
from pathlib import Path
import pandas as pd
from omfpandas import OMFPandasReader, OMFPandasWriter
Instantiate
Create the object OMFPandas with the path to the OMF file.
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
datefmt='%Y-%m-%dT%H:%M:%S%z')
test_omf_path: Path = Path('./../assets/v2/test_file.omf')
# create a temporary copy to preserve the original file
temp_omf_path: Path = Path(tempfile.gettempdir()) / 'test_file_copy.omf'
shutil.copy(test_omf_path, temp_omf_path)
# Display the head of the original block model
blocks: pd.DataFrame = OMFPandasReader(filepath=temp_omf_path).read_blockmodel(blockmodel_name='vol')
blocks.head()
Profile
View the elements in the OMF file first.
omfpw: OMFPandasWriter = OMFPandasWriter(filepath=temp_omf_path)
omfpw.write_block_model_schema(blockmodel_name='vol', pd_schema_filepath=test_omf_path.with_suffix('.schema.yaml'))
omfpw.profile_blockmodel(blockmodel_name='vol')
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Summarize dataset: 0%| | 0/6 [00:00<?, ?it/s, Describe variable:random attr]
Summarize dataset: 17%|█▋ | 1/6 [00:00<00:00, 82.43it/s, Get variable types]
Summarize dataset: 29%|██▊ | 2/7 [00:00<00:00, 163.03it/s, Get dataframe statistics]
Summarize dataset: 38%|███▊ | 3/8 [00:00<00:00, 228.87it/s, Calculate auto correlation]
Summarize dataset: 50%|█████ | 4/8 [00:00<00:00, 302.27it/s, Get scatter matrix]
Summarize dataset: 44%|████▍ | 4/9 [00:00<00:00, 300.02it/s, scatter random attr, random attr]
Summarize dataset: 56%|█████▌ | 5/9 [00:00<00:00, 18.13it/s, scatter random attr, random attr]
Summarize dataset: 45%|████▌ | 5/11 [00:00<00:00, 18.13it/s, Missing diagram bar]
Summarize dataset: 55%|█████▍ | 6/11 [00:00<00:00, 18.13it/s, Missing diagram matrix]
Summarize dataset: 64%|██████▎ | 7/11 [00:00<00:00, 16.25it/s, Missing diagram matrix]
Summarize dataset: 64%|██████▎ | 7/11 [00:00<00:00, 16.25it/s, Take sample]
Summarize dataset: 73%|███████▎ | 8/11 [00:00<00:00, 16.25it/s, Detecting duplicates]
Summarize dataset: 82%|████████▏ | 9/11 [00:00<00:00, 16.25it/s, Get alerts]
Summarize dataset: 91%|█████████ | 10/11 [00:00<00:00, 16.25it/s, Get reproduction details]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 16.25it/s, Completed]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 25.84it/s, Completed]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00, 2.56it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00, 2.55it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, 5.49it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, 5.48it/s]
omfpw.view_block_model_profile(blockmodel_name='vol')
Profile a subset with a query filter string
omfpw.profile_blockmodel(blockmodel_name='vol', query='`random attr`>0.5')
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Summarize dataset: 0%| | 0/6 [00:00<?, ?it/s, Describe variable:random attr]
Summarize dataset: 17%|█▋ | 1/6 [00:00<00:00, 118.44it/s, Get variable types]
Summarize dataset: 29%|██▊ | 2/7 [00:00<00:00, 233.06it/s, Get dataframe statistics]
Summarize dataset: 38%|███▊ | 3/8 [00:00<00:00, 329.05it/s, Calculate auto correlation]
Summarize dataset: 50%|█████ | 4/8 [00:00<00:00, 433.05it/s, Get scatter matrix]
Summarize dataset: 44%|████▍ | 4/9 [00:00<00:00, 428.89it/s, scatter random attr, random attr]
Summarize dataset: 56%|█████▌ | 5/9 [00:00<00:00, 49.65it/s, scatter random attr, random attr]
Summarize dataset: 45%|████▌ | 5/11 [00:00<00:00, 49.65it/s, Missing diagram bar]
Summarize dataset: 55%|█████▍ | 6/11 [00:00<00:00, 49.65it/s, Missing diagram matrix]
Summarize dataset: 64%|██████▎ | 7/11 [00:00<00:00, 49.65it/s, Take sample]
Summarize dataset: 73%|███████▎ | 8/11 [00:00<00:00, 49.65it/s, Detecting duplicates]
Summarize dataset: 82%|████████▏ | 9/11 [00:00<00:00, 49.65it/s, Get alerts]
Summarize dataset: 91%|█████████ | 10/11 [00:00<00:00, 47.02it/s, Get alerts]
Summarize dataset: 91%|█████████ | 10/11 [00:00<00:00, 47.02it/s, Get reproduction details]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 47.02it/s, Completed]
Summarize dataset: 100%|██████████| 11/11 [00:00<00:00, 51.96it/s, Completed]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00, 3.63it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00, 3.62it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, 16.33it/s]
omfpw.view_block_model_profile(blockmodel_name='vol', query='`random attr`>0.5')
Total running time of the script: (0 minutes 3.382 seconds)