Calculated Attributes

Calculated Attributes#

Schema-backed calculated attributes keep derived values close to the block model schema. This example derives tonnes from density * volume and then derives contained_metal from tonnes * grade.

The volume is available as a column from the block model geometry.

import tempfile
from pathlib import Path

import pandas as pd

from parq_blockmodel import ParquetBlockModel
from parq_blockmodel.utils.demo_block_model import create_demo_blockmodel

Create a small block model with base attributes.

temp_dir = Path(tempfile.gettempdir()) / "calculated_attributes_example"
temp_dir.mkdir(parents=True, exist_ok=True)

df = create_demo_blockmodel(
    shape=(2, 2, 2),
    block_size=(10.0, 10.0, 5.0),
    corner=(0.0, 0.0, 0.0),
    index_type="world_centroids",
)
df["density"] = 2.4 + 0.05 * df["depth"]
df["grade"] = 0.1 + 0.01 * df["depth"]

df[["density", "grade"]].head()
density grade
x y z
5.0 5.0 2.5 2.775 0.175
7.5 2.525 0.125
15.0 2.5 2.775 0.175
7.5 2.525 0.125
15.0 5.0 2.5 2.775 0.175


try:
    import df_eval  # noqa: F401
    from pandera import Column, DataFrameSchema
except ImportError:
    print(
        "Install parq-blockmodel[schema] to run the schema-backed part of this example."
    )

Build the schema-backed model when the optional schema dependencies are installed. The schema stores the calculated-column expressions in Pandera metadata under df-eval.

schema = DataFrameSchema(
    columns={
        "density": Column(float, coerce=True, nullable=True),
        "grade": Column(float, coerce=True, nullable=True),
        "tonnes": Column(
            float,
            coerce=True,
            nullable=True,
            required=False,
            metadata={"df-eval": {"expr": "density * volume"}},
        ),
        "contained_metal": Column(
            float,
            coerce=True,
            nullable=True,
            required=False,
            metadata={"df-eval": {"expr": "tonnes * grade"}},
        ),
    },
    strict=False,
)

pbm = ParquetBlockModel.from_dataframe(
    df[["density", "grade"]],
    filename=temp_dir / "calculated_attributes.parquet",
    schema=schema,
    overwrite=True,
)

We can read only the calculated columns…

calculated: pd.DataFrame = pbm.read(columns=["tonnes", "contained_metal"], index="ijk", dense=True)
calculated.head()
tonnes contained_metal
i j k
0 0 0 1387.5 242.8125
1 1262.5 157.8125
1 0 1387.5 242.8125
1 1262.5 157.8125
1 0 0 1387.5 242.8125


Or we can read all columns, including the calculated attributes

calculated: pd.DataFrame = pbm.read(index="ijk", dense=True, include_calculated=True)
calculated.head()
density grade tonnes contained_metal block_id world_id i j k x y z volume
i j k
0 0 0 2.775 0.175 1387.5 242.8125 0 0 0 0 0 5.0 5.0 2.5 500.0
1 2.525 0.125 1262.5 157.8125 1 147488 0 0 1 5.0 5.0 7.5 500.0
1 0 2.775 0.175 1387.5 242.8125 2 589952 0 1 0 5.0 15.0 2.5 500.0
1 2.525 0.125 1262.5 157.8125 3 737440 0 1 1 5.0 15.0 7.5 500.0
1 0 0 2.775 0.175 1387.5 242.8125 4 294976 1 0 0 15.0 5.0 2.5 500.0


Total running time of the script: (0 minutes 0.054 seconds)

Gallery generated by Sphinx-Gallery