Calculated attributes#

Calculated attributes are schema-defined values derived from other block-model attributes. In parq-blockmodel they are expressed as calculated columns in Pandera metadata and evaluated with df-eval when you ask for a materialized read.

For regular models, volume is also available as a built-in calculated column derived from geometry (pbm.geometry.block_volume), even when it is not persisted in parquet.

Install the optional schema dependencies first:

uv add parq-blockmodel[schema]

The example below derives tonnes from density * volume and then derives contained_metal from tonnes * grade.

from pandera import Column, DataFrameSchema

from parq_blockmodel import ParquetBlockModel
from parq_blockmodel.utils.demo_block_model import create_demo_blockmodel

df = create_demo_blockmodel(
    shape=(2, 2, 2),
    block_size=(10.0, 10.0, 5.0),
    corner=(0.0, 0.0, 0.0),
    index_type="world_centroids",
)
df["density"] = 2.4 + 0.05 * df["depth"]
df["grade"] = 0.1 + 0.01 * df["depth"]

schema = DataFrameSchema(
    columns={
        "density": Column(float, coerce=True, nullable=True),
        "grade": Column(float, coerce=True, nullable=True),
        "tonnes": Column(
            float,
            coerce=True,
            nullable=True,
            required=False,
            metadata={"df-eval": {"expr": "density * volume"}},
        ),
        "contained_metal": Column(
            float,
            coerce=True,
            nullable=True,
            required=False,
            metadata={"df-eval": {"expr": "tonnes * grade"}},
        ),
    },
    strict=False,
)

pbm = ParquetBlockModel.from_dataframe(
    df[["density", "grade"]],
    filename=Path("orebody.parquet"),
    schema=schema,
)

result = pbm.read(index="ijk", dense=True, include_calculated=True)

ParquetBlockModel.read() stays raw by default. Pass include_calculated=True when you want the schema-derived columns included in the returned DataFrame.

Schema-calculated columns can also be consumed by reblocking. See Reblocking (Up/Down Sampling) for weighted downsampling examples where basis and/or aggregated targets are materialized from schema metadata before aggregation.

Custom lookups and functions#

When your schema uses df-eval functions that require registered resolvers (e.g., lookup functions with DictResolver), you need to register them with the Engine before evaluating. Two APIs are provided:

Option 1: Constructor-time registration (eager)

Register resolvers when creating the PBM.

Option 2: Post-load registration (lazy)

Register resolvers after loading the PBM, via configure_engine().

Example: DictResolver-based lookup in calculated column#

Suppose you have a block model with rock codes (1, 2, 3) and want to look up the rock type names using a DictResolver:

import pandas as pd
from pathlib import Path
from pandera import Column, DataFrameSchema
from df_eval import DictResolver

from parq_blockmodel import ParquetBlockModel

# Create a DictResolver for rock type lookup
rock_type_resolver = DictResolver({
    1: "Granite",
    2: "Diorite",
    3: "Gabbro",
}, default="Unknown")

# Define schema with lookup operation using the resolver
schema = DataFrameSchema(
    columns={
        "rock_code": Column(int, nullable=False),
        "rock_name": Column(
            str,
            required=False,
            metadata={"df-eval": {
                "lookup": {
                    "resolver": "rock_type",
                    "key": "rock_code",
                    "on_missing": "default"
                }
            }},
        ),
    },
    strict=False,
)

Approach 1: Register at construction

def setup_engine(engine):
    engine.register_resolver("rock_type", rock_type_resolver)
    return engine

pbm = ParquetBlockModel.from_parquet(
    Path("model.pbm"),
    schema=schema,
    engine_initializer=setup_engine,  # ← Register here
)

result = pbm.read(columns=["rock_code", "rock_name"], dense=True)

Approach 2: Register after loading

pbm = ParquetBlockModel.from_parquet(
    Path("model.pbm"),
    schema=schema,
)

# Configure engine before reading calculated columns
pbm.configure_engine(setup_engine)

result = pbm.read(columns=["rock_code", "rock_name"], dense=True)

Combining lookups with expressions#

You can combine lookup functions with other calculated columns. For example:

density_class_resolver = DictResolver({
    2.0: "Low",
    2.5: "Medium",
    3.0: "High",
}, default="Unknown")

schema = DataFrameSchema(
    columns={
        "density": Column(float, nullable=False),
        "rock_code": Column(int, nullable=False),
        # Calculated column 1: simple expression
        "tonnes": Column(
            float,
            required=False,
            metadata={"df-eval": {"expr": "density * volume"}},
        ),
 # Calculated column 2: lookup using rock_code
        "rock_name": Column(
           str,
           required=False,
           metadata={"df-eval": {
               "lookup": {
                   "resolver": "rock_type",
                   "key": "rock_code",
                   "on_missing": "default"
               }
           }},
        ),
        # Calculated column 3: lookup using density
        "density_class": Column(
           str,
           required=False,
           metadata={"df-eval": {
               "lookup": {
                   "resolver": "density_class",
                   "key": "density",
                   "on_missing": "default"
               }
           }},
        ),
    },
    strict=False,
)

def setup_engine(engine):
    engine.register_resolver("rock_type", rock_type_resolver)
    engine.register_resolver("density_class", density_class_resolver)
    return engine

pbm = ParquetBlockModel.from_dataframe(
    df[["density", "rock_code"]],
    filename=Path("model.parquet"),
    schema=schema,
    engine_initializer=setup_engine,
)

# Read multiple calculated columns at once
result = pbm.read(
    columns=["tonnes", "rock_name", "density_class"],
    index="ijk",
    dense=True,
)

See the custom lookup example for a complete working demonstration.

For further details on custom resolvers and functions, refer to the df-eval documentation.