Developer notes: geometry & metadata#
Overview#
This document describes the design for moving parq-blockmodel to a
canonical (i, j, k) indexing scheme, with geometry stored as metadata
in Parquet files and attached to in-memory DataFrames via df.attrs.
The goals are:
Make
(i, j, k)the canonical, internal indexing for all block models.Treat world coordinates
(x, y, z)as derived from geometry.Store block model geometry as a compact, versioned metadata payload (“geometry metadata”) under a single reserved key, shared between Parquet key-value metadata and
DataFrame.attrs.Keep a single dense geometry concept (
RegularGeometry) and handle sparse vs dense data purely at the block model / DataFrame level.Maintain backward compatibility with existing xyz-centric Parquet files that do not yet carry geometry metadata.
Design Steps#
1. Canonical indexing and ordering#
Canonical index:
(i, j, k)block indices.Canonical ordering: NumPy C-order (row-major), where:
ivaries fastest,jnext,kslowest.
The flattened row index
ris given by:r = i + ni * (j + nj * k)
where
shape = (ni, nj, nk).RegularGeometryis responsible for converting between(i, j, k)and(x, y, z)using geometry metadata:corner: local corner of block(i=0, j=0, k=0)in(u, v, w)origin: world position of local(0, 0, 0)block_size: block dimensions(dx, dy, dz)along the locali/j/k(u/v/w) axesshape: number of blocks(ni, nj, nk)axis_u,axis_v,axis_w: orthonormal basis vectorssrs: optional spatial reference system identifier
(x, y, z)centroids are derived from geometry; they are not considered the primary key for block identity.
2. Internal architecture: LocalGeometry and WorldFrame#
The public RegularGeometry is composed of two
internal components that separate concerns:
LocalGeometry (local lattice, no rotation/CRS)
Represents the logical (i, j, k) indexing grid
Stores:
corner,block_size,shapeProvides: C-order conversion methods (
ijk_from_row_index,row_index_from_ijk)Computes: local centroid coordinates (u, v, w)
No rotation, no world frame, no CRS
WorldFrame (world embedding)
Represents the spatial reference system and orientation
Stores:
origin,axis_u,axis_v,axis_w,srsProvides: orthonormal rotation matrix and transform methods
Maps: local (u, v, w) ↔ world (x, y, z) coordinates
Ensures: axes are orthonormal (enforced in
__post_init__)Handles: inverse transformations via transpose (since axes are orthonormal)
RegularGeometry (public composite)
Combines
LocalGeometryandWorldFrameProvides the complete grid definition and coordinate transformation API
Delegates to components internally
Exposes high-level methods like
xyz_from_ijk,ijk_from_xyz, etc.Users typically interact only with
RegularGeometry;LocalGeometryandWorldFrameare available for advanced use cases but not required
This separation enables:
Clean separation of concerns (local indexing vs. world embedding)
Simpler testing and validation of each component
Clear data flow through coordinate transformations
Transparent internal refactoring without affecting the public API
3. Geometry metadata layout#
RegularGeometry defines a compact, JSON-serialisable metadata payload
via to_metadata_dict and from_metadata.
The canonical geometry payload is a plain
dict:{ "schema_version": "1.0", "corner": [u0, v0, w0], "origin": [x0, y0, z0], "block_size": [dx, dy, dz], "shape": [ni, nj, nk], "axis_u": [ux, uy, uz], "axis_v": [vx, vy, vz], "axis_w": [wx, wy, wz], "srs": "EPSG:XXXX" or None, }
The payload includes a schema version and may include optional
world_idencoding metadata:{ "schema_version": "1.0", ..., "world_id_encoding": { "enabled": true, "column": "world_id", "frame": "world_xyz", "axis_order": ["x", "y", "z"], "quantization": {"scale": 10.0}, "offset": {"x": 500000.0, "y": 7000000.0, "z": 0.0} } }
world_id_encodingis optional and used only whenworld_idcolumns are present/required.This payload is designed to be stored:
Directly in
df.attrs["parq-blockmodel"](as a Pythondict),JSON-encoded under a reserved key in Parquet file metadata.
4. Reserved key: "parq-blockmodel"#
A single, namespaced key is used to avoid collisions with other tools and to keep related metadata grouped logically:
Reserved key:
"parq-blockmodel"
Usage:
In a
pandas.DataFrame:df.attrs["parq-blockmodel"] = geometry.to_metadata_dict()
In a Parquet file’s key-value metadata:
Key:
"parq-blockmodel"Value:
json.dumps(geometry.to_metadata_dict())(UTF-8 string)
Future versions may choose to wrap this payload with additional fields, for example:
{
"version": 1,
"geometry": { ... },
"extras": { ... },
}
In that case, the new class methods (see below) will be responsible for
unpacking the appropriate sub-dict before delegating to
RegularGeometry.from_metadata.
5. New constructors on RegularGeometry#
Two new class methods are introduced to reconstruct a
RegularGeometry instance from
metadata stored either on a DataFrame or in a Parquet file:
1. from_attrs#
Signature:
@classmethod
def from_attrs(
cls,
attrs: dict,
key: str = "parq-blockmodel",
) -> "RegularGeometry":
"""Reconstruct geometry from a DataFrame.attrs mapping.
Expects attrs[key] to contain the dict produced by
RegularGeometry.to_metadata_dict().
"""
Behaviour:
Looks up
attrs[key].Expects this value to be a
dictcompatible withto_metadata_dict.Calls
RegularGeometry.from_metadataon that dict.Raises a clear
KeyErrorif the key is missing.Raises
TypeError/ValueErrorfor incompatible payloads.
This defines a simple round-trip for in-memory objects:
geometry -> df.attrs -> geometry
without needing to read/write Parquet.
2. from_parquet_metadata#
Signature:
@classmethod
def from_parquet_metadata(
cls,
metadata,
key: str = "parq-blockmodel",
) -> "RegularGeometry":
"""Reconstruct geometry from Parquet file metadata.
Expects key_value_metadata to contain a JSON-encoded dict
produced by RegularGeometry.to_metadata_dict() under the
given key.
"""
Behaviour:
Accepts either:
a
pyarrow.parquet.FileMetaDatainstance, ora mapping
dict[str, str]of key-value metadata.
Normalises to a simple
dict[str, str].Extracts the value for
key("parq-blockmodel"by default).If missing: raises
KeyError.If present:
If the value is a JSON string,
json.loadsis used to decode it into a dict.If the value is already a dict, it is used directly.
The resulting dict is passed to
RegularGeometry.from_metadata.
Invalid JSON or incompatible dicts raise a clear
ValueError.
This provides a single, well-defined way for all consumers to recreate geometry from a Parquet file that carries compliant metadata.
5. Behaviour of ParquetBlockModel#
The ParquetBlockModel class is
responsible for reading/writing block model data in Parquet format while
using RegularGeometry as the authoritative description of the grid.
Dense geometry, sparse vs dense data#
RegularGeometryalways represents the full dense grid of blocks with shape(ni, nj, nk)in C-order.A ParquetBlockModel’s data may be:
dense on disk: every possible block is present as a row, or
sparse on disk: only some subset of blocks are present.
Sparsity is a property of the stored data, not of the geometry:
geometryalways knows the full dense grid.ParquetBlockModel.read(..., dense=False)returns exactly the rows stored in the file (sparse view).ParquetBlockModel.read(..., dense=True)reindexes onto the full dense grid implied bygeometry, filling missing blocks.
The
to_dense_parquetmethod remains the explicit, opt-in way to write a physically dense Parquet file from a (possibly) sparse pbm.
Reading geometry#
When constructing a ParquetBlockModel from a .pbm.parquet
file, geometry is resolved as follows (target design):
Try to read geometry from Parquet metadata using
RegularGeometry.from_parquet_metadataand the"parq-blockmodel"key.If the key is absent (for example in a source parquet without geometry metadata), fall back to centroid-based inference using the existing
RegularGeometry.from_parquetlogic (which inspectsx, y, zvalues and optional rotation angles).If the key is present but invalid (malformed JSON or incompatible dict), raise a clear
ValueErrorrather than silently inferring from centroids.
This makes metadata-based reconstruction the preferred path, while keeping backward compatibility for older xyz-centric files.
Writing geometry#
When creating a pbm file from an input Parquet or DataFrame, the following invariants apply:
The resulting pbm file must have a valid
RegularGeometry.That geometry is written as metadata under
"parq-blockmodel".The file may remain sparse or be densified depending on the specific API used (e.g.
to_dense_parquet).
Examples:
ParquetBlockModel.from_parquet:Ingests an arbitrary Parquet file (often sparse, xyz-centric).
Infers geometry from centroids if metadata is missing.
Writes a new
.pbm.parquetfile with geometry metadata attached under"parq-blockmodel".
ParquetBlockModel.from_dataframe:Accepts a DataFrame with an xyz MultiIndex or one that already carries geometry metadata in
df.attrs["parq-blockmodel"].If geometry is not provided explicitly, it is inferred from the index or attrs.
Writes a
.pbm.parquetfile and attaches geometry metadata.
ParquetBlockModel.from_geometry:Creates an empty pbm file (no attributes) from a
RegularGeometry.The resulting Parquet file includes geometry metadata and may store centroids for convenience / interop.
6. Backward compatibility#
Existing pbm files and raw Parquet files that:
lack
"parq-blockmodel"metadata, butcontain xyz centroid columns (
x,y,z),
will continue to be supported via the existing
RegularGeometry.from_parquetcentroid-based inference.New code paths will prefer metadata-based reconstruction when available, but will fall back gracefully for old files.
RegularGeometryconstructor semantics intentionally changed to separate local and world frames (breaking change).WorldFramenow usesorigininstead ofworld_origin(breaking rename).Existing files without
originin geometry metadata are still readable;from_metadatafalls back toorigin = (0, 0, 0).New behaviour is introduced via:
the additional class methods on
RegularGeometry, andinternal changes to how
ParquetBlockModelresolves geometry.
7. Geometry schema versioning#
To allow future evolution of geometry encoding without tying it
rigidly to the top-level package version, geometry metadata can carry
its own schema version (e.g. "schema_version": 1).
to_metadata_dictwrites this field once introduced.from_metadatareads it and dispatches to version-specific decoding logic if necessary.Old files without a version field can be treated as version 0 or 1 by default.
This keeps geometry I/O stable and makes it possible to introduce additional fields (e.g. alternative rotation encodings, anisotropic cells) in a controlled way.
8. Terminology: block_id, world_id, and ijk#
To avoid confusion, terminology is used precisely:
- ijk (or i, j, k)
Logical block indices:
0 ≤ i < nᵢ, 0 ≤ j < nⱼ, 0 ≤ k < nₖCanonical indexing scheme for
parq-blockmodelNo separate “block_id” column – ijk is the primary key
Can be reconstructed from row index using C-order formula
- block_id (if used externally)
May refer to ijk-based encoding in external systems
In
parq-blockmodel, we use ijk directly, not a composite “block_id”Row index (flat, C-order) is used internally for DataFrame operations
- world_id
Stable 64-bit positional identifier derived from world (x, y, z) centroids
Unique within a CRS and encoding contract
Useful for cross-model joins when models share the same SRS
Generated by
encode_world_coordinatesusing quantization + offsetNot a spatial index – should not be used for range queries
The distinction is important:
Use ijk for logical block queries and geometric operations
Use world_id for stable cross-model references and external joins
Use row index (flat, C-order) for internal DataFrame/Parquet operations
Summary#
Key concepts to remember:
RegularGeometryis the public interface;LocalGeometry+WorldFrameare internal components that handle the coordinate transformation pipeline.LocalGeometry = indexing + local grid (no rotation/CRS)
WorldFrame = world embedding + axis orientation + optional SRS
ijk is canonical indexing (not “block_id”); row index is the flat C-order version
world_id is a stable 64-bit position key, useful for cross-model joins
Geometry metadata is compact, versioned, and stored in Parquet key-value pairs
Decode world_id from DataFrame.attrs#
Consumers that only have a DataFrame (no ParquetBlockModel instance)
can still decode world_id values using the attached metadata payload:
from parq_blockmodel.utils import get_id_encoding_params, decode_frame_coordinates
meta = df.attrs["parq-blockmodel"]
encoding = meta["world_id_encoding"]
offset, scale = get_id_encoding_params(encoding)
x, y, z = decode_frame_coordinates(
df["world_id"].to_numpy(dtype="int64"),
offset=offset,
scale=scale,
)
Cross-model uniqueness depends on shared encoding policy and non-overlapping centroid positions within the same SRS.
RegularGeometryis the single, dense description of block model geometry using C-order and ijk as canonical indexing.Geometry metadata is serialised by
to_metadata_dictand deserialised byfrom_metadata.Two new helpers,
from_attrsandfrom_parquet_metadata, provide a consistent way to rebuild geometry from DataFrame attrs and Parquet file metadata respectively, using a reserved key"parq-blockmodel".ParquetBlockModelalways uses a dense geometry, while allowing the underlying data to be sparse or dense on disk; density is controlled by read/write options such asdense=Trueandto_dense_parquet.Existing xyz-centric files remain supported through centroid-based geometry inference, with metadata-based reconstruction becoming the preferred path for new, canonical pbm files.