Reblocking (Up/Down Sampling)#

Reblocking changes the block size of a regular model:

  • Upsampling creates a finer grid (smaller blocks).

  • Downsampling creates a coarser grid (larger blocks).

Configuration is explicit per attribute. This avoids accidental defaults when continuous and class-like attributes are mixed in one model.

Note

Not all upsampling methods are interpolative. mode, nearest, and parent are class-preserving assignment methods.

Upsampling#

Use parq_blockmodel.blockmodel.ParquetBlockModel.upsample() with an upsample_config mapping every attribute to a method.

Supported methods are:

  • linear: continuous interpolation (for continuous numeric attributes)

  • nearest: nearest-source assignment (class-safe)

  • mode: neighborhood class mode with deterministic tie-break (lowest code/value)

  • parent: inherit value directly from the parent block (exact replication)

Typical mixed configuration:

upsampled = pbm.upsample(
    new_block_size=(0.5, 0.5, 0.5),
    upsample_config={
        "grade": "linear",          # continuous
        "density": "linear",        # continuous
        "rock_type": "mode",        # categorical / class-like
        "domain_code": "parent",    # integer-coded classes
    },
)

If any attribute is omitted from upsample_config, upsampling fails immediately with a clear error.

Downsampling#

Use parq_blockmodel.blockmodel.ParquetBlockModel.downsample() with an aggregation_config mapping attributes to method dictionaries.

Common methods include:

  • mean

  • sum

  • weighted_mean (requires basis)

  • mode (class-like attributes)

Typical mixed configuration:

downsampled = pbm.downsample(
    new_block_size=(2.0, 2.0, 2.0),
    aggregation_config={
        "grade": {"method": "weighted_mean", "basis": "dry_mass"},
        "dry_mass": {"method": "sum"},
        "volume": {"method": "sum"},
        "rock_type": {"method": "mode"},
        "domain_code": {"method": "mode"},
    },
)

Using schema-calculated aggregation inputs#

Aggregation inputs can come from schema df-eval metadata, not only persisted parquet columns. This is useful when a weighted basis (or an aggregated target) is derived from other attributes.

Example: tonnes and contained_metal are calculated in the schema, then used in downsampling:

downsampled = pbm.downsample(
    new_block_size=(2.0, 2.0, 2.0),
    aggregation_config={
        "grade": {"method": "mean"},
        "contained_metal": {"method": "weighted_mean", "basis": "tonnes"},
    },
)

In this call:

  • tonnes is used as the weighted basis even if it is not stored on disk.

  • contained_metal can be aggregated even if it is only schema-defined.

  • Aggregation math stays unchanged; only input materialization differs.

For regular models, basis: "volume" can also be used without persisting a volume column. volume is available as a built-in calculated column from geometry (pbm.geometry.block_volume).

Handling partially-filled blocks with fill_ratio#

When downsampling sparse or partially-filled block models, use the fill_ratio key to normalize aggregation by the fraction of blocks actually occupied.

This is useful if child blocks in a region are only partly filled (e.g., from surface-only meshes or sparse models), and you want to avoid underestimating aggregated values.

Example: downsampling a grade with fill-aware weighting:

downsampled = pbm.downsample(
    new_block_size=(2.0, 2.0, 2.0),
    aggregation_config={
        "grade": {"method": "weighted_mean", "basis": "mass", "fill_ratio": "fill_factor"},
        "mass": {"method": "sum", "fill_ratio": "fill_factor"},
        "fill_factor": {"method": "mean"},  # required: aggregate the fill ratio itself
    },
)

How it works:

  • fill_factor must be an attribute (array) with values in [0, 1] that indicate the fraction of child blocks that are occupied in each coarse block.

  • For sum with fill_ratio: "fill_factor": result = (sum of values) / mean(fill_factor).

  • For weighted_mean with fill_ratio: "fill_factor": result = (sum of (values * basis)) / (mean(fill_factor) * sum of basis).

Only one fill_ratio attribute may be used across all aggregations in a single downsampling call. If you omit fill_ratio, aggregations work on all available child blocks without normalization.

Choosing methods#

Use this rule of thumb:

  • Continuous attributes (grade, density, porosity): linear for upsample, mean/weighted_mean for downsample.

  • Class-like attributes (categorical labels, integer-coded domains): parent, nearest, or mode for upsample, mode for downsample.

IDW interpolation#

Inverse Distance Weighting (IDW) is not currently implemented.

In this package, adding IDW for upsampling is likely non-trivial because it would need:

  • method wiring across API/config validation,

  • deterministic handling for boundary and missing data,

  • behavior definition for integer and class-like attributes,

  • new regression and integration tests.

Given that, IDW is best treated as a separate scoped feature after the class-safe upsampling behavior is finalized.