Resampling Interval Data

Interval (or fractional) data is common in metallurgy and mineral processing. Samples are sized using sieves in a laboratory and each resultant fraction is often assayed to determine chemical composition. The typical nomenclature is of the interval edges is size_retained, size passing - any particle within an interval or fraction was retained by the lower sieve size, but passed the sieve size above it.

import logging

import numpy as np
import pandas as pd
import plotly

from elphick.mass_composition import MassComposition
from elphick.mass_composition.datasets.sample_data import size_by_assay

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
                    datefmt='%Y-%m-%dT%H:%M:%S%z',
                    )

Create a MassComposition object

We get some demo data in the form of a pandas DataFrame We create this object as 1D based on the pandas index

df_data: pd.DataFrame = size_by_assay()
df_data

		mass_dry	fe	sio2	al2o3
size_retained	size_passing
0.850	2.000	3.3	64.15	2.04	2.68
0.500	0.850	9.9	64.33	2.05	2.23
0.150	0.500	26.5	64.52	1.84	2.19
0.075	0.150	2.5	62.65	2.88	3.32
0.045	0.075	8.8	62.81	2.12	2.25
0.000	0.045	49.0	55.95	6.39	6.34

The size index is of the Interval type, maintaining the fractional information.

mc_size: MassComposition = MassComposition(df_data, name='Sample')
mc_size.data.to_dataframe()

	mass_wet	mass_dry	H2O	Fe	SiO2	Al2O3
size
[0.85, 2.0)	3.3	3.3	0.0	64.15	2.04	2.68
[0.5, 0.85)	9.9	9.9	0.0	64.33	2.05	2.23
[0.15, 0.5)	26.5	26.5	0.0	64.52	1.84	2.19
[0.075, 0.15)	2.5	2.5	0.0	62.65	2.88	3.32
[0.045, 0.075)	8.8	8.8	0.0	62.81	2.12	2.25
[0.0, 0.045)	49.0	49.0	0.0	55.95	6.39	6.34

mc_size.aggregate()

	mass_wet	mass_dry	H2O	Fe	SiO2	Al2O3
name
Sample	100.0	100.0	0.0	60.09245	4.14753	4.27716

First we’ll plot the intervals

fig = mc_size.plot_intervals(variables=['mass_dry', 'Fe', 'SiO2', 'Al2O3'],
                             cumulative=False)
fig

Size distributions are often plotted in the cumulative form. Cumulative passing is achieved by setting the direction = ascending.

fig = mc_size.plot_intervals(variables=['mass_dry', 'Fe', 'SiO2', 'Al2O3'],
                             cumulative=True, direction='ascending')
fig

/home/runner/work/mass-composition/mass-composition/elphick/mass_composition/mc_xarray.py:157: FutureWarning:

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Now we will resample on a defined grid (interval edges) and view the resampled fractions

new_edges = np.unique(np.geomspace(1.0e-03, mc_size.data.to_dataframe().index.right.max() * 3, 50))
new_coords = np.insert(new_edges, 0, 0)

mc_upsampled: MassComposition = mc_size.resample_1d(interval_edges=new_edges, precision=3, include_original_edges=True)

fig = mc_upsampled.plot_intervals(variables=['mass_dry', 'Fe', 'SiO2', 'Al2O3'], cumulative=False)
# noinspection PyTypeChecker
plotly.io.show(fig)

Close inspection of the plot above reals some sharp dips for some mass intervals. This is caused by those intervals being narrower than the adjacent neighbours, hence they have less absolute mass. This is a visual artefact only, numerically it is correct, as shown by the cumulative plot.

fig = mc_upsampled.plot_intervals(variables=['mass_dry', 'Fe', 'SiO2', 'Al2O3'], cumulative=True, direction='ascending')
fig

/home/runner/work/mass-composition/mass-composition/elphick/mass_composition/mc_xarray.py:157: FutureWarning:

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

We can upsample each of the original fraction by a factor. Since adjacent fractions are similar, the fractional plot is reasonably smooth. Note however, that fraction widths are still different, caused by the original sieve selection.

mc_upsampled_2: MassComposition = mc_size.resample_1d(interval_edges=10, precision=3)
fig = mc_upsampled_2.plot_intervals(variables=['mass_dry', 'Fe', 'SiO2', 'Al2O3'], cumulative=False)
fig

Validate the head grade against the original sample

pd.testing.assert_frame_equal(mc_size.aggregate().reset_index(drop=True),
                              mc_upsampled.aggregate().reset_index(drop=True))

pd.testing.assert_frame_equal(mc_size.aggregate().reset_index(drop=True),
                              mc_upsampled_2.aggregate().reset_index(drop=True))

Complete a round trip by converting the up-sampled objects back to the original intervals and validate.

orig_index = mc_size.data.to_dataframe().index
original_edges: np.ndarray = np.sort(np.unique(list(orig_index.left) + list(orig_index.right)))

mc_downsampled: MassComposition = mc_upsampled.resample_1d(interval_edges=original_edges, precision=3)
mc_downsampled_2: MassComposition = mc_upsampled_2.resample_1d(interval_edges=original_edges, precision=3)

pd.testing.assert_frame_equal(mc_size.data.to_dataframe(), mc_downsampled.data.to_dataframe())
pd.testing.assert_frame_equal(mc_size.data.to_dataframe(), mc_downsampled_2.data.to_dataframe())

Total running time of the script: ( 0 minutes 0.906 seconds)

Gallery generated by Sphinx-Gallery