Interval Data

This example adds a second dimension. The second dimension is an interval, of the form interval_from, interval_to. It is also known as binned data, where each ‘bin’ is bounded between and upper and lower limit.

An interval is relevant in geology, when analysing drill hole data.

Intervals are also encountered in metallurgy, but in that discipline they are often called fractions, e.g. size fractions. In that case the typical nomenclature is size_retained, size passing, since the data originates from a sieve stack.

import logging

import pandas as pd
import plotly.io
from matplotlib import pyplot as plt

from elphick.geomet import Sample, IntervalSample
from elphick.geomet.data.downloader import Downloader
from elphick.geomet.utils.pandas import weight_average
import plotly.graph_objects as go

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
                    datefmt='%Y-%m-%dT%H:%M:%S%z',
                    )

Create a MassComposition object

We get some demo data in the form of a pandas DataFrame We create this object as 1D based on the pandas index

iron_ore_sample_data: pd.DataFrame = Downloader().load_data(datafile='iron_ore_sample_A072391.zip', show_report=False)
df_data: pd.DataFrame = iron_ore_sample_data
df_data.head()

	index	mass_dry	H2O	MgO	Al2O3	P	Fe	SiO2	TiO2	CaO	Na2O	K2O	DHID	interval_from	interval_to
0	6	2.12	0.35	0.07	1.48	0.019	64.30	3.23	0.080	0.04	0.01	0.03	CBS02	26.60	26.85
1	7	2.06	0.23	0.06	1.28	0.017	64.91	2.90	0.082	0.04	0.01	0.03	CBS02	26.85	27.10
2	9	1.91	0.23	0.06	1.01	0.016	65.09	2.39	0.059	0.03	0.01	0.02	CBS02	27.70	28.00
3	10	1.96	0.36	0.06	0.99	0.022	65.03	2.22	0.057	0.04	0.01	0.02	CBS02	28.00	28.30
4	12	2.06	0.40	0.05	0.75	0.016	65.87	1.69	0.040	0.03	0.01	0.01	CBS02	28.60	28.95

obj_mc: Sample = Sample(df_data, name='Drill program')
obj_mc

<elphick.geomet.sample.Sample object at 0x7f36e8970980>

obj_mc.aggregate

	mass_wet	mass_dry	H2O	MgO	MnO	Al2O3	P	Fe	SiO2	TiO2	CaO	Na2O	K2O
0	2029.617808	1981.688	2.361519	0.080513	0.149219	1.773585	0.044628	60.443938	2.82721	0.062978	0.125071	0.015877	0.013164

Use the normal pandas groupby-apply as needed. Here we leverage the weight_average function from utils.pandas

hole_average: pd.DataFrame = obj_mc.data.groupby('DHID').apply(weight_average)
hole_average

/home/runner/work/geometallurgy/geometallurgy/examples/02_interval_sample/01_interval_sample.py:58: DeprecationWarning:

DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.

weight_average	mass_dry	MgO	MnO	Al2O3	P	Fe	SiO2	TiO2	CaO	Na2O	K2O
DHID
CBS02	46.310	0.055636	0.000000	1.029685	0.022764	64.656675	2.589849	0.053834	0.029160	0.010436	0.017296
CBS03	226.250	0.132968	0.038134	1.399500	0.046587	60.297271	3.653106	0.052016	0.699112	0.030318	0.019871
CBS04	344.680	0.088788	0.126484	1.513147	0.038461	60.289083	3.319658	0.050986	0.093096	0.001773	0.011788
CBS10	304.690	0.044836	0.091520	1.963728	0.059062	61.001839	2.949455	0.061461	0.060045	0.014310	0.012509
CBS12	493.968	0.090723	0.247394	1.854824	0.032990	60.572344	2.491186	0.067683	0.045545	0.016941	0.014323
CBS13	565.790	0.066832	0.165065	1.969400	0.051777	59.839565	2.443914	0.072125	0.027301	0.019054	0.010321

We will now make a 2D dataset using DHID and the intervals.

df_data['DHID'] = df_data['DHID'].astype('category')
df_data = df_data.reset_index(drop=True).set_index(['DHID', 'interval_from', 'interval_to'])

obj_mc_2d: IntervalSample = IntervalSample(df_data, name='Drill program')
print(obj_mc_2d)

IntervalSample: Drill program
{'mass_wet': {0: 2029.6178076448032}, 'mass_dry': {0: 1981.688}, 'H2O': {0: 2.3615188763258583}, 'MgO': {0: 0.08051321903347046}, 'MnO': {0: 0.14921928174364477}, 'Al2O3': {0: 1.773585095131019}, 'P': {0: 0.044627670955266416}, 'Fe': {0: 60.443937895370006}, 'SiO2': {0: 2.827210176374888}, 'TiO2': {0: 0.06297808534945964}, 'CaO': {0: 0.12507133312610258}, 'Na2O': {0: 0.015876646576050316}, 'K2O': {0: 0.013163565606694896}}

obj_mc_2d.aggregate

	mass_wet	mass_dry	H2O	MgO	MnO	Al2O3	P	Fe	SiO2	TiO2	CaO	Na2O	K2O
0	2029.617808	1981.688	2.361519	0.080513	0.149219	1.773585	0.044628	60.443938	2.82721	0.062978	0.125071	0.015877	0.013164

obj_mc_2d.data.groupby('DHID').apply(weight_average, **{'mass_wet': 'mass_wet', 'moisture_column_name': 'H2O'})

weight_average	mass_wet	mass_dry	h2o	MgO	MnO	Al2O3	P	Fe	SiO2	TiO2	CaO	Na2O	K2O
DHID
CBS02	46.614043	46.310	0.652257	0.055636	0.000000	1.029685	0.022764	64.656675	2.589849	0.053834	0.029160	0.010436	0.017296
CBS03	229.414089	226.250	1.379204	0.132968	0.038134	1.399500	0.046587	60.297271	3.653106	0.052016	0.699112	0.030318	0.019871
CBS04	347.440438	344.680	0.794507	0.088788	0.126484	1.513147	0.038461	60.289083	3.319658	0.050986	0.093096	0.001773	0.011788
CBS10	306.500146	304.690	0.590586	0.044836	0.091520	1.963728	0.059062	61.001839	2.949455	0.061461	0.060045	0.014310	0.012509
CBS12	506.098042	493.968	2.396777	0.090723	0.247394	1.854824	0.032990	60.572344	2.491186	0.067683	0.045545	0.016941	0.014323
CBS13	593.551050	565.790	4.677112	0.066832	0.165065	1.969400	0.051777	59.839565	2.443914	0.072125	0.027301	0.019054	0.010321

View some plots

fig: go.Figure = obj_mc_2d.plot_parallel(color='DHID')
plotly.io.show(fig)

obj_mc_2d.query('DHID=="CBS02"').reset_index('DHID').plot_intervals(variables=['mass_dry', 'Fe', 'SiO2', 'Al2O3'],
                                                                    cumulative=False)

Total running time of the script: (0 minutes 3.682 seconds)

Gallery generated by Sphinx-Gallery