Note
Click here to download the full example code
Interval Dataο
This example adds a second dimension. The second dimension is an interval, of the form interval_from, interval_to. It is also known as binned data, where each βbinβ is bounded between and upper and lower limit.
An interval is relevant in geology, when analysing drill hole data.
Intervals are also encountered in metallurgy, but in that discipline they are often called fractions, e.g. size fractions. In that case the typical nomenclature is size_retained, size passing, since the data originates from a sieve stack.
import logging
import pandas as pd
from matplotlib import pyplot as plt
from elphick.mass_composition import MassComposition
from elphick.mass_composition.datasets.sample_data import iron_ore_sample_data
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
datefmt='%Y-%m-%dT%H:%M:%S%z',
)
Create a MassComposition objectο
We get some demo data in the form of a pandas DataFrame We create this object as 1D based on the pandas index
df_data: pd.DataFrame = iron_ore_sample_data()
df_data.head()
obj_mc: MassComposition = MassComposition(df_data,
name='Drill program',
mass_units='kg')
obj_mc
<elphick.mass_composition.mass_composition.MassComposition object at 0x7f3958f41420>
obj_mc.aggregate()
obj_mc.aggregate('DHID')
We will now make a 2D dataset using DHID and the interval. We will first create a mean interval variable. Then we will set the dataframe index to both variables before constructing the object.
print(df_data.columns)
df_data['DHID'] = df_data['DHID'].astype('category')
# make an int based drillhole identifier
code, dh_id = pd.factorize(df_data['DHID'])
df_data['DH'] = code
df_data = df_data.reset_index().set_index(['DH', 'interval_from', 'interval_to'])
obj_mc_2d: MassComposition = MassComposition(df_data,
name='Drill program',
mass_units='kg')
# obj_mc_2d._data.assign(hole_id=dh_id)
print(obj_mc_2d)
print(obj_mc_2d.aggregate())
print(obj_mc_2d.aggregate('DHID'))
Index(['mass_dry', 'H2O', 'MgO', 'MnO', 'Al2O3', 'P', 'Fe', 'SiO2', 'TiO2',
'CaO', 'Na2O', 'K2O', 'DHID', 'interval_from', 'interval_to'],
dtype='object')
Drill program
<xarray.Dataset> Size: 90kB
Dimensions: (DH: 6, interval: 123)
Coordinates:
* DH (DH) int64 48B 0 1 2 3 4 5
* interval (interval) object 984B [7.5, 7.8) [7.8, 8.2) ... [39.25, 39.6)
Data variables: (12/15)
mass_wet (DH, interval) float64 6kB nan nan nan nan ... 18.26 18.57 17.01
mass_dry (DH, interval) float64 6kB nan nan nan nan ... 17.4 17.76 16.54
H2O (DH, interval) float64 6kB nan nan nan nan ... 4.56 4.72 4.34 2.76
MgO (DH, interval) float64 6kB nan nan nan nan ... 0.06 0.06 0.06 0.05
MnO (DH, interval) float64 6kB nan nan nan nan ... 0.09 0.1 0.08 0.06
Al2O3 (DH, interval) float64 6kB nan nan nan nan ... 1.5 1.55 1.58 1.41
... ...
TiO2 (DH, interval) float64 6kB nan nan nan nan ... 0.057 0.058 0.052
CaO (DH, interval) float64 6kB nan nan nan nan ... 0.03 0.03 0.03 0.03
Na2O (DH, interval) float64 6kB nan nan nan nan ... 0.02 0.02 0.03 0.02
K2O (DH, interval) float64 6kB nan nan nan nan ... 0.01 0.01 0.01 0.01
index (DH, interval) float64 6kB nan nan nan nan ... 460.0 461.0 462.0
DHID (DH, interval) object 6kB nan nan nan ... 'CBS13' 'CBS13' 'CBS13'
Attributes:
mc_name: Drill program
mc_vars_mass: ['mass_wet', 'mass_dry']
mc_vars_chem: ['MgO', 'MnO', 'Al2O3', 'P', 'Fe', 'SiO2', 'TiO2', 'C...
mc_vars_attrs: ['index', 'DHID']
mc_interval_edges: {'interval': {'left': 'from', 'right': 'to'}}
mass_wet mass_dry H2O ... CaO Na2O K2O
name ...
Drill program 2029.617808 1981.688 2.361519 ... 0.125071 0.015877 0.013164
[1 rows x 13 columns]
mass_wet mass_dry H2O ... CaO Na2O K2O
DHID ...
CBS02 46.614043 46.310 0.652257 ... 0.029160 0.010436 0.017296
CBS03 229.414089 226.250 1.379204 ... 0.699112 0.030318 0.019871
CBS04 347.440438 344.680 0.794507 ... 0.093096 0.001773 0.011788
CBS10 306.500146 304.690 0.590586 ... 0.060045 0.014310 0.012509
CBS12 506.098042 493.968 2.396777 ... 0.045545 0.016941 0.014323
CBS13 593.551050 565.790 4.677112 ... 0.027301 0.019054 0.010321
[6 rows x 13 columns]
View some plots
First confirm the parallel plot still works
# TODO: work on the display order
# TODO - fails for DH (integer)
# fig: Figure = obj_mc_2d.plot_parallel(color='Fe')
# fig.show()
# now plot using the xarray data - take advantage of the multi-dim nature of the package
obj_mc_2d.data['Fe'].plot()
plt.show()
Total running time of the script: ( 0 minutes 0.961 seconds)