Datasets and Sample Data

TLDR

Datasets are sourced in the public domain, largely unaltered.

Sample Data is for use in mass-composition examples, often sourced from a dataset with some transformation applied.

Why two modules?

We are after open, real data so our examples are realistic. We are obliged to appropriately reference so the original dataset is largely structured as they are sourced, potentially with some minor adjustments as noted.

The sample_data module contains methods that often load from the dataset module and apply transformations to prepare data ready for injection into the package. This keeps examples simple.

This approach retains the integrity of the original datasets, but creates sample_data that simplifies examples.

The Dataset Register can be found here.

import pandas as pd

from elphick.mass_composition.datasets import datasets
from elphick.mass_composition.datasets import sample_data

Datasets

We load some datasets. This will download the file after a hash check, thereby avoiding repeated downloads unless the source file has been updated.

df_ds1: pd.DataFrame = datasets.load_size_by_assay()
df_ds1

	size_retained	size_passing	mass_pct	fe	sio2	al2o3
0	0.850	2.000	3.3	64.15	2.04	2.68
1	0.500	0.850	9.9	64.33	2.05	2.23
2	0.150	0.500	26.5	64.52	1.84	2.19
3	0.075	0.150	2.5	62.65	2.88	3.32
4	0.045	0.075	8.8	62.81	2.12	2.25
5	0.000	0.045	49.0	55.95	6.39	6.34

When executing this method, you can view the ‘profile report’ for the dataset, by setting the show_report argument to True.

df_ds1: pd.DataFrame = datasets.load_size_by_assay(show_report=True)

Sample Data

We load some sample data. The method called here utilises the file downloaded in the example above. Some minor changes have been made to the file to simplify instantiation of a MassComposition object.

df_sd1: pd.DataFrame = sample_data.size_by_assay()
df_sd1

		mass_dry	fe	sio2	al2o3
size_retained	size_passing
0.850	2.000	3.3	64.15	2.04	2.68
0.500	0.850	9.9	64.33	2.05	2.23
0.150	0.500	26.5	64.52	1.84	2.19
0.075	0.150	2.5	62.65	2.88	3.32
0.045	0.075	8.8	62.81	2.12	2.25
0.000	0.045	49.0	55.95	6.39	6.34

Total running time of the script: ( 0 minutes 0.310 seconds)

Gallery generated by Sphinx-Gallery