Datasets and Sample Data

TLDR

Datasets are sourced in the public domain, largely unaltered.

Sample Data is for use in mass-composition examples, often sourced from a dataset with some transformation applied.

Why two modules?

We are after open, real data so our examples are realistic. We are obliged to appropriately reference so the original dataset is largely structured as they are sourced, potentially with some minor adjustments as noted.

The sample_data module contains methods that often load from the dataset module and apply transformations to prepare data ready for injection into the package. This keeps examples simple.

This approach retains the integrity of the original datasets, but creates sample_data that simplifies examples.

The Dataset Register can be found here.

import pandas as pd

from elphick.mass_composition.datasets import datasets
from elphick.mass_composition.datasets import sample_data

Datasets

We load some datasets. This will download the file after a hash check, thereby avoiding repeated downloads unless the source file has been updated.

df_ds1: pd.DataFrame = datasets.load_size_by_assay()
df_ds1
size_retained size_passing mass_pct fe sio2 al2o3
0 0.850 2.000 3.3 64.15 2.04 2.68
1 0.500 0.850 9.9 64.33 2.05 2.23
2 0.150 0.500 26.5 64.52 1.84 2.19
3 0.075 0.150 2.5 62.65 2.88 3.32
4 0.045 0.075 8.8 62.81 2.12 2.25
5 0.000 0.045 49.0 55.95 6.39 6.34


When executing this method, you can view the ‘profile report’ for the dataset, by setting the show_report argument to True.

df_ds1: pd.DataFrame = datasets.load_size_by_assay(show_report=True)

Sample Data

We load some sample data. The method called here utilises the file downloaded in the example above. Some minor changes have been made to the file to simplify instantiation of a MassComposition object.

df_sd1: pd.DataFrame = sample_data.size_by_assay()
df_sd1
mass_dry fe sio2 al2o3
size_retained size_passing
0.850 2.000 3.3 64.15 2.04 2.68
0.500 0.850 9.9 64.33 2.05 2.23
0.150 0.500 26.5 64.52 1.84 2.19
0.075 0.150 2.5 62.65 2.88 3.32
0.045 0.075 8.8 62.81 2.12 2.25
0.000 0.045 49.0 55.95 6.39 6.34


Total running time of the script: ( 0 minutes 0.310 seconds)

Gallery generated by Sphinx-Gallery