.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/504_dag_with_estimator.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_504_dag_with_estimator.py: DAG with Estimator =================== Flowsheet can be used to apply an estimator in a process flowsheet. This example demonstrates how to use a DAG to define a flowsheet that applies a lump estimator to a feed stream. The focus will not be on the model development, but rather on the simulation. The model is a simple RandomForest regressor that predicts the lump mass and composition from the feed stream. .. note:: This example uses the `estimator` extras. ensure you have installed like ``poetry install -E estimator``. .. GENERATED FROM PYTHON SOURCE LINES 15-32 .. code-block:: default import logging # This import at the top to guard against the estimator extras not being installed from elphick.mass_composition.utils.sklearn import PandasPipeline import pandas as pd import plotly from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from elphick.mass_composition import MassComposition, Stream from elphick.mass_composition.dag import DAG from elphick.mass_composition.datasets.sample_data import iron_ore_met_sample_data from elphick.mass_composition.flowsheet import Flowsheet .. GENERATED FROM PYTHON SOURCE LINES 33-36 .. code-block:: default logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) .. GENERATED FROM PYTHON SOURCE LINES 37-43 Load Data --------- We load some metallurgical data from a drill program, REF: A072391 Since we are not concerned about the model performance in this example, we'll convert the categorical feature bulk_hole_no to an integer .. GENERATED FROM PYTHON SOURCE LINES 43-57 .. code-block:: default df: pd.DataFrame = iron_ore_met_sample_data() base_components = ['fe', 'p', 'sio2', 'al2o3', 'loi'] cols_x = ['dry_weight_lump_kg'] + [f'head_{comp}' for comp in base_components] + ['bulk_hole_no'] cols_y = ['lump_pct'] + [f'lump_{comp}' for comp in base_components] df = df.loc[:, cols_x + cols_y].query('lump_pct>0').dropna(how='any') df = df.rename(columns={'dry_weight_lump_kg': 'head_mass_dry'}) df['bulk_hole_no'] = df['bulk_hole_no'].astype('category').cat.codes logger.info(df.shape) df.head() .. raw:: html
head_mass_dry head_fe head_p head_sio2 head_al2o3 head_loi bulk_hole_no lump_pct lump_fe lump_p lump_sio2 lump_al2o3 lump_loi
sample_number
30129 0.31 62.94 0.02 3.71 1.51 4.12 0 21.3 64.95 0.014 2.29 0.87 3.83
30131 0.52 64.79 0.02 2.88 1.29 2.80 0 22.4 66.48 0.009 1.67 0.65 2.46
30132 0.41 65.22 0.02 2.64 1.15 2.61 0 16.6 66.78 0.009 1.34 0.47 2.67
30133 0.32 64.67 0.02 2.85 1.17 3.00 0 19.6 66.23 0.011 1.56 0.62 2.74
30134 0.31 65.29 0.02 2.25 0.94 2.93 0 13.9 66.53 0.011 1.41 0.54 2.91


.. GENERATED FROM PYTHON SOURCE LINES 58-60 Build a model ------------- .. GENERATED FROM PYTHON SOURCE LINES 60-66 .. code-block:: default X: pd.DataFrame = df[[col for col in df.columns if col not in cols_y]] y: pd.DataFrame = df[cols_y] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 67-68 The model needs to be wrapped in a PandasPipeline object to ensure that the column names are preserved. .. GENERATED FROM PYTHON SOURCE LINES 68-74 .. code-block:: default pipe: PandasPipeline = PandasPipeline.from_pipeline( make_pipeline(StandardScaler(), RandomForestRegressor(n_estimators=100, random_state=42))) pipe .. raw:: html
PandasPipeline(steps=[('standardscaler', StandardScaler()),
                          ('randomforestregressor',
                           RandomForestRegressor(random_state=42))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 75-79 Test the model -------------- The model can be called directly to predict the lump percentage and composition from the feed stream. We will pass in a dataframe with the same columns as the training data. .. GENERATED FROM PYTHON SOURCE LINES 79-84 .. code-block:: default y_pred = pipe.fit(X_train.drop(columns=['head_mass_dry']), y_train).predict(X_test) logger.info(f'Test score: {pipe.score(X_test, y_test)}') y_pred.head() .. raw:: html
lump_pct lump_fe lump_p lump_sio2 lump_al2o3 lump_loi
sample_number
30838 21.150 62.1259 0.03646 1.5286 0.9012 8.3704
30836 17.511 61.8068 0.04462 1.6825 0.9777 8.5672
30143 20.425 64.9562 0.01953 2.0668 0.6913 4.1378
30158 18.170 58.1892 0.03722 8.7937 0.9085 6.5543
30915 22.386 62.5843 0.03022 1.4852 0.9723 7.6845


.. GENERATED FROM PYTHON SOURCE LINES 85-88 Create a Head MassComposition object ------------------------------------ Now we will create a MassComposition object and use it to apply the model to the feed stream. .. GENERATED FROM PYTHON SOURCE LINES 88-96 .. code-block:: default head: MassComposition = MassComposition(data=X_test.drop(columns=['bulk_hole_no']), name='head', mass_dry_var='head_mass_dry') lump, fines = head.split_by_estimator(estimator=pipe, name_2='fines', mass_recovery_column='lump_pct', mass_recovery_max=100, extra_features=X_test['bulk_hole_no']) lump.data.to_dataframe().head() .. raw:: html
mass_wet mass_dry H2O Fe P SiO2 Al2O3 LOI
sample_number
30838 0.300330 0.300330 0.0 62.1259 0.03646 1.5286 0.9012 8.3704
30836 0.227643 0.227643 0.0 61.8068 0.04462 1.6825 0.9777 8.5672
30143 0.085785 0.085785 0.0 64.9562 0.01953 2.0668 0.6913 4.1378
30158 0.101752 0.101752 0.0 58.1892 0.03722 8.7937 0.9085 6.5543
30915 0.385039 0.385039 0.0 62.5843 0.03022 1.4852 0.9723 7.6845


.. GENERATED FROM PYTHON SOURCE LINES 97-99 .. code-block:: default fines.data.to_dataframe().head() .. raw:: html
mass_wet mass_dry H2O Fe P SiO2 Al2O3 LOI
sample_number
30838 1.119670 1.119670 0.0 60.672634 0.053632 1.974256 1.166007 9.092784
30836 1.072357 1.072357 0.0 60.416554 0.051142 2.309856 1.501770 8.752437
30143 0.334215 0.334215 0.0 63.892801 0.032687 2.975628 1.644740 3.424259
30158 0.458248 0.458248 0.0 58.788980 0.040617 8.092612 1.643560 5.669172
30915 1.334961 1.334961 0.0 60.800730 0.042821 2.135598 1.948541 8.155420


.. GENERATED FROM PYTHON SOURCE LINES 100-107 Define the DAG -------------- First we define a simple DAG, where the feed stream is split into two streams, lump and fines. The lump estimator requires the usual mass-composition variables plus an addition feature/variable called `bulk_hole_no`. Since the `bulk_hole_no` is available in the feed stream, it is immediately accessible to the estimator. .. GENERATED FROM PYTHON SOURCE LINES 107-123 .. code-block:: default head: MassComposition = MassComposition(data=X_test, name='head', mass_dry_var='head_mass_dry') dag = DAG(name='A072391', n_jobs=1) dag.add_input(name='head') dag.add_step(name='screen', operation=Stream.split_by_estimator, streams=['head'], kwargs={'estimator': pipe, 'name_1': 'lump', 'name_2': 'fines', 'mass_recovery_column': 'lump_pct', 'mass_recovery_max': 100}) dag.add_output(name='lump', stream='lump') dag.add_output(name='fines', stream='fines') dag.run(input_streams={'head': head}, progress_bar=True) fig = Flowsheet.from_dag(dag).plot_network() fig .. rst-class:: sphx-glr-script-out .. code-block:: none Executing nodes: 0%| | 0/4 [00:00


.. GENERATED FROM PYTHON SOURCE LINES 124-134 More Complex DAG ---------------- This DAG is to test a more complex flowsheet where the estimator may have all the features immediately available in the parent stream. .. note:: This example works, but it does so since all attribute (extra) variables are passed all the way around the network in the current design. This is to be changed in the future to allow for more efficient processing. Once attributes are no longer passed, changes will be needed to the DAG to marshall features from other streams in the network (most often the input stream). .. GENERATED FROM PYTHON SOURCE LINES 134-154 .. code-block:: default dag = DAG(name='A072391', n_jobs=1) dag.add_input(name='head') dag.add_step(name='screen', operation=Stream.split_by_estimator, streams=['head'], kwargs={'estimator': pipe, 'name_1': 'lump', 'name_2': 'fines', 'mass_recovery_column': 'lump_pct', 'mass_recovery_max': 100}) dag.add_step(name='screen_2', operation=Stream.split_by_estimator, streams=['fines'], kwargs={'estimator': pipe, 'name_1': 'lump_2', 'name_2': 'fines_2', 'mass_recovery_column': 'lump_pct', 'mass_recovery_max': 100, 'allow_prefix_mismatch': True}) dag.add_output(name='lump', stream='lump_2') dag.add_output(name='fines', stream='fines_2') dag.add_output(name='stockpile', stream='lump') dag.run(input_streams={'head': head}, progress_bar=True) fs: Flowsheet = Flowsheet.from_dag(dag) fig = fs.plot_network() fig .. rst-class:: sphx-glr-script-out .. code-block:: none Executing nodes: 0%| | 0/6 [00:00


.. GENERATED FROM PYTHON SOURCE LINES 155-159 .. code-block:: default fig = fs.table_plot(plot_type='sankey', sankey_color_var='Fe', sankey_edge_colormap='copper_r', sankey_vmin=52, sankey_vmax=70) plotly.io.show(fig) .. raw:: html :file: images/sphx_glr_504_dag_with_estimator_001.html .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 2.498 seconds) .. _sphx_glr_download_auto_examples_504_dag_with_estimator.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 504_dag_with_estimator.py <504_dag_with_estimator.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 504_dag_with_estimator.ipynb <504_dag_with_estimator.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_