elphick.sklearn_viz.model_selection.model_selection.ModelSelection

class elphick.sklearn_viz.model_selection.model_selection.ModelSelection(estimators, datasets, target, pre_processor=None, k_folds=10, scorer=None, metrics=None, group=None, random_state=None, n_jobs=1, verbosity=1)[source]
__init__(estimators, datasets, target, pre_processor=None, k_folds=10, scorer=None, metrics=None, group=None, random_state=None, n_jobs=1, verbosity=1)[source]
Parameters:
  • estimators (Union[BaseEstimator, Dict[str, BaseEstimator]]) – sklearn estimator or a Dict of algorithms to cross-validate, keyed by string name/code.

  • datasets (Union[DataFrame, Dict[str, DataFrame]]) – pandas DataFrame or a dict of DataFrames, keyed by string name/code.

  • target (str) – target column

  • pre_processor (Optional[Pipeline]) – Optional pipeline used to pre-process the datasets.

  • k_folds (int) – The number of cross validation folds.

  • scorer (Union[str, Callable, None]) – Optional callable scorers which the model will be fitted using

  • metrics (Optional[Dict[str, Callable]]) – Optional Dict of callable metrics to calculate post-fitting

  • group (Optional[Series]) – Optional group variable by which to partition/group metrics. The same group applies across all datasets, so is more useful when testing different algorithms.

  • random_state (Optional[int]) – Optional random seed

  • n_jobs (Union[int, str]) – Number of parallel jobs to run. If -1, then the number of jobs is set to the number of CPU cores. Recommend setting to -2 for large jobs to retain a core for system interaction.

  • verbosity (int) – Verbosity level. 0 = silent, 1 = overall (start/finish), 2 = each cross-validation.

Methods

__init__(estimators, datasets, target[, ...])

type estimators:

Union[BaseEstimator, Dict[str, BaseEstimator]]

calculate_metrics(x, y, estimators, indices, ...)

rtype:

Tuple[Dict, Dict]

cross_validate_task(data_key, data, ...)

get_cv_metrics(metrics[, by_group])

rtype:

DataFrame

get_cv_scores()

rtype:

DataFrame

get_model_by_group_data(estimator, dataset)

rtype:

tuple[DataFrame, DataFrame]

plot([metrics, show_group, title, col_wrap])

Create the plot

plot_category_analysis([algorithm, dataset, ...])

Plot the category feature analysis

Attributes

n_cores

results

plot(metrics=None, show_group=False, title=None, col_wrap=None)[source]

Create the plot

The plot will show the cross-validation scores for each algorithm and dataset. The first panel is used to show the scorer, that is the metric used to fit the model. If multiple metrics are supplied, each metric will be shown in a separate panel. If a show_group is true, the metrics will be grouped by the group variable. col_wrap allows the width of the plot to be controlled by wrapping the columns to new rows.

KUDOS: https://towardsdatascience.com/applying-a-custom-colormap-with-plotly-boxplots-5d3acf59e193

Parameters:
  • metrics (Union[str, List[str], None]) – The metric or metrics to plot in addition to the scorer. Each metric will be plotted in a separate panel.

  • show_group (bool) – If True (and a group variable has been set), plot by group.

  • title (Optional[str]) – Title of the plot

  • col_wrap (Optional[int]) – If plotting multiple metrics, col_wrap will wrap columns to new rows, resulting in col-wrap columns, and multiple rows.

Return type:

Figure

Returns:

a plotly GraphObjects.Figure

plot_category_analysis(algorithm=None, dataset=None, metrics=None, title=None, col_wrap=None)[source]

Plot the category feature analysis

Parameters:
  • algorithm (Optional[str]) – If supplied, this will be the name of the algorithm tested. If None the first algorithm is used.

  • dataset (Optional[str]) – If supplied, this will be the name of the dataset tested. If None the first dataset is used.

  • metrics (Union[str, List[str], None]) – The metric or metrics to show. Each metric will be plotted in a separate panel.

  • title (Optional[str]) – Title of the plot

  • col_wrap (Optional[int]) – If plotting multiple metrics, col_wrap will wrap columns to new rows, resulting in col-wrap columns, and multiple rows.

Return type:

Figure

Returns:

a plotly GraphObjects.Figure