Parallel Coordinates

Parallel coordinate plots are very useful for Exploratory Data Analysis (EDA).

Typically the target variable will be colored, since it is the variable of most interest, though this is optional.

The interactive nature of plotly is a real asset for this particular plot. Records/samples can be highlighted by clicking and dragging the mouse vertically at a given axis for a variable (feature or target). Multiple selections are possible. Single clicking a selection will remove it.

import pandas as pd
import plotly.io as pio
from sklearn.datasets import load_diabetes, load_wine

from elphick.sklearn_viz.features import plot_parallel_coordinates

Load Classification Data

wine = load_wine(as_frame=True)
X, y = wine.data, wine.target.rename('target')
df = pd.concat([X, y], axis=1)
df

	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline	target
0	14.23	1.71	2.43	15.6	127.0	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065.0	0
1	13.20	1.78	2.14	11.2	100.0	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050.0	0
2	13.16	2.36	2.67	18.6	101.0	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185.0	0
3	14.37	1.95	2.50	16.8	113.0	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480.0	0
4	13.24	2.59	2.87	21.0	118.0	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735.0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
173	13.71	5.65	2.45	20.5	95.0	1.68	0.61	0.52	1.06	7.70	0.64	1.74	740.0	2
174	13.40	3.91	2.48	23.0	102.0	1.80	0.75	0.43	1.41	7.30	0.70	1.56	750.0	2
175	13.27	4.28	2.26	20.0	120.0	1.59	0.69	0.43	1.35	10.20	0.59	1.56	835.0	2
176	13.17	2.59	2.37	20.0	120.0	1.65	0.68	0.53	1.46	9.30	0.60	1.62	840.0	2
177	14.13	4.10	2.74	24.5	96.0	2.05	0.76	0.56	1.35	9.20	0.61	1.60	560.0	2

178 rows × 14 columns

Plot Classification Data

fig = plot_parallel_coordinates(df, color=y.name)
# noinspection PyTypeChecker
pio.show(fig)

The target is optional. If the plot is too dense, then consider sampling as demonstrated.

fig = plot_parallel_coordinates(df.sample(frac=0.5))
fig

Load Regression Data

diabetes = load_diabetes(as_frame=True, scaled=False)
X, y = diabetes.data, diabetes.target.rename('target')
df = pd.concat([X, y], axis=1)
df

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target
0	59.0	2.0	32.1	101.00	157.0	93.2	38.0	4.00	4.8598	87.0	151.0
1	48.0	1.0	21.6	87.00	183.0	103.2	70.0	3.00	3.8918	69.0	75.0
2	72.0	2.0	30.5	93.00	156.0	93.6	41.0	4.00	4.6728	85.0	141.0
3	24.0	1.0	25.3	84.00	198.0	131.4	40.0	5.00	4.8903	89.0	206.0
4	50.0	1.0	23.0	101.00	192.0	125.4	52.0	4.00	4.2905	80.0	135.0
...	...	...	...	...	...	...	...	...	...	...	...
437	60.0	2.0	28.2	112.00	185.0	113.8	42.0	4.00	4.9836	93.0	178.0
438	47.0	2.0	24.9	75.00	225.0	166.0	42.0	5.00	4.4427	102.0	104.0
439	60.0	2.0	24.9	99.67	162.0	106.6	43.0	3.77	4.1271	95.0	132.0
440	36.0	1.0	30.0	95.00	201.0	125.2	42.0	4.79	5.1299	85.0	220.0
441	36.0	1.0	19.6	71.00	250.0	133.2	97.0	3.00	4.5951	92.0	57.0

442 rows × 11 columns

Plot Regression Data

fig = plot_parallel_coordinates(df, color=y.name)
fig

Categorical data is supported

df['sex'] = df['sex'].map({1: 'Male', 2: 'Female'}).astype('category')
fig = plot_parallel_coordinates(df.sample(frac=0.5), color=y.name)
fig

Total running time of the script: ( 0 minutes 2.912 seconds)

Gallery generated by Sphinx-Gallery