Parallel Coordinates
Parallel coordinate plots are very useful for Exploratory Data Analysis (EDA).
Typically the target variable will be colored, since it is the variable of most interest, though this is optional.
The interactive nature of plotly is a real asset for this particular plot. Records/samples can be highlighted by
clicking and dragging the mouse vertically at a given axis for a variable (feature or target). Multiple selections
are possible. Single clicking a selection will remove it.
import pandas as pd
import plotly.io as pio
from sklearn.datasets import load_diabetes, load_wine
from elphick.sklearn_viz.features import plot_parallel_coordinates
Load Classification Data
wine = load_wine(as_frame=True)
X, y = wine.data, wine.target.rename('target')
df = pd.concat([X, y], axis=1)
df
|
alcohol |
malic_acid |
ash |
alcalinity_of_ash |
magnesium |
total_phenols |
flavanoids |
nonflavanoid_phenols |
proanthocyanins |
color_intensity |
hue |
od280/od315_of_diluted_wines |
proline |
target |
0 |
14.23 |
1.71 |
2.43 |
15.6 |
127.0 |
2.80 |
3.06 |
0.28 |
2.29 |
5.64 |
1.04 |
3.92 |
1065.0 |
0 |
1 |
13.20 |
1.78 |
2.14 |
11.2 |
100.0 |
2.65 |
2.76 |
0.26 |
1.28 |
4.38 |
1.05 |
3.40 |
1050.0 |
0 |
2 |
13.16 |
2.36 |
2.67 |
18.6 |
101.0 |
2.80 |
3.24 |
0.30 |
2.81 |
5.68 |
1.03 |
3.17 |
1185.0 |
0 |
3 |
14.37 |
1.95 |
2.50 |
16.8 |
113.0 |
3.85 |
3.49 |
0.24 |
2.18 |
7.80 |
0.86 |
3.45 |
1480.0 |
0 |
4 |
13.24 |
2.59 |
2.87 |
21.0 |
118.0 |
2.80 |
2.69 |
0.39 |
1.82 |
4.32 |
1.04 |
2.93 |
735.0 |
0 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
173 |
13.71 |
5.65 |
2.45 |
20.5 |
95.0 |
1.68 |
0.61 |
0.52 |
1.06 |
7.70 |
0.64 |
1.74 |
740.0 |
2 |
174 |
13.40 |
3.91 |
2.48 |
23.0 |
102.0 |
1.80 |
0.75 |
0.43 |
1.41 |
7.30 |
0.70 |
1.56 |
750.0 |
2 |
175 |
13.27 |
4.28 |
2.26 |
20.0 |
120.0 |
1.59 |
0.69 |
0.43 |
1.35 |
10.20 |
0.59 |
1.56 |
835.0 |
2 |
176 |
13.17 |
2.59 |
2.37 |
20.0 |
120.0 |
1.65 |
0.68 |
0.53 |
1.46 |
9.30 |
0.60 |
1.62 |
840.0 |
2 |
177 |
14.13 |
4.10 |
2.74 |
24.5 |
96.0 |
2.05 |
0.76 |
0.56 |
1.35 |
9.20 |
0.61 |
1.60 |
560.0 |
2 |
178 rows × 14 columns
Plot Classification Data
fig = plot_parallel_coordinates(df, color=y.name)
# noinspection PyTypeChecker
pio.show(fig)
The target is optional. If the plot is too dense, then consider sampling as demonstrated.
fig = plot_parallel_coordinates(df.sample(frac=0.5))
fig
Load Regression Data
diabetes = load_diabetes(as_frame=True, scaled=False)
X, y = diabetes.data, diabetes.target.rename('target')
df = pd.concat([X, y], axis=1)
df
|
age |
sex |
bmi |
bp |
s1 |
s2 |
s3 |
s4 |
s5 |
s6 |
target |
0 |
59.0 |
2.0 |
32.1 |
101.00 |
157.0 |
93.2 |
38.0 |
4.00 |
4.8598 |
87.0 |
151.0 |
1 |
48.0 |
1.0 |
21.6 |
87.00 |
183.0 |
103.2 |
70.0 |
3.00 |
3.8918 |
69.0 |
75.0 |
2 |
72.0 |
2.0 |
30.5 |
93.00 |
156.0 |
93.6 |
41.0 |
4.00 |
4.6728 |
85.0 |
141.0 |
3 |
24.0 |
1.0 |
25.3 |
84.00 |
198.0 |
131.4 |
40.0 |
5.00 |
4.8903 |
89.0 |
206.0 |
4 |
50.0 |
1.0 |
23.0 |
101.00 |
192.0 |
125.4 |
52.0 |
4.00 |
4.2905 |
80.0 |
135.0 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
437 |
60.0 |
2.0 |
28.2 |
112.00 |
185.0 |
113.8 |
42.0 |
4.00 |
4.9836 |
93.0 |
178.0 |
438 |
47.0 |
2.0 |
24.9 |
75.00 |
225.0 |
166.0 |
42.0 |
5.00 |
4.4427 |
102.0 |
104.0 |
439 |
60.0 |
2.0 |
24.9 |
99.67 |
162.0 |
106.6 |
43.0 |
3.77 |
4.1271 |
95.0 |
132.0 |
440 |
36.0 |
1.0 |
30.0 |
95.00 |
201.0 |
125.2 |
42.0 |
4.79 |
5.1299 |
85.0 |
220.0 |
441 |
36.0 |
1.0 |
19.6 |
71.00 |
250.0 |
133.2 |
97.0 |
3.00 |
4.5951 |
92.0 |
57.0 |
442 rows × 11 columns
Plot Regression Data
fig = plot_parallel_coordinates(df, color=y.name)
fig
Categorical data is supported
df['sex'] = df['sex'].map({1: 'Male', 2: 'Female'}).astype('category')
fig = plot_parallel_coordinates(df.sample(frac=0.5), color=y.name)
fig
Total running time of the script: ( 0 minutes 2.684 seconds)
Gallery generated by Sphinx-Gallery