This week I was helping a friend to explore her data-set with some simple statistics and plots. So I decided to try seaborn out.
It is a really nice library that, together with pandas, becomes a powerful tool to take the first steps while exploring your data.
Here is a simple example of what we did.
import seaborn
import numpy as np
import matplotlib.pyplot as plt
from io import BytesIO
from pandas import read_csv
kw = dict(na_values='NaN', sep=',', encoding='utf-8',
skipinitialspace=True, index_col=False)
df = read_csv("./data/fish.csv", **kw)
df.head()
Seaborn
makes it easy to control the figure aesthetics with set_style
and
get_style
.
kw = {'axes.edgecolor': '0', 'text.color': '0', 'ytick.color': '0', 'xtick.color': '0',
'ytick.major.size': 5, 'xtick.major.size': 5, 'axes.labelcolor': '0'}
seaborn.set_style("whitegrid", kw)
The first plot will be a simple and naive correlation matrix. It it just one
line with seaborn
.
ax = seaborn.corrplot(df, annot=False, diag_names=False)
Easy conclusion, the bigger the fish, the heavier it is ;). But seriously now,
BDE 47
is positively correlated with Days
and BDE 99
, that is worth
exploring. BDE 99
was part of the experiment. However, BDE 47
was not in
the fish at the begging, it is a by-product of the BDE 99
that appear as the
fish metabolized it.
We can explore this a little further. Note that we used pandas groupby
to
aggregate the the data around the variables "Days".
g = df.groupby('Days')
mean_df = g.mean()
g.describe().head()
ax = seaborn.jointplot("Days", "BDE 99 (ng/g)", df, kind="reg")
ax = seaborn.jointplot("Days", "BDE 47 (ng/g)", df, kind="reg")
The increase in BDE 47
is clear. BDE 99
does not show a decrease in the
same rate as BDE 47
because it was part of the fish diet.
The inspection of the residues is also a one-liner.
ax = seaborn.residplot("Days", "BDE 99 (ng/g)", df)
ax = seaborn.residplot("Days", "BDE 47 (ng/g)", df)
Hopefully that is useful for others. Do not forget to check seaborn docs.
HTML(html)