This week I was helping a friend to explore her data-set with some simple statistics and plots. So I decided to try seaborn out.
It is a really nice library that, together with pandas, becomes a powerful tool to take the first steps while exploring your data.
Here is a simple example of what we did.
import seaborn
import numpy as np
import matplotlib.pyplot as plt
from io import BytesIO
from pandas import read_csv
kw = dict(na_values='NaN', sep=',', encoding='utf-8',
skipinitialspace=True, index_col=False)
df = read_csv("./data/fish.csv", **kw)
df.head()
Seaborn makes it easy to control the figure aesthetics with set_style and
get_style.
kw = {'axes.edgecolor': '0', 'text.color': '0', 'ytick.color': '0', 'xtick.color': '0',
'ytick.major.size': 5, 'xtick.major.size': 5, 'axes.labelcolor': '0'}
seaborn.set_style("whitegrid", kw)
The first plot will be a simple and naive correlation matrix. It it just one
line with seaborn.
ax = seaborn.corrplot(df, annot=False, diag_names=False)
Easy conclusion, the bigger the fish, the heavier it is ;). But seriously now,
BDE 47 is positively correlated with Days and BDE 99, that is worth
exploring. BDE 99 was part of the experiment. However, BDE 47 was not in
the fish at the begging, it is a by-product of the BDE 99 that appear as the
fish metabolized it.
We can explore this a little further. Note that we used pandas groupby to
aggregate the the data around the variables "Days".
g = df.groupby('Days')
mean_df = g.mean()
g.describe().head()
ax = seaborn.jointplot("Days", "BDE 99 (ng/g)", df, kind="reg")
ax = seaborn.jointplot("Days", "BDE 47 (ng/g)", df, kind="reg")
The increase in BDE 47 is clear. BDE 99 does not show a decrease in the
same rate as BDE 47 because it was part of the fish diet.
The inspection of the residues is also a one-liner.
ax = seaborn.residplot("Days", "BDE 99 (ng/g)", df)
ax = seaborn.residplot("Days", "BDE 47 (ng/g)", df)
Hopefully that is useful for others. Do not forget to check seaborn docs.
HTML(html)
