This week I was helping a friend to explore her data-set with some simple statistics and plots. So I decided to try seaborn out.

It is a really nice library that, together with pandas, becomes a powerful tool to take the first steps while exploring your data.

Here is a simple example of what we did.

In [2]:

import seaborn
import numpy as np
import matplotlib.pyplot as plt

from io import BytesIO
from pandas import read_csv

In [3]:

kw = dict(na_values='NaN', sep=',', encoding='utf-8',
          skipinitialspace=True, index_col=False)

df = read_csv("./data/fish.csv", **kw)

In [4]:

df.head()

Out[4]:

	Days	ID	Recovery	Extract weight	Lipid %	Weight (g)	Size (cm)	Liver weight (g)	LSI	CF
0	0	A	73.21	0.10	3.600000	20.09	12.8	0.14	0.696864	0.957966
1	0	B	98.24	0.22	2.272727	36.52	15.5	0.33	0.903614	0.980699
2	0	C	89.71	0.18	3.500000	28.74	14.7	0.25	0.869868	0.904763
3	1	A	78.40	0.13	1.330769	23.70	14.0	0.15	0.632911	0.863703
4	1	B	66.24	0.13	2.838462	32.80	15.0	0.20	0.609756	0.971852

Seaborn makes it easy to control the figure aesthetics with set_style and get_style.

In [5]:

kw = {'axes.edgecolor': '0', 'text.color': '0', 'ytick.color': '0', 'xtick.color': '0',
      'ytick.major.size': 5, 'xtick.major.size': 5, 'axes.labelcolor': '0'}

seaborn.set_style("whitegrid", kw)

The first plot will be a simple and naive correlation matrix. It it just one line with seaborn.

In [6]:

ax = seaborn.corrplot(df, annot=False, diag_names=False)

Easy conclusion, the bigger the fish, the heavier it is ;). But seriously now, BDE 47 is positively correlated with Days and BDE 99, that is worth exploring. BDE 99 was part of the experiment. However, BDE 47 was not in the fish at the begging, it is a by-product of the BDE 99 that appear as the fish metabolized it.

We can explore this a little further. Note that we used pandas groupby to aggregate the the data around the variables "Days".

In [7]:

g = df.groupby('Days')
mean_df = g.mean()
g.describe().head()

Out[7]:

		BDE 47 (ng/g)	BDE 99 (ng/g)	CF	Extract weight	LSI	Lipid %	Liver weight (g)	Recovery	Size (cm)	Weight (g)
Days
0	count	3	3	3.000000	3.000000	3.000000	3.000000	3.000000	3.000000	3.000000	3.000000
	mean	0	0	0.947809	0.166667	0.823449	3.124242	0.240000	87.053333	14.333333	28.450000
	std	0	0	0.038974	0.061101	0.110916	0.739127	0.095394	12.724725	1.386843	8.218838
	min	0	0	0.904763	0.100000	0.696864	2.272727	0.140000	73.210000	12.800000	20.090000
	25%	0	0	0.931364	0.140000	0.783366	2.886364	0.195000	81.460000	13.750000	24.415000

In [8]:

ax = seaborn.jointplot("Days", "BDE 99 (ng/g)", df, kind="reg")

In [9]:

ax = seaborn.jointplot("Days", "BDE 47 (ng/g)", df, kind="reg")

The increase in BDE 47 is clear. BDE 99 does not show a decrease in the same rate as BDE 47 because it was part of the fish diet.

The inspection of the residues is also a one-liner.

In [10]:

ax = seaborn.residplot("Days", "BDE 99 (ng/g)", df)

In [11]:

ax = seaborn.residplot("Days", "BDE 47 (ng/g)", df)

Hopefully that is useful for others. Do not forget to check seaborn docs.

In [12]:

HTML(html)

Out[12]:

This post was written as an IPython notebook. It is available for download or as a static html.

python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

python4oceanographers

Turning ripples into waves

Exploratory analysis using seaborn

Comments