python4oceanographers

Turning ripples into waves

The ggplot interface for python

There has been a lot of buzz around the python ggplot module recently. I must confess that the original (ggplot2 for R) for R is not a tool in my utility belt. However, every now and then I find myself teaching it to biologist/ecologist that are stuck with R. I do appreciate the concept Grammar of Graphics though, it is just not my everyday plotting tool.

This post is just to give the python version of ggplot a try and see what all the fuzz is about.

Yhat's python version of ggplot is is extremely un-pythonic (it says so in README file!), so be aware, if you never used ggplot before and/or you are coming from matplotlib you might get a little bit confused.

We'll try the module out by comparing CTD temperature profile plots with:

  • pure matplotlib
  • my own wrapper for plotting ctd profiles
  • ggplot.

The first issue I faced with python ggplot was that I could not reverse the axis of a plot. Hopefully this PR will change this situation. If you want to reproduce the plot at the end of the post you'll need to install ggplot from my branch.

The PR was merged:

pip install https://github.com/yhat/ggplot/tarball/master

Then you'll need to download the matplotlibrc for ggplot's layout, unzip, and copy it into your local .matplotlib folder.

wget https://github.com/yhat/ggplot/raw/master/matplotlibrcs/matplotlibrc-windows.zip

unzip -p matplotlibrc-windows.zip home/stefan/.matplotlib/matplotlibrc > matplotlibrc

cp matplotlibrc $HOME/.config/matplotlib/matplotlibrc

Before plotting we need to load the data (and perform some simple pre-processing).

It is worth mentioning that I explicitly named the DataFrame index, that way the ctd module can automagically label the plots.

In [2]:
import gsw
from ctd import DataFrame, Series

cast = DataFrame.from_cnv('./data/CTD_001.cnv.gz', compression='gzip')

keep = set(['t090C', 'c0S/m'])
null = map(cast.pop, keep.symmetric_difference(cast.columns))

cast, _ = cast.split()
cast = cast.apply(Series.bindata, **dict(delta=1.))

cast['SP'] = gsw.SP_from_C(cast['c0S/m'].values * 10.,
                           cast['t090C'].values,
                           cast.index.values.astype(float))
cast.index.name = 'Pressure [dbar]'

First let's plot the profile "the matplotlib way" (11 LOC).

In [3]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(3, 4))
ax.plot(cast['t090C'], cast.index)
ax.set_ylabel(cast.index.name)
ax.invert_yaxis()
offset = 0.01
x1, x2 = ax.get_xlim()[0] - offset, ax.get_xlim()[1] + offset
ax.set_xlim(x1, x2)
ax.set_title("Matplotlib")
ax.set_xlabel("Temperature")
ax.set_ylabel("Pressure [dbar]")
Out[3]:
<matplotlib.text.Text at 0x7f4df8b10250>

Now let's make the same plot with the ctd module (3 LOC).

(Note: The next version will accept title and figsize as kw options, making this a one-liner.)

In [4]:
fig, ax = cast['t090C'].plot()
ax.set_title('python-ctd "wrapper"')
fig.set_size_inches(3, 4)

Finally, we will plot the profile with ggplot. You'll observe that, before plotting, I created a new column data with the index. The reason for that is because I could not figure out how to pass the index as the y-axis.

In [5]:
from ggplot import *
cast['pressure'] = cast.index.values
p = ggplot(cast, aes(x='t090C', y='pressure')) + geom_line() + scale_y_reverse()
p
/home/filipe/miniconda3/envs/Blog/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

Out[5]:
<ggplot: (8748239955721)>

It is kind of a one liner if you exclude the import and the print line.

I understand that ggplot has lots of fans from the R world, but it fells a little alien to me. For example, I'm still figuring out how to adjust the figure size/aspect ratio to give it a "depth-profile look."

My final take is that, if you have a very specific type of plot that requires some tweaking, and/or that you'll have to re-plot it several times, you might be better of writing your own wrapper around matplotlib.

Still, I can see the potential of ggplot when teaching students how to make simple, yet powerful, plots.

I'll leave it with a TS-diagram. (Note that we don't need to pass raw strings to use latex.)

In [6]:
p = ggplot(cast, aes(x='SP', y='t090C')) + \
                 geom_point(color='black') + \
                 xlab("Salinity [g kg$^{-1}$]") + \
                 ylab("Temperature [$^\circ$C]")

p
Out[6]:
<ggplot: (8748240952545)>
In [7]:
HTML(html)
Out[7]:

This post was written as an IPython notebook. It is available for download or as a static html.

Creative Commons License
python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

Comments