python4oceanographers

Turning ripples into waves

Interactive plots with mpld3

I usually use IPython notebook in my classes, they are a great tool for teaching code and science or even both at the same time.

However, when I am using the notebook to teach, I miss interaction with matplotlib figures. One can drop the %matplotlib inline magic to force the figures to pop-up in the screen and use the interaction, but that's confusing, force students to deal with multiple windows, and breaks the linear flow of the notebook.

I've looked at some alternative to create interactive figures in HTML documents like:

Some of those, like Bokeh, even have some sort of integration with the IPython notebook. The problem with them is that they all require you to learn a new tool!

So here comes Jake to the rescue! (Like he already did before, and before, and before...) He is developing a D3 Renderings of Matplotlib Graphics. That means we can keep using the familiar matplotlib interface to make nice d3 interactive plots.

As an example I will re-write my tidal harmonic analysis using his mpld3 module so my student will be able to explore the plot, zooming-in and -out at any point.

For that I will use sea level data from the GLOSS Station 194. Here is the data format:

In [2]:
%%bash
sed -n 71,83p ./data/gloss_hourly.fmt
          The data records are coded as:

          field          bytes   length             comment
     -----------------  -------  ------  ------------------------------------
     station number        1-3     3     exactly 3 digits
     station version         4     1     letter from A to Z
     station name          6-9     4     Abbreviated if necessary
     year                12-15     4
     month               16-17     2     numerical value
     day                 18-19     2
     day record count       20     1     either 1 or 2
     sea level data      21-80    60     12 sea level values,
                                         hours 00-11 (1); hours 12-23 (2)

To read that data let's define some handy functions and load the data directly from the zipfile into a pandas Series object.

In [3]:
import os
import numpy as np
import numpy.ma as ma

from zipfile import ZipFile
from datetime import datetime
from pandas import Series, date_range

def basename(fname):
    return os.path.splitext(os.path.basename(fname))[0]


def make_date(data):
    date = str(int(data))[:-1]
    return datetime.strptime(date, '%Y%m%d')


def load_gloss(fname):
    data = np.loadtxt(fname, skiprows=1, usecols=range(2, 15))
    elev = ma.masked_equal(data[:, 1:].ravel(), 9999)
    start = make_date(data[0, 0])
    dates = date_range(start, periods=len(elev), freq='H')
    return Series(elev, index=dates)

series = Series()
with ZipFile('./data/h281a.zip', 'r') as ziped:
    for fname in sorted(ziped.namelist()):
        if not fname.endswith('/'):  # Skip directories.
            if False:
                with ziped.open(fname) as f:
                    print(f.readline())
            name = basename(fname)
            series = series.append(load_gloss(ziped.open(fname)))

Now we can convert the data from millimeters to meters, print a data description, and produce a quick plot to inspect the data.

In [4]:
series /= 1e3
print(series.describe())
ax = series.plot(figsize=(10, 3))
count    457699.000000
mean          1.689822
std           0.404013
min           0.140000
25%           1.400000
50%           1.690000
75%           1.980000
max           4.880000
dtype: float64

The data need some more pre-processing to fill-in gaps, and to remove outliers that passed the initial QA/QC. That is beyond of my scope here so I'll just slice a clean and continuous 2-month period from 2004, remove the mean, and move ahead.

In [5]:
obs = series.ix['2004'] - series.ix['2004'].mean()
obs = obs.ix['2004-07':'2004-08']
ax = obs.plot(figsize=(8, 3))
print(obs.describe())
count    1488.000000
mean       -0.031990
std         0.425528
min        -1.213540
25%        -0.339540
50%        -0.009540
75%         0.283710
max         0.960460
dtype: float64

Looks good to me. The next 3 cells are just a copy-and-paste from my previous post on calling t_tide via oct2py.

In [6]:
from matplotlib.dates import date2num

dates = date_range(datetime.now(), periods=72, freq='H')
tim = date2num(dates.to_pydatetime()) + 366  # Matlab Argh!!!
elev = obs.values
In [7]:
%load_ext oct2py.ipython
In [8]:
%%octave -i elev -i tim -o tidestruc -o pout -o yout

pkg load all

addpath(genpath('./t_tide_v1.3beta'))

[tidestruc, pout] = t_tide(elev, 'interval', 1, 'latitude', -25, 'starttime', [1998, 1, 1, 0]);

yout = t_predic(tim, tidestruc);
   number of standard constituents used: 35
   Points used: 1487 of 1488
   percent of var residual after lsqfit/var original: 27.26 %
   Greenwich phase computed with nodal corrections applied to amplitude
 and phase relative to center time
   Using nonlinear bootstrapped error estimates
   Generating prediction with nodal corrections, SNR is 2.000000
warning: isstr is obsolete and will be removed from a future version of Octave, please use ischar instead
   percent of var residual after synthesis/var original: 33.93 %
-----------------------------------
date: 02-May-2014
nobs = 1488,  ngood = 1487,  record length (days) = 62.00
start time: 01-Jan-1998
rayleigh criterion = 1.0
Greenwich phase computed with nodal corrections applied to amplitude \n and phase relative to center time

x0= -0.0365, x trend= 0

var(x)= 0.1812   var(xp)= 0.11915   var(xres)= 0.061482
percent var predicted/var original= 65.8 %

     tidal amplitude and phase with 95% CI estimates

tide   freq       amp     amp_err    pha    pha_err     snr
 MM   0.0015122    0.1571    0.183    64.59    77.12     0.74
 MSF  0.0028219    0.0256    0.164   231.18   210.70    0.024
 ALP1 0.0343966    0.0019    0.006    16.85   171.73    0.091
*2Q1  0.0357064    0.0109    0.007   235.60    44.29      2.2
*Q1   0.0372185    0.0473    0.008   212.37     9.84       36
*O1   0.0387307    0.1614    0.007   203.26     2.58  4.8e+02
 NO1  0.0402686    0.0029    0.005   240.58    99.45     0.35
*K1   0.0417807    0.1032    0.008    29.44     3.80  1.9e+02
 J1   0.0432929    0.0021    0.006   295.19   161.03     0.14
 OO1  0.0448308    0.0051    0.008   227.13   101.19     0.44
 UPS1 0.0463430    0.0086    0.010   230.15    78.72     0.73
 EPS2 0.0761773    0.0050    0.016   227.96   161.74      0.1
*MU2  0.0776895    0.0356    0.019    65.92    34.86      3.5
*N2   0.0789992    0.0650    0.019   190.48    18.93       11
*M2   0.0805114    0.3383    0.021    83.78     3.94  2.7e+02
*L2   0.0820236    0.0488    0.026    44.37    29.43      3.5
*S2   0.0833333    0.2356    0.023   210.99     5.78    1e+02
 ETA2 0.0850736    0.0336    0.031   143.32    60.92      1.2
*MO3  0.1192421    0.0596    0.011   152.47    10.36       30
*M3   0.1207671    0.0703    0.008    42.64     6.71       70
*MK3  0.1222921    0.0397    0.009    16.25    14.31       21
*SK3  0.1251141    0.0489    0.010   298.89    11.84       25
*MN4  0.1595106    0.0241    0.006   166.80    13.51       18
*M4   0.1610228    0.0526    0.005   182.36     5.57       97
 SN4  0.1623326    0.0054    0.005   125.25    61.09      1.1
*MS4  0.1638447    0.0270    0.006    49.39    12.76       20
 S4   0.1666667    0.0029    0.005   325.77   113.83     0.28
 2MK5 0.2028035    0.0005    0.003    17.45   225.26     0.03
 2SK5 0.2084474    0.0031    0.005   312.31    93.55     0.44
 2MN6 0.2400221    0.0004    0.002   319.96   174.12    0.061
 M6   0.2415342    0.0012    0.002   246.25   100.47     0.44
*2MS6 0.2443561    0.0054    0.002   333.73    24.77      7.7
 2SM6 0.2471781    0.0029    0.002   100.65    51.16      1.6
 3MK7 0.2833149    0.0025    0.003   181.48    67.60     0.81
 M8   0.3220456    0.0023    0.002   216.09    47.06      1.9

Now I'll use mpld3 to plot the observed data, tide prediction from the analysis, and the residue. The main difference from my previous post is that, while the original plot was just a static png figure, this one allow us to interact with the graph.

In [9]:
import mpld3
mpld3.enable_notebook()

import matplotlib.pyplot as plt

fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, sharey=True, sharex=True, figsize=(8, 4))

ax0.plot(obs.index, obs)
ax1.plot(obs.index, pout.squeeze(), alpha=0.5)
ax2.plot(obs.index, obs-pout.squeeze(), alpha=0.5)
Out[9]:
[<matplotlib.lines.Line2D at 0x7f2e8336b110>]

Now I can tell my students to zoom-in at the observed high water level event that occurred around July 14$^{\text{th}}$ and start quizzing them about what is happening there.

In [10]:
HTML(html)
Out[10]:

This post was written as an IPython notebook. It is available for download or as a static html.

Creative Commons License
python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

Comments