python4oceanographers

Turning ripples into waves

Filtering out tides with pandas, iris and numpy

This is a short post on how to filter out tides from a time-series using pandas, iris, and numpy.convolve.

Our data consists of hourly measurements of currents, temperature and salinity as follows:

In [2]:
from datetime import datetime
from pandas import read_table

url = 'https://raw.githubusercontent.com/ocefpaf/python4oceanographers/master/content/downloads/notebooks/data/'
fname = url + '15t30717.3f1'
cols = ['j', 'u', 'v', 'temp', 'sal', 'y', 'mn', 'd', 'h', 'mi']


df = read_table(fname , delim_whitespace=True, names=cols)
dates = [datetime(*x) for x in
         zip(df['y'], df['mn'], df['d'], df['h'], df['mi'])]
df.index = dates
df.drop(['y', 'mn', 'd', 'h', 'mi', 'j'], axis=1, inplace=True)
df.head()
Out[2]:
u v temp sal
1993-07-18 03:00:00 11.7 -1.3 28.3 29.2
1993-07-18 04:00:00 12.3 -4.5 28.1 29.4
1993-07-18 05:00:00 9.6 -5.3 27.6 31.0
1993-07-18 06:00:00 7.6 -2.3 27.2 32.4
1993-07-18 07:00:00 10.6 -2.0 27.2 32.5

First let's try numpy.convolve to apply a lanczos filter from the module oceans:

In [3]:
import numpy as np
from oceans import lanc

freq = 1./40  # Hours
window_size = 96+1+96
pad = np.zeros(window_size) * np.NaN

wt = lanc(window_size, freq)
res = np.convolve(wt, df['v'], mode='same')

df['low'] = res
df['high'] = df['v'] - df['low']

Now a 40 hours pandas rolling_mean:

In [4]:
from pandas import rolling_mean

df['pandas_l'] = rolling_mean(df['v'], window=40, center=True, freq='1H')
df['pandas_h'] = df['v'] - df['pandas_l']

And finally iris built-in rolling_window method:

In [5]:
import iris
from iris.pandas import as_cube

cube = as_cube(df['v'])
low = cube.rolling_window('index',
                        iris.analysis.SUM,
                        len(wt),
                        weights=wt)

df['iris_l'] = np.r_[pad, low.data, pad]
df['iris_h'] = df['v'] - df['iris_l']

Plotting everything together.

In [6]:
%matplotlib inline

import matplotlib.pyplot as plt

fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, figsize=(15, 7),
                                    sharex=True, sharey=True)
x = df.index.to_pydatetime()

ax0.plot(x, df['v'], label='original')
ax0.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05),
           ncol=3, fancybox=True, shadow=True, numpoints=1)

ax1.plot(x, df['high'], label='lanc high', linewidth=3, alpha=0.5)
ax1.plot(x, df['pandas_h'], label='pandas high')
ax1.plot(x, df['iris_h'], label='iris high')
ax1.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05),
           ncol=3, fancybox=True, shadow=True, numpoints=1)

ax2.plot(x, df['low'], label='lanc low', linewidth=3, alpha=0.5)
ax2.plot(x, df['pandas_l'], label='pandas low')
ax2.plot(x, df['iris_l'], label='iris low')
leg = ax2.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05),
                 ncol=3, fancybox=True, shadow=True, numpoints=1)

Conclusions

The pandas rolling_mean function, as the name suggests, is just a moving average. Iris filtering method also has the rolling_ prefix in the name, but I find that misleading. Iris does perform a convolution under the hood. In fact, the only difference from the numpy convolution we applied above is the mode option. We used mode=same and iris probably used mode=valid in order to exclude the border effect from the data.

In [7]:
HTML(html)
Out[7]:

This post was written as an IPython notebook. It is available for download or as a static html.

Creative Commons License
python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

Comments