This is a short post on how to filter out tides from a time-series using
pandas
, iris
, and numpy.convolve
.
Our data consists of hourly measurements of currents, temperature and salinity as follows:
from datetime import datetime
from pandas import read_table
url = 'https://raw.githubusercontent.com/ocefpaf/python4oceanographers/master/content/downloads/notebooks/data/'
fname = url + '15t30717.3f1'
cols = ['j', 'u', 'v', 'temp', 'sal', 'y', 'mn', 'd', 'h', 'mi']
df = read_table(fname , delim_whitespace=True, names=cols)
dates = [datetime(*x) for x in
zip(df['y'], df['mn'], df['d'], df['h'], df['mi'])]
df.index = dates
df.drop(['y', 'mn', 'd', 'h', 'mi', 'j'], axis=1, inplace=True)
df.head()
First let's try numpy.convolve
to apply a lanczos filter from the module oceans
:
import numpy as np
from oceans import lanc
freq = 1./40 # Hours
window_size = 96+1+96
pad = np.zeros(window_size) * np.NaN
wt = lanc(window_size, freq)
res = np.convolve(wt, df['v'], mode='same')
df['low'] = res
df['high'] = df['v'] - df['low']
Now a 40 hours pandas rolling_mean
:
from pandas import rolling_mean
df['pandas_l'] = rolling_mean(df['v'], window=40, center=True, freq='1H')
df['pandas_h'] = df['v'] - df['pandas_l']
And finally iris built-in rolling_window
method:
import iris
from iris.pandas import as_cube
cube = as_cube(df['v'])
low = cube.rolling_window('index',
iris.analysis.SUM,
len(wt),
weights=wt)
df['iris_l'] = np.r_[pad, low.data, pad]
df['iris_h'] = df['v'] - df['iris_l']
Plotting everything together.
%matplotlib inline
import matplotlib.pyplot as plt
fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, figsize=(15, 7),
sharex=True, sharey=True)
x = df.index.to_pydatetime()
ax0.plot(x, df['v'], label='original')
ax0.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05),
ncol=3, fancybox=True, shadow=True, numpoints=1)
ax1.plot(x, df['high'], label='lanc high', linewidth=3, alpha=0.5)
ax1.plot(x, df['pandas_h'], label='pandas high')
ax1.plot(x, df['iris_h'], label='iris high')
ax1.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05),
ncol=3, fancybox=True, shadow=True, numpoints=1)
ax2.plot(x, df['low'], label='lanc low', linewidth=3, alpha=0.5)
ax2.plot(x, df['pandas_l'], label='pandas low')
ax2.plot(x, df['iris_l'], label='iris low')
leg = ax2.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05),
ncol=3, fancybox=True, shadow=True, numpoints=1)
Conclusions
The pandas rolling_mean
function, as the name suggests, is just a moving
average. Iris filtering method also has the rolling_
prefix in the name,
but I find that misleading. Iris does perform a convolution under the hood.
In fact, the only difference from the numpy convolution we applied above
is the mode
option. We used mode=same
and iris probably used mode=valid
in order to exclude the border effect from the data.
HTML(html)