Simple least squares fit to study residuals

I got a question from a friend: how do I fit a function, using least-squares, to analyze the residuals. To answer that I cook this notebook up and made a post out of it ;-)

For the example I create a fake tidal signal, that we want fit and remove, added some noise and an offset (mean current).

In [3]:
import numpy as np

def fake_tide(t, M2amp, M2phase):
"""
Generate a minimally realistic-looking fake semidiurnal tide.

t is time in hours
"""
out = M2amp * np.sin(2 * np.pi * t / 12.42 - M2phase)
return out

In [4]:
N = 500
res = 5
noise = np.random.randn(N)

t = np.arange(N)

amp, phase = 2, 0

u = fake_tide(t, amp, phase) + res + noise


Here is the signal we created.

In [5]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(11, 2.75))
ax.plot(t, u, 'darkgreen', label='Signal')
leg = ax.legend()


Now lets try to specify a first guess for the non-linear fitting. Note that the phase is a little off on purpose.

In [6]:
mean = np.mean(u)

std = amp * np.std(u) / (amp**0.5)

# First guess.
first_guess = fake_tide(t, std, 0.25) + mean

fig, ax = plt.subplots(figsize=(11, 2.75))
ax.plot(t, first_guess, 'darkorange', label='First guess')
ax.plot(t, u, 'darkgreen', label='Signal')
leg = ax.legend()


Nice first guess, we did not had to do such a good jib ;-), but now it is time to feed that into SciPy's optimize.leastsq to get a least squares fit of the parameters.

In [7]:
from scipy.optimize import leastsq

optimize_func = lambda x: x[0] * np.sin(2 * np.pi * t / 12.42 + x[1]) + x[2] - u

est_std, est_phase, est_mean = leastsq(optimize_func, [std, phase, mean])[0]


Let's check if the parameters we estimated are close to reality.

In [8]:
print('Orignal: std {:.2f}, phase {:.2f}, and res {:.2f}'.format(std, phase, res))
print('Estimated: std {:.2f}, phase {:.2f}, and res {:.2f}'.format(est_std, est_phase, est_mean))

Orignal: std 2.39, phase 0.00, and res 5.00
Estimated: std 1.98, phase -0.00, and res 5.05



Awesome! Close enough =)

Now lets re-create the signal using the estimated parameters and plot everything.

In [9]:
fit = fake_tide(t, est_std, est_phase) + est_mean

In [10]:
fig, ax = plt.subplots(figsize=(11, 2.75))
ax.plot(t, fit, 'darkorange', label='Fitted')
ax.plot(t, first_guess, 'crimson', label='First guess')
ax.plot(t, u, 'darkgreen', label='Signal')
ax.plot(t, u-fit, 'crimson', label='Residue')
leg = ax.legend()
_ = ax.set_xlim(100, 150)

In [11]:
HTML(html)

Out[11]:

This post was written as an IPython notebook. It is available for download or as a static html.