# Gridding data part 1

In the last post we filled gaps in a gridded data. However, a more common problem is quite the opposite: to grid scattered (observed) data.

There are several optimum interpolation schemes to grid observed data, two examples are Objective Analysis and LOWESS.

In this post I implemented a simple function that is something between a LOWESS and an Objective Analysis. That is because LOWESS does not need to provide an underlying function to fit the model, and I do use a fit function (Gaussian) in my example. However, it is not an Objective Analysis either because I use the data mean as a background and the parameters (error and correlation length of the data) are guessed.

(This is the same data used in the last post.)

In [2]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

LAND = cfeature.NaturalEarthFeature('physical', 'land', '50m',
edgecolor='face',
facecolor=cfeature.COLORS['land'])

def make_map(bbox, projection=ccrs.PlateCarree()):
fig, ax = plt.subplots(figsize=(8, 6),
subplot_kw=dict(projection=projection))
ax.set_extent(bbox)
ax.add_feature(LAND, facecolor='0.75')
ax.coastlines(resolution='50m')
gl = ax.gridlines(draw_labels=True)
gl.xlabels_top = gl.ylabels_right = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
return fig, ax

In [3]:
import numpy as np
import numpy.ma as ma
from scipy.io import loadmat

merc = loadmat('./data/mercator_temperature.mat', squeeze_me=True)

x, y = merc['x'], merc['y']
bbox = [x.min(), x.max(), y.min(), y.max()]

level = 10  # 10 meters temperature.
data = ma.masked_invalid(merc['temp'][level, ...])

fig, ax = make_map(bbox=bbox)
cs = ax.pcolormesh(x, y, data)
cbar = fig.colorbar(cs, extend='both', shrink=0.85)


Here I sub-sampled the data, like in the previous post, but now accounting only for data actual and not the land mask.

In [4]:
import numpy as np

def random_mask(data, frac=0.05):
N = data.size
frac = np.int(frac * N)
idx = np.random.random_integers(0, N-1, frac)
return idx

mask = ~data.mask
temp = data[mask].data

idx = random_mask(temp, frac=0.25)

X, Y = np.meshgrid(x, y)

tsub = temp[idx]
xx, yy = X[mask][idx], Y[mask][idx]

In [5]:
fig, ax = make_map(bbox)
ax.set_title('Subsampled data to {:0.2f} %.'.format(100 * tsub.size / temp.size))
cs = ax.scatter(xx, yy, 20, tsub)
cbar = fig.colorbar(cs, extend='both', shrink=0.85)


The figure above shows 20% of the original data. We can pretend that these are CTD casts.

The cell below implements our gridding function.

In [6]:
def scaloa(xc, yc, x, y, data, corrlen=1.5, err=0.09**2):
shape = xc.shape
xc, yc, x, y, data = map(np.ravel, (xc, yc, x, y, data))
n = len(x)
x, y = np.reshape(x, (1, n)), np.reshape(y, (1, n))

# Squared distance matrix between the observations.
d2 = ((np.tile(x, (n, 1)).T - np.tile(x, (n, 1))) ** 2 +
(np.tile(y, (n, 1)).T - np.tile(y, (n, 1))) ** 2)

nv = len(xc)
# Squared distance between the observations and the grid points.
dc2 = ((np.tile(xc, (n, 1)).T - np.tile(x, (nv, 1))) ** 2 +
(np.tile(yc, (n, 1)).T - np.tile(y, (nv, 1))) ** 2)

# Correlation matrix between stations (A) and cross correlation (stations
# and grid points (C))
A = (1 - err) * np.exp(-d2 / corrlen ** 2)
C = (1 - err) * np.exp(-dc2 / corrlen ** 2)

# Add the diagonal matrix associated with the sampling error.  We use the
# diagonal because the error is assumed to be random.  This means it just
# correlates with itself at the same place.
A = A + err * np.eye(len(A))

# Weights that minimize the variance (OI).
data = np.reshape(data, (n, 1))
tp = np.dot(C, np.linalg.solve(A, data))

# Normalized mean error.  Taking the squared root you can get the
# interpolation error in percentage.
ep = 1 - np.sum(C.T * np.linalg.solve(A, C.T), axis=0) / (1 - err)

tp = tp.reshape((shape))
ep = ep.reshape((shape))
return tp, ep


The next cell has some guessed values for the error and the correlation length.

In [7]:
tp0, ep0 = scaloa(X, Y, xx, yy, tsub, corrlen=1, err=0.1**2)
tp1, ep1 = scaloa(X, Y, xx, yy, tsub, corrlen=1, err=0.01**2)
tp2, ep2 = scaloa(X, Y, xx, yy, tsub, corrlen=2.0, err=0.1**2)
tp3, ep3 = scaloa(X, Y, xx, yy, tsub, corrlen=2.0, err=0.01**2)

In [8]:
def make_ax(ax, tp):
ax.set_extent(bbox)
ax.add_feature(LAND, facecolor='0.75')
ax.coastlines(resolution='50m')
gl = ax.gridlines(draw_labels=True)
gl.xlabels_top = gl.ylabels_right = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
cs = ax.pcolormesh(X, Y, tp)
return ax, cs

fig = plt.figure(figsize=(11, 9))
projection = ccrs.PlateCarree()
ax0 = fig.add_subplot(221, projection=projection)
ax0, _ = make_ax(ax0, tp0)
ax0.set_title('corrlen=1, err=0.1')

ax1 = fig.add_subplot(222, projection=projection)
ax1, _ = make_ax(ax1, tp1)
ax1.set_title('corrlen=1, err=0.01')

ax2 = fig.add_subplot(223, projection=projection)
ax2, _ = make_ax(ax2, tp2)
ax2.set_title('corrlen=2, err=0.1')

ax3 = fig.add_subplot(224, projection=projection)
ax3, _ = make_ax(ax3, tp3)
ax3.set_title('corrlen=2, err=0.01')

Out[8]:
<matplotlib.text.Text at 0x7ff3c82f9750>


We can note that, as one increases the correlation length the gridded data get smoother. While, as one decrease the error, we can see more "details" in the interpolated data.

Using this function one must fine tune these two parameters until the interpolation yields a "nice" result. However, in the next post I will show how to Objectively Estimated them instead guessing.

In [9]:
HTML(html)

Out[9]:

This post was written as an IPython notebook. It is available for download or as a static html.

python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.