Turning ripples into waves

netCDF4 python and the get_variables_by_attributes

netCDF-Java users like to brag about the Get by Attribute method in the Java version of the netCDF library. This method allow the user to query for using the information in the variable attribute. That is much better than browsing the all variables names to find what you need. I had a very hackish version of that to search the attributes using string as input, but Kyle Wilcox took this to a whole new level by allowing callable searches. With such a nice implementation it was time to get that upstream!

PR 454 implements the get_by_attribute method in the netCDF4-python library. this will be available in the next release. Then we will be able to do:

In [3]:
from netCDF4 import Dataset

url = (''

nc = Dataset(url)
In [4]:
u = nc.get_variables_by_attributes(standard_name='eastward_sea_water_velocity')
In [5]:
type(u), len(u)
(list, 1)
In [6]:
<type 'netCDF4._netCDF4.Variable'>
float32 u(time, s_rho, eta_u, xi_u)
    units: meter second-1
    long_name: time-averaged u-momentum component
    time: ocean_time
    coordinates: time_run time s_rho lat_u lon_u 
    field: u-velocity, scalar, series
    _FillValue: 1e+37
    _ChunkSizes: [  1  36  82 129]
    standard_name: eastward_sea_water_velocity
unlimited dimensions: 
current shape = (858, 36, 82, 129)
filling off

As you can see the method always returns a list of variables. In this case we looked for the u velocity using its CF standard name.

Let's try a more complex search using a callable.

In [7]:
axis = lambda v: v in ['X', 'Y', 'Z', 'T']

variables = nc.get_variables_by_attributes(axis=axis)

Bad dataset! Don't you know that defining the axis attribute you make the life of users and automated tools much easier? This is what you get when CF only recommends an attribute rather than making it obligatory.

OK. Let's try something that is mandatory in CF like the formula_terms.

In [8]:
formula_terms = lambda v: v is not None

var = nc.get_variables_by_attributes(formula_terms=formula_terms)[0]

formula_terms = var.formula_terms
u's: s_rho C: Cs_r eta: zeta depth: h depth_c: hc'

Good dataset! Have a biscuit while I transform that horrendous CF string to a Python dictionary.

In [9]:
terms = [x.strip(':') for x in formula_terms.split()]
formula_terms = {k: v for k, v in zip(terms[::2], terms[1::2])}
{u'C': u'Cs_r',
 u'depth': u'h',
 u'depth_c': u'hc',
 u'eta': u'zeta',
 u's': u's_rho',
 'standard_name': u'ocean_s_coordinate_g1'}

And now that we have the dictionary we can use the CF definition for these variables, never minding the actual names in the used in the dataset. This is extremely valuable when loading ocean models from various sources. The variables names might change, but as long as they are CF-compliant we are good to go.

In [10]:
C = nc[formula_terms['C']]
s = nc[formula_terms['s']]
eta = nc[formula_terms['eta']]
depth = nc[formula_terms['depth']]
depth_c = nc[formula_terms['depth_c']]

A quick example on how to use this would be to construct the dimensioned vertical coordinate. I am writing all the CF Ocean Dimensionless Vertical Coordinates (ODVC) in their canonical, and almost pseudo-code like, form. That way users can import from there, or just copy-and-paste the formulas they need in their code. In order to keep the formulas simple I do not check, nor adjust, for the input shapes. I leave that for the end user. That way one can easily extend these formulas to CDMs interprets like xray and iris. (Actually iris already have these formulas built-in ;-)

So... Let's adjust the shapes!

In [11]:
s.shape, C.shape, eta.shape, depth.shape
((36,), (36,), (858, 82, 130), (82, 130))
In [12]:
s, C, eta, depth = (s[:][:, None, None], C[:][:, None, None],
                    eta[0, ...][None, ...], depth[:][None, ...])
In [13]:
s.shape, C.shape, eta.shape, depth.shape
((36, 1, 1), (36, 1, 1), (1, 82, 130), (1, 82, 130))

All 3D arrays ready for NumPy broadcasting!

In [14]:
def ocean_s_coordinate_g1(s, c, eta, depth, depth_c):
    Creates an Ocean s-coordinate, generic form 1 factory with the formula:
    z(n,k,j,i) = S(k,j,i) + eta(n,j,i) * (1 + S(k,j,i) / depth(j,i))
      S(k,j,i) = depth_c * s(k) + (depth(j,i) - depth_c) * C(k)
    S = depth_c * s + (depth - depth_c) * c
    z = S + eta * (1 + S / depth)
    return z

And finally let's compute z and make some plots.

In [15]:
z = ocean_s_coordinate_g1(s, C, eta, depth, depth_c)

salt = nc.get_variables_by_attributes(standard_name='sea_water_salinity')[0]
temp = nc.get_variables_by_attributes(standard_name='sea_water_potential_temperature')[0]
In [16]:
import seaborn
import matplotlib.pyplot as plt


fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(9, 6), sharey=True)

kw = dict(linewidth=2.5)
ax0.plot(salt[0, :, 0, 76], z[:, 0, 76], label='Salinity', **kw)
ax1.plot(temp[0, :, 0, 76], z[:, 0, 76], label='Temperature', **kw)

kw = dict(right=False, top=False, left=True, bottom=True, offset=None, trim=True)

seaborn.despine(ax=ax0, **kw)
seaborn.despine(ax=ax1, **kw)
In [17]:

This post was written as an IPython notebook. It is available for download or as a static html.

Creative Commons License
python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at