xray is an interesting package that has been under my radar for a while.
According to the docs xray is:
... an open source project and Python package that aims to bring the
labeled data power of pandas to the physical sciences, by providing
N-dimensional variants of the core pandas data structures, Series and
DataFrame: the xray DataArray and Dataset.
Our goal is to provide a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays, rather than the tabular data for
which pandas excels. Our approach adopts the Common Data Model for
self- describing scientific data in widespread use in the Earth sciences
(e.g., netCDF and OPeNDAP): xray.Dataset is an in-memory representation of
a netCDF file.
This post is my first attempt using it for ocean models output. Because xray is under heavy development I will be using the latest git checkout.
import xray
print(xray.__version__)
xray uses python-netCDF4 under the hood, so it "speaks" OPeNDAP.
url = ('http://tds.marine.rutgers.edu/thredds/dodsC/roms/'
'espresso/2013_da/his_Best/'
'ESPRESSO_Real-Time_v2_History_Best_Available_best.ncd')
nc = xray.open_dataset(url)
It has a nice and meaniful __repr__
for the DataArray object.
salt = nc['salt']
salt
My first impression is that, when compared to raw netCDF4 interface, the
xray.DataArray
object is richer (more metadata). However, if compared to his older brother iris.cube.Cube
, it falls short. For example: xray
does not attach the 2D coordinates to the object. (As of the latest git
checkout it does attach the coordinates.)
print(salt.coords['xi_rho'].coords)
print(salt.coords['eta_rho'].coords)
print(salt.coords['lon_rho'].coords)
print(salt.coords['lat_rho'].coords)
The Dataset
is another important xray object. Here is a small snippet on how to construct one:
ds = xray.Dataset({'temperature': nc['temp']},
coords={'lon': (['eta_rho', 'xi_rho'], nc['lon_rho']),
'lat': (['eta_rho', 'xi_rho'], nc['lat_rho']),
's': nc['s_rho'],
'time': nc['time']})
ds
The best feature, IMHO, is the use of Pandas "fuzzyness" when slicing with
strings dates. Iris uses the cumbersome lambda
approach, but using strings is much easier.
s = salt.loc['2014-12-02']
s.coords['time']
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
LAND = cfeature.NaturalEarthFeature('physical', 'land', '50m',
edgecolor='face',
facecolor=cfeature.COLORS['land'])
def make_map(bbox, projection=ccrs.PlateCarree()):
fig, ax = plt.subplots(figsize=(8, 6),
subplot_kw=dict(projection=projection))
ax.set_extent(bbox)
ax.add_feature(LAND, facecolor='0.75')
ax.coastlines(resolution='50m')
gl = ax.gridlines(draw_labels=True)
gl.xlabels_top = gl.ylabels_right = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
return fig, ax
import numpy.ma as ma
import matplotlib.pyplot as plt
from oceans.colormaps import cm
sal = s[0, -1, ...].values
lon, lat = nc['lon_rho'], nc['lat_rho']
fig, ax = make_map([lon.min(), lon.max(), lat.min(), lat.max()])
cs = ax.pcolormesh(lon.values, lat.values,
ma.masked_invalid(sal), cmap=cm.odv)
cbar = fig.colorbar(cs, extend='both')
Note that when plotting I had to use the .values
method. The reason is because the
DataArray object does not behave entirely like a numpy array. The .ravel()
method, need for pcolormesh, is missing. Also note that xray does not support masked array (yet).
One might ask: why use labeled arrays? The main argument is to write idiomatic expressions like,
s.mean(dim='time')
there is no doubt on which dimension I am averaging.
Because xray is designed to behave like pandas, when dealing with time series
one can take advantage of all pandas features. The same may be accomplished with
iris.pandas.as_series
, but with xray there is no awkward conversions.
Now let's try an example straight from xray
's documentation.
First I will reduce the data from hourly to daily observations,
then compute weekly anomalies.
def daily_aggregator(data_array):
time = data_array.coords['time']
values = time.to_index().to_period('D').to_timestamp()
return xray.DataArray(values, [time], name='daily')
temp = nc['temp'].loc['2013-06-01':'2013-07-01'] # I get a weird error if extending this.
ts = temp.isel(eta_rho=16, xi_rho=40, s_rho=-1) # Cannot use lon, lat because xray does not support 2D dimensions.
daily_avg = ts.groupby(daily_aggregator(ts)).mean()
week_avg = daily_avg.groupby('daily.week').mean('daily')
anomalies = daily_avg.groupby('daily.week') - week_avg
ax = anomalies.reset_coords(drop=True).to_series().plot()
Those anomalies could very well be the GS front meandering around that point.
surface = temp.isel(s_rho=-1)
fig, ax = make_map([lon.min(), lon.max(), lat.min(), lat.max()])
cs = ax.pcolormesh(lon.values, lat.values,
ma.masked_invalid(surface[0].values),
cmap=cm.avhrr, alpha=0.5, zorder=1)
p, = ax.plot(lon.isel(eta_rho=16, xi_rho=40),
lat.isel(eta_rho=16, xi_rho=40),
'ko', zorder=2)
HTML(html)