Turning ripples into waves

Loading non-standard dates with cf_units

I saw an issue on GitHub over the weekend that got me thinking: how does one read non-standards dates from dataset like simulations of Earth past climates?

I will use an example of a common dataset out there in the wild for this post. Let's try to read it with the netCDF4 library first.

In [3]:
from netCDF4 import Dataset, num2date

nc = Dataset('./data/')

time = nc.variables['time']

print('{}\n{}'.format(time.units, time.calendar))
days since 0000-01-01 00:00:00

Days since year 0 and a noleap calendar. Anyone working with climate will encounter that at a certain point in theirs lives. We have to deal with that, but...

In [4]:
num2date(time[:], time.units, time.calendar)
ValueError                                Traceback (most recent call last)
<ipython-input-4-b8be098448dc> in <module>()
----> 1 num2date(time[:], time.units, time.calendar)

/home/filipe/.virtualenvs/blog/lib/python2.7/site-packages/netCDF4/ in netCDF4._netCDF4.num2date (netCDF4/_netCDF4.c:50572)()

/home/filipe/.virtualenvs/blog/lib/python2.7/site-packages/netCDF4/ in netCDF4._netCDF4._dateparse (netCDF4/_netCDF4.c:48418)()

ValueError: year is out of range

That is a sensible error message. After all there is no year zero in any of the calendars supported by netCDF4. UDUNITS behaves in a different way though:

> "UDUNITS implements the mixed Gregorian/Julian calendar system, as followed in England, in which dates prior to 1582-10-15 are assumed to use the Julian calendar.  Other software cannot be relied upon to handle the change of calendar in the same way, so for robustness it is recommended that the reference date be later than 1582.  If earlier dates must be used, it should be noted that UDUNITS treats 0 AD as identical to 1 AD."

The year-zero edge case will show up in the CF-conventions for climatological data. It is allowed, but not recommended! It only exists for compatibility with the COARDS conventions.

How can we derive higher level datetime-like objects to work with? We can use UDUNITS of course, but a python wrapper is better!

In [5]:
import cf_units

times = cf_units.num2date(time[:], time.units, time.calendar)
array([ 100-02-01 00:00:00], dtype=object)

Same time variable! Same netCDF4.num2date syntax! It cannot get any easier than that ;-)

But note!!! This is not a real datetime object!

In [6]:

[m for m in dir(times[0]) if not m.startswith('__')]
<type 'netcdftime._datetime.datetime'>


A lot of the same methods are there and, if you are brave enough, you can get real datetime object with _to_real_datetime.

In [7]:
date = times[0]._to_real_datetime()
datetime.datetime(100, 2, 1, 0, 0)

Is it doing the right thing? Let's check! Remember that the 'noleap' calendar means 365 days.

In [8]:
import numpy as np

years = np.fix(time[:]/365)

days = np.remainder(time[:], 365)

years, days
(array([ 100.]), array([ 31.]))

31 days ahead the first day means February 1, right!? Note that the fake datetime object can do date formatting.

In [9]:
' 100-02-01'

But the real datetime cannot!

In [10]:
ValueError                                Traceback (most recent call last)
<ipython-input-10-936e5acfe983> in <module>()
----> 1 date.strftime('%Y-%m-%d')

ValueError: year=100 is before 1900; the datetime strftime() methods require year >= 1900

Bonus questions: will this work as a pandas time index?

In [11]:
from pandas import DatetimeIndex

times = cf_units.num2date(np.arange(36531, 36532, 1), time.units, time.calendar)

times = [t._to_real_datetime() for t in times]
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-11-9ec1b8ef24b9> in <module>()
      5 times = [t._to_real_datetime() for t in times]
----> 6 DatetimeIndex(times)

/home/filipe/.virtualenvs/blog/lib/python2.7/site-packages/pandas/util/ in wrapper(*args, **kwargs)
     86                 else:
     87                     kwargs[new_arg_name] = new_arg_value
---> 88             return func(*args, **kwargs)
     89         return wrapper
     90     return _deprecate_kwarg

/home/filipe/.virtualenvs/blog/lib/python2.7/site-packages/pandas/tseries/ in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, **kwargs)
    243                                         yearfirst=yearfirst)
    244             else:
--> 245                 data = tools.to_datetime(data, errors='raise')
    246                 data.offset = freq
    247                 if isinstance(data, DatetimeIndex):

/home/filipe/.virtualenvs/blog/lib/python2.7/site-packages/pandas/tseries/ in to_datetime(arg, errors, dayfirst, utc, box, format, exact, coerce, unit, infer_datetime_format)
    341         return Series(values, index=arg.index,
    342     elif com.is_list_like(arg):
--> 343         return _convert_listlike(arg, box, format)
    345     return _convert_listlike(np.array([ arg ]), box, format)[0]

/home/filipe/.virtualenvs/blog/lib/python2.7/site-packages/pandas/tseries/ in _convert_listlike(arg, box, format)
    331                 return DatetimeIndex._simple_new(values, None, tz=tz)
    332             except (ValueError, TypeError):
--> 333                 raise e
    335     if arg is None:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 100-02-01 00:00:00

Nope :-(

We are very limited when dealing with non-standard calendars, but at least we can load and plot our data with a "proper" timestamp.

In [12]:

This post was written as an IPython notebook. It is available for download or as a static html.

Creative Commons License
python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at