I saw an issue on GitHub over the weekend that got me thinking: how does one read non-standards dates from dataset like simulations of Earth past climates?
I will use an example of a common dataset out there in the wild for this post.
Let's try to read it with the netCDF4
library first.
from netCDF4 import Dataset, num2date
nc = Dataset('./data/cf_units.nc')
time = nc.variables['time']
print('{}\n{}'.format(time.units, time.calendar))
Days since year 0
and a noleap
calendar. Anyone working with climate
will encounter that at a certain point in theirs lives. We have to deal
with that, but...
num2date(time[:], time.units, time.calendar)
That is a sensible error message. After all there is no year zero in any of the calendars supported by netCDF4. UDUNITS behaves in a different way though:
> "UDUNITS implements the mixed Gregorian/Julian calendar system, as followed in England, in which dates prior to 1582-10-15 are assumed to use the Julian calendar. Other software cannot be relied upon to handle the change of calendar in the same way, so for robustness it is recommended that the reference date be later than 1582. If earlier dates must be used, it should be noted that UDUNITS treats 0 AD as identical to 1 AD."
The year-zero edge case will show up in the CF-conventions for climatological data. It is allowed, but not recommended! It only exists for compatibility with the COARDS conventions.
How can we derive higher level datetime-like
objects to work with?
We can use UDUNITS of course, but a python wrapper is better!
import cf_units
times = cf_units.num2date(time[:], time.units, time.calendar)
times
Same time variable! Same netCDF4.num2date
syntax! It cannot get any easier
than that ;-)
But note!!! This is not a real datetime
object!
print('{}\n'.format(type(times[0])))
[m for m in dir(times[0]) if not m.startswith('__')]
A lot of the same methods are there and, if you are brave enough, you can get
real datetime
object with _to_real_datetime
.
date = times[0]._to_real_datetime()
date
Is it doing the right thing? Let's check! Remember that the 'noleap'
calendar means 365 days.
import numpy as np
years = np.fix(time[:]/365)
days = np.remainder(time[:], 365)
years, days
31 days ahead the first day means February 1, right!? Note that the fake
datetime
object can do date formatting.
times[0].strftime('%Y-%m-%d')
But the real datetime
cannot!
date.strftime('%Y-%m-%d')
Bonus questions: will this work as a pandas time index?
from pandas import DatetimeIndex
times = cf_units.num2date(np.arange(36531, 36532, 1), time.units, time.calendar)
times = [t._to_real_datetime() for t in times]
DatetimeIndex(times)
Nope :-(
We are very limited when dealing with non-standard calendars, but at least we can load and plot our data with a "proper" timestamp.
HTML(html)