python4oceanographers

Turning ripples into waves

Pandas for CTD data

Pandas is an amazing python module to deal with data. It has tons of features worthwhile learning, but the best of all is how quick one can explore a data with just a few code lines once you learn the basics.

I wrote a small module to read CTD (also XBT's EDF and FSI's CTD format) data directly as a pandas DataFrame. Here is an example how to use it:

In [2]:
import gsw
import numpy as np
from oceans.sw_extras import gamma_GP_from_SP_pt
from ctd import DataFrame, Series, movingaverage, rosette_summary, lp_filter

def o2sat(s, pt):
    from seawater.library import T68conv
    t = T68conv(pt) + 273.16
    a = (-177.7888, 255.5907, 146.4813, -22.2040)
    b = (-0.037362, 0.016504, -0.0020564)
    lnC = (a[0] + a[1] * (100. / t) + a[2] * np.log(t / 100.) + a[3] *
           (t / 100.) +
           s * (b[0] + b[1] * (t / 100.) + b[2] * (t / 100.) ** 2))
    return np.exp(lnC) * 1000. / 22.392
In [3]:
cast = DataFrame.from_cnv('./data/CTD_001.cnv.gz', compression='gzip')

That's it! One line to load the compressed SeaBird cnv file into memory. If you have the Rosette file you can load it as well and print the bottle summary with just two lines:

In [4]:
rossum = rosette_summary('./data/CTD_001.ros')
rossum.groupby(rossum.index)['t090C', 't190C', 'c0S/m', 'c1S/m'].mean()
Out[4]:
t090C t190C c0S/m c1S/m
nbf
1 0.928043 0.924906 3.119878 3.119614
2 0.925949 0.922761 3.119751 3.119480
3 0.926014 0.923645 3.119753 3.119537
4 0.925939 0.923045 3.119767 3.119535
5 0.927427 0.924004 3.119817 3.119534
6 0.927300 0.923857 3.119881 3.119604
7 2.442392 2.439239 3.242547 3.242271
8 2.443031 2.439859 3.242581 3.242297
9 2.442831 2.440165 3.242537 3.242277
10 3.230039 3.227310 3.277096 3.276797
11 3.230059 3.227284 3.277093 3.276799
12 3.229876 3.227224 3.277075 3.276790
13 3.838278 3.835665 3.283290 3.283006
14 3.834353 3.833296 3.282793 3.282687
15 3.836551 3.833398 3.283059 3.282718
16 4.507531 4.505259 3.281205 3.280923
17 4.507973 4.505092 3.281235 3.280895
18 4.508949 4.505784 3.281321 3.280968
19 14.905559 14.907837 4.343548 4.343580
20 14.873559 14.869651 4.339898 4.339350
21 14.946447 14.941363 4.348168 4.347379
22 23.996061 23.987796 5.500221 5.498812
23 23.987096 23.963898 5.499071 5.495626
24 23.975698 23.965688 5.497590 5.495803

Metadata and data flags are also easily accessed.

In [5]:
# Read metadata.
hours = cast['timeS'].max() / 60. / 60.
lon, lat = cast.longitude.mean(), cast.latitude.mean()

# Apply 'bad pump' and 'flag'.
cast = cast[cast['pumps']]  # True for good values.
cast = cast[~cast['flag']]  # True for bad values blame SBE!

print("Cast duration was: %2.2f hours." % hours)
cast[['t090C', 't190C', 'c0S/m', 'c1S/m']].describe()  # Statistics for sensors pairs 0, 1.
Cast duration was: 3.91 hours.

Out[5]:
t090C t190C c0S/m c1S/m
count 337916.000000 337916.000000 337916.000000 3.379160e+05
mean 4.665746 4.661965 3.417277 3.416835e+00
std 5.100514 5.099301 0.516788 5.165796e-01
min 0.925100 0.921800 3.119202 -9.990000e-29
25% 2.569000 2.565900 3.228872 3.228444e+00
50% 3.234400 3.231500 3.262658 3.262342e+00
75% 3.875200 3.871600 3.300690 3.300353e+00
max 24.240500 24.240200 5.529971 5.529486e+00

Cleaning the DataFrame and manipulating the variables:

In [6]:
keep = set(['t090C', 'c0S/m', 'sbeox0Mm/Kg', 'dz/dtM'])
cast = cast.drop(keep.symmetric_difference(cast.columns), axis=1)

The module also contains some rudimentary CTD processing steps akin to those found in the SBE Software:

  • Smooth velocity and Oxygen with a 2 seconds window:
In [7]:
cast['dz/dtM'] = movingaverage(cast['dz/dtM'], window_size=48)
cast['sbeox0Mm/Kg'] = movingaverage(cast['sbeox0Mm/Kg'], window_size=48)
  • Filter pressure:
In [8]:
kw = dict(sample_rate=24.0, time_constant=0.15)
cast.index = lp_filter(cast.index, **kw)
  • Split:
In [9]:
downcast, upcast = cast.split()
  • Loop Edit:
In [10]:
downcast = downcast.press_check()  # Remove pressure reversals.
downcast = downcast[downcast['dz/dtM'] >= 0.25]  # Threshold velocity.
  • Wild Edit:
In [11]:
kw = dict(n1=2, n2=20, block=150)
downcast = downcast.apply(Series.despike, **kw)
  • Bin-average:
In [12]:
downcast = downcast.apply(Series.bindata, **dict(delta=1.))
downcast = downcast.apply(Series.interpolate)
  • Smooth:
In [13]:
pmax = max(cast.index)
if pmax >= 500:
    window_len = 21
elif pmax >= 100:
    window_len = 11
else:
    window_len = 5
kw = dict(window_len=window_len, window='hanning')
downcast = downcast.apply(Series.smooth, **kw)
  • The module also allows for a more "customized" Derive step:
In [14]:
p = downcast.index.values.astype(float)
downcast['depth'] = -gsw.z_from_p(p, lat)
# ctm [mS/cm] = ctm [S/m] * 10
downcast['SP'] = gsw.SP_from_C(downcast['c0S/m'].values * 10, downcast['t090C'].values, p)
downcast['SA'] = gsw.SA_from_SP(downcast['SP'].values, p, lon, lat)
downcast['CT'] = gsw.CT_from_t(downcast['SA'].values, downcast['t090C'].values, p)
downcast['sigma0_CT'] = gsw.sigma0_CT_exact(downcast['SA'].values, downcast['CT'].values)
downcast['sound'] = gsw.sound_speed_CT_exact(downcast['SA'].values, downcast['CT'].values, p)
downcast['gamma'] = gamma_GP_from_SP_pt(downcast['SA'].values, downcast['CT'].values, p, lon, lat)
# Does't fit the DataFrame because the index in at the middle (p_mid) point of the original data.
N2, p_mid = gsw.Nsquared(downcast['SA'].values, downcast['CT'].values, p, lat=lat)
downcast['aou'] = o2sat(downcast['SA'].values, downcast['CT'].values) - downcast['sbeox0Mm/Kg']

Last but not least, a handy way for plotting the data:

In [15]:
figsize = (4, 6)
kw_t = dict(linestyle='-', color='#6699cc', linewidth=3, label=r"Temperature [$^\circ$C]")
kw_ct = dict(linestyle='-', color='#ffcc33', alpha=0.75, linewidth=3, label=r"Conservative temperature [$^\circ$C]")

fig, ax = downcast['t090C'].plot(**kw_t)
ax.plot(downcast['CT'], downcast.index, **kw_ct)
ax.grid(True)
ax.set_xlabel("Temperature")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower right")
fig.set_size_inches(figsize)
In [16]:
kw_sp = dict(linestyle='-', color='#339933', linewidth=3, label=r"Practical Salinity")
kw_sa = dict(linestyle='-', color='#999999', linewidth=3, label=r"Absolute Salinity [g kg$^{-1}$]")
fig, ax = downcast['SP'].plot(**kw_sp)
ax.plot(downcast['SA'], downcast.index, **kw_sa)
ax.grid(True)
ax.set_xlabel("Salinity")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower right")
fig.set_size_inches(figsize)
In [17]:
kw_gm = dict(linestyle='-', color='#0000ff', linewidth=3, label=r"Gamma")
kw_sg = dict(linestyle='-', color='#ff3333', linewidth=3, label=r"Sigma")
fig, ax = downcast['gamma'].plot(**kw_sg)
ax.plot(downcast['sigma0_CT'], downcast.index, **kw_gm)
ax.grid(True)
ax.set_xlabel("Density [kg m $^{-3}$]")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower left")
fig.set_size_inches(figsize)
_ = ax.set_xlim(25.3, 28.5)
In [18]:
HTML(html)
Out[18]:

This post was written as an IPython notebook. It is available for download or as a static html.

Creative Commons License
python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

Comments