Pandas is an amazing python module to deal with data. It has tons of features worthwhile learning, but the best of all is how quick one can explore a data with just a few code lines once you learn the basics.

I wrote a small module to read CTD (also XBT's EDF and FSI's CTD format) data directly as a pandas DataFrame. Here is an example how to use it:

In [2]:

import gsw
import numpy as np
from oceans.sw_extras import gamma_GP_from_SP_pt
from ctd import DataFrame, Series, movingaverage, rosette_summary, lp_filter

def o2sat(s, pt):
    from seawater.library import T68conv
    t = T68conv(pt) + 273.16
    a = (-177.7888, 255.5907, 146.4813, -22.2040)
    b = (-0.037362, 0.016504, -0.0020564)
    lnC = (a[0] + a[1] * (100. / t) + a[2] * np.log(t / 100.) + a[3] *
           (t / 100.) +
           s * (b[0] + b[1] * (t / 100.) + b[2] * (t / 100.) ** 2))
    return np.exp(lnC) * 1000. / 22.392

In [3]:

cast = DataFrame.from_cnv('./data/CTD_001.cnv.gz', compression='gzip')

That's it! One line to load the compressed SeaBird cnv file into memory. If you have the Rosette file you can load it as well and print the bottle summary with just two lines:

In [4]:

rossum = rosette_summary('./data/CTD_001.ros')
rossum.groupby(rossum.index)['t090C', 't190C', 'c0S/m', 'c1S/m'].mean()

Out[4]:

	t090C	t190C	c0S/m	c1S/m
nbf
1	0.928043	0.924906	3.119878	3.119614
2	0.925949	0.922761	3.119751	3.119480
3	0.926014	0.923645	3.119753	3.119537
4	0.925939	0.923045	3.119767	3.119535
5	0.927427	0.924004	3.119817	3.119534
6	0.927300	0.923857	3.119881	3.119604
7	2.442392	2.439239	3.242547	3.242271
8	2.443031	2.439859	3.242581	3.242297
9	2.442831	2.440165	3.242537	3.242277
10	3.230039	3.227310	3.277096	3.276797
11	3.230059	3.227284	3.277093	3.276799
12	3.229876	3.227224	3.277075	3.276790
13	3.838278	3.835665	3.283290	3.283006
14	3.834353	3.833296	3.282793	3.282687
15	3.836551	3.833398	3.283059	3.282718
16	4.507531	4.505259	3.281205	3.280923
17	4.507973	4.505092	3.281235	3.280895
18	4.508949	4.505784	3.281321	3.280968
19	14.905559	14.907837	4.343548	4.343580
20	14.873559	14.869651	4.339898	4.339350
21	14.946447	14.941363	4.348168	4.347379
22	23.996061	23.987796	5.500221	5.498812
23	23.987096	23.963898	5.499071	5.495626
24	23.975698	23.965688	5.497590	5.495803

Metadata and data flags are also easily accessed.

In [5]:

# Read metadata.
hours = cast['timeS'].max() / 60. / 60.
lon, lat = cast.longitude.mean(), cast.latitude.mean()

# Apply 'bad pump' and 'flag'.
cast = cast[cast['pumps']]  # True for good values.
cast = cast[~cast['flag']]  # True for bad values blame SBE!

print("Cast duration was: %2.2f hours." % hours)
cast[['t090C', 't190C', 'c0S/m', 'c1S/m']].describe()  # Statistics for sensors pairs 0, 1.

Cast duration was: 3.91 hours.

Out[5]:

	t090C	t190C	c0S/m	c1S/m
count	337916.000000	337916.000000	337916.000000	3.379160e+05
mean	4.665746	4.661965	3.417277	3.416835e+00
std	5.100514	5.099301	0.516788	5.165796e-01
min	0.925100	0.921800	3.119202	-9.990000e-29
25%	2.569000	2.565900	3.228872	3.228444e+00
50%	3.234400	3.231500	3.262658	3.262342e+00
75%	3.875200	3.871600	3.300690	3.300353e+00
max	24.240500	24.240200	5.529971	5.529486e+00

Cleaning the DataFrame and manipulating the variables:

In [6]:

keep = set(['t090C', 'c0S/m', 'sbeox0Mm/Kg', 'dz/dtM'])
cast = cast.drop(keep.symmetric_difference(cast.columns), axis=1)

The module also contains some rudimentary CTD processing steps akin to those found in the SBE Software:

Smooth velocity and Oxygen with a 2 seconds window:

In [7]:

cast['dz/dtM'] = movingaverage(cast['dz/dtM'], window_size=48)
cast['sbeox0Mm/Kg'] = movingaverage(cast['sbeox0Mm/Kg'], window_size=48)

Filter pressure:

In [8]:

kw = dict(sample_rate=24.0, time_constant=0.15)
cast.index = lp_filter(cast.index, **kw)

Split:

In [9]:

downcast, upcast = cast.split()

Loop Edit:

In [10]:

downcast = downcast.press_check()  # Remove pressure reversals.
downcast = downcast[downcast['dz/dtM'] >= 0.25]  # Threshold velocity.

Wild Edit:

In [11]:

kw = dict(n1=2, n2=20, block=150)
downcast = downcast.apply(Series.despike, **kw)

Bin-average:

In [12]:

downcast = downcast.apply(Series.bindata, **dict(delta=1.))
downcast = downcast.apply(Series.interpolate)

Smooth:

In [13]:

pmax = max(cast.index)
if pmax >= 500:
    window_len = 21
elif pmax >= 100:
    window_len = 11
else:
    window_len = 5
kw = dict(window_len=window_len, window='hanning')
downcast = downcast.apply(Series.smooth, **kw)

The module also allows for a more "customized" Derive step:

In [14]:

p = downcast.index.values.astype(float)
downcast['depth'] = -gsw.z_from_p(p, lat)
# ctm [mS/cm] = ctm [S/m] * 10
downcast['SP'] = gsw.SP_from_C(downcast['c0S/m'].values * 10, downcast['t090C'].values, p)
downcast['SA'] = gsw.SA_from_SP(downcast['SP'].values, p, lon, lat)
downcast['CT'] = gsw.CT_from_t(downcast['SA'].values, downcast['t090C'].values, p)
downcast['sigma0_CT'] = gsw.sigma0_CT_exact(downcast['SA'].values, downcast['CT'].values)
downcast['sound'] = gsw.sound_speed_CT_exact(downcast['SA'].values, downcast['CT'].values, p)
downcast['gamma'] = gamma_GP_from_SP_pt(downcast['SA'].values, downcast['CT'].values, p, lon, lat)
# Does't fit the DataFrame because the index in at the middle (p_mid) point of the original data.
N2, p_mid = gsw.Nsquared(downcast['SA'].values, downcast['CT'].values, p, lat=lat)
downcast['aou'] = o2sat(downcast['SA'].values, downcast['CT'].values) - downcast['sbeox0Mm/Kg']

Last but not least, a handy way for plotting the data:

In [15]:

figsize = (4, 6)
kw_t = dict(linestyle='-', color='#6699cc', linewidth=3, label=r"Temperature [$^\circ$C]")
kw_ct = dict(linestyle='-', color='#ffcc33', alpha=0.75, linewidth=3, label=r"Conservative temperature [$^\circ$C]")

fig, ax = downcast['t090C'].plot(**kw_t)
ax.plot(downcast['CT'], downcast.index, **kw_ct)
ax.grid(True)
ax.set_xlabel("Temperature")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower right")
fig.set_size_inches(figsize)

In [16]:

kw_sp = dict(linestyle='-', color='#339933', linewidth=3, label=r"Practical Salinity")
kw_sa = dict(linestyle='-', color='#999999', linewidth=3, label=r"Absolute Salinity [g kg$^{-1}$]")
fig, ax = downcast['SP'].plot(**kw_sp)
ax.plot(downcast['SA'], downcast.index, **kw_sa)
ax.grid(True)
ax.set_xlabel("Salinity")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower right")
fig.set_size_inches(figsize)

In [17]:

kw_gm = dict(linestyle='-', color='#0000ff', linewidth=3, label=r"Gamma")
kw_sg = dict(linestyle='-', color='#ff3333', linewidth=3, label=r"Sigma")
fig, ax = downcast['gamma'].plot(**kw_sg)
ax.plot(downcast['sigma0_CT'], downcast.index, **kw_gm)
ax.grid(True)
ax.set_xlabel("Density [kg m $^{-3}$]")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower left")
fig.set_size_inches(figsize)
_ = ax.set_xlim(25.3, 28.5)

In [18]:

HTML(html)

Out[18]:

This post was written as an IPython notebook. It is available for download or as a static html.

python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

python4oceanographers

Turning ripples into waves

Pandas for CTD data

Comments