There are two different types of spikes that may happen in CTD data:
- Bad data from electrical failures. Usually it manifest in all the measured variables (Temperature, Conductivity, Oxygen, etc);
- Salinity spikes due to a wrong temperature value used when computing it from conductivity. This can happen due to a a bad Alignment of the temperature and conductivity sensors and/or poor Cell thermal mass correction (Lueck 1990).
The second case is a little bit more complex and will be discussed in another post. However, when bad data appears one must create a set criteria to eliminate them. It varies from manually excluding them to several "objective" criteria.
SBE software calls it Wild Edit and performs a very similar technique to the two-pass criteria from a previous post.
Here is a typical spike in the salinity data:
import gsw
from ctd import DataFrame, despike
cast = DataFrame.from_cnv('./data/CTD-after.cnv.gz', compression='gzip').split()[0]
# Compute salinity.
p = cast.index.values.astype(float)
SP = gsw.SP_from_C(cast['c0S/m'].values * 10, cast['t090C'].values, p)
cast['SP'] = SP.copy()
The parameters used where:
# Wild Edit.
cast['SP'] = cast['SP'].despike(n1=2, n2=20, block=500)
kw_original = dict(linestyle='-', color='#ff3333', alpha=0.65, linewidth=1.5, label=r"Practical Salinity Before Wild Edit.")
kw_despiked = dict(linestyle='-', color='#339933', linewidth=3, label=r"Practical Salinity After Wild Edit.")
fig, ax = cast['SP'].plot(**kw_despiked)
ax.plot(SP, cast.index, **kw_original)
ax.set_xlabel("Salinity [g kg$^{-1}$]")
ax.set_ylabel("Pressure [dbar]")
l = ax.legend(loc="lower right")
fig.set_size_inches((5, 7))
The first pass just flag the data that is more than 2 standard deviations from the mean and hide them from the second pass. The second pass then flags the remaining data points that are more than 20 standard deviations from the mean.
2, 20 , 500 are not magical numbers! They are the same values suggested in the SBE Software defaults, probably tested to exhaustion for CTD data.
HTML(html)