madspikes

Spike detection using a moving median absolute difference filter

This module was original written by Tino Rau and Matthias Cuntz, and maintained by Arndt Piayda while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued by Matthias Cuntz while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.

copyright:

Copyright 2008-2022 Matthias Cuntz, see AUTHORS.rst for details.

license:

MIT License, see LICENSE for details.

The following functions are provided

madspikes(dfin[, flag, isday, colhead, ...])

Spike detection using a moving median absolute difference filter

History
  • Written 2008 by Tino Rau and Matthias Cuntz - mc (at) macu (dot) de

  • Maintained by Arndt Piayda since Aug 2014.

  • Input can be pandas Dataframe or numpy array(s), Apr 2020, Matthias Cuntz

  • Removed iteration, Apr 2020, Matthias Cuntz

  • Using numpy docstring format, May 2020, Matthias Cuntz

  • Improved flake8 and numpy docstring, Oct 2021, Matthias Cuntz

  • Removed np.float and np.bool, Jun 2024, Matthias Cuntz

madspikes(dfin, flag=None, isday=None, colhead=None, undef=-9999, nscan=720, nfill=48, z=7, deriv=2, swthr=10.0, plot=False)[source]

Spike detection using a moving median absolute difference filter

Used with Eddy vovariance data in Papale et al. (Biogeosciences, 2006).

Parameters:
  • dfin (pandas.Dataframe or numpy.array) – time series of data where spike detection with MAD should be applied. dfin can be a pandas.Dataframe. dfin can also me a numpy array. In this case colhead must be given. MAD will be applied along axis=0, i.e. on each column of axis=1.

  • flag (pandas.Dataframe or numpy.array, optional) – Dataframe or array has the same shape as dfin. Non-zero values in flag will be treated as missing values in dfin. If flag is numpy array, df.columns.values will be used as column heads.

  • isday (array_like of bool, optional) – True when it is day, False when night. Must have the same length as dfin.shape[0]. If isday is not given, dfin must have a column with head ‘SW_IN’ or starting with ‘SW_IN’. isday will then be dfin[‘SW_IN’] > swthr.

  • colhed (array_like of str, optional) – column names if dfin is numpy array.

  • undef (float, optional) – values having undef value are treated as missing values in dfin (default: -9999). np.nan as undef is not allowed (not working).

  • nscan (int, optional) – size of moving window to calculate mad in time steps (default: 15*48)

  • nfill (int, optional) – step size of moving window to calculate mad in time steps (default: 1*48). MAD will be calculated in nscan time window. Resulting mask will be applied only in nfill window in the middle of the nscan window. Then nscan window will be moved by nfill time steps.

  • z (float, optional) – Input is allowed to deviate maximum z standard deviations from the median (default: 7)

  • deriv (int, optional) –

    0: Act on raw input.

    1: Use first derivatives.

    2: Use 2nd derivatives (default).

  • swthr (float, optional) – Threshold to determine daytime from incoming shortwave radiation if isday not given (default: 10).

  • plot (bool, optional) – True: data and spikes are plotted into madspikes.pdf (default: False).

Returns:

flags, 0 everywhere except detected spikes set to 2

Return type:

pandas.Dataframe or numpy array