MAD

Contents

MAD#

Median absolute deviation test

This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.

copyright:

Copyright 2011-2022 Matthias Cuntz, see AUTHORS.rst for details.

license:

MIT License, see LICENSE for details.

The following functions are provided

mad(datin[, z, deriv, nozero, prepend, append])

Median absolute deviation test

History
  • Written Nov 2011 by Matthias Cuntz - mc (at) macu (dot) de

  • ND-array, act on axis=0, May 2012, Matthias Cuntz

  • Removed bug in broadcasting, axis=0 did not always work: spread md and MAD to input dimensions, Jun 2012, Matthias Cuntz

  • Better usage of numpy possibilities, e.g. using np.diff, remove spreads, Jun 2012, Matthias Cuntz

  • Ported to Python 3, Feb 2013, Matthias Cuntz

  • Use bottleneck for medians, otherwise loop over axis=1, Jul 2013, Matthias Cuntz and Juliane Mai

  • Re-allow masked arrays and arrays with NaNs, Jul 2013, Matthias Cuntz

  • Removed bug in NaN treatment, Oct 2013, Matthias Cuntz

  • Keyword nonzero, Oct 2013, Matthias Cuntz

  • Using numpy docstring format, May 2020, Matthias Cuntz

  • Code refactoring, Sep 2021, Matthias Cuntz

  • Ported to pyjams, Jan 2022, Matthias Cuntz

  • Only one test of bottleneck availability, Jan 2022, Matthias Cuntz

  • Return all False instead of all True if all masked before MAD started, Jan 2022, Matthias Cuntz

  • prepend, append as in numpy.diff, May 2023, Matthias Cuntz

  • Support pandas Series and DataFrame, Jul 2023, Matthias Cuntz

mad(datin, z=7, deriv=0, nozero=False, prepend=None, append=None)[source]#

Median absolute deviation test

The test acts either on raw values, or on 1st or 2nd derivatives.

Return mask with True where value is out of range, i.e. \(< (median - z.MAD/0.6745)\) or \(> (md + z.MAD/0.6745)\).

Parameters:
  • datin (array or masked array) – mad acts on axis=0

  • z (float, optional) – Input is allowed to deviate maximum z (estimators of) standard deviations from the median (default: 7)

  • deriv (int, optional) – Act on raw input (0, default), on first derivatives (1), or on 2nd derivatives (2)

  • nozero (bool, optional) – Exclude zeros (0.) from input datin if True.

  • prepend (array_like, optional) –

    Values to prepend or append to datin prior to performing the difference with numpy.diff if deriv > 0. prepend uses numpy.insert and append uses numpy.append. Scalar values are hence expanded to arrays with length 1 in the first axis and the shape of the input array along all other axes. Otherwise the dimension and shape must match datin except along the first axis.

    Added in version 1.31.

  • append (array_like, optional) –

    Values to prepend or append to datin prior to performing the difference with numpy.diff if deriv > 0. prepend uses numpy.insert and append uses numpy.append. Scalar values are hence expanded to arrays with length 1 in the first axis and the shape of the input array along all other axes. Otherwise the dimension and shape must match datin except along the first axis.

    Added in version 1.31.

Returns:

False everywhere except where input deviates more than z standard deviations from median. The shape of the output is the same as datin except for the first dimension, which is smaller by deriv if prepend and append are not set.

Return type:

array of bool

Notes

If input is an ndarray then mad is checked along the first axis for outliers.

The 1st derivative is calculated simply as d = numpy.diff(datin, n=1, axis=0) because mean of left and right difference would give 0 for spikes.

The 2nd derivative is calculated as d = numpy.diff(datin, n=2, axis=0).

If numpy.all(d.mask) then d.mask is returned, which is all True.

NaN does not return True because this would remove points adjacent to NaN if deriv > 0.

Examples

>>> import numpy as np
>>> y = np.array([-0.25, 0.68, 0.94, 1.15, 2.26, 2.35, 2.37, 2.40, 2.47,
...               2.54, 2.62, 2.64, 2.90, 2.92, 2.92, 2.93, 3.21, 3.26,
...               3.30, 3.59, 3.68, 4.30, 4.64, 5.34, 5.42, 8.01])

MAD on raw data

>>> print(mad(y))
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False]
>>> print(mad(y, z=4))
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False  True]
>>> print(mad(y, z=3))
[ True False False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]

MAD on 2nd derivatives

>>> print(mad(y, z=4, deriv=2))
[False False False False False False False False False False False False
 False False False False False False False False False False False  True]

MAD on 2nd derivatives with prepend and append set

>>> print(mad(y, z=4, deriv=2, prepend=y[0], append=y[-1]))
[False False False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]

Use for masking arrays, for example

>>> my = np.ma.array(y, mask=mad(y, z=4))
>>> print(my)
[-0.25 0.68 0.94 1.15 2.26 2.35 2.37 2.4 2.47 2.54 2.62 2.64 2.9 2.92 2.92
 2.93 3.21 3.26 3.3 3.59 3.68 4.3 4.64 5.34 5.42 --]

MAD on several dimensions

>>> yy = np.transpose(np.array([y, y]))
>>> print(np.transpose(mad(yy, z=4)))
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
  False  True]
 [False False False False False False False False False False False False
  False False False False False False False False False False False False
  False  True]]
>>> yyy = np.transpose(np.array([y, y, y]))
>>> print(np.transpose(mad(yyy, z=3)))
[[ True False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [ True False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [ True False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]]
>>> print(np.transpose(mad(yy, z=4, deriv=2)))
[[False False False False False False False False False False False False
  False False False False False False False False False False False  True]
 [False False False False False False False False False False False False
  False False False False False False False False False False False  True]]

Set prepend and append either as scalar or array

>>> print(np.transpose(mad(yy, z=4, deriv=2, prepend=y[0], append=y[-1])))
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [False False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]]
>>> print(np.transpose(mad(yy, z=4, deriv=2,
...                        prepend=yy[0, :], append=yy[-1, :])))
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [False False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]]
>>> print(np.transpose(mad(yy, z=4, deriv=2,
...                        prepend=yy[0:1, :], append=yy[-1:, :])))
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [False False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]]

Masked arrays

>>> my = np.ma.array(y, mask=np.zeros(y.shape))
>>> my.mask[-1] = True
>>> print(mad(my, z=4))
[True False False False False False False False False False False False
 False False False False False False False False False False False False
 False --]
>>> print(mad(my, z=3))
[True False False False False False False False False False False False
 False False False False False False False False False False False True
 True --]

Arrays with NaNs

>>> ny = y.copy()
>>> ny[-1] = np.nan
>>> print(mad(ny, z=4))
[ True False False False False False False False False False False False
 False False False False False False False False False False False False
 False False]
>>> print(mad(ny, z=3))
[ True False False False False False False False False False False False
 False False False False False False False False False False False  True
  True False]

Exclude zeros

>>> zy = y.copy()
>>> zy[1] = 0.
>>> print(mad(zy, z=3))
[ True  True False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]
>>> print(mad(zy, z=3, nozero=True))
[ True False False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]