kernel_regression

kernel_regression#

Multi-dimensional non-parametric kernel regression

This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.

copyright:

Copyright 2012-2022 Matthias Cuntz, see AUTHORS.rst for details.

license:

MIT License, see LICENSE for details.

The following functions are provided

kernel_regression_h(x, y[, silverman])

Optimal bandwidth for multi-dimensional non-parametric kernel regression

kernel_regression(x, y[, h, silverman, xout])

Multi-dimensional non-parametric kernel regression

History
  • Written, Jun 2012 by Matthias Cuntz - mc (at) macu (dot) de, inspired by Matlab routines of Yingying Dong, Boston College and Yi Cao, Cranfield University

  • Assert correct input, Apr 2014, Matthias Cuntz

  • Corrected bug in _boot_h: x.size->x.shape[0], Jan 2018, Matthias Cuntz

  • Code refactoring, Sep 2021, Matthias Cuntz

  • Use format strings, Apr 2022, Matthias Cuntz

  • Use minimize with method TNC instead of fmin_tnc, Apr 2022, Matthias Cuntz

  • Use helper function array2input to assure correct output type, Apr 2022, Matthias Cuntz

  • Return scalar h if 1-dimensional, Apr 2022, Matthias Cuntz

  • Output type is same as y instead of x or xout, Apr 2022, Matthias Cuntz

kernel_regression(x, y, h=None, silverman=False, xout=None)[source]#

Multi-dimensional non-parametric kernel regression

Optimal bandwidth can be estimated by cross-validation or by using Silverman’s rule-of-thumb.

Parameters:
  • x (array_like (n, k)) – Independent values

  • y (array_like (n)) – Dependent values

  • h (float or array_like(k), optional) – Use h as bandwidth for calculating regression values if given, otherwise determine optimal h using cross-validation or by using Silverman’s rule-of-thumb if silverman==True.

  • silverman (bool, optional) – Use Silverman’s rule-of-thumb to calculate bandwidth h if True, otherwise determine h via cross-validation. Only used if h is not given.

  • xout (ndarray(n, k), optional) – Return fitted values at xout if given, otherwise return fitted values at x.

Returns:

Fitted values at x, or at xout if given

Return type:

array_like with same type as x, or xout if given

References

Hardle W and Muller M (2000) Multivariate and semiparametric kernel

regression. In MG Schimek (Ed.), Smoothing and regression: Approaches, computation, and application (pp. 357-392). Hoboken, NJ, USA: John Wiley & Sons, Inc. doi: 10.1002/9781118150658.ch12

Examples

>>> n = 10
>>> x = np.zeros((n, 2))
>>> x[:, 0] = np.arange(n, dtype=float) / float(n-1)
>>> x[:, 1] = 1. / (np.arange(n, dtype=float) / float(n-1) + 0.1)
>>> y = 1. + x[:, 0]**2 - np.sin(x[:, 1])**2

Separate determination of h and kernel regression

>>> h = kernel_regression_h(x, y)
>>> yk = kernel_regression(x, y, h)
>>> print(np.allclose(yk[0:6],
...       [0.52241, 0.52570, 0.54180, 0.51781, 0.47644, 0.49230],
...       atol=0.0001))
True

Single call to kernel regression

>>> yk = kernel_regression(x, y)
>>> print(np.allclose(yk[0:6],
...       [0.52241, 0.52570, 0.54180, 0.51781, 0.47644, 0.49230],
...       atol=0.0001))
True

Single call to kernel regression using Silverman’s rule-of-thumb for h

>>> yk = kernel_regression(x, y, silverman=True)
>>> print(np.allclose(yk[0:6],
...       [0.691153, 0.422809, 0.545844, 0.534315, 0.521494, 0.555426],
...       atol=0.0001))
True
>>> n = 5
>>> xx = np.empty((n, 2))
>>> xx[:, 0] = (np.amin(x[:, 0]) + (np.amax(x[:, 0]) - np.amin(x[:, 0])) *
...                                 np.arange(n, dtype=float) / float(n))
>>> xx[:, 1] = (np.amin(x[:, 1]) + (np.amax(x[:, 1]) - np.amin(x[:, 1])) *
...                                 np.arange(n, dtype=float) / float(n))
>>> yk = kernel_regression(x, y, silverman=True, xout=xx)
>>> print(np.allclose(yk,
...       [0.605485, 0.555235, 0.509529, 0.491191, 0.553325],
...       atol=0.0001))
True
kernel_regression_h(x, y, silverman=False)[source]#

Optimal bandwidth for multi-dimensional non-parametric kernel regression

Optimal bandwidth is determined using cross-validation or Silverman’s rule-of-thumb.

Parameters:
  • x (array_like (n, k)) – Independent values

  • y (array_like (n)) – Dependent values

  • silverman (bool, optional) – Use Silverman’s rule-of-thumb to calculate bandwidth h if True, otherwise determine h via cross-validation

Returns:

Optimal bandwidth h. If multidimensional regression then h is a 1d-array, assuming a diagonal bandwidth matrix.

Return type:

float or array

References

Hardle W and Muller M (2000) Multivariate and semiparametric kernel

regression. In MG Schimek (Ed.), Smoothing and regression: Approaches, computation, and application (pp. 357-392). Hoboken, NJ, USA: John Wiley & Sons, Inc. doi: 10.1002/9781118150658.ch12

Examples

>>> n = 10
>>> x = np.zeros((n, 2))
>>> x[:, 0] = np.arange(n, dtype=float) / float(n-1)
>>> x[:, 1] = 1. / (np.arange(n, dtype=float) / float(n-1) + 0.1)
>>> y = 1. + x[:, 0]**2 - np.sin(x[:, 1])**2
>>> h = kernel_regression_h(x, y)
>>> print(np.allclose(h, [0.172680, 9.516907], atol=0.0001))
True
>>> h = kernel_regression_h(x, y, silverman=True)
>>> print(np.allclose(h, [0.229190, 1.903381], atol=0.0001))
True
>>> n = 10
>>> x = np.arange(n, dtype=float) / float(n-1)
>>> y = 1. + x**2 - np.sin(x)**2
>>> h = kernel_regression_h(x, y)
>>> print(np.around(h, 4))
0.045
>>> h = kernel_regression_h(x, y, silverman=True)
>>> print(np.around(h, 4))
0.2248