kernel_regression#
Multi-dimensional non-parametric kernel regression
This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.
- copyright:
Copyright 2012-2022 Matthias Cuntz, see AUTHORS.rst for details.
- license:
MIT License, see LICENSE for details.
The following functions are provided
|
Optimal bandwidth for multi-dimensional non-parametric kernel regression |
|
Multi-dimensional non-parametric kernel regression |
- History
Written, Jun 2012 by Matthias Cuntz - mc (at) macu (dot) de, inspired by Matlab routines of Yingying Dong, Boston College and Yi Cao, Cranfield University
Assert correct input, Apr 2014, Matthias Cuntz
Corrected bug in _boot_h: x.size->x.shape[0], Jan 2018, Matthias Cuntz
Code refactoring, Sep 2021, Matthias Cuntz
Use format strings, Apr 2022, Matthias Cuntz
Use minimize with method TNC instead of fmin_tnc, Apr 2022, Matthias Cuntz
Use helper function array2input to assure correct output type, Apr 2022, Matthias Cuntz
Return scalar h if 1-dimensional, Apr 2022, Matthias Cuntz
Output type is same as y instead of x or xout, Apr 2022, Matthias Cuntz
- kernel_regression(x, y, h=None, silverman=False, xout=None)[source]#
Multi-dimensional non-parametric kernel regression
Optimal bandwidth can be estimated by cross-validation or by using Silverman’s rule-of-thumb.
- Parameters:
x (array_like (n, k)) – Independent values
y (array_like (n)) – Dependent values
h (float or array_like(k), optional) – Use h as bandwidth for calculating regression values if given, otherwise determine optimal h using cross-validation or by using Silverman’s rule-of-thumb if silverman==True.
silverman (bool, optional) – Use Silverman’s rule-of-thumb to calculate bandwidth h if True, otherwise determine h via cross-validation. Only used if h is not given.
xout (ndarray(n, k), optional) – Return fitted values at xout if given, otherwise return fitted values at x.
- Returns:
Fitted values at x, or at xout if given
- Return type:
array_like with same type as x, or xout if given
References
- Hardle W and Muller M (2000) Multivariate and semiparametric kernel
regression. In MG Schimek (Ed.), Smoothing and regression: Approaches, computation, and application (pp. 357-392). Hoboken, NJ, USA: John Wiley & Sons, Inc. doi: 10.1002/9781118150658.ch12
Examples
>>> n = 10 >>> x = np.zeros((n, 2)) >>> x[:, 0] = np.arange(n, dtype=float) / float(n-1) >>> x[:, 1] = 1. / (np.arange(n, dtype=float) / float(n-1) + 0.1) >>> y = 1. + x[:, 0]**2 - np.sin(x[:, 1])**2
Separate determination of h and kernel regression
>>> h = kernel_regression_h(x, y) >>> yk = kernel_regression(x, y, h) >>> print(np.allclose(yk[0:6], ... [0.52241, 0.52570, 0.54180, 0.51781, 0.47644, 0.49230], ... atol=0.0001)) True
Single call to kernel regression
>>> yk = kernel_regression(x, y) >>> print(np.allclose(yk[0:6], ... [0.52241, 0.52570, 0.54180, 0.51781, 0.47644, 0.49230], ... atol=0.0001)) True
Single call to kernel regression using Silverman’s rule-of-thumb for h
>>> yk = kernel_regression(x, y, silverman=True) >>> print(np.allclose(yk[0:6], ... [0.691153, 0.422809, 0.545844, 0.534315, 0.521494, 0.555426], ... atol=0.0001)) True
>>> n = 5 >>> xx = np.empty((n, 2)) >>> xx[:, 0] = (np.amin(x[:, 0]) + (np.amax(x[:, 0]) - np.amin(x[:, 0])) * ... np.arange(n, dtype=float) / float(n)) >>> xx[:, 1] = (np.amin(x[:, 1]) + (np.amax(x[:, 1]) - np.amin(x[:, 1])) * ... np.arange(n, dtype=float) / float(n)) >>> yk = kernel_regression(x, y, silverman=True, xout=xx) >>> print(np.allclose(yk, ... [0.605485, 0.555235, 0.509529, 0.491191, 0.553325], ... atol=0.0001)) True
- kernel_regression_h(x, y, silverman=False)[source]#
Optimal bandwidth for multi-dimensional non-parametric kernel regression
Optimal bandwidth is determined using cross-validation or Silverman’s rule-of-thumb.
- Parameters:
x (array_like (n, k)) – Independent values
y (array_like (n)) – Dependent values
silverman (bool, optional) – Use Silverman’s rule-of-thumb to calculate bandwidth h if True, otherwise determine h via cross-validation
- Returns:
Optimal bandwidth h. If multidimensional regression then h is a 1d-array, assuming a diagonal bandwidth matrix.
- Return type:
float or array
References
- Hardle W and Muller M (2000) Multivariate and semiparametric kernel
regression. In MG Schimek (Ed.), Smoothing and regression: Approaches, computation, and application (pp. 357-392). Hoboken, NJ, USA: John Wiley & Sons, Inc. doi: 10.1002/9781118150658.ch12
Examples
>>> n = 10 >>> x = np.zeros((n, 2)) >>> x[:, 0] = np.arange(n, dtype=float) / float(n-1) >>> x[:, 1] = 1. / (np.arange(n, dtype=float) / float(n-1) + 0.1) >>> y = 1. + x[:, 0]**2 - np.sin(x[:, 1])**2
>>> h = kernel_regression_h(x, y) >>> print(np.allclose(h, [0.172680, 9.516907], atol=0.0001)) True
>>> h = kernel_regression_h(x, y, silverman=True) >>> print(np.allclose(h, [0.229190, 1.903381], atol=0.0001)) True
>>> n = 10 >>> x = np.arange(n, dtype=float) / float(n-1) >>> y = 1. + x**2 - np.sin(x)**2
>>> h = kernel_regression_h(x, y) >>> print(np.around(h, 4)) 0.045
>>> h = kernel_regression_h(x, y, silverman=True) >>> print(np.around(h, 4)) 0.2248