Noise model diagnostics

Pints includes functionality to generate diagnostic plots of the residuals. These tools may be useful to evaluate the validity of a noise model.

Plotting functions:

Diagnostics:

Plotting functions

pints.residuals_diagnostics.plot_residuals_autocorrelation(parameters, problem, max_lag=10, thinning=None, significance_level=0.05, posterior_interval=0.95)[source]

Generate an autocorrelation plot of the residuals.

This function can be used to analyse the results of either optimisation or MCMC Bayesian inference. When multiple samples of the residuals are present (corresponding to multiple MCMC samples), the plot illustrates the distribution of autocorrelations across the MCMC samples. At each lag, a point is drawn at the median autocorrelation, and a line is drawn giving the percentile range of the posterior interval specified as an argument (by default, the 2.5th to the 97.5th percentile).

When multiple outputs are present, one residuals plot will be generated for each output.

When a significance level is provided, confidence bounds for the sample autocorrelations under the assumption of IID residuals are drawn on the plot. Many of the observed residuals autocorrelations falling outside these bounds could imply evidence against the residuals being IID.

Under the assumption that the residuals of length \(n\) are IID with mean 0 and variance \(\sigma^2\), for large \(n\) the residuals sample autocorrelations are approximately IID Normal(mean=0, variance=1/n). This result is proved in [1] (see Theorem 7.2.2 and Example 7.2.1). Therefore, the confidence bounds can be calculated by \(\pm z^* n^{-1/2}\) for the appropriate critical value \(z^*\).

This function returns a matplotlib figure.

Parameters:
  • parameters – The parameter values with shape (n_samples, n_parameters). When passing a single best fit parameter vector, n_samples will be 1.
  • problem – The problem given by a pints.SingleOutputProblem or pints.MultiOutputProblem, with n_parameters greater than or equal to the n_parameters of the parameters. Extra parameters not found in the problem are ignored.
  • max_lag – Optional int value (default 10). The highest lag to plot.
  • thinning – Optional int value (greater than zero). If thinning is set to n, only every nth sample in parameters will be used. If set to None (default), some thinning will be applied so that about 200 samples will be used.
  • significance_levelNone or float value (default 0.05). When a significance level is provided, dashed lines for the confidence interval corresponding to that significance level are drawn on the plot. When None, no lines are drawn.
  • posterior_interval – Float value (default 0.95). When multiple samples of the parameter values are provided, this gives the size of the credible region of the posterior to plot.

References

[1]Brockwell, P. J., & Davis, R. A. (1991). Time series: Theory and methods (2nd ed.). New York: Springer.
pints.residuals_diagnostics.plot_residuals_binned_autocorrelation(parameters, problem, thinning=None, n_bins=25)[source]

Plot the autocorrelation of the residuals within bins (i.e. discrete time windows across the series).

Given a time series with observed residuals

\[e_i = y_i - f(t_i; \theta)\]

This method divides the vector of residuals into some number of equally sized bins. The lag 1 autocorrelation is calculated for the residuals within each bin. The plot shows the lag 1 autocorrelation in each bin over time.

This diagnostic is useful for diagnosing time series with noise whose autocorrelation varies over time.

When passing an array of parameters (from an MCMC sampler), this method plots the autocorrelations of the posterior median residual values.

Typically, this diagnostic is called after obtaining the residuals of an IID fit, in order to determine whether the IID fit is satisfactory or a more complex noise model is needed.

This function returns a matplotlib figure.

Parameters:
  • parameters – The parameter values with shape (n_samples, n_parameters). When passing a single best fit parameter vector, n_samples will be 1.
  • problem – The problem given by a pints.SingleOutputProblem or pints.MultiOutputProblem, with n_parameters greater than or equal to the n_parameters of the parameters. Extra parameters not found in the problem are ignored.
  • thinning – Optional int value (greater than zero). If thinning is set to n, only every nth sample in parameters will be used. If set to None (default), some thinning will be applied so that about 200 samples will be used.
  • n_bins – Optional int value (greater than zero) giving the number of bins into which to divide the time series. By default, it is fixed to 25.
pints.residuals_diagnostics.plot_residuals_binned_std(parameters, problem, thinning=None, n_bins=25)[source]

Plot the standard deviation of the residuals within bins (i.e. discrete time windows across the series).

Given a time series with observed residuals

\[e_i = y_i - f(t_i; \theta)\]

This method divides the vector of residuals into some number of equally sized bins. The standard deviation is calculated for the residuals within each bin. The plot shows the standard deviation in each bin over time.

This diagnostic is particularly useful for diagnosing time series whose noise exhibits a change in variance over time.

When passing an array of parameters (from an MCMC sampler), this method will plot the standard deviation of the posterior median residual values.

Typically, this diagnostic can be called after obtaining the residuals of an IID fit, in order to determine whether the IID fit is satisfactory or a more complex noise model is needed.

This function returns a matplotlib figure.

Parameters:
  • parameters – The parameter values with shape (n_samples, n_parameters). When passing a single best fit parameter vector, n_samples will be 1.
  • problem – The problem given by a pints.SingleOutputProblem or pints.MultiOutputProblem, with n_parameters greater than or equal to the n_parameters of the parameters. Extra parameters not found in the problem are ignored.
  • thinning – Optional int value (greater than zero). If thinning is set to n, only every nth sample in parameters will be used. If set to None (default), some thinning will be applied so that about 200 samples will be used.
  • n_bins – Optional int value (greater than zero) giving the number of bins into which to divide the time series. By default, it is fixed to 25.
pints.residuals_diagnostics.plot_residuals_distance(parameters, problem, thinning=None)[source]

Plot a distance matrix of the residuals.

Given a time series with observed residuals

\[e_i = y_i - f(t_i; \theta)\]

this function generates and plots the distance matrix \(D\) whose entries are defined by

\[D_{i, j} = |e_i - e_j|\]

The plot of this matrix may be helpful for identifying a time series with correlated noise. When the noise terms are correlated, the distance matrix \(D\) is likely to have a banded appearance.

For problems with multiple outputs, one distance matrix is generated for each output.

When passing an array of parameters (from an MCMC sampler), this method will plot the distance matrix of the posterior median residual values.

Typically, this diagnostic is called after obtaining the residuals of an IID fit, in order to determine whether the IID fit is satisfactory or a more complex noise model is needed.

This function returns a matplotlib figure.

Parameters:
  • parameters – The parameter values with shape (n_samples, n_parameters). When passing a single best fit parameter vector, n_samples will be 1.
  • problem – The problem given by a pints.SingleOutputProblem or pints.MultiOutputProblem, with n_parameters greater than or equal to the n_parameters of the parameters. Extra parameters not found in the problem are ignored.
  • thinning – Optional int value (greater than zero). If thinning is set to n, only every nth sample in parameters will be used. If set to None (default), some thinning will be applied so that about 200 samples will be used.
pints.residuals_diagnostics.plot_residuals_vs_output(parameters, problem, thinning=None)[source]

Draw a plot of the magnitude of residuals versus the solution output.

This plot is useful to detect any dependence between the error model and the magnitude of the solution. For example, it may help to detect multiplicative Gaussian noise, in which the standard deviation of the error scales with the output.

When multiple samples of the parameters are provided (from an MCMC chain), the residuals are calculated and plotted relative to the posterior median of the solution outputs.

This function returns a matplotlib figure.

Parameters:
  • parameters – The parameter values with shape (n_samples, n_parameters). When passing a single best fit parameter vector, n_samples will be 1.
  • problem – The problem given by a pints.SingleOutputProblem or pints.MultiOutputProblem, with n_parameters greater than or equal to the n_parameters of the parameters. Extra parameters not found in the problem are ignored.
  • thinning – Optional, integer value (greater than zero). If thinning is set to n, only every nth sample in parameters will be used. If set to None (default), some thinning will be applied so that about 200 samples will be used.

Diagnostics

pints.residuals_diagnostics.acorr(x, max_lag)[source]

Calculate the normalised autocorrelation for a given data series.

This function uses the same procedure as matplotlib.pyplot.acorr, but it just calculates the autocorrelation without plotting anything.

Returns the autocorrelation as a NumPy array.

Parameters:
  • x – A 1d NumPy array containing the time series for which to calculate autocorrelation.
  • max_lag – An int specifying the highest lag to consider.
pints.residuals_diagnostics.calculate_residuals(parameters, problem, thinning=None)[source]

Calculate the residuals (difference between actual data and the fit).

Either a single set of parameters or a chain of MCMC samples can be provided.

The residuals are returned as a 3-dimensional NumPy array with shape (n_samples, n_outputs, n_times).

Parameters:
  • parameters – The parameter values with shape (n_samples, n_parameters). When passing a single best fit parameter vector, n_samples will be 1.
  • problem – The problem given by a pints.SingleOutputProblem or pints.MultiOutputProblem, with n_parameters greater than or equal to the n_parameters of the parameters. Extra parameters not found in the problem are ignored.
  • thinning – Optional, integer value (greater than zero). If thinning is set to n, only every nth sample in parameters will be used. If set to None (default), some thinning will be applied so that about 200 samples will be used.