Welcome to the pints documentation¶
Pints is hosted on GitHub, where you can find downloads and installation instructions.
Detailed examples can also be found there.
This page provides the API, or developer documentation for pints
.
Contents¶
Boundaries¶
Simple boundaries for an optimisation can be created using
RectangularBoundaries
.
More complex types can be made using LogPDFBoundaries
or a custom
implementation of the Boundaries
interface.
Overview:
-
class
pints.
Boundaries
[source]¶ Abstract class representing boundaries on a parameter space.
-
check
(parameters)[source]¶ Returns
True
if and only if the given point in parameter space is within the boundaries.Parameters: parameters – A point in parameter space
-
n_parameters
()[source]¶ Returns the dimension of the parameter space these boundaries are defined on.
-
sample
(n=1)[source]¶ Returns
n
random samples from within the boundaries, for example to use as starting points for an optimisation.The returned value is a NumPy array with shape
(n, d)
wheren
is the requested number of samples, andd
is the dimension of the parameter space these boundaries are defined on.Note that implementing :meth:`sample()` is optional, so some boundary types may not support it.
Parameters: n (int) – The number of points to sample
-
-
class
pints.
LogPDFBoundaries
(log_pdf, threshold=-inf)[source]¶ Uses a
pints.LogPDF
(e.g. aLogPrior
) as boundaries), accepting log-likelihoods above a given threshold as within bounds.For a
pints.LogPrior
based onpints.Boundaries
, seepints.UniformLogPrior
.Extends
pints.Boundaries
.Parameters: - log_pdf – A
pints.LogPdf
to use. - threshold – A threshold to determine whether a given log-prior value counts as within bounds. Anything _above_ the threshold counts as within bounds.
-
sample
(n=1)[source]¶ See
pints.Boundaries.sample()
.Note: This method is implemented only when the error measure is based on a
pints.LogPrior
that supports sampling.
- log_pdf – A
-
class
pints.
RectangularBoundaries
(lower, upper)[source]¶ Represents a set of lower and upper boundaries for model parameters.
A point
x
is considered within the boundaries if (and only if)lower <= x < upper
.Extends
pints.Boundaries
.Parameters: - lower – A 1d array of lower boundaries.
- upper – The corresponding upper boundaries
Core classes and methods¶
Pints provides the SingleOutputProblem
and
MultiOutputProblem
classes to formulate
inverse problems based on time series data and
ForwardModel
.
Overview:
-
pints.
version
(formatted=False)[source]¶ Returns the version number, as a 3-part integer (major, minor, revision). If
formatted=True
, it returns a string formatted version (for example “Pints 1.0.0”).
-
class
pints.
TunableMethod
[source]¶ Defines an interface for a numerical method with a given number of hyper-parameters.
Each optimiser or sampler method implemented in pints has a number of parameters which alters its behaviour, which can be called “hyper-parameters”. The optimiser/sampler method will provide member functions to set each of these hyper-parameters individually. In contrast, this interface provides a generic way to set the hyper-parameters, which allows the user to, for example, use an optimiser to tune the hyper-parameters of the method.
Note that
set_hyper_parameters()
takes an array of parameters, which might be of the same type (e.g. a NumPy array). So derived classes should not raise any errors if individual hyper parameters are set using the wrong type (e.g. float rather than int), but should instead implicitly convert the argument to the correct type.-
n_hyper_parameters
()[source]¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
set_hyper_parameters
(x)[source]¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
Forward model¶
-
class
pints.
ForwardModel
[source]¶ Defines an interface for user-supplied forward models.
Classes extending
ForwardModel
can implement the required methods directly in Python or interface with other languages (for example via Python wrappers around C code).-
simulate
(parameters, times)[source]¶ Runs a forward simulation with the given
parameters
and returns a time-series with data points corresponding to the giventimes
.Returns a sequence of length
n_times
(for single output problems) or a NumPy array of shape(n_times, n_outputs)
(for multi-output problems), representing the values of the model at the giventimes
.Parameters: - parameters – An ordered sequence of parameter values.
- times – The times at which to evaluate. Must be an ordered sequence,
without duplicates, and without negative values.
All simulations are started at time 0, regardless of whether this
value appears in
times
.
-
Forward model with sensitivities¶
-
class
pints.
ForwardModelS1
[source]¶ Defines an interface for user-supplied forward models which can calculate the first-order derivative of the simulated values with respect to the parameters.
Extends
pints.ForwardModel
.-
n_outputs
()¶ Returns the number of outputs this model has. The default is 1.
-
n_parameters
()¶ Returns the dimension of the parameter space.
-
simulate
(parameters, times)¶ Runs a forward simulation with the given
parameters
and returns a time-series with data points corresponding to the giventimes
.Returns a sequence of length
n_times
(for single output problems) or a NumPy array of shape(n_times, n_outputs)
(for multi-output problems), representing the values of the model at the giventimes
.Parameters: - parameters – An ordered sequence of parameter values.
- times – The times at which to evaluate. Must be an ordered sequence,
without duplicates, and without negative values.
All simulations are started at time 0, regardless of whether this
value appears in
times
.
-
simulateS1
(parameters, times)[source]¶ Runs a forward simulation with the given
parameters
and returns a time-series with data points corresponding to the giventimes
, along with the sensitivities of the forward simulation with respect to the parameters.Parameters: - parameters – An ordered list of parameter values.
- times – The times at which to evaluate. Must be an ordered sequence,
without duplicates, and without negative values.
All simulations are started at time 0, regardless of whether this
value appears in
times
.
Returns: - y – The simulated values, as a sequence of
n_times
values, or a NumPy array of shape(n_times, n_outputs)
. - y’ – The corresponding derivatives, as a NumPy array of shape
(n_times, n_parameters)
or an array of shape(n_times, n_outputs, n_parameters)
.
-
Problems¶
-
class
pints.
SingleOutputProblem
(model, times, values)[source]¶ Represents an inference problem where a model is fit to a single time series, such as measured from a system with a single output.
Parameters: - model – A model or model wrapper extending
ForwardModel
. - times – A sequence of points in time. Must be non-negative and increasing.
- values – A sequence of scalar output values, measured at the times in
times
.
-
evaluate
(parameters)[source]¶ Runs a simulation using the given parameters, returning the simulated values as a NumPy array of shape
(n_times,)
.
-
evaluateS1
(parameters)[source]¶ Runs a simulation with first-order sensitivity calculation, returning the simulated values and derivatives.
The returned data is a tuple of NumPy arrays
(y, y')
, wherey
has shape(self._n_times,)
whiley'
has shape(n_times, n_parameters)
.This method only works for problems with a model that implements the :class:`ForwardModelS1` interface.
-
n_times
()[source]¶ Returns the number of sampling points, i.e. the length of the vectors returned by
times()
andvalues()
.
- model – A model or model wrapper extending
-
class
pints.
MultiOutputProblem
(model, times, values)[source]¶ Represents an inference problem where a model is fit to a multi-valued time series, such as measured from a system with multiple outputs.
Parameters: - model – A model or model wrapper extending
ForwardModel
. - times – A sequence of points in time. Must be non-negative and non-decreasing.
- values – A sequence of multi-valued measurements. Must have shape
(n_times, n_outputs)
, wheren_times
is the number of points intimes
andn_outputs
is the number of outputs in the model.
-
evaluate
(parameters)[source]¶ Runs a simulation using the given parameters, returning the simulated values.
The returned data is a NumPy array with shape
(n_times, n_outputs)
.
-
evaluateS1
(parameters)[source]¶ Runs a simulation using the given parameters, returning the simulated values.
The returned data is a tuple of NumPy arrays
(y, y')
, wherey
has shape(n_times, n_outputs)
, whiley'
has shape(n_times, n_outputs, n_parameters)
.This method only works for problems whose model implements the :class:`ForwardModelS1` interface.
-
n_times
()[source]¶ Returns the number of sampling points, i.e. the length of the vectors returned by
times()
andvalues()
.
- model – A model or model wrapper extending
Diagnosing MCMC results¶
Pints provides a number of functions to diagnose MCMC progress and convergence.
Overview:
-
pints.
rhat
(chains, warm_up=0.0)[source]¶ Returns the convergence measure \(\hat{R}\) for the approximate posterior according to [1].
\(\hat{R}\) diagnoses convergence by checking mixing and stationarity of \(m\) chains (at least two, \(m\geq 2\)). To diminish the influence of starting values, the first portion of each chain can be excluded from the computation. Subsequently, the truncated chains are split in half, resulting in a total number of \(m'=2m\) chains of length \(n'=(1-\text{warm_up})n/2\). The mean of the variances within and between the resulting chains are computed, \(W\) and \(B\) respectively. Based on those variances an estimator of the marginal posterior variance is constructed
\[\widehat{\text{var}}^+ = \frac{n'-1}{n'}W + \frac{1}{n'}B,\]The estimator overestimates the variance of the marginal posterior if the chains are not well mixed and stationary, but is unbiased if the original chains equal the target distribution. At the same time, the mean within variance \(W\) underestimates the marginal posterior variance for finite \(n\), but converges to the true variance for \(n\rightarrow \infty\). By comparing \(\widehat{\text{var}}^+\) and \(W\) the mixing and stationarity of the chains can be quantified
\[\hat{R} = \sqrt{\frac{\widehat{\text{var}}^+}{W}}.\]For well mixed and stationary chains \(\hat{R}\) will be close to one.
The mean within \(W\) and mean between \(B\) variance of the \(m'=2m\) chains of length \(n'=(1-\text{warm_up})n/2\) are defined as
\[W = \frac{1}{m'}\sum _{j=1}^{m'}s_j^2\quad \text{where}\quad s_j^2=\frac{1}{n'-1}\sum _{i=1}^{n'}(\psi _{ij} - \bar{\psi} _j)^2,\]\[B = \frac{n'}{m'-1}\sum _{j=1}^{m'}(\bar{\psi} _j - \bar{\psi})^2.\]Here, \(\psi _{ij}\) is the jth sample of the ith chain, \(\bar{\psi _j}=\sum _{i=1}^{n'}\psi _{ij}/n'\) is the within chain mean of the parameter \(\psi\) and \(\bar{\psi } = \sum _{j=1}^{m'}\bar{\psi} _{j}/m'\) is the between chain mean of the within chain means.
References
[1] “Bayesian data analysis”, ch. 11.4 ‘Inference and assessing convergence’, 3rd edition, Gelman et al., 2014. Parameters: - chains (np.ndarray of shape (m, n) or (m, n, p)) – A numpy array with \(n\) samples for each of \(m\) chains. Optionally the \(\hat{R}\) for \(p\) parameters can be computed by passing a numpy array with \(m\) chains of length \(n\) for \(p\) parameters.
- warm_up (float) – First portion of each chain that will not be used for the computation of \(\hat{R}\).
Returns: rhat – \(\hat{R}\) of the posteriors for each parameter.
Return type: float or np.ndarray of shape (p,)
Diagnostic plots¶
For users who have Matplotlib installed, Pints offers a number of diagnostic plots that can be used to quickly check obtained results.
Plotting functions:
Diagnosing MCMC results:
Functions¶
-
pints.plot.
function
(f, x, lower=None, upper=None, evaluations=20)[source]¶ Creates 1d plots of a
LogPDF
or aErrorMeasure
around a point x (i.e. a 1-dimensional plot in each direction).Returns a
matplotlib
figure object and axes handle.Parameters: - f – A
pints.LogPDF
orpints.ErrorMeasure
to plot. - x – A point in the function’s input space.
- lower – Optional lower bounds for each parameter, used to specify the lower bounds of the plot.
- upper – Optional upper bounds for each parameter, used to specify the upper bounds of the plot.
- evaluations – The number of evaluations to use in each plot.
- f – A
-
pints.plot.
function_between_points
(f, point_1, point_2, padding=0.25, evaluations=20)[source]¶ Creates and returns a plot of a function between two points in parameter space.
Returns a
matplotlib
figure object and axes handle.Parameters: - f – A
pints.LogPDF
orpints.ErrorMeasure
to plot. - point_1 – The first point in parameter space. The method will find a line from
point_1
topoint_2
and plotf
at several points along it. - point_2 – The second point.
- padding – Specifies the amount of padding around the line segment
[point_1, point_2]
that will be shown in the plot. - evaluations – The number of evaluation along the line in parameter space.
- f – A
-
pints.plot.
surface
(points, values, boundaries=None, markers='+', figsize=None)[source]¶ Takes irregularly spaced points and function evaluations in a two-dimensional parameter space and creates a coloured surface plot using a voronoi diagram.
Returns a
matplotlib
figure object and axes handle.Parameters: - points – A list of (two-dimensional) points in parameter space.
- values – The values (e.g. error measure evaluations) corresponding to these points.
- boundaries – An optional
pints.RectangularBoundaries
object to restrict the area shown. If set toNone
boundaries will be determined from the givenpoints
. - markers – An optional string indicating the matplotlib markers to use to plot
the
points
. Set toNone
to hide. - figsize – An optional tuple
(width, height)
that will be passed to matplotlib when creating the figure. If set toNone
matplotlib will use its default figure size.
MCMC Diagnostics¶
-
pints.plot.
autocorrelation
(samples, max_lags=100, parameter_names=None)[source]¶ Creates and returns an autocorrelation plot for a given markov chain or list of samples.
Returns a
matplotlib
figure object and axes handle.Parameters: - samples – A list of samples, with shape
(n_samples, n_parameters)
, wheren_samples
is the number of samples in the list andn_parameters
is the number of parameters. - max_lags – The maximum autocorrelation lag to plot.
- parameter_names – A list of parameter names, which will be displayed in the legend of the autocorrelation subplots. If no names are provided, the parameters are enumerated.
- samples – A list of samples, with shape
-
pints.plot.
histogram
(samples, kde=False, n_percentiles=None, parameter_names=None, ref_parameters=None)[source]¶ Takes one or more markov chains or lists of samples as input and creates and returns a plot showing histograms for each chain or list of samples.
Returns a
matplotlib
figure object and axes handle.Parameters: - samples – A list of lists of samples, with shape
(n_lists, n_samples, n_parameters)
, wheren_lists
is the number of lists of samples,n_samples
is the number of samples in one list andn_parameters
is the number of parameters. - kde – Set to
True
to include kernel-density estimation for the histograms. - n_percentiles – Shows only the middle n-th percentiles of the distribution.
Default shows all samples in
samples
. - parameter_names – A list of parameter names, which will be displayed on the x-axis of the histogram subplots. If no names are provided, the parameters are enumerated.
- ref_parameters – A set of parameters for reference in the plot. For example, if true values of parameters are known, they can be passed in for plotting.
- samples – A list of lists of samples, with shape
-
pints.plot.
pairwise
(samples, kde=False, heatmap=False, opacity=None, n_percentiles=None, parameter_names=None, ref_parameters=None)[source]¶ Takes a markov chain or list of
samples
and creates a set of pairwise scatterplots for all parameters (p1 versus p2, p1 versus p3, p2 versus p3, etc.).The returned plot is in a ‘matrix’ form, with histograms of each individual parameter on the diagonal, and scatter plots of parameters
i
andj
on each entry(i, j)
below the diagonal.Returns a
matplotlib
figure object and axes handle.Parameters: - samples – A list of samples, with shape
(n_samples, n_parameters)
, wheren_samples
is the number of samples in the list andn_parameters
is the number of parameters. - kde – Set to
True
to use kernel-density estimation for the histograms and scatter plots. Cannot use together withheatmap
. - heatmap – Set to
True
to plot heatmap for the pairwise plots. Cannot be used together withkde
. - Opacity – This value can be used to manually set the opacity of the
points in the scatter plots (when
kde=False
andheatmap=False
only). - n_percentiles – Shows only the middle n-th percentiles of the distribution.
Default shows all samples in
samples
. - parameter_names – A list of parameter names, which will be displayed on the x-axis of the trace subplots. If no names are provided, the parameters are enumerated.
- ref_parameters – A set of parameters for reference in the plot. For example, if true values of parameters are known, they can be passed in for plotting.
- samples – A list of samples, with shape
-
pints.plot.
series
(samples, problem, ref_parameters=None, thinning=None)[source]¶ Creates and returns a plot of predicted time series for a given list of
samples
and a single-output or multi-outputproblem
.Because this method runs simulations, it can take a considerable time to run.
Returns a
matplotlib
figure object and axes handle.Parameters: - samples – A list of samples, with shape
(n_samples, n_parameters)
, where n_samples is the number of samples in the list andn_parameters
is the number of parameters. - problem – A :class:
pints.SingleOutputProblem
or :class:pints.MultiOutputProblem
of a n_parameters equal to or greater than then_parameters
of the samples. Any extra parameters present in the chain but not accepted by theSingleOutputProblem
orMultiOutputProblem
(for example parameters added by a noise model) will be ignored. - ref_parameters – A set of parameters for reference in the plot. For example, if true values of parameters are known, they can be passed in for plotting.
- thinning – An integer exceeding zero. If specified, only every
n-th sample (with
n = thinning
) in the samples will be used. If left at the default valueNone
, a value will be chosen so that 200 to 400 predictions are shown.
- samples – A list of samples, with shape
-
pints.plot.
trace
(samples, n_percentiles=None, parameter_names=None, ref_parameters=None)[source]¶ Takes one or more markov chains or lists of samples as input and creates and returns a plot showing histograms and traces for each chain or list of samples.
Returns a
matplotlib
figure object and axes handle.Parameters: - samples – A list of lists of samples, with shape
(n_lists, n_samples, n_parameters)
, wheren_lists
is the number of lists of samples,n_samples
is the number of samples in one list andn_parameters
is the number of parameters. - n_percentiles – Shows only the middle n-th percentiles of the distribution.
Default shows all samples in
samples
. - parameter_names – A list of parameter names, which will be displayed on the x-axis of the trace subplots. If no names are provided, the parameters are enumerated.
- ref_parameters – A set of parameters for reference in the plot. For example, if true values of parameters are known, they can be passed in for plotting.
- samples – A list of lists of samples, with shape
Error measures¶
Error measures are callable objects that return some scalar representing the error between a model and an experiment.
Example:
error = pints.SumOfSquaresError(problem)
x = [1,2,3]
fx = error(x)
Overview:
ErrorMeasure
MeanSquaredError
NormalisedRootMeanSquaredError
ProbabilityBasedError
ProblemErrorMeasure
RootMeanSquaredError
SumOfErrors
SumOfSquaresError
-
class
pints.
ErrorMeasure
[source]¶ Abstract base class for objects that calculate some scalar measure of goodness-of-fit (for a model and a data set), such that a smaller value means a better fit.
ErrorMeasures are callable objects: If
e
is an instance of anErrorMeasure
class you can calculate the error by callinge(p)
wherep
is a point in parameter space.-
evaluateS1
(x)[source]¶ Evaluates this error measure, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data has the shape
(e, e')
wheree
is a scalar value ande'
is a sequence of lengthn_parameters
.This is an optional method that is not always implemented.
-
-
class
pints.
MeanSquaredError
(problem, weights=None)[source]¶ Calculates the mean square error:
\[f = \sum _i^n \frac{(y_i - x_i)^2}{n},\]where \(y\) is the data, \(x\) the model output and \(n\) is the total number of data points.
Extends
ProblemErrorMeasure
.Parameters: - problem – A
pints.SingleOutputProblem
orpints.MultiOutputProblem
. - weights – An optional sequence of (float) weights, exactly one per problem output. If given, the error in each individual output will be multiplied by the corresponding weight. If no weights are specified all outputs will be weighted equally.
-
n_parameters
()¶
- problem – A
-
class
pints.
NormalisedRootMeanSquaredError
(problem)[source]¶ Calculates a normalised root mean squared error:
\[f = \frac{1}{C}\sqrt{\frac{\sum _i^n (y_i - x_i) ^ 2}{n}},\]where \(C\) is the normalising constant, \(y\) is the data, \(x\) the model output and \(n\) is the total number of data points. The normalising constant is given by
\[C = \sqrt{\frac{\sum _i^n y_i^2}{n}}.\]This error measure is similar to the (unnormalised)
RootMeanSquaredError
.Extends
ProblemErrorMeasure
.Parameters: problem – A pints.SingleOutputProblem
.-
evaluateS1
(x)¶ Evaluates this error measure, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data has the shape
(e, e')
wheree
is a scalar value ande'
is a sequence of lengthn_parameters
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
ProbabilityBasedError
(log_pdf)[source]¶ Changes the sign of a
LogPDF
to use it as an error. Minimising this error will maximise the probability.Extends
ErrorMeasure
.Parameters: log_pdf (pints.LogPDF) – The LogPDF to base this error on. -
evaluateS1
(x)[source]¶ See
ErrorMeasure.evaluateS1()
.This method only works if the underlying
LogPDF
implements the optional methodLogPDF.evaluateS1()
!
-
-
class
pints.
ProblemErrorMeasure
(problem=None)[source]¶ Abstract base class for ErrorMeasures defined for
single
ormulti-output
problems.-
evaluateS1
(x)¶ Evaluates this error measure, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data has the shape
(e, e')
wheree
is a scalar value ande'
is a sequence of lengthn_parameters
.This is an optional method that is not always implemented.
-
-
class
pints.
RootMeanSquaredError
(problem)[source]¶ Calculates a normalised root mean squared error:
\[f = \sqrt{\frac{\sum _i^n (y_i - x_i) ^ 2}{n}},\]where \(y\) is the data, \(x\) the model output and \(n\) is the total number of data points.
Extends
ProblemErrorMeasure
.Parameters: problem – A pints.SingleOutputProblem
.-
evaluateS1
(x)¶ Evaluates this error measure, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data has the shape
(e, e')
wheree
is a scalar value ande'
is a sequence of lengthn_parameters
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
SumOfErrors
(error_measures, weights=None)[source]¶ Calculates a sum of
ErrorMeasure
objects, all defined on the same parameter space\[f = \sum _i f_i,\]where \(f_i\) are the individual error meaures.
Extends
ErrorMeasure
.Parameters: - error_measures – A sequence of error measures.
- weights – An optional sequence of (float) weights, exactly one per error measure. If given, each individual error will be multiplied by the corresponding weight. If no weights are given all sums will be weighted equally.
Examples
errors = [ pints.MeanSquaredError(problem1), pints.MeanSquaredError(problem2), ] # Equally weighted e1 = pints.SumOfErrors(errors) # Differrent weights: weights = [ 1.0, 2.7, ] e2 = pints.SumOfErrors(errors, weights)
-
evaluateS1
(x)[source]¶ See
ErrorMeasure.evaluateS1()
.This method only works if all the underlying :class:`ErrorMeasure` objects implement the optional method :meth:`ErrorMeasure.evaluateS1()`!
-
class
pints.
SumOfSquaresError
(problem, weights=None)[source]¶ - Calculates a sum of squares error:\[f = \sum _i^n (y_i - x_i) ^ 2,\]
where \(y\) is the data, \(x\) the model output and \(n\) is the total number of data points.
Extends
ErrorMeasure
.Parameters: problem – A pints.SingleOutputProblem
orpints.MultiOutputProblem
.-
n_parameters
()¶
-
Function evaluation¶
The Evaluator
classes provide an abstraction layer that makes it
easier to implement sequential and/or parallel evaluation of functions.
Example:
f = pints.SumOfSquaresError(problem)
e = pints.ParallelEvaluator(f)
x = [[1, 2],
[3, 4],
[5, 6],
[7, 8],
]
fx = e.evaluate(x)
Overview:
-
pints.
evaluate
(f, x, parallel=False, args=None)[source]¶ Evaluates the function
f
on every value present inx
and returns a sequence of evaluationsf(x[i])
.It is possible for the evaluation of
f
to involve the generation of random numbers (using numpy). In this case, the results from callingevaluate
can be made reproducible by first seeding numpy’s generator with a fixed number. However, a call withparallel=True
will use a different (but consistent) sequence of random numbers than a call withparallel=False
.Parameters: - f (callable) – The function to evaluate, called as
f(x[i], *args)
. - x – A list of values to evaluate
f
with - parallel (boolean) – Run in parallel or not.
If set to
True
, the evaluations will happen in parallel using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. Parallelisation can be disabled by settingparallel
to0
orFalse
. - args (sequence) – Optional extra arguments to pass into
f
.
- f (callable) – The function to evaluate, called as
-
class
pints.
Evaluator
(function, args=None)[source]¶ Abstract base class for classes that take a function (or callable object)
f(x)
and evaluate it for list of input valuesx
.This interface is shared by a parallel and a sequential implementation, allowing easy switching between parallel or sequential implementations of the same algorithm.
It is possible for the evaluation of
f
to involve the generation of random numbers (using numpy). In this case, the results from callingevaluate
can be made reproducible by first seeding numpy’s generator with a fixed number. However, differentEvaluator
implementations may use a different random sequence. In other words, each Evaluator can be made to return consistent results, but the results returned by different Evaluators may vary.Parameters: - function (callable) – A function or other callable object
f
that takes a valuex
and returns an evaluationf(x)
. - args (sequence) – An optional sequence of extra arguments to
f
. Ifargs
is specified,f
will be called asf(x, *args)
.
- function (callable) – A function or other callable object
-
class
pints.
ParallelEvaluator
(function, n_workers=None, max_tasks_per_worker=500, n_numpy_threads=1, args=None)[source]¶ Evaluates a single-valued function object for any set of input values given, using all available cores.
Shares an interface with the
SequentialEvaluator
, allowing parallelism to be switched on and off with minimal hassle. Parallelism takes a little time to be set up, so as a general rule of thumb it’s only useful for if the total run-time is at least ten seconds (anno 2015).By default, the number of processes (“workers”) used to evaluate the function is set equal to the number of CPU cores reported by python’s
multiprocessing
module. To override the number of workers used, setn_workers
to some integer greater than0
.There are two important caveats for using multiprocessing to evaluate functions:
- Processes don’t share memory. This means the function to be evaluated will be duplicated (via pickling) for each process (see Avoid shared state for details).
- On windows systems your code should be within an
if __name__ == '__main__':
block (see Windows for details).
The evaluator will keep it’s subprocesses alive and running until it is tidied up by garbage collection.
Note that while this class uses multiprocessing, it is not thread/process safe itself: It should not be used by more than a single thread/process at a time.
Extends
Evaluator
.Parameters: - function – The function to evaluate
- n_workers – The number of worker processes to use. If left at the default value
n_workers=None
the number of workers will equal the number of CPU cores in the machine this is run on. In many cases this will provide good performance. - max_tasks_per_worker – Python garbage collection does not seem to be optimized for
multi-process function evaluation. In many cases, some time can be
saved by refreshing the worker processes after every
max_tasks_per_worker
evaluations. This number can be tweaked for best performance on a given task / system. - n_numpy_threads – Numpy and other scientific libraries may make use of threading in C or
C++ based BLAS libraries, which can interfere with PINTS
multiprocessing and cause slower execution. To prevent this, the number
of threads to use will be limited to 1 by default, using the
threadpoolctl
module. To use the current numpy default instead, setn_numpy_threads
toNone
, to use the BLAS/OpenMP etc. defaults, setn_numpy_threads
to0
, or to use a specific number of threads pass in any integer greater than 1. - args – An optional sequence of extra arguments to
f
. Ifargs
is specified,f
will be called asf(x, *args)
.
-
static
cpu_count
()[source]¶ Uses the multiprocessing module to guess the number of available cores.
For machines with simultaneous multithreading (“hyperthreading”) this will return the number of virtual cores.
-
evaluate
(positions)¶ Evaluate the function for every value in the sequence
positions
.Returns a list with the returned evaluations.
-
class
pints.
SequentialEvaluator
(function, args=None)[source]¶ Evaluates a function (or callable object) for a list of input values, and returns a list containing the calculated function evaluations.
Runs sequentially, but shares an interface with the
ParallelEvaluator
, allowing parallelism to be switched on/off.Extends
Evaluator
.Parameters: - function (callable) – The function to evaluate.
- args (sequence) – An optional tuple containing extra arguments to
f
. Ifargs
is specified,f
will be called asf(x, *args)
.
-
evaluate
(positions)¶ Evaluate the function for every value in the sequence
positions
.Returns a list with the returned evaluations.
I/O Helper classes¶
-
pints.io.
load_samples
(filename, n=None)[source]¶ Loads samples from the given
filename
and returns a 2d NumPy array containing them.If the optional argument
n
is given, the method assumes there aren
files, with names based onfilename
such that e.g.test.csv
would becometest_0.csv
,test_1.csv
, …,test_n.csv
. In this case a list of 2d NumPy arrays is returned.Assumes the first line in each file is a header.
See also
save_samples()
.
-
pints.io.
save_samples
(filename, *sample_lists)[source]¶ Stores one or multiple lists of samples at the path given by
filename
.If one list of samples is given, the filename is used as is. If multiple lists are given, the filenames are updated to include
_0
,_1
,_2
, etc.For example,
save_samples('test.csv', samples)
will store information fromsamples
intest.csv
. Usingsave_samples('test.csv', samples_0, samples_1)
will store the samples fromsamples_0
totest_0.csv
andsamples_1
totest_1.csv
.See also:
load_samples()
.
Log-likelihoods¶
The classes below all implement the ProblemLogLikelihood
interface,
and can calculate a log-likelihood based on some time-series Problem
and an assumed noise model.
Example:
logpdf = pints.GaussianLogLikelihood(problem)
x = [1, 2, 3]
fx = logpdf(x)
Overview:
AR1LogLikelihood
ARMA11LogLikelihood
CauchyLogLikelihood
ConstantAndMultiplicativeGaussianLogLikelihood
GaussianIntegratedUniformLogLikelihood
GaussianKnownSigmaLogLikelihood
GaussianLogLikelihood
KnownNoiseLogLikelihood
MultiplicativeGaussianLogLikelihood
ScaledLogLikelihood
StudentTLogLikelihood
UnknownNoiseLogLikelihood
-
class
pints.
AR1LogLikelihood
(problem)[source]¶ Calculates a log-likelihood assuming AR(1) (autoregressive order 1) errors.
In this error model, the ith error term \(\epsilon_i = x_i - f_i(\theta)\) is assumed to obey the following relationship.
\[\epsilon_i = \rho \epsilon_{i-1} + \nu_i\]where \(\nu_i\) is IID Gaussian white noise with variance \(\sigma^2 (1-\rho^2)\). Therefore, this likelihood is appropriate when error terms are autocorrelated, and the parameter \(\rho\) determines the level of autocorrelation.
This model is parameterised as such because it leads to a simple marginal distribution \(\epsilon_i \sim N(0, \sigma)\).
This class treats the error at the first time point (i=1) as fixed, which simplifies the calculations. For sufficiently long time-series, this conditioning on the first observation has at most a small effect on the likelihood. Further details as well as the alternative unconditional likelihood are available in [1] , chapter 5.2.
Noting that
\[\nu_i = \epsilon_i - \rho \epsilon_{i-1} \sim N(0, \sigma^2 (1-\rho^2))\]we thus calculate the likelihood as the product of normal likelihoods from \(i=2,...,N\), for a time series with N time points.
\[L(\theta, \sigma, \rho|\boldsymbol{x}) = -\frac{N-1}{2} \log(2\pi) - (N-1) \log(\sigma') - \frac{1}{2\sigma'^2} \sum_{i=2}^N (\epsilon_i - \rho \epsilon_{i-1})^2\]for \(\sigma' = \sigma \sqrt{1-\rho^2}\).
Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem two parameters are added (rho, sigma), for a multi-output problem 2 *n_outputs
parameters are added.References
[1] Hamilton, James D. Time series analysis. Vol. 2. New Jersey: Princeton, 1994. -
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
ARMA11LogLikelihood
(problem)[source]¶ Calculates a log-likelihood assuming ARMA(1,1) errors.
The ARMA(1,1) model has 1 autoregressive term and 1 moving average term. It assumes that the errors \(\epsilon_i = x_i - f_i(\theta)\) obey
\[\epsilon_i = \rho \epsilon_{i-1} + \nu_i + \phi \nu_{i-1}\]where \(\nu_i\) is IID Gaussian white noise with standard deviation \(\sigma'\).
\[\sigma' = \sigma \sqrt{\frac{1 - \rho^2}{1 + 2 \phi \rho + \phi^2}}\]This model is parameterised as such because it leads to a simple marginal distribution \(\epsilon_i \sim N(0, \sigma)\).
Due to the complexity of the exact ARMA(1,1) likelihood, this class calculates a likelihood conditioned on initial values. This topic is discussed further in [2] , chapter 5.6. Thus, for a time series defined at points \(i=1,...,N\), summation begins at \(i=3\), and the conditional log-likelihood is
\[L(\theta, \sigma, \rho, \phi|\boldsymbol{x}) = -\frac{N-2}{2} \log(2\pi) - (N-2) \log(\sigma') - \frac{1}{2\sigma'^2} \sum_{i=3}^N (\nu_i)^2\]where the values of \(\nu_i\) are calculated from the observations according to
\[\nu_i = \epsilon_i - \rho \epsilon_{i-1} - \phi (\epsilon_{i-1} - \rho \epsilon_{i-2})\]Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem three parameters are added (rho, phi, sigma), for a multi-output problem 3 *n_outputs
parameters are added.References
[2] Hamilton, James D. Time series analysis. Vol. 2. New Jersey: Princeton, 1994. -
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
CauchyLogLikelihood
(problem)[source]¶ Calculates a log-likelihood assuming independent Cauchy-distributed noise at each time point, and adds one parameter: the scale (
sigma
).For a noise characterised by
sigma
, the log-likelihood is of the form:\[\log{L(\theta, \sigma)} = -N\log \pi - N\log \sigma -\sum_{i=1}^N\log(1 + \frac{x_i - f(\theta)}{\sigma}^2)\]Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem one parameter is addedsigma
, wheresigma
is scale, for a multi-output problemn_outputs
parameters are added.-
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
ConstantAndMultiplicativeGaussianLogLikelihood
(problem)[source]¶ Calculates the log-likelihood assuming a mixed error model of a Gaussian base-level noise and a Gaussian heteroscedastic noise.
For a time series model \(f(t| \theta)\) with parameters \(\theta\) , the ConstantAndMultiplicativeGaussianLogLikelihood assumes that the model predictions \(X\) are Gaussian distributed according to
\[X(t| \theta , \sigma _{\text{base}}, \sigma _{\text{rel}}) = f(t| \theta) + (\sigma _{\text{base}} + \sigma _{\text{rel}} f(t| \theta)^\eta ) \, \epsilon ,\]where \(\epsilon\) is a i.i.d. standard Gaussian random variable
\[\epsilon \sim \mathcal{N}(0, 1).\]For each output in the problem, this likelihood introduces three new scalar parameters: a base-level scale \(\sigma _{\text{base}}\); an exponential power \(\eta\); and a scale relative to the model output \(\sigma _{\text{rel}}\).
The resulting log-likelihood of a constant and multiplicative Gaussian error model is
\[\log L(\theta, \sigma _{\text{base}}, \eta , \sigma _{\text{rel}} | X^{\text{obs}}) = -\frac{n_t}{2} \log 2 \pi -\sum_{i=1}^{n_t}\log \sigma _{\text{tot}, i} - \sum_{i=1}^{n_t} \frac{(X^{\text{obs}}_i - f(t_i| \theta))^2} {2\sigma ^2_{\text{tot}, i}},\]where \(n_t\) is the number of measured time points in the time series, \(X^{\text{obs}}_i\) is the observation at time point \(t_i\), and \(\sigma _{\text{tot}, i}=\sigma _{\text{base}} +\sigma _{\text{rel}} f(t_i| \theta)^\eta\) is the total standard deviation of the error at time \(t_i\).
For a system with \(n_o\) outputs, this becomes
\[\log L(\theta, \sigma _{\text{base}}, \eta , \sigma _{\text{rel}} | X^{\text{obs}}) = -\frac{n_tn_o}{2} \log 2 \pi -\sum_{j=1}^{n_0}\sum_{i=1}^{n_t}\log \sigma _{\text{tot}, ij} - \sum_{j=1}^{n_0}\sum_{i=1}^{n_t} \frac{(X^{\text{obs}}_{ij} - f_j(t_i| \theta))^2} {2\sigma ^2_{\text{tot}, ij}},\]where \(n_o\) is the number of outputs of the model, \(X^{\text{obs}}_{ij}\) is the observation at time point \(t_i\) of output \(j\), and \(\sigma _{\text{tot}, ij}=\sigma _{\text{base}, j} + \sigma _{\text{rel}, j}f_j(t_i| \theta)^{\eta _j}\) is the total standard deviation of the error at time \(t_i\) of output \(j\).
Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem three parameters are added (\(\sigma _{\text{base}}\), \(\eta\), \(\sigma _{\text{rel}}\)), for a multi-output problem \(3n_o\) parameters are added (\(\sigma _{\text{base},1},\ldots , \sigma _{\text{base},n_o}, \eta _1,\ldots , \eta _{n_o}, \sigma _{\text{rel},1}, \ldots , \sigma _{\text{rel},n_o})\).-
evaluateS1
(parameters)[source]¶ See
LogPDF.evaluateS1()
.The partial derivatives of the log-likelihood w.r.t. the model parameters are
\[\begin{split}\frac{\partial \log L}{\partial \theta _k} =& -\sum_{i,j}\sigma _{\text{rel},j}\eta _j\frac{ f_j(t_i| \theta)^{\eta _j-1}} {\sigma _{\text{tot}, ij}} \frac{\partial f_j(t_i| \theta)}{\partial \theta _k} + \sum_{i,j} \frac{X^{\text{obs}}_{ij} - f_j(t_i| \theta)} {\sigma ^2_{\text{tot}, ij}} \frac{\partial f_j(t_i| \theta)}{\partial \theta _k} \\ &+\sum_{i,j}\sigma _{\text{rel},j}\eta _j \frac{(X^{\text{obs}}_{ij} - f_j(t_i| \theta))^2} {\sigma ^3_{\text{tot}, ij}}f_j(t_i| \theta)^{\eta _j-1} \frac{\partial f_j(t_i| \theta)}{\partial \theta _k} \\ \frac{\partial \log L}{\partial \sigma _{\text{base}, j}} =& -\sum ^{n_t}_{i=1}\frac{1}{\sigma _{\text{tot}, ij}} +\sum ^{n_t}_{i=1} \frac{(X^{\text{obs}}_{ij} - f_j(t_i| \theta))^2} {\sigma ^3_{\text{tot}, ij}} \\ \frac{\partial \log L}{\partial \eta _j} =& -\sigma _{\text{rel},j}\eta _j\sum ^{n_t}_{i=1} \frac{f_j(t_i| \theta)^{\eta _j}\log f_j(t_i| \theta)} {\sigma _{\text{tot}, ij}} + \sigma _{\text{rel},j}\eta _j \sum ^{n_t}_{i=1} \frac{(X^{\text{obs}}_{ij} - f_j(t_i| \theta))^2} {\sigma ^3_{\text{tot}, ij}}f_j(t_i| \theta)^{\eta _j} \log f_j(t_i| \theta) \\ \frac{\partial \log L}{\partial \sigma _{\text{rel},j}} =& -\sum ^{n_t}_{i=1} \frac{f_j(t_i| \theta)^{\eta _j}}{\sigma _{\text{tot}, ij}} + \sum ^{n_t}_{i=1} \frac{(X^{\text{obs}}_{ij} - f_j(t_i| \theta))^2} {\sigma ^3_{\text{tot}, ij}}f_j(t_i| \theta)^{\eta _j},\end{split}\]where \(i\) sums over the measurement time points and \(j\) over the outputs of the model.
-
n_parameters
()¶
-
-
class
pints.
GaussianIntegratedUniformLogLikelihood
(problem, lower, upper)[source]¶ Calculates a log-likelihood assuming independent Gaussian-distributed noise at each time point where \(\sigma\sim U(a,b)\) has been integrated out of the joint posterior of \(p(\theta,\sigma|X)\),
\[\begin{split}\begin{align} p(\theta|X) &= \int_{0}^{\infty} p(\theta, \sigma|X) \mathrm{d}\sigma\\ &\propto \int_{0}^{\infty} p(X|\theta, \sigma) p(\theta, \sigma) \mathrm{d}\sigma,\end{align}\end{split}\]Note that this is exactly the same statistical model as
pints.GaussianLogLikelihood
with a uniform prior on \(\sigma\).A possible advantage of this log-likelihood compared with using a
pints.GaussianLogLikelihood
, is that it has one fewer parameters (\(sigma\)) which may speed up convergence to the posterior distribution, especially for multi-output problems which will haven_outputs
fewer parameter dimensions.The log-likelihood is given in terms of the sum of squared errors:
\[SSE = \sum_{i=1}^n (f_i(\theta) - y_i)^2\]and is given up to a normalisation constant by:
\[\begin{split}\begin{align} \text{log} L = & - n / 2 \text{log}(\pi) \\ & - \text{log}(2 (b - a) \sqrt(2)) \\ & + (1 / 2 - n / 2) \text{log}(SSE) \\ & + \text{log}\left[\Gamma((n - 1) / 2, \frac{SSE}{2 b^2}) - \Gamma((n - 1) / 2, \frac{SSE}{2 a^2}) \right] \end{align}\end{split}\]where \(\Gamma(u,v)\) is the upper incomplete gamma function as defined here: https://en.wikipedia.org/wiki/Incomplete_gamma_function
This log-likelihood is inherently a Bayesian method since it assumes a uniform prior on \(\sigma\sim U(a,b)\). However using this likelihood in optimisation routines should yield the same estimates as the full
pints.GaussianLogLikelihood
.Extends
ProblemLogLikelihood
.Parameters: - problem – A
SingleOutputProblem
orMultiOutputProblem
. - lower – The lower limit on the uniform prior on sigma. Must be non-negative.
- upper – The upper limit on the uniform prior on sigma.
-
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶
- problem – A
-
class
pints.
GaussianKnownSigmaLogLikelihood
(problem, sigma)[source]¶ Calculates a log-likelihood assuming independent Gaussian noise at each time point, using a known value for the standard deviation (sigma) of that noise:
\[\log{L(\theta | \sigma,\boldsymbol{x})} = -\frac{N}{2}\log{2\pi} -N\log{\sigma} -\frac{1}{2\sigma^2}\sum_{i=1}^N{(x_i - f_i(\theta))^2}\]Extends
ProblemLogLikelihood
.Parameters: - problem – A
SingleOutputProblem
orMultiOutputProblem
. - sigma – The standard devation(s) of the noise. Can be a single value or a sequence of sigma’s for each output. Must be greater than zero.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
n_parameters
()¶
- problem – A
-
class
pints.
GaussianLogLikelihood
(problem)[source]¶ Calculates a log-likelihood assuming independent Gaussian noise at each time point, and adds a parameter representing the standard deviation (sigma) of the noise on each output.
For a noise level of
sigma
, the likelihood becomes:\[L(\theta, \sigma|\boldsymbol{x}) = p(\boldsymbol{x} | \theta, \sigma) = \prod_{j=1}^{n_t} \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left( -\frac{(x_j - f_j(\theta))^2}{2\sigma^2}\right)\]leading to a log likelihood of:
\[\log{L(\theta, \sigma|\boldsymbol{x})} = -\frac{n_t}{2} \log{2\pi} -n_t \log{\sigma} -\frac{1}{2\sigma^2}\sum_{j=1}^{n_t}{(x_j - f_j(\theta))^2}\]where
n_t
is the number of time points in the series,x_j
is the sampled data at timej
andf_j
is the simulated data at timej
.For a system with
n_o
outputs, this becomes\[\log{L(\theta, \sigma|\boldsymbol{x})} = -\frac{n_t n_o}{2}\log{2\pi} -\sum_{i=1}^{n_o}{ {n_t}\log{\sigma_i} } -\sum_{i=1}^{n_o}{\left[ \frac{1}{2\sigma_i^2}\sum_{j=1}^{n_t}{(x_j - f_j(\theta))^2} \right]}\]Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem a single parameter is added, for a multi-output problemn_outputs
parameters are added.-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
n_parameters
()¶
-
-
class
pints.
KnownNoiseLogLikelihood
(problem, sigma)[source]¶ Deprecated alias of
GaussianKnownSigmaLogLikelihood
.-
evaluateS1
(x)¶ See
LogPDF.evaluateS1()
.
-
n_parameters
()¶
-
-
class
pints.
MultiplicativeGaussianLogLikelihood
(problem)[source]¶ Calculates the log-likelihood for a time-series model assuming a heteroscedastic Gaussian error of the model predictions \(f(t, \theta )\).
This likelihood introduces two new scalar parameters for each dimension of the model output: an exponential power \(\eta\) and a scale \(\sigma\).
A heteroscedascic Gaussian noise model assumes that the observable \(X\) is Gaussian distributed around the model predictions \(f(t, \theta )\) with a standard deviation that scales with \(f(t, \theta )\)
\[X(t) = f(t, \theta) + \sigma f(t, \theta)^\eta v(t)\]where \(v(t)\) is a standard i.i.d. Gaussian random variable
\[v(t) \sim \mathcal{N}(0, 1).\]This model leads to a log likelihood of the model parameters of
\[\log{L(\theta, \eta , \sigma | X^{\text{obs}})} = -\frac{n_t}{2} \log{2 \pi} -\sum_{i=1}^{n_t}{\log{f(t_i, \theta)^\eta \sigma}} -\frac{1}{2}\sum_{i=1}^{n_t}\left( \frac{X^{\text{obs}}_{i} - f(t_i, \theta)} {f(t_i, \theta)^\eta \sigma}\right) ^2,\]where \(n_t\) is the number of time points in the series, and \(X^{\text{obs}}_{i}\) the measurement at time \(t_i\).
For a system with \(n_o\) outputs, this becomes
\[\log{L(\theta, \eta , \sigma | X^{\text{obs}})} = -\frac{n_t n_o}{2} \log{2 \pi} -\sum ^{n_o}_{j=1}\sum_{i=1}^{n_t}{\log{f_j(t_i, \theta)^\eta \sigma _j}} -\frac{1}{2}\sum ^{n_o}_{j=1}\sum_{i=1}^{n_t}\left( \frac{X^{\text{obs}}_{ij} - f_j(t_i, \theta)} {f_j(t_i, \theta)^\eta \sigma _j}\right) ^2,\]where \(n_o\) is the number of outputs of the model, and \(X^{\text{obs}}_{ij}\) the measurement of output \(j\) at time point \(t_i\).
Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem two parameters are added (\(\eta\), \(\sigma\)), for a multi-output problem 2 times \(n_o\) parameters are added.-
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
ScaledLogLikelihood
(log_likelihood)[source]¶ Calculates a log-likelihood based on a (conditional)
ProblemLogLikelihood
divided by the number of time samples.The returned value will be
(1 / n) * log_likelihood(x|problem)
, wheren
is the number of time samples multiplied by the number of outputs.This log-likelihood operates on both single and multi-output problems.
Extends
ProblemLogLikelihood
.Parameters: log_likelihood – A ProblemLogLikelihood
to scale.-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.This method only works if the underlying
LogPDF
object implements the optional methodLogPDF.evaluateS1()
!
-
n_parameters
()¶
-
-
class
pints.
StudentTLogLikelihood
(problem)[source]¶ Calculates a log-likelihood assuming independent Student-t-distributed noise at each time point, and adds two parameters: one representing the degrees of freedom (
nu
), the other representing the scale (sigma
).For a noise characterised by
nu
andsigma
, the log likelihood is of the form:\[\log{L(\theta, \nu, \sigma|\boldsymbol{x})} = N\frac{\nu}{2}\log(\nu) - N\log(\sigma) - N\log B(\nu/2, 1/2) -\frac{1+\nu}{2}\sum_{i=1}^N\log(\nu + \frac{x_i - f(\theta)}{\sigma}^2)\]where
B(.,.)
is a beta function.Extends
ProblemLogLikelihood
.Parameters: problem – A SingleOutputProblem
orMultiOutputProblem
. For a single-output problem two parameters are added(nu, sigma)
, wherenu
is the degrees of freedom andsigma
is scale, for a multi-output problem2 * n_outputs
parameters are added.-
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶
-
-
class
pints.
UnknownNoiseLogLikelihood
(problem)[source]¶ Deprecated alias of
GaussianLogLikelihood
-
evaluateS1
(x)¶ See
LogPDF.evaluateS1()
.
-
n_parameters
()¶
-
Log-PDFs¶
LogPDFs
are callable objects that represent
distributions, including likelihoods and Bayesian priors and posteriors.
They are unnormalised, i.e. their area does not necessarily sum up to 1, and
for efficiency reasons we always work with the logarithm e.g. a log-likelihood
instead of a likelihood.
Example:
p = pints.GaussianLogPrior(mean=0, variance=1)
x = p(0.1)
Overview:
-
class
pints.
LogPDF
[source]¶ Represents the natural logarithm of a (not necessarily normalised) probability density function (PDF).
All
LogPDF
types are callable: when called with a vector argumentp
they return some valuelog(f(p))
wheref(p)
is an unnormalised PDF. The size of the argumentp
is given byn_parameters()
.-
evaluateS1
(x)[source]¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
-
class
pints.
LogPrior
[source]¶ Represents the natural logarithm
log(f(theta))
of a known probability density functionf(theta)
.Priors are usually normalised (i.e. the integral
f(theta)
over all pointstheta
in parameter space sums to 1), but this is not a strict requirement.Extends
LogPDF
.-
cdf
(x)[source]¶ Returns the cumulative density function at point(s)
x
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_from_unit_cube
(u)[source]¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)[source]¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
icdf
(p)[source]¶ Returns the inverse cumulative density function at cumulative probability/probabilities
p
.p
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
-
class
pints.
LogPosterior
(log_likelihood, log_prior)[source]¶ Represents the sum of a
LogPDF
and aLogPrior
defined on the same parameter space.As an optimisation, if the
LogPrior
evaluates as -inf for a particular point in parameter space, the correspondingLogPDF
will not be evaluated.Extends
LogPDF
.Parameters: -
evaluateS1
(x)[source]¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data has the shape
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.This method only works if the underlying :class:`LogPDF` and :class:`LogPrior` implement the optional method :meth:`LogPDF.evaluateS1()`!
-
-
class
pints.
PooledLogPDF
(log_pdfs, pooled)[source]¶ Combines \(m\)
LogPDFs
, each with \(n\) parameters, into a single LogPDF where \(k\) parameters are “pooled” (i.e. have the same value for each LogPDF), so that the resulting combined LogPDF has \(m (n - k) + k\) independent parameters.This is useful for e.g. modelling the time-series of multiple individuals (each individual defines a separate
LogPDF
), and some parameters are expected to be the same across individuals (for example, the noise parameter across different individuals within the same experiment).For two
LogPDFs
\(L _1\) and \(L _2\) with four parameters \((\psi ^{(1)}_1, \psi ^{(1)}_2, \psi ^{(1)}_3, \psi ^{(1)}_4)\) and \((\psi ^{(2)}_1, \psi ^{(2)}_2, \psi ^{(2)}_3, \psi ^{(2)}_4)\) respectively, a pooling of the second and third parameter \(\psi _2 := \psi ^{(1)}_2 = \psi ^{(2)}_2\), \(\psi _3 := \psi ^{(1)}_3 = \psi ^{(2)}_3\) results in a pooled log-pdf of the form\[L(\psi ^{(1)}_1, \psi ^{(1)}_4, \psi ^{(2)}_1, \psi ^{(2)}_4, \psi _2, \psi _3 | D_1, D_2) = L _1(\psi ^{(1)}_1, \psi _2, \psi _3, \psi ^{(1)}_4 | D_1) + L _2(\psi ^{(2)}_1, \psi _2, \psi _3, \psi ^{(2)}_4 | D_2),\]\(D_i\) is the measured time-series of individual \(i\). As \(k=2\) parameters where pooled across the log-likelihoods, the pooled log-likelihood has six parameters in the following order: \((\psi ^{(1)}_1, \psi ^{(1)}_4, \psi ^{(2)}_1, \psi ^{(2)}_4, \psi _2, \psi _3)\).
Note that the input parameters of a
PooledLogPDF
are not just a simple concatenation of the parameters of the individualLogPDFs
. The pooled parameters are only listed once and are moved to the end of the parameter list. This avoids inputting the value of the pooled parameters at mutliple positions. Otherwise the order of the parameters is determined firstly by the order of the likelihoods and then by the order of the parameters of those likelihoods.Extends
LogPDF
.Parameters: - log_pdfs – A sequence of
LogPDF
objects. - pooled – A sequence of booleans indicating which parameters across
the likelihoods are pooled (
True
) or remain unpooled (False
).
Example
pooled_log_likelihood = pints.PooledLogPDF( log_pdfs=[ pints.GaussianLogLikelihood(problem1), pints.GaussianLogLikelihood(problem2)], pooled=[False, True])
-
evaluateS1
(parameters)[source]¶ See
LogPDF.evaluateS1()
.The partial derivatives of the pooled log-likelihood with respect to unpooled parameters equals the partial derivative of the corresponding indiviudal log-likelihood.
\[\frac{\partial L}{\partial \psi} = \frac{\partial L_i}{\partial \psi},\]where \(L\) is the pooled log-likelihood, \(\psi\) an unpooled parameter and \(L _i\) the individual log-likelihood that depends on \(\psi\).
For a pooled parameter \(\theta\) the partial derivative of the pooled log-likelihood equals to the sum of partial derivatives of all individual log-likelihoods
\[\frac{\partial L}{\partial \theta} = \sum _{i=1}^n\frac{\partial L_i}{\partial \theta}.\]Here \(n\) is the number of individual log-likelihoods.
This method only works if all the underlying :class:`LogPDF` objects implement the optional method :meth:`LogPDF.evaluateS1()`!
- log_pdfs – A sequence of
-
class
pints.
ProblemLogLikelihood
(problem)[source]¶ Represents a log-likelihood on a problem’s parameter space, used to indicate the likelihood of an observed (fixed) time-series given a particular parameter set (variable).
Extends
LogPDF
.Parameters: problem – The time-series problem this log-likelihood is defined for. -
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
-
class
pints.
SumOfIndependentLogPDFs
(log_likelihoods)[source]¶ Calculates a sum of
LogPDF
objects, all defined on the same parameter space.This is useful for e.g. Bayesian inference using a single model evaluated on two independent data sets
D
andE
. In this case,\[\begin{split}f(\theta|D,E) &= \frac{f(D, E|\theta)f(\theta)}{f(D, E)} \\ &= \frac{f(D|\theta)f(E|\theta)f(\theta)}{f(D, E)}\end{split}\]Extends
LogPDF
.Parameters: log_likelihoods – A sequence of LogPDF
objects.Example
log_likelihood = pints.SumOfIndependentLogPDFs([ pints.GaussianLogLikelihood(problem1), pints.GaussianLogLikelihood(problem2), ])
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.This method only works if all the underlying :class:`LogPDF` objects implement the optional method :meth:`LogPDF.evaluateS1()`!
-
Log-priors¶
A number of LogPriors
are provided for use in e.g.
Bayesian inference.
Example:
p = pints.GaussianLogPrior(mean=0, variance=1)
x = p(0.1)
Overview:
BetaLogPrior
CauchyLogPrior
ComposedLogPrior
ExponentialLogPrior
GammaLogPrior
GaussianLogPrior
HalfCauchyLogPrior
InverseGammaLogPrior
LogNormalLogPrior
MultivariateGaussianLogPrior
NormalLogPrior
StudentTLogPrior
TruncatedGaussianLogPrior
UniformLogPrior
-
class
pints.
BetaLogPrior
(a, b)[source]¶ Defines a beta (log) prior with given shape parameters
a
andb
, with pdf\[f(x|a,b) = \frac{x^{a-1} (1-x)^{b-1}}{\mathrm{B}(a,b)}\]where \(\mathrm{B}\) is the Beta function. A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\frac{a}{a+b}.\]For example, to create a prior with shape parameters
a=5
andb=1
, use:p = pints.BetaLogPrior(5, 1)
Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
CauchyLogPrior
(location, scale)[source]¶ Defines a 1-d Cauchy (log) prior with a given
location
, andscale
, with pdf\[f(x|\text{location}, \text{scale}) = \frac{1}{\pi\;\text{scale} \left[1 + \left(\frac{x-\text{location}}{\text{scale}}\right)^2 \right]}.\]A random variable distributed according to this pdf has undefined expectation.
For example, to create a prior centered around 0 and a scale of 5, use:
p = pints.CauchyLogPrior(0, 5)
Extends
LogPrior
.Parameters: - location – The center of the distribution.
- scale – The scale of the distribution.
-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
class
pints.
ComposedLogPrior
(*priors)[source]¶ N-dimensional
LogPrior
composed of one or more other \(N_i\)- dimensional LogPriors, such that \(\sum _i N_i = N\). The evaluation of the composed log-prior assumes the input log-priors are all independent from each other.For example, a composed log prior
p = pints.ComposedLogPrior(log_prior1, log_prior2, log_prior3)
,where
log_prior1
,log_prior2
, andlog_prior3
each have dimension 1, 2 and 1, will have dimension 4.The dimensionality of the individual priors does not have to be the same, i.e. \(N_i\neq N_j\) is allowed.
The input parameters of the
ComposedLogPrior
have to be ordered in the same way as the individual priors. In the above example the prior may be evaluated byp(x)
, where:x = [parameter1_log_prior1, parameter1_log_prior2, parameter2_log_prior2, parameter1_log_prior3]
.Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.This method only works if the underlying :class:`LogPrior` classes all implement the optional method :class:`LogPDF.cdf().`.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.This method only works if the underlying :class:`LogPrior` classes all implement the optional method :class:`LogPDF.evaluateS1().`.
-
icdf
(x)[source]¶ See
LogPrior.icdf()
.This method only works if the underlying :class:`LogPrior` classes all implement the optional method :class:`LogPDF.icdf().`.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
ExponentialLogPrior
(rate)[source]¶ Defines an exponential (log) prior with given rate parameter
rate
with pdf\[f(x|\text{rate}) = \text{rate} \; e^{-\text{rate}\;x}.\]A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\frac{1}{\text{rate}}.\]For example, to create a prior with
rate=0.5
use:p = pints.ExponentialLogPrior(0.5)
Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
GammaLogPrior
(a, b)[source]¶ Defines a gamma (log) prior with given shape parameter
a
and rate parameterb
, with pdf\[f(x|a,b)=\frac{b^a x^{a-1} e^{-bx}}{\mathrm{\Gamma}(a)}.\]where \(\Gamma\) is the Gamma function. A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\frac{a}{b}.\]For example, to create a prior with shape parameters
a=5
andb=1
, use:p = pints.GammaLogPrior(5, 1)
Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
GaussianLogPrior
(mean, sd)[source]¶ Defines a 1-d Gaussian (log) prior with a given
mean
and standard deviationsd
, with pdf\[f(x|\text{mean},\text{sd}) = \frac{1}{\text{sd}\sqrt{2\pi}} \exp\left(-\frac{(x-\text{mean})^2}{2\;\text{sd}^2}\right).\]A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\text{mean}.\]For example, to create a prior with mean of
0
and a standard deviation of1
, use:p = pints.GaussianLogPrior(0, 1)
Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
HalfCauchyLogPrior
(location, scale)[source]¶ Defines a 1-d half-Cauchy (log) prior with a given
location
andscale
. This is a Cauchy distribution that has been truncated to lie in between \((0,\infty)\), with pdf\[\begin{split}f(x|\text{location},\text{scale})=\begin{cases}\frac{1}{\pi\; \text{scale}\left(\frac{1}{\pi}\arctan\left(\frac{\text{location}} {\text{scale} }\right)+\frac{1}{2}\right)\left(\frac{(x-\text{location} )^2}{\text{scale}^2}+1\right)},&x>0\\0,&\text{otherwise.}\end{cases}\end{split}\]A random variable distributed according to this pdf has undefined expectation.
For example, to create a prior centered around 0 and a scale of 5, use:
p = pints.HalfCauchyLogPrior(0, 5)
Extends
LogPrior
.Parameters: - location – The center of the distribution.
- scale – The scale of the distribution.
-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
class
pints.
InverseGammaLogPrior
(a, b)[source]¶ Defines an inverse gamma (log) prior with given shape parameter
a
and scale parameterb
, with pdf\[\begin{split}f(x|a,b)=\begin{cases}\frac{b^a}{\Gamma(a)}x^{-a-1}\exp \left(-\frac{b}{x}\right),&x>0\\0,&\text{otherwise.}\end{cases}\end{split}\]where \(\Gamma\) is the Gamma function. A random variable \(X\) distributed according to this pdf has expectation
\[\begin{split}\mathrm{E}(X)=\begin{cases}\frac{b}{a-1},&a>1\\ \text{undefined},&\text{otherwise.}\end{cases}\end{split}\]For example, to create a prior with shape parameter
a=5
and scale parameterb=1
, use:p = pints.InverseGammaLogPrior(5, 1)
Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
LogNormalLogPrior
(log_mean, scale)[source]¶ Defines a log-normal (log) prior with a given
log_mean
and scalescale
. Thelog_mean
parameter of a log-normal distribution is the mean of a normal distribution whose random samples, when exponentiated, yield samples from a log-normal distribution. This log-normal distribution has pdf\[f(x|\text{log_mean},\text{scale}) = \frac{1}{x\;\text{scale} \sqrt{2\pi}}\exp\left(-\frac{(\log x-\text{log_mean})^2}{2\; \text{scale}^2}\right).\]A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\exp\left(\text{log_mean}+\frac{\text{scale}^2}{2} \right).\]For example, to create a prior with log_mean of
0
and a scale of1
, use:p = pints.LogNormalLogPrior(0, 1)
Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
MultivariateGaussianLogPrior
(mean, cov)[source]¶ Defines a multivariate Gaussian (log) prior with a given
mean
and covariance matrixcov
, with pdf\[f(x|\text{mean},\text{cov}) = \frac{1}{(2\pi)^{d/2}| \text{cov}|^{1/2}} \exp\left(-\frac{1}{2}(x-\text{mean})' \text{cov}^{-1}(x-\text{mean})\right).\]A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\text{mean}.\]For example, to create a prior with zero mean and identity covariance, use:
p = pints.MultivariateGaussianLogPrior( np.array([0, 0]), np.array([[1, 0],[0, 1]]))
Extends
LogPrior
.-
cdf
(x)¶ Returns the cumulative density function at point(s)
x
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_from_unit_cube
(u)[source]¶ Converts a sample
u
uniformly drawn from the unit cube into one drawn from the prior space, usingMultivariateGaussianLogPrior.pseudo_icdf()
.
-
convert_to_unit_cube
(x)[source]¶ Converts a sample from the prior
x
to be drawn uniformly from the unit cube usingMultivariateGaussianLogPrior.pseudo_cdf()
.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)¶ Returns the inverse cumulative density function at cumulative probability/probabilities
p
.p
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
pseudo_cdf
(xs)[source]¶ Calculates a pseudo-cdf for a multivariate Gaussian as described in Feroz et al. (2009) (“Multnest…”). In this approach, a multivariate Gaussian is factorised:
\[\pi(\theta_1,\theta_2,...,\theta_d) = \pi_1(\theta_1) \pi_2(\theta_2|\theta_1)... \pi_d(\theta_d|\theta_1, \theta_2,...,\theta_{d-1})\]The cdfs we report are then the values for each individual conditional. For example, for the second component, we calculate:
\[u_2 = \int_{-\infty}^{\theta_2} \pi_2(\theta_2|\theta_1)d\theta_2\]So that we return a vector of cdfs (u_1,u_2,…,u_d). Note that, this function is mainly to facilitate Multinest sampling since the distribution (u_1,u_2,…,u_d) is uniform within the unit cube.
-
pseudo_icdf
(ps)[source]¶ Calculates a pseudo-icdf for a multivariate Gaussian as described in Feroz et al. (2009) (“Multnest…”). In this approach, a multivariate Gaussian is factorised:
\[\pi(\theta_1,\theta_2,...,\theta_d) = \pi_1(\theta_1) \pi_2(\theta_2|\theta_1)... \pi_d(\theta_d|\theta_1, \theta_2,...,\theta_{d-1})\]The icdfs we report are then the values for each individual conditional. For example, for the second component, we calculate the theta_2 value that satisfies:
\[u_2 = \int_{-\infty}^{\theta_2} \pi_2(\theta_2|\theta_1)d\theta_2\]So that we return a vector of icdfs (theta_1,theta_2,…,theta_d) Note that, this function is mainly to facilitate Multinest sampling since the distribution (u_1,u_2,…,u_d) is uniform within the unit cube.
-
-
class
pints.
NormalLogPrior
(mean, standard_deviation)[source]¶ Deprecated alias of
GaussianLogPrior
.-
cdf
(x)¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)¶ See
LogPrior.icdf()
.
-
mean
()¶ See
LogPrior.mean()
.
-
n_parameters
()¶
-
sample
(n=1)¶ See
LogPrior.sample()
.
-
-
class
pints.
StudentTLogPrior
(location, df, scale)[source]¶ Defines a 1-d Student-t (log) prior with a given
location
, degrees of freedomdf
, andscale
with pdf\[f(x|\text{location},\text{scale},\text{df})=\frac{\left(\frac{ \text{df}}{\text{df}+\frac{(x-\text{location})^2}{\text{scale}^2}} \right)^{\frac{\text{df}+1}{2}}}{\sqrt{\text{df}}\;\text{scale} \;\mathrm{B}\left(\frac{\text{df} }{2},\frac{1}{2}\right)}.\]where \(\mathrm{B}\) is the Beta function. A random variable \(X\) distributed according to this pdf has expectation
\[\begin{split}\mathrm{E}(X)=\begin{cases}\text{location},&\text{df}>1\\\ \text{undefined},&\text{otherwise.}\end{cases}\end{split}\]For example, to create a prior centered around 0 with 3 degrees of freedom and a scale of 1, use:
p = pints.StudentTLogPrior(0, 3, 1)
Extends
LogPrior
.Parameters: - location – The center of the distribution.
- df (int) – The number of degrees of freedom of the distribution.
- scale – The scale of the distribution.
-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
class
pints.
TruncatedGaussianLogPrior
(mean, sd, a, b)[source]¶ Defines a truncated Gaussian log prior.
This distribution is also known as the truncated Normal distribution.
The truncated Gaussian distribution is similar to the Gaussian distribution, but constrained to lie between two values.
The parameters are the mean
mean
and standard deviationsd
, as in the Gaussian distribution, as well as a lower bounda
and an upper boundb
.The pdf of the truncated Gaussian distribution is given by
\[f(x|\mu, \sigma, a, b) = \frac{1}{\sigma\sqrt{2\pi}} \exp \left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \frac{1} {\Phi((b-\mu) / \sigma) - \Phi((a-\mu) / \sigma)}\]for \(x \in [a, b]\), where \(\mu\) indicates the mean and \(\sigma\) indicates the standard deviation, and \(\Phi\) is the standard normal CDF.
For example, to create a prior with mean of 0 and a standard deviation of 1, bounded above at 3 and below at -2, use:
p = pints.TruncatedGaussianLogPrior(0, 1, -2, 3)
For a Gaussian distribution truncated on only one side,
numpy.inf
or-numpy.inf
can be used for the unbounded side.Extends
LogPrior
.-
cdf
(x)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(x)[source]¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
-
class
pints.
UniformLogPrior
(lower_or_boundaries, upper=None)[source]¶ Defines a uniform prior over a given range.
The range includes the lower, but not the upper boundaries, so that any point
x
with a non-zero prior must havelower <= x < upper
.In 1D this has pdf
\[\begin{split}f(x|\text{lower},\text{upper})=\begin{cases}0,&\text{if }x\not\in [\text{lower},\text{upper})\\\frac{1}{\text{upper}-\text{lower}} ,&\text{if }x\in[\text{lower},\text{upper})\end{cases}.\end{split}\]A random variable \(X\) distributed according to this pdf has expectation
\[\mathrm{E}(X)=\frac{1}{2}(\text{lower}+\text{upper}).\]For example, to create a prior with \(x\in[0,4]\), \(y\in[1,5]\), and \(z\in[2,6]\) use either:
p = pints.UniformLogPrior([0, 1, 2], [4, 5, 6])
or:
p = pints.UniformLogPrior(RectangularBoundaries([0, 1, 2], [4, 5, 6]))
Extends
LogPrior
.-
cdf
(xs)[source]¶ See
LogPrior.cdf()
.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
icdf
(ps)[source]¶ See
LogPrior.icdf()
.
-
mean
()[source]¶ See
LogPrior.mean()
.
-
sample
(n=1)[source]¶ See
LogPrior.sample()
.
-
MCMC Samplers¶
Pints provides a number of MCMC methods, all implementing the MCMC
interface, that can be used to sample from an unknown
PDF
(usually a Bayesian
Posterior
).
Running an MCMC routine¶
-
pints.
mcmc_sample
(log_pdf, chains, x0, sigma0=None, transformation=None, method=None)[source]¶ Sample from a
pints.LogPDF
using a Markov Chain Monte Carlo (MCMC) method.Parameters: - log_pdf (pints.LogPDF) – A
LogPDF
function that evaluates points in the parameter space. - chains (int) – The number of MCMC chains to generate.
- x0 – A sequence of starting points. Can be a list of lists, a 2-dimensional
array, or any other structure such that
x0[i]
is the starting point for chaini
. - sigma0 – An optional initial covariance matrix, i.e., a guess of the covariance
in
logpdf
around the points inx0
(the samesigma0
is used for each point inx0
). Can be specified as a(d, d)
matrix (whered
is the dimension of the parameterspace) or as a(d, )
vector, in which casediag(sigma0)
will be used. - transformation (pints.Transformation) – An optional
pints.Transformation
to allow the sampler to work in a transformed parameter space. If used, points shown or returned to the user will first be detransformed back to the original space. - method (class) – The class of
MCMCSampler
to use. If no method is specified,HaarioBardenetACMC
is used.
- log_pdf (pints.LogPDF) – A
-
class
pints.
MCMCController
(log_pdf, chains, x0, sigma0=None, transformation=None, method=None)[source]¶ Samples from a
pints.LogPDF
using a Markov Chain Monte Carlo (MCMC) method.The method to use (either a
SingleChainMCMC
class or aMultiChainMCMC
class) is specified at runtime. For example:mcmc = pints.MCMCController( log_pdf, 3, x0, method=pints.HaarioBardenetACMC)
Properties related to the number if iterations, parallelisation, and logging can be set directly on the
MCMCController
object, e.g.:mcmc.set_max_iterations(1000)
Sampler specific properties must be set on the internal samplers themselves, e.g.:
for sampler in mcmc.samplers(): sampler.set_target_acceptance_rate(0.2)
Finally, to run an MCMC routine, call:
chains = mcmc.run()
By default, an MCMCController run will write regular progress updates to screen. This can be disabled using
set_log_to_screen()
. To write a similar progress log to a file, useset_log_to_file()
. To store the chains and/or evaluations generated byrun()
to a file, useset_chain_filename()
andset_log_pdf_filename()
.Parameters: - log_pdf (pints.LogPDF) – A
LogPDF
function that evaluates points in the parameter space. - chains (int) – The number of MCMC chains to generate.
- x0 – A sequence of starting points. Can be a list of lists, a 2-dimensional
array, or any other structure such that
x0[i]
is the starting point for chaini
. - sigma0 – An optional initial covariance matrix, i.e., a guess of the covariance
in
logpdf
around the points inx0
(the samesigma0
is used for each point inx0
). Can be specified as a(d, d)
matrix (whered
is the dimension of the parameter space) or as a(d, )
vector, in which casediag(sigma0)
will be used. - transformation (pints.Transformation) – An optional
pints.Transformation
to allow the sampler to work in a transformed parameter space. If used, points shown or returned to the user will first be detransformed back to the original space. - method (class) – The class of
MCMCSampler
to use. If no method is specified,HaarioBardenetACMC
is used.
-
chains
()[source]¶ Returns the chains generated by
run()
.The returned array has shape
(n_chains, n_iterations, n_parameters)
.If the controller has not run yet, or if chain storage to memory is disabled, this method will return
None
.
-
initial_phase_iterations
()[source]¶ For methods that require an initial phase (e.g. an adaptation-free phase for the adaptive covariance MCMC method), this returns the number of iterations that the initial phase will take.
For methods that do not require an initial phase, a
NotImplementedError
is raised.
-
log_pdfs
()[source]¶ Returns the
LogPDF
evaluations generated byrun()
.If a
LogPosterior
was used, the returned array will have shape(n_chains, n_iterations, 3)
, and for each sample the LogPDF, LogLikelihood, and LogPrior will be stored. For all other cases, only the full LogPDF evaluations are returned, in an array of shape(n_chains, n_iterations)
.If the controller has not run yet, or if storage of evaluations to memory is disabled (default), this method will return
None
.
-
max_iterations
()[source]¶ Returns the maximum iterations if this stopping criterion is set, or
None
if it is not. Seeset_max_iterations()
.
-
method_needs_initial_phase
()[source]¶ Returns true if this sampler has been created with a method that has an initial phase (see
MCMCSampler.needs_initial_phase()
.)
-
n_evaluations
()[source]¶ Returns the number of evaluations performed during the last run, or
None
if the controller hasn’t run yet.
-
parallel
()[source]¶ Returns the number of parallel worker processes this routine will be run on, or
False
if parallelisation is disabled.
-
run
()[source]¶ Runs the MCMC sampler(s) and returns the result.
By default, this method returns an array of shape
(n_chains, n_iterations, n_parameters)
. If storing chains to memory has been disabled withset_chain_storage()
, thenNone
is returned instead.
-
sampler
()[source]¶ Returns the underlying
MultiChainMCMC
object, or raises an error ifSingleChainMCMC
objects are being used.See also:
samplers()
.
-
samplers
()[source]¶ Returns a list containing the underlying sampler objects.
If a
SingleChainMCMC
method was selected, this will be a list containing as manySingleChainMCMC
objects as the number of chains. If aMultiChainMCMC
method was selected, this will be a list containing a singleMultiChainMCMC
instance.
-
set_chain_filename
(chain_file)[source]¶ Write chains to disk as they are generated.
If a
chain_file
is specified, a CSV file will be created for each chain, to which samples will be written as they are accepted. To disable logging of chains, setchain_file=None
.Filenames for each chain file will be derived from
chain_file
, e.g. ifchain_file='chain.csv'
and there are 2 chains, then the fileschain_0.csv
andchain_1.csv
will be created. Each CSV file will start with a header (e.g."p0","p1","p2",...
) and contain a sample on each subsequent line.
-
set_chain_storage
(store_in_memory=True)[source]¶ Store chains in memory as they are generated.
By default, all generated chains are stored in memory as they are generated, and returned by
run()
. This method allows this behaviour to be disabled, which can be useful for very large chains which are already stored to disk (seeset_chain_filename()
).
-
set_initial_phase_iterations
(iterations=200)[source]¶ For methods that require an initial phase (e.g. an adaptation-free phase for the adaptive covariance MCMC method), this sets the number of iterations that the initial phase will take.
For methods that do not require an initial phase, a
NotImplementedError
is raised.
-
set_log_interval
(iters=20, warm_up=3)[source]¶ Changes the frequency with which messages are logged.
Parameters: - iters (int) – A log message will be shown every
iters
iterations. - warm_up (int) – A log message will be shown every iteration, for the first
warm_up
iterations.
- iters (int) – A log message will be shown every
-
set_log_pdf_filename
(log_pdf_file)[source]¶ Write
LogPDF
evaluations to disk as they are generated.If an
evaluation_file
is specified, a CSV file will be created for each chain, to whichLogPDF
evaluations will be written for every accepted sample. To disable this feature, setevaluation_file=None
. If theLogPDF
being evaluated is aLogPosterior
, the individual likelihood and prior will also be stored.Filenames for each evaluation file will be derived from
evaluation_file
, e.g. ifevaluation_file='evals.csv'
and there are 2 chains, then the filesevals_0.csv
andevals_1.csv
will be created. Each CSV file will start with a header (e.g."logposterior","loglikelihood","logprior"
) and contain the evaluations for i-th accepted sample on the i-th subsequent line.
-
set_log_pdf_storage
(store_in_memory=False)[source]¶ Store
LogPDF
evaluations in memory as they are generated.By default, evaluations of the
LogPDF
are not stored. This method can be used to enable storage of the evaluations for the accepted samples. After running, evaluations can be obtained usingevaluations()
.
-
set_log_to_file
(filename=None, csv=False)[source]¶ Enables progress logging to file when a filename is passed in, disables it if
filename
isFalse
orNone
.The argument
csv
can be set toTrue
to write the file in comma separated value (CSV) format. By default, the file contents will be similar to the output on screen.
-
set_max_iterations
(iterations=10000)[source]¶ Adds a stopping criterion, allowing the routine to halt after the given number of iterations.
This criterion is enabled by default. To disable it, use set_max_iterations(None).
-
set_parallel
(parallel=False)[source]¶ Enables/disables parallel evaluation.
If
parallel=True
, the method will run using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. Parallelisation can be disabled by settingparallel
to0
orFalse
.
- log_pdf (pints.LogPDF) – A
-
class
pints.
MCMCSampling
(log_pdf, chains, x0, sigma0=None, transformation=None, method=None)[source]¶ Deprecated alias for
MCMCController
.-
chains
()¶ Returns the chains generated by
run()
.The returned array has shape
(n_chains, n_iterations, n_parameters)
.If the controller has not run yet, or if chain storage to memory is disabled, this method will return
None
.
-
initial_phase_iterations
()¶ For methods that require an initial phase (e.g. an adaptation-free phase for the adaptive covariance MCMC method), this returns the number of iterations that the initial phase will take.
For methods that do not require an initial phase, a
NotImplementedError
is raised.
-
log_pdfs
()¶ Returns the
LogPDF
evaluations generated byrun()
.If a
LogPosterior
was used, the returned array will have shape(n_chains, n_iterations, 3)
, and for each sample the LogPDF, LogLikelihood, and LogPrior will be stored. For all other cases, only the full LogPDF evaluations are returned, in an array of shape(n_chains, n_iterations)
.If the controller has not run yet, or if storage of evaluations to memory is disabled (default), this method will return
None
.
-
max_iterations
()¶ Returns the maximum iterations if this stopping criterion is set, or
None
if it is not. Seeset_max_iterations()
.
-
method_needs_initial_phase
()¶ Returns true if this sampler has been created with a method that has an initial phase (see
MCMCSampler.needs_initial_phase()
.)
-
n_evaluations
()¶ Returns the number of evaluations performed during the last run, or
None
if the controller hasn’t run yet.
-
parallel
()¶ Returns the number of parallel worker processes this routine will be run on, or
False
if parallelisation is disabled.
-
run
()¶ Runs the MCMC sampler(s) and returns the result.
By default, this method returns an array of shape
(n_chains, n_iterations, n_parameters)
. If storing chains to memory has been disabled withset_chain_storage()
, thenNone
is returned instead.
-
sampler
()¶ Returns the underlying
MultiChainMCMC
object, or raises an error ifSingleChainMCMC
objects are being used.See also:
samplers()
.
-
samplers
()¶ Returns a list containing the underlying sampler objects.
If a
SingleChainMCMC
method was selected, this will be a list containing as manySingleChainMCMC
objects as the number of chains. If aMultiChainMCMC
method was selected, this will be a list containing a singleMultiChainMCMC
instance.
-
set_chain_filename
(chain_file)¶ Write chains to disk as they are generated.
If a
chain_file
is specified, a CSV file will be created for each chain, to which samples will be written as they are accepted. To disable logging of chains, setchain_file=None
.Filenames for each chain file will be derived from
chain_file
, e.g. ifchain_file='chain.csv'
and there are 2 chains, then the fileschain_0.csv
andchain_1.csv
will be created. Each CSV file will start with a header (e.g."p0","p1","p2",...
) and contain a sample on each subsequent line.
-
set_chain_storage
(store_in_memory=True)¶ Store chains in memory as they are generated.
By default, all generated chains are stored in memory as they are generated, and returned by
run()
. This method allows this behaviour to be disabled, which can be useful for very large chains which are already stored to disk (seeset_chain_filename()
).
-
set_initial_phase_iterations
(iterations=200)¶ For methods that require an initial phase (e.g. an adaptation-free phase for the adaptive covariance MCMC method), this sets the number of iterations that the initial phase will take.
For methods that do not require an initial phase, a
NotImplementedError
is raised.
-
set_log_interval
(iters=20, warm_up=3)¶ Changes the frequency with which messages are logged.
Parameters: - iters (int) – A log message will be shown every
iters
iterations. - warm_up (int) – A log message will be shown every iteration, for the first
warm_up
iterations.
- iters (int) – A log message will be shown every
-
set_log_pdf_filename
(log_pdf_file)¶ Write
LogPDF
evaluations to disk as they are generated.If an
evaluation_file
is specified, a CSV file will be created for each chain, to whichLogPDF
evaluations will be written for every accepted sample. To disable this feature, setevaluation_file=None
. If theLogPDF
being evaluated is aLogPosterior
, the individual likelihood and prior will also be stored.Filenames for each evaluation file will be derived from
evaluation_file
, e.g. ifevaluation_file='evals.csv'
and there are 2 chains, then the filesevals_0.csv
andevals_1.csv
will be created. Each CSV file will start with a header (e.g."logposterior","loglikelihood","logprior"
) and contain the evaluations for i-th accepted sample on the i-th subsequent line.
-
set_log_pdf_storage
(store_in_memory=False)¶ Store
LogPDF
evaluations in memory as they are generated.By default, evaluations of the
LogPDF
are not stored. This method can be used to enable storage of the evaluations for the accepted samples. After running, evaluations can be obtained usingevaluations()
.
-
set_log_to_file
(filename=None, csv=False)¶ Enables progress logging to file when a filename is passed in, disables it if
filename
isFalse
orNone
.The argument
csv
can be set toTrue
to write the file in comma separated value (CSV) format. By default, the file contents will be similar to the output on screen.
-
set_log_to_screen
(enabled)¶ Enables or disables progress logging to screen.
-
set_max_iterations
(iterations=10000)¶ Adds a stopping criterion, allowing the routine to halt after the given number of iterations.
This criterion is enabled by default. To disable it, use set_max_iterations(None).
-
set_parallel
(parallel=False)¶ Enables/disables parallel evaluation.
If
parallel=True
, the method will run using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. Parallelisation can be disabled by settingparallel
to0
orFalse
.
-
time
()¶ Returns the time needed for the last run, in seconds, or
None
if the controller hasn’t run yet.
-
MCMC Sampler base classes¶
-
class
pints.
MCMCSampler
[source]¶ Abstract base class for (single or multi-chain) MCMC methods.
All MCMC samplers implement the
pints.Loggable
andpints.TunableMethod
interfaces.-
in_initial_phase
()[source]¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
n_hyper_parameters
()¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
needs_initial_phase
()[source]¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()[source]¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
set_hyper_parameters
(x)¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
set_initial_phase
(in_initial_phase)[source]¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
-
class
pints.
SingleChainMCMC
(x0, sigma0=None)[source]¶ Abstract base class for MCMC methods that generate a single markov chain, via an ask-and-tell interface.
Extends
MCMCSampler
.Parameters: - x0 – An starting point in the parameter space.
- sigma0 – An optional (initial) covariance matrix, i.e., a guess of the
covariance of the distribution to estimate, around
x0
.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
n_hyper_parameters
()¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
name
()¶ Returns this method’s full name.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)[source]¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_hyper_parameters
(x)¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
tell
(fx)[source]¶ Performs an iteration of the MCMC algorithm, using the
pints.LogPDF
evaluationfx
of the pointx
specified byask
.For methods that require sensitivities (see
MCMCSamper.needs_sensitivities()
),fx
should be a tuple(log_pdf, sensitivities)
, containing the values returned bypints.LogPdf.evaluateS1()
.After a successful call,
tell()
returns a tuple(x, fx, accepted)
, wherex
contains the current position of the chain,fx
contains the corresponding evaluation, andaccepted
is a boolean indicating whether the last evaluated sample was added to the chain.Some methods may require multiple ask-tell calls per iteration. These methods can return
None
to indicate an iteration is still in progress.
-
class
pints.
MultiChainMCMC
(chains, x0, sigma0=None)[source]¶ Abstract base class for MCMC methods that generate multiple markov chains, via an ask-and-tell interface.
Extends
MCMCSampler
.Parameters: - chains (int) – The number of MCMC chains to generate.
- x0 – A sequence of starting points. Can be a list of lists, a 2-dimensional
array, or any other structure such that
x0[i]
is the starting point for chaini
. - sigma0 – An optional initial covariance matrix, i.e., a guess of the covariance
in
logpdf
around the points inx0
(the samesigma0
is used for each point inx0
). Can be specified as a(d, d)
matrix (whered
is the dimension of the parameterspace) or as a(d, )
vector, in which casediag(sigma0)
will be used.
-
current_log_pdfs
()[source]¶ Returns the log pdf values of the current points (i.e. of the most recent points returned by
tell()
).
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
n_hyper_parameters
()¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
name
()¶ Returns this method’s full name.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
set_hyper_parameters
(x)¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
tell
(fxs)[source]¶ Performs an iteration of the MCMC algorithm, using the
pints.LogPDF
evaluationsfxs
of the pointsxs
specified byask
.For methods that require sensitivities (see
MCMCSamper.needs_sensitivities()
), each entry infxs
should be a tuple(log_pdf, sensitivities)
, containing the values returned bypints.LogPdf.evaluateS1()
.After a successful call,
tell()
returns a tuple(xs, fxs, accepted)
, wherex
contains the current position of the chain,fx
contains the corresponding evaluation, andaccepted
is an array of booleans indicating whether the last evaluated sample was added to the chain.Some methods may require multiple ask-tell calls per iteration. These methods can return
None
to indicate an iteration is still in progress.
Adaptive Covariance MC¶
-
class
pints.
AdaptiveCovarianceMC
(x0, sigma0=None)[source]¶ Base class for single chain MCMC methods that globally adapt a proposal covariance matrix when running, in order to control the acceptance rate.
Each subclass should provide a method
_generate_proposal()
that will be called byask()
.Adaptation is implemented with three methods, which are called in sequence, at the end of every
tell()
:_adapt_mu()
,_adapt_sigma()
, and_adapt_internal()
. A basic implementation is provided for each, which extending methods can choose to override.Extends
SingleChainMCMC
.-
eta
()[source]¶ Returns
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
name
()¶ Returns this method’s full name.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
set_eta
(eta)[source]¶ Updates
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
Differential Evolution MCMC¶
-
class
pints.
DifferentialEvolutionMCMC
(chains, x0, sigma0=None)[source]¶ Uses differential evolution MCMC as described in [1] to perform posterior sampling from the posterior.
In each step of the algorithm
n
chains are evolved using the evolution equation:x_proposed = x[i,r] + gamma * (X[i,r1] - x[i,r2]) + epsilon
where
r1
andr2
are random chain indices chosen (without replacement) from then
available chains, which must not equali
or each other, wherei
indicates the current time step, andepsilon ~ N(0,b)
whered
is the dimensionality of the parameter vector.If
x_proposed / x[i,r] > u ~ U(0,1)
, thenx[i+1,r] = x_proposed
; otherwise,x[i+1,r] = x[i]
.Extends
MultiChainMCMC
.Note
This sampler requires a number of chains \(n \ge 3\), and recommends \(n \ge 1.5 d\).
References
[1] “A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces”. Cajo J. F. Ter Braak (2006) Statistical Computing https://doi.org/10.1007/s11222-006-8769-1 -
current_log_pdfs
()¶ Returns the log pdf values of the current points (i.e. of the most recent points returned by
tell()
).
-
gamma_switch_rate
()[source]¶ Returns the number of steps between iterations where gamma is set to 1 (then reset immediately afterwards).
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
relative_scaling
()[source]¶ Returns whether an error process whose standard deviation scales relatively is used (False indicates absolute scale).
-
scale_coefficient
()[source]¶ Sets the scale coefficient
b
of the error process used in updating the position of each chain.
-
set_gamma_switch_rate
(gamma_switch_rate)[source]¶ Sets the number of steps between iterations where gamma is set to 1 (then reset immediately afterwards).
-
set_gaussian_error
(gaussian_error)[source]¶ If
True
sets the error process to be a gaussian error,N(0, b*)
; ifFalse
, it uses a uniform errorU(-b*, b*)
; whereb* = b
if absolute scaling used andb* = mu * b
if relative scaling is used instead.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[gamma, gaussian_scale_coefficient, gamma_switch_rate, gaussian_error, relative_scaling]
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_relative_scaling
(relative_scaling)[source]¶ Sets whether to use an error process whose standard deviation scales relatively (
scale = self._mu * self_b
) or absolutely (scale = self._b
in all dimensions).
-
set_scale_coefficient
(b)[source]¶ Sets the scale coefficient
b
of the error process used in updating the position of each chain.
-
Dram ACMC¶
-
class
pints.
DramACMC
(x0, sigma0=None)[source]¶ DRAM (Delayed Rejection Adaptive Covariance) MCMC, as described in [1].
In this method, rejections do not necessarily lead an iteration to end. Instead, if a rejection occurs, another point is proposed although typically from a narrower (i.e. more conservative) proposal kernel than was used for the first proposal.
In this approach, in each iteration, the following steps return the next state of the Markov chain (assuming the current state is
theta_0
and that there are 2 proposal kernels):theta_1 ~ N(theta_0, lambda * scale_1 * sigma) alpha_1(theta_0, theta_1) = min(1, p(theta_1|X) / p(theta_0|X)) u_1 ~ uniform(0, 1) if alpha_1(theta_0, theta_1) > u_1: return theta_1 theta_2 ~ N(theta_0, lambda * scale_2 * sigma0) alpha_2(theta_0, theta_1, theta_2) = min(1, p(theta_2|X) (1 - alpha_1(theta_2, theta_1)) / (p(theta_0|X) (1 - alpha_1(theta_0, theta_1)))) u_2 ~ uniform(0, 1) if alpha_2(theta_0, theta_1, theta_2) > u_2: return theta_2 else: return theta_0
Our implementation also allows more than 2 proposal kernels to be used. This means that
k
accept-reject steps are taken. In each step (i
), the probability that a proposaltheta_i
is accepted is:alpha_i(theta_0, theta_1, ..., theta_i) = min(1, p(theta_i|X) / p(theta_0|X) * n_i / d_i)
where:
n_i = (1 - alpha_1(theta_i, theta_i-1)) * (1 - alpha_2(theta_i, theta_i-1, theta_i-2)) * ... (1 - alpha_i-1(theta_i, theta_i-1, ..., theta_0)) d_i = (1 - alpha_1(theta_0, theta_1)) * (1 - alpha_2(theta_0, theta_1, theta_2)) * ... (1 - alpha_i-1(theta_0, theta_1, ..., theta_i-1))
If
k
proposals have been rejected, the initial pointtheta_0
is returned.At the end of each iterations, a ‘base’ proposal kernel is adapted:
mu = (1 - gamma) mu + gamma theta sigma = (1 - gamma) sigma + gamma (theta - mu)(theta - mu)^t log_lambda = log_lambda + gamma (accepted - target_acceptance_rate)
where
gamma = adaptations^-eta
,theta
is the current state of the Markov chain andaccepted
is a binary indicator for whether any of the series of proposals were accepted. The kernels for the all proposals are then adapted as[scale_1, scale_2, ..., scale_k] * sigma
, where the scale factors are set usingset_sigma_scale
.Extends:
GlobalAdaptiveCovarianceMC
References
[1] (1, 2) “DRAM: Efficient adaptive MCMC”. H Haario, M Laine, A Mira, E Saksman, (2006) Statistical Computing https://doi.org/10.1007/s11222-006-9438-0 -
acceptance_rate
()¶ Returns the current (measured) acceptance rate.
-
ask
()¶
-
eta
()¶ Returns
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
in_initial_phase
()¶
-
needs_initial_phase
()¶
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶
-
set_eta
(eta)¶ Updates
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
set_initial_phase
(initial_phase)¶
-
set_sigma_scale
()[source]¶ Set the scale of initial covariance matrix multipliers for each of the kernels:
[0,...,upper]
where the gradations are uniform on the log10 scale meaning the proposal covariance matrices are:[10^upper,..., 1] * sigma
.
-
set_target_acceptance_rate
(rate=0.234)¶ Sets the target acceptance rate.
-
set_upper_scale
(upper_scale)[source]¶ Set the upper scale of initial covariance matrix multipliers for each of the kernels:
[0,...,upper]
where the gradations are uniform on the log10 scale meaning the proposal covariance matrices are:[10^upper,..., 1] * sigma
.
-
sigma_scale
()[source]¶ Returns scale factors used to multiply a base covariance matrix, resulting in proposal matrices for each accept-reject step.
-
target_acceptance_rate
()¶ Returns the target acceptance rate.
-
tell
(fx)[source]¶ If first proposal, then accept with ordinary Metropolis probability; if a later proposal, use probability determined by [1].
-
upper_scale
()[source]¶ Returns upper scale limit (see
pints.DramACMC.set_upper_scale()
).
-
DreamMCMC¶
-
class
pints.
DreamMCMC
(chains, x0, sigma0=None)[source]¶ Uses differential evolution adaptive Metropolis (DREAM) MCMC as described in [1] to perform posterior sampling from the posterior.
In each step of the algorithm N chains are evolved using the following steps:
Select proposal:
x_proposed = x[i,r] + (1 + e) * gamma(delta, d, p_g) * sum_j=1^delta (X[i,r1[j]] - x[i,r2[j]]) + epsilon
where [r1[j], r2[j]] are random chain indices chosen (without replacement) from the
N
available chains, which must not equal each other ori
, wherei
indicates the current time step;delta ~ uniform_discrete(1,D)
determines the number of terms to include in the summation:e ~ U(-b*, b*) in d dimensions; gamma(delta, d, p_g) = if p_g < u1 ~ U(0,1): 2.38 / sqrt(2 * delta * d) else: 1
epsilon ~ N(0,b)
ind
dimensions (whered
is the dimensionality of the parameter vector).2. Modify random subsets of the proposal according to a crossover probability CR:
for j in 1:N: if 1 - CR > u2 ~ U(0,1): x_proposed[j] = x[j] else: x_proposed[j] = x_proposed[j] from 1
If
x_proposed / x[i,r] > u ~ U(0,1)
, thenx[i+1,r] = x_proposed
; otherwise,x[i+1,r] = x[i]
.Here b > 0, b* > 0, 1 >= p_g >= 0, 1 >= CR >= 0.
Extends
MultiChainMCMC
.References
[1] “Accelerating Markov Chain Monte Carlo Simulation by Differential Evolution with Self-Adaptive Randomized Subspace Sampling”, 2009, Vrugt et al., International Journal of Nonlinear Sciences and Numerical Simulation. https://doi.org/10.1515/IJNSNS.2009.10.3.273 -
CR
()[source]¶ Returns the probability of crossover occurring if constant crossover mode is enabled (see
set_CR()
).
-
b_star
()[source]¶ Returns b*, which determines the weight given to other chains’ positions in determining new positions (see
set_b_star()
).
-
current_log_pdfs
()¶ Returns the log pdf values of the current points (i.e. of the most recent points returned by
tell()
).
-
delta_max
()[source]¶ Returns the maximum number of other chains’ positions to use to determine the next sampler position (see
set_delta_max()
).
-
nCR
()[source]¶ Returns the size of the discrete crossover probability distribution (only used if constant crossover mode is disabled), see
set_nCR()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
set_CR
(CR)[source]¶ Sets the probability of crossover occurring if constant crossover mode is enabled. CR is a probability and so must be in the range
[0, 1]
.
-
set_b
(b)[source]¶ Sets the Gaussian scale coefficient used in updating the position of each chain (must be non-negative).
-
set_b_star
(b_star)[source]¶ Sets b*, which determines the weight given to other chains’ positions in determining new positions (must be non-negative).
-
set_delta_max
(delta_max)[source]¶ Sets the maximum number of other chains’ positions to use to determine the next sampler position.
delta_max
must be in the range[1, nchains - 2]
.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[b, b_star, p_g, delta_max, initial_phase, constant_crossover, CR, nCR]
.
-
set_nCR
(nCR)[source]¶ Sets the size of the discrete crossover probability distribution (only used if constant crossover mode is disabled).
nCR
must be greater than or equal to 2.
-
set_p_g
(p_g)[source]¶ Sets
p_g
which is the probability of choosing a highergamma
versus regular (a highergamma
means that other chains are given more weight).p_g
must be in the range [0, 1].
Dual Averaging¶
Dual averaging is not a sampling method, but is a method of adaptivly tuning the Hamintonian Monte Carlo (HMC) step size and mass matrix for the particular log-posterior being sampled. Pint’s NUTS sampler uses dual averaging, but we have defined the dual averaging method separately so that in the future it can be used in HMC and other HMC-derived samplers.
-
class
pints.
DualAveragingAdaption
(num_warmup_steps, target_accept_prob, init_epsilon, init_inv_mass_matrix)[source]¶ Dual Averaging method to adaptively tune the step size and mass matrix of a Hamiltonian Monte Carlo (HMC) routine (as used e.g. in NUTS).
Implements a Dual Averaging scheme to adapt the step size
epsilon
, as per [1] (section 3.2.1 and algorithm 6), and estimates the inverse mass matrix using the sample covariance of the accepted parameter, as suggested in [2]. The mass matrix can either be given as a fully dense matrix represented as a 2D ndarray, or a diagonal matrix represented as a 1D ndarray.During iteration
m
of adaption, the parameterepsilon
is updated using the following scheme:\[\bar{H} = (1 - 1/(m + t_0)) \bar{H} + 1/(m + t_0)(\delta_t - \delta) \text{log} \epsilon = \mu - \sqrt{m}/\gamma \bar{H}\]where $delta_t$ is the target acceptence probability set by the user and $delta$ is the acceptence probability reported by the algorithm (i.e. that is provided as an argument to the
step()
method.The adaption is done using the same windowing method employed by Stan, which is done over three or more windows:
- initial window: epsilon is adapted using dual averaging (no adaption of the mass matrix).
- base window: epsilon continues to be adapted using dual averaging; this adaption completes at the end of this window. The inverse mass matrix is adaped at the end of the window by taking the sample covariance of all parameter points within this window.
- terminal window: epsilon is adapted using dual averaging, holding the mass matrix constant, and completes at the end of the window.
If the number of warmup steps requested by the user is greater than the sum of these three windows, then additional base windows are added, each with a size double that of the previous window.
References
[1] Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623. [2] Betancourt, M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo. https://arxiv.org/abs/1701.02434. Parameters: - num_warmup_steps –
???
- target_accept_prob –
???
- init_epsilon – An initial guess for the step size epsilon
- init_inv_mass_matrix – An initial guess for the inverse adapted mass matrix
-
add_parameter_sample
(sample)[source]¶ Store the parameter samples to calculate a sample covariance matrix later on.
EmceeHammerMCMC¶
-
class
pints.
EmceeHammerMCMC
(chains, x0, sigma0=None)[source]¶ Uses the differential evolution algorithm “emcee: the MCMC hammer”, described in Algorithm 2 in [1].
For
k
in1:N
:- Draw a walker
X_j
at random from the “complementary ensemble” (the group of chains not includingk
) without replacement. - Sample
z ~ g(z)
, (see below). - Set
Y = X_j(t) + z[X_k(t) - X_j(t)]
. - Set
q = z^{d - 1} p(Y) / p(X_k(t))
. - Sample
r ~ U(0, 1)
. - If
r <= q
, setX_k(t + 1)
equal toY
, if not useX_k(t)
.
Here,
N
is the number of chains (or walkers),d
is the dimensionality of the space, andg(z)
is proportional to1 / sqrt(z)
ifz
is in[1 / a, a]
or to 0, otherwise (wherea
is a parameter with default value2
).References
[1] “emcee: The MCMC Hammer”, Daniel Foreman-Mackey, David W. Hogg, Dustin Lang, Jonathan Goodman, 2013, arXiv, https://arxiv.org/pdf/1202.3665.pdf -
current_log_pdfs
()¶ Returns the log pdf values of the current points (i.e. of the most recent points returned by
tell()
).
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
- Draw a walker
Haario ACMC¶
-
class
pints.
HaarioACMC
(x0, sigma0=None)[source]¶ Adaptive Metropolis MCMC, which is algorithm 4 in [1] and is described in the text in [2].
This algorithm differs from
HaarioBardenetACMC
only through its use ofalpha
in the updating oflog_lambda
(rather than a binary accept/reject).Initialise:
mu Sigma adaptation_count = 0 log lambda = 0
In each adaptive iteration (t):
adaptation_count = adaptation_count + 1 gamma = (adaptation_count)^-eta theta* ~ N(theta_t, lambda * Sigma) alpha = min(1, p(theta*|data) / p(theta_t|data)) u ~ uniform(0, 1) if alpha > u: theta_(t+1) = theta* accepted = 1 else: theta_(t+1) = theta_t accepted = 0 mu = (1 - gamma) mu + gamma theta_(t+1) Sigma = (1 - gamma) Sigma + gamma (theta_(t+1) - mu)(theta_(t+1) - mu) log lambda = log lambda + gamma (alpha - self._target_acceptance) gamma = adaptation_count^-eta
Extends
AdaptiveCovarianceMC
.References
[1] A tutorial on adaptive MCMC Christophe Andrieu and Johannes Thoms, Statistical Computing, 2008, 18: 343-373. https://doi.org/10.1007/s11222-008-9110-y [2] An adaptive Metropolis algorithm Heikki Haario, Eero Saksman, and Johanna Tamminen (2001) Bernoulli. -
acceptance_rate
()¶ Returns the current (measured) acceptance rate.
-
ask
()¶
-
eta
()¶ Returns
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
in_initial_phase
()¶
-
n_hyper_parameters
()¶
-
needs_initial_phase
()¶
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶
-
set_eta
(eta)¶ Updates
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[eta]
.
-
set_initial_phase
(initial_phase)¶
-
set_target_acceptance_rate
(rate=0.234)¶ Sets the target acceptance rate.
-
target_acceptance_rate
()¶ Returns the target acceptance rate.
-
tell
(fx)¶
-
Haario Bardenet ACMC¶
-
class
pints.
HaarioBardenetACMC
(x0, sigma0=None)[source]¶ Adaptive Metropolis MCMC, which is algorithm in the supplementary materials of [1], which in turn is based on [2].
Initialise:
mu Sigma adaptation_count = 0 log lambda = 0
In each adaptive iteration (t):
adaptation_count = adaptation_count + 1 gamma = (adaptation_count)^-eta theta* ~ N(theta_t, lambda * Sigma) alpha = min(1, p(theta*|data) / p(theta_t|data)) u ~ uniform(0, 1) if alpha > u: theta_(t+1) = theta* accepted = 1 else: theta_(t+1) = theta_t accepted = 0 alpha = accepted mu = (1 - gamma) mu + gamma theta_(t+1) Sigma = (1 - gamma) Sigma + gamma (theta_(t+1) - mu)(theta_(t+1) - mu) log lambda = log lambda + gamma (alpha - self._target_acceptance) gamma = adaptation_count^-eta
Extends
AdaptiveCovarianceMC
.References
[1] Johnstone, Chang, Bardenet, de Boer, Gavaghan, Pathmanathan, Clayton, Mirams (2015) “Uncertainty and variability in models of the cardiac action potential: Can we build trustworthy models?” Journal of Molecular and Cellular Cardiology. https://10.1016/j.yjmcc.2015.11.018 [2] Haario, Saksman, Tamminen (2001) “An adaptive Metropolis algorithm” Bernoulli. https://doi.org/10.2307/3318737 -
acceptance_rate
()¶ Returns the current (measured) acceptance rate.
-
ask
()¶
-
eta
()¶ Returns
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
in_initial_phase
()¶
-
n_hyper_parameters
()¶
-
needs_initial_phase
()¶
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶
-
set_eta
(eta)¶ Updates
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[eta]
.
-
set_initial_phase
(initial_phase)¶
-
set_target_acceptance_rate
(rate=0.234)¶ Sets the target acceptance rate.
-
target_acceptance_rate
()¶ Returns the target acceptance rate.
-
tell
(fx)¶
-
-
class
pints.
AdaptiveCovarianceMCMC
(x0, sigma0=None)[source]¶ Deprecated alias of
pints.HaarioBardenetACMC
.-
acceptance_rate
()¶ Returns the current (measured) acceptance rate.
-
ask
()¶
-
eta
()¶ Returns
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
in_initial_phase
()¶
-
n_hyper_parameters
()¶
-
name
()¶
-
needs_initial_phase
()¶
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶
-
set_eta
(eta)¶ Updates
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[eta]
.
-
set_initial_phase
(initial_phase)¶
-
set_target_acceptance_rate
(rate=0.234)¶ Sets the target acceptance rate.
-
target_acceptance_rate
()¶ Returns the target acceptance rate.
-
tell
(fx)¶
-
Hamiltonian MCMC¶
-
class
pints.
HamiltonianMCMC
(x0, sigma0=None)[source]¶ Implements Hamiltonian Monte Carlo as described in [1].
Uses a physical analogy of a particle moving across a landscape under Hamiltonian dynamics to aid efficient exploration of parameter space. Introduces an auxilary variable – the momentum (
p_i
) of a particle moving in dimensioni
of negative log posterior space – which supplements the position (q_i
) of the particle in parameter space. The particle’s motion is dictated by solutions to Hamilton’s equations,\[\begin{split}dq_i/dt &= \partial H/\partial p_i\\ dp_i/dt &= - \partial H/\partial q_i.\end{split}\]The Hamiltonian is given by,
\[\begin{split}H(q,p) &= U(q) + KE(p)\\ &= -log(p(q|X)p(q)) + \Sigma_{i=1}^{d} p_i^2/2m_i,\end{split}\]where
d
is the dimensionality of model andm_i
is the ‘mass’ given to each particle (often chosen to be 1 as default).To numerically integrate Hamilton’s equations, it is essential to use a sympletic discretisation routine, of which the most typical approach is the leapfrog method,
\[\begin{split}p_i(t + \epsilon/2) &= p_i(t) - (\epsilon/2) d U(q_i(t))/dq_i\\ q_i(t + \epsilon) &= q_i(t) + \epsilon p_i(t + \epsilon/2) / m_i\\ p_i(t + \epsilon) &= p_i(t + \epsilon/2) - (\epsilon/2) d U(q_i(t + \epsilon))/dq_i\end{split}\]In particular, the algorithm we implement follows eqs. (4.14)-(4.16) in [1], since we allow different epsilon according to dimension.
Extends
SingleChainMCMC
.References
[1] (1, 2) “MCMC using Hamiltonian dynamics”. Radford M. Neal, Chapter 5 of the Handbook of Markov Chain Monte Carlo by Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. -
hamiltonian_threshold
()[source]¶ Returns threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_hamiltonian_threshold
(hamiltonian_threshold)[source]¶ Sets threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[leapfrog_steps, leapfrog_step_size]
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_leapfrog_steps
(steps)[source]¶ Sets the number of leapfrog steps to carry out for each iteration.
-
Metropolis-Adjusted Langevin Algorithm (MALA) MCMC¶
-
class
pints.
MALAMCMC
(x0, sigma0=None)[source]¶ Metropolis-Adjusted Langevin Algorithm (MALA), an MCMC sampler as described in [1].
This method involves simulating Langevin diffusion such that the solution to the time evolution equation (the Fokker-Planck PDE) is a stationary distribution that equals the target density (in Bayesian problems, the posterior distribution). The stochastic differential equation (SDE) given below ensures that if \(u(\theta, 0) = \pi(\theta)\), then \(\partial u / \partial t = 0\),
\[\mathrm{d}\Theta_t = 1/2 \nabla \; \text{log} \pi(\Theta_t) \mathrm{d}t + \mathrm{d}W_t\]where \(\pi(\theta)\) is the target density and \(W\) is a standard multivariate Wiener process.
In general, the above SDE cannot be solved exactly and the below first-order Euler discretisation is used instead,
\[\theta^* = \theta_t + \epsilon^2 1/2 \nabla \; \text{log} \pi(\theta_t) + \epsilon z\]where \(z \sim \mathcal{N}(0, I)\) resulting in a mean \(\mu(\theta^*) = \theta_t + \epsilon^2 1/2 \nabla \; \text{log} \pi(\theta_t)\).
To correct for first-order integration error that is introduced from discretisation, a Metropolis-Hastings acceptance probability is calculated after a step,
\[\alpha = \frac{\pi(\theta^*)q(\theta_t|\theta^*)}{\pi(\theta_t) q(\theta^*|\theta_t)}\]where \(q(\theta_2|\theta_1) = \mathcal{N}(\theta_2|\mu(\theta_1), \epsilon I)\) and \(\theta^*\) is accepted with probability \(\text{min}(1, \alpha)\).
Here we consider a slight variant of the above method discussed in [1], which is to use a preconditioning matrix \(M\) to allow differing degrees of freedom in each dimension.
\[\theta^* = \theta_t + \epsilon'^2 1/2 \nabla \; \text{log} \pi(\theta_t) + \epsilon' z\]leading to \(q(\theta_2|\theta_1) = \mathcal{N}(\theta_2|\mu(\theta_1), \epsilon')\).
where \(\epsilon' = \epsilon \sqrt{M}\) is given by the initial value of
sigma0
.Extends
SingleChainMCMC
.References
[1] (1, 2) Girolami, M. and Calderhead, B., 2011. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), pp.123-214. https://doi.org/10.1111/j.1467-9868.2010.00765.x -
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_epsilon
(epsilon=None)[source]¶ Sets epsilon, which is the effective step size used in proposals. If epsilon not specified, then
epsilon = 0.2 * diag(sigma0)
will be used.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[epsilon]
.The effective step size (
epsilon
) isstep_size * scale_vector
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
Metropolis Random Walk MCMC¶
-
class
pints.
MetropolisRandomWalkMCMC
(x0, sigma0=None)[source]¶ Metropolis Random Walk MCMC, as described in [1].
Metropolis using multivariate Gaussian distribution as proposal step, also known as Metropolis Random Walk MCMC. In each iteration (t) of the algorithm, the following occurs:
propose x' ~ N(x_t, Sigma) generate u ~ U(0, 1) calculate r = pi(x') / pi(x_t) if r > u, x_t+1 = x'; otherwise, x_t+1 = x_t
here Sigma is the covariance matrix of the proposal.
Extends
SingleChainMCMC
.References
[1] “Equation of state calculations by fast computing machines”. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953) The journal of chemical physics, 21(6), pp.1087-1092 https://doi.org/10.1063/1.1699114 -
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
n_hyper_parameters
()¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
set_hyper_parameters
(x)¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
Monomial-Gamma Hamiltonian MCMC¶
-
class
pints.
MonomialGammaHamiltonianMCMC
(x0, sigma0=None)[source]¶ Implements Monomial Gamma HMC as described in [1] - a generalisation of HMC as described in [2] - involving a non-physical kinetic energy term.
Uses a physical analogy of a particle moving across a landscape under Hamiltonian dynamics to aid efficient exploration of parameter space. Introduces an auxilary variable – the momentum (
p_i
) of a particle moving in dimensioni
of negative log posterior space – which supplements the position (q_i
) of the particle in parameter space. The particle’s motion is dictated by solutions to Hamilton’s equations,\[dq_i/dt = \partial H/\partial p_i, dp_i/dt = - \partial H/\partial q_i.\]The Hamiltonian is given by,
\[H(q,p) = U(q) + K(p) = -log(p(q|X)p(q)) + \Sigma_{i=1}^{d} ( -g(p_i) + (2/c) \text{log}(1 + \text{exp}(cg(p_i))))\]where
d
is the dimensionality of model,U
is the potential energy andK
is the kinetic energy term. Note the kinetic energy is the ‘soft’ version described in [1], where,\[g(p_i) = (1 / m_i) \text{sign}|p_i|^{1 / a}\]To numerically integrate Hamilton’s equations, it is essential to use a sympletic discretisation routine, of which the most typical approach is the leapfrog method,
\[\begin{split}p_i(t + \epsilon/2) &= p_i(t) - (\epsilon/2) dU(q_i)/ dq_i\\ q_i(t + \epsilon) &= q_i(t) + \epsilon d K(p_i(t + \epsilon/2))/dp_i\\ p_i(t + \epsilon) &= p_i(t + \epsilon/2) - (\epsilon/2) dU(q_i + \epsilon)/ dq_i\end{split}\]The derivative of the soft kinetic energy term is given by,
\[d K(p_i)/dp_i = |p_i|^{-1 + 1 / a}\text{sign}(p_i) \times \text{tanh}(c|p_i|^{1/a}\text{sign}(p_i) / {2 m_i}) / {a m_i}\]In particular, the algorithm we implement follows eqs. (4.14)-(4.16) in [2], since we allow different epsilon according to dimension.
Extends
SingleChainMCMC
.References
[1] Towards Unifying Hamiltonian Monte Carlo and Slice Sampling Yizhe Zhang, Xiangyu Wang, Changyou Chen, Ricardo Henao, Kai Fan, Lawrence Cari. Advances in Neural Information Processing Systems (NIPS) [2] (1, 2) MCMC using Hamiltonian dynamics Radford M. Neal, Chapter 5 of the Handbook of Markov Chain Monte Carlo by Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. -
hamiltonian_threshold
()[source]¶ Returns threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_hamiltonian_threshold
(hamiltonian_threshold)[source]¶ Sets threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[leapfrog_steps, leapfrog_step_size, a, c, mass]
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_leapfrog_steps
(steps)[source]¶ Sets the number of leapfrog steps to carry out for each iteration.
-
No-U-Turn MCMC Sampler¶
-
class
pints.
NoUTurnMCMC
(x0, sigma0=None)[source]¶ Implements the No U-Turn Sampler (NUTS) with dual averaging, as described in Algorithm 6 in [1].
Implements the multinomial sampling suggested in [2]. Implements a mass matrix for the dynamics, which is detailed in [2]. Both the step size and the mass matrix is adapted using a combination of the dual averaging detailed in [1], and the windowed adaption for the mass matrix and step size implemented in the Stan library (https://github.com/stan-dev/stan).
Like Hamiltonian Monte Carlo, NUTS imagines a particle moving over negative log-posterior (NLP) space to generate proposals. Naturally, the particle tends to move to locations of low NLP – meaning high posterior density. Unlike HMC, NUTS allows the number of steps taken through parameter space to depend on position, allowing local adaptation.
Note: This sampler is only supported on Python versions 3.3 and newer.
Extends
SingleChainMCMC
.References
[1] (1, 2) Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623. [2] (1, 2) Betancourt, M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo, https://arxiv.org/abs/1701.02434. -
hamiltonian_threshold
()[source]¶ Returns threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
max_tree_depth
()[source]¶ Returns the maximum tree depth
D
for the algorithm. For each iteration, the number of leapfrog steps will not be greater than2^D
.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_delta
(delta)[source]¶ Sets delta for the nuts algorithm. This is the goal acceptance probability for the algorithm. Used to set the scalar magnitude of the leapfrog step size.
-
set_hamiltonian_threshold
(hamiltonian_threshold)[source]¶ Sets threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_max_tree_depth
(max_tree_depth)[source]¶ Sets the maximum tree depth
D
for the algorithm. For each iteration, the number of leapfrog steps will not be greater than2^D
-
set_number_adaption_steps
(n)[source]¶ Sets number of adaptions steps in the nuts algorithm. This is the number of mcmc steps that are used to determin the best value for epsilon, the scalar magnitude of the leafrog step size.
-
set_use_dense_mass_matrix
(use_dense_mass_matrix)[source]¶ If
use_dense_mass_matrix
is False then algorithm uses a diagonal matrix for the mass matrix. If True then a fully dense mass matrix is used.
-
Population MCMC¶
-
class
pints.
PopulationMCMC
(x0, sigma0=None)[source]¶ Creates a chain of samples from a target distribution, using the population MCMC (simulated tempering) routine described in algorithm 1 in [1].
This method uses several chains internally, but only a single one is updated per iteration, and only a single one is returned at the end, hence this method is classified here as a single chain MCMC method.
The algorithm goes through the following steps (after initialising
N
internal chains):1. Mutation: randomly select chain
i
and update the chain using a Markov kernel that admitsp_i
as its invariant distribution.2. Exchange: Select another chain
j
at random from the remaining and swap the parameter vector ofi
andj
with probabilitymin(1, A)
,A = p_i(x_j) * p_j(x_i) / (p_i(x_i) * p_j(x_j))
where
x_i
andx_j
are the current values of chainsi
andj
, respectively, wherep_i = p(theta|data) ^ (1 - T_i)
, wherep(theta|data)
is the target distribution andT_i
is bounded between[0, 1]
and represents a tempering parameter.We use a range of
T = (0,delta_T,...,1)
, wheredelta_T = 1 / num_temperatures
, and the chain withT_i = 0
is the one whose target distribution we want to sample.Extends
SingleChainMCMC
.References
[1] “On population-based simulation for static inference”, Ajay Jasra, David A. Stephens and Christopher C. Holmes, Statistical Computing, 2007. https://doi.org/10.1007/s11222-007-9028-9 -
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[n_temperatures]
, wheren_temperatures
is an integer that will be passed toset_temperature_schedule()
.Note that, since the hyper-parameter vector should be 1d (without nesting), setting an explicit temperature schedule is not supported via the hyper-parameter interface.
-
set_temperature_schedule
(schedule=10)[source]¶ Sets a temperature schedule.
If
schedule
is anint
it is interpreted as the number of temperatures and a schedule is generated accordingly.If
schedule
is a list (or array) it is interpreted as a custom temperature schedule.
-
Rao-Blackwell ACMC¶
-
class
pints.
RaoBlackwellACMC
(x0, sigma0=None)[source]¶ Rao-Blackwell adaptive MCMC, as described by Algorithm 3 in [1]. After initialising mu0 and sigma0, in each iteration after initial phase (t), the following steps occur:
theta* ~ N(theta_t, lambda * sigma0) alpha(theta_t, theta*) = min(1, p(theta*|data) / p(theta_t|data)) u ~ uniform(0, 1) if alpha(theta_t, theta*) > u: theta_t+1 = theta* else: theta_t+1 = theta_t mu_t+1 = mu_t + gamma_t+1 * (theta_t+1 - mu_t) sigma_t+1 = sigma_t + gamma_t+1 * (bar((theta_t+1 - mu_t)(theta_t+1 - mu_t)') - sigma_t)
where:
bar(theta_t+1) = alpha(theta_t, theta*) theta* + (1 - alpha(theta_t, theta*)) theta_t
Note that we deviate from the paper in two places:
gamma_t = t^-eta Y_t+1 ~ N(theta_t, lambda * sigma0) rather than Y_t+1 ~ N(theta_t, sigma0)
Extends
AdaptiveCovarianceMC
.References
[1] A tutorial on adaptive MCMC Christophe Andrieu and Johannes Thoms, Statistical Computing, 2008, 18: 343-373. https://doi.org/10.1007/s11222-008-9110-y -
acceptance_rate
()¶ Returns the current (measured) acceptance rate.
-
ask
()¶
-
eta
()¶ Returns
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
in_initial_phase
()¶
-
n_hyper_parameters
()¶
-
needs_initial_phase
()¶
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶
-
set_eta
(eta)¶ Updates
eta
which controls the rate of adaptation decayadaptations**(-eta)
, whereeta > 0
to ensure asymptotic ergodicity.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[eta]
.
-
set_initial_phase
(initial_phase)¶
-
set_target_acceptance_rate
(rate=0.234)¶ Sets the target acceptance rate.
-
target_acceptance_rate
()¶ Returns the target acceptance rate.
-
Relativistic MCMC¶
-
class
pints.
RelativisticMCMC
(x0, sigma0=None)[source]¶ Implements Relativistic Monte Carlo as described in [1].
Uses a physical analogy of a particle moving across a landscape under Hamiltonian dynamics to aid efficient exploration of parameter space. Introduces an auxilary variable – the momentum (
p_i
) of a particle moving in dimensioni
of negative log posterior space – which supplements the position (q_i
) of the particle in parameter space. The particle’s motion is dictated by solutions to Hamilton’s equations,\[\begin{split}dq_i/dt &= \partial H/\partial p_i\\ dp_i/dt &= - \partial H/\partial q_i.\end{split}\]The Hamiltonian is given by,
\[\begin{split}H(q,p) &= U(q) + KE(p)\\ &= -\text{log}(p(q|X)p(q)) + mc^2 (\Sigma_{i=1}^{d} p_i^2 / (m^2 c^2) + 1)^{0.5}\end{split}\]where
d
is the dimensionality of model,m
is the scalar ‘mass’ given to each particle (chosen to be 1 as default) andc
is the speed of light (chosen to be 10 by default).To numerically integrate Hamilton’s equations, it is essential to use a sympletic discretisation routine, of which the most typical approach is the leapfrog method,
\[\begin{split}p_i(t + \epsilon/2) &= p_i(t) - (\epsilon/2) d U(q_i(t))/dq_i\\ q_i(t + \epsilon) &= q_i(t) + \epsilon M^{-1}(p_i(t + \epsilon/2)) p_i(t + \epsilon/2)\\ p_i(t + \epsilon) &= p_i(t + \epsilon/2) - (\epsilon/2) d U(q_i(t + \epsilon))/dq_i\end{split}\]where relativistic mass (a scalar) is,
\[M(p) = m (\Sigma_{i=1}^{d} p_i^2 / (m^2 c^2) + 1)^{0.5}\]In particular, the algorithm we implement follows eqs. in section 2.1 of [1].
Extends
SingleChainMCMC
.References
[1] (1, 2) “Relativistic Monte Carlo”. Xiaoyu Lu, Valerio Perrone, Leonard Hasenclever, Yee Whye Teh, Sebastian J. Vollmer, 2017, Proceedings of Machine Learning Research. -
hamiltonian_threshold
()[source]¶ Returns threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_hamiltonian_threshold
(hamiltonian_threshold)[source]¶ Sets threshold difference in Hamiltonian value from one iteration to next which determines whether an iteration is divergent.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[leapfrog_steps, leapfrog_step_size, mass, c]
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_leapfrog_steps
(steps)[source]¶ Sets the number of leapfrog steps to carry out for each iteration.
-
Slice Sampling - Doubling MCMC¶
-
class
pints.
SliceDoublingMCMC
(x0, sigma0=None)[source]¶ Implements Slice Sampling with Doubling, as described in [1].
This is a univariate method, which is applied in a Slice-Sampling-within-Gibbs framework to allow MCMC sampling from multivariate models.
Generates samples by sampling uniformly from the volume underneath the posterior (\(f\)). It does so by introducing an auxiliary variable (\(y\)) and by defining a Markov chain.
If the distribution is univariate, sampling follows:
- Calculate the pdf (\(f(x0)\)) of the current sample (\(x0\)).
- Draw a real value (\(y\)) uniformly from (0, f(x0)), defining a horizontal “slice”: \(S = {x: y < f (x)}\). Note that \(x0\) is always within S.
- Find an interval (\(I = (L, R)\)) around \(x0\) that contains all, or much, of the slice.
- Draw a new point (\(x1\)) from the part of the slice within this interval.
If the distribution is multivariate, we apply the univariate algorithm to each variable in turn, where the other variables are set at their current values.
This implementation uses the “Doubling” method to estimate the interval \(I = (L, R)\), as described in [1] Fig. 4. pp.715 and consists of the following steps:
- \(U \sim uniform(0, 1)\)
- \(L = x_0 - wU\)
- \(R = L + w\)
- \(K = p\)
- while \(K > 0\) and \({y < f(L) or y < f(R)}\):
- \(V \sim uniform(0, 1)\)
- if \(V < 0.5\), then \(L = L - (R - L)\) else, \(R = R + (R - L)\)
- \(K = K - 1\)
Intuitively, the interval
I
is estimated by expanding the initial interval by producing a sequence of intervals, each twice the size of the previous one, until an interval is found with both ends outside the slice, or until a pre-determined limit is reached. The parametersp
(an integer, which determines the limit of slice size) andw
(the estimate of typical slice width) are hyperparameters.To sample from the interval \(I = (L, R)\), such that the sample \(x\) satisfies \(y < f(x)\), we use the “Shrinkage” procedure, which reduces the size of the interval after rejecting a trial point, as defined in [1] Fig. 5. pp.716. This algorithm consists of the following steps:
- \(\bar{L} = L\) and \(\bar{R} = R\)
- Repeat:
- \(U \sim uniform(0, 1)\)
- \(x_1 = \bar{L} + U (\bar{R} - \bar{L})\)
- if \(y < f(x_1)\) and \(Accept(x_1)\), exit loop else: if \(x_1 < x_0\), then \(\bar{L} = x_1\) else \(\bar{R} = x_1\)
Intuitively, we uniformly sample a trial point from the interval
I
, and subsequently shrink the interval each time a trial point is rejected.The
Accept(x_1)
check is required to guarantee detailed balance. We shall refer to this check as theAcceptance Check
. Intuitively, it tests whether starting the doubling expansion atx_1
leads to an earlier termination compared to starting it from the current statex_0
. The procedure works backward through the intervals that the doubling expansion would pass through to arrive atI
when starting fromx_1
, checking that none of them has both ends outside the slice. The algorithm is described in [1] Fig. 6. pp.717 and it consists of the following steps:- \(\hat{L} = L\) and \(\hat{R} = R\) and \(D = False\)
- while \(\hat{R} - \hat{L} > 1.1w\):
- M = \((\hat{L} + \hat{R})/2\)
- if {\(x_0 < M\) and \(x_1 >= M\)} or {\(x_0 >= M\) and :math:` x_1 < M`}, then \(D = True\)
- if \(x_1 < M\), then \(\hat{R} = M\) else, \(\hat{L} = M\)
- if \(D\) and \(y >= f(\hat{L})\) and \(y >= f(\hat{R})\), then reject proposal
- If the proposal is not rejected in the previous loop, accept it
The multiplication by
1.1
in thewhile
condition in Step 2 guards against possible round-off errors. The variableD
tracks whether the intervals that would be generated fromx_1
differ from those leading tox_0
: when they don’t, time is saved by omitting the subsequent check.To avoid floating-point underflow, we implement the suggestion advanced in [1] pp.712. We use the log pdf of the un-normalised posterior (\(g(x) = log(f(x))\)) instead of \(f(x)\). In doing so, we use an auxiliary variable \(z = log(y) = g(x0) - \epsilon\), where \(\epsilon \sim \text{exp}(1)\) and define the slice as \(S = {x : z < g(x)}\).
Extends
SingleChainMCMC
.References
[1] Neal, R.M., 2003. Slice sampling. The annals of statistics, 31(3), pp.705-767. https://doi.org/10.1214/aos/1056562461 -
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_width
(w)[source]¶ Sets the width for generating the interval.
This can either be a single number or an array with the same number of elements as the number of variables to update.
Slice Sampling - Rank Shrinking MCMC¶
-
class
pints.
SliceRankShrinkingMCMC
(x0, sigma0=None)[source]¶ Implements Covariance-Adaptive slice sampling by “rank shrinking”, as introduced in [1] with pseudocode given in Fig. 5.
This is an adaptive multivariate method which uses additional points, called “crumbs”, and rejected proposals to guide the selection of samples.
It generates samples by sampling uniformly from the volume underneath the posterior (\(f\)). It does so by introducing an auxiliary variable (\(y\)) that guide the path of a Markov chain.
Sampling follows:
1. Calculate the pdf (\(f(x_0)\)) of the current sample \((x_0)\). 2. Draw a real value (\(y\)) uniformly from \((0, f(x0))\), defining a horizontal “slice”: \(S = {x: y < f(x)}\). Note that \(x_0\) is always within \(S\). 3. Draw the first crumb (\(c_1\)) from a Gaussian distribution with mean \(x_0\) and precision matrix \(W_1\). 4. Draw a new point (\(x_1\)) from a Gaussian distribution with mean \(c_1\) and precision matrix \(W_2\).
New crumbs are drawn until a new proposal is accepted. In particular, after sampling \(k\) crumbs from Gaussian distributions with mean \(x0\) and precision matrices \((W_1, ..., W_k)\), the distribution for the kth proposal sample is:
\[x_k \sim Normal(\bar{c}_k, \Lambda^{-1}_k)\]where:
\(\Lambda_k = W_1 + ... + W_k\) \(\bar{c}_k = \Lambda^{-1}_k * (W_1 * c_1 + ... + W_k * c_k)\)This method aims to conveniently modify the (k+1)th proposal distribution to increase the likelihood of sampling an acceptable point. It does so by calculating the gradient (\(g(f(x))\)) of the unnormalised posterior (\(f(x)\)) at the last rejected point (\(x_k\)). It then sets the conditional variance of the (k + 1)th proposal distribution in the direction of the gradient \(g(f(x_k))\) to 0. This is reasonable in that the gradient at a proposal probably points in a direction where the variance is small, so it is more efficient to move in a different direction.
To avoid floating-point underflow, we implement the suggestion advanced in [2] pp.712. We use the log pdf of the un-normalised posterior (\(\text{log} f(x)\)) instead of \(f(x)\). In doing so, we use an auxiliary variable \(z = log(y) - \epsilon\), where \(\epsilon \sim \text{exp}(1)\) and define the slice as \(S = {x : z < log f(x)}\).
Extends
SingleChainMCMC
.References
[1] “Covariance-Adaptive Slice Sampling”, 2010, M Thompson and RM Neal, Technical Report No. 1002, Department of Statistics, University of Toronto [2] “Slice sampling”, 2003, Neal, R.M., The annals of statistics, 31(3), pp.705-767. https://doi.org/10.1214/aos/1056562461 -
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[sigma_c]
. SeeTunableMethod.set_hyper_parameters()
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
Slice Sampling - Stepout MCMC¶
-
class
pints.
SliceStepoutMCMC
(x0, sigma0=None)[source]¶ Implements Slice Sampling with Stepout, as described in [1].
This is a univariate method, which is applied in a Slice-Sampling-within-Gibbs framework to allow MCMC sampling from multivariate models.
Generates samples by sampling uniformly from the volume underneath the posterior (
f
). It does so by introducing an auxiliary variable (y
) and by definying a Markov chain.If the distribution is univariate, sampling follows:
- Calculate the PDF (\(f(x0)\)) of the current sample (\(x0\)).
- Draw a real value (\(y\)) uniformly from :math`(0, f(x0))`, defining a horizontal ‘slice’ \(S = {x: y < f (x)}\). Note that \(x0\) is always within \(S\).
- Find an interval (\(I = (L, R)\)) around \(x0\) that contains all, or much, of the slice.
- Draw a new point (\(x1\)) from the part of the slice within this interval.
If the distribution is multivariate, we apply the univariate algorithm to each variable in turn, where the other variables are set at their current values.
This implementation uses the “Stepout” method to estimate the interval \(I = (L, R)\), as described in [1] Fig. 3. pp.715 and consists of the following steps:
- \(U \sim uniform(0, 1)\)
- \(L = x_0 - wU\)
- \(R = L + w\)
- \(V \sim uniform(0, 1)\)
- \(J = floor(mV)\)
- \(K = (m - 1) - J\)
- while \(J > 0\) and \(y < f(L), L = L - w, J = J - 1\)
- while \(K > 0\) and \(y < f(R), R = R + w, K = K - 1\)
Intuitively, the interval
I
is estimated by expanding the initial interval by a widthw
in each direction until both edges fall outside the slice, or until a pre-determined limit is reached. The parametersm
(an integer, which determines the limit of slice size) andw
(the estimate of typical slice width) are hyperparameters.To sample from the interval \(I = (L, R)\), such that the sample
x
satisfies \(y < f(x)\), we use the “Shrinkage” procedure, which reduces the size of the interval after rejecting a trial point, as defined in [1] Fig. 5. pp.716. This algorithm consists of the following steps:- \(\bar{L} = L\) and \(\bar{R} = R\)
- Repeat:
- \(U \sim uniform(0, 1)\)
- \(x_1 = \bar{L} + U (\bar{R} - \bar{L})\)
- if \(y < f(x_1)\) accept \(x_1\) and exit loop, else: if \(x_1 < x_0\), \(\bar{L} = x_1\) else \(\bar{R} = x_1\)
Intuitively, we uniformly sample a trial point from the interval
I
, and subsequently shrink the interval each time a trial point is rejected.The following implementation includes the possibility of carrying out “overrelaxed” slice sampling steps, as described in [1] pp. 726. Overrelaxed steps increase sampling efficiency in highly correlated unimodal distributions by suppressing the random walk behaviour of single-variable slice sampling: each variable is still updated in turn, but rather than drawing a new value for a variable from its conditional distribution independently of the current value, the new value is instead chosen to be on the opposite side of the mode from the current value. The interval
I
is still calculated via Stepout, and the edgesl,r
are used to estimate the slice endpoints via bisection. To obtain a full sampling scheme, overrelaxed updates are alternated with normal Stepout updates. To obtain the full benefits of overrelaxation, [1] suggests to set almost every update to being overrelaxed and to set the limitm
for findingI
to infinity. The algorithm consists of the following steps:- \(\bar{L} = L, \bar{R} = R, \bar{w} = w, \bar{a} = a\)
- while \(R - L < 1.1 * w\):
- \(M = (\bar{L} + \bar{R})/ 2\)
- if \(\bar{a} = 0 \), exit loop
- if \(x_0 > M\), \(\bar{L} = M\) else, \(\bar{R} = M\)
- \(\bar{a} = \bar{a} - 1\)
- \(\bar{w} = \bar{w} / 2\)
- \(\hat{L} = \bar{L}, \hat{R} = \bar{R}\)
- while \(\bar{a} > 0\):
- \(\bar{a} = \bar{a} - 1\)
- \(\bar{w} = \bar{w} \ 2\)
- if \(y >= f(\hat{L} + \bar{w})\), then \(\hat{L} = \hat{L} + \bar{w}\)
- if \(y >= f(\hat{R} - \bar{w})\), then \(\hat{R} = \hat{R} - \bar{W}\)
- \(x_1 = \hat{L} + \hat{R} - x_0\)
- if \(x_1 < \bar{L}\) or \(x_1 >= \bar{R}\) or \(y >= f(x_1)\), then \(x_1 = x_0\)
The probability of pursuing an overrelaxed step and the number of bisection iterations are hyperparameters.
To avoid floating-point underflow, we implement the suggestion advanced in [1] pp.712. We use the log pdf of the un-normalised posterior (\(g(x) = log(f(x))\)) instead of \(f(x)\). In doing so, we use an auxiliary variable \(z = log(y) = g(x0) - \epsilon\), where \(\epsilon \sim \text{exp}(1)\) and define the slice as \(S = {x : z < g(x)}\).
Extends
SingleChainMCMC
.References
[1] (1, 2) Neal, R.M., 2003. “Slice sampling”. The annals of statistics, 31(3), pp.705-767. https://doi.org/10.1214/aos/1056562461 -
bisection_steps
()[source]¶ Returns integer limit overrelaxation endpoint accuracy to
2^(-bisection steps) * width
.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example an adaptation-free period for adaptive covariance methods, or a warm-up phase for DREAM.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated logpdf.
-
replace
(current, current_log_pdf, proposed=None)¶ Replaces the internal current position, current LogPDF, and proposed point (if any) by the user-specified values.
This method can only be used once the initial position and LogPDF have been set (so after at least 1 round of ask-and-tell).
This is an optional method, and some samplers may not support it.
-
set_bisection_steps
(a)[source]¶ Set integer for limiting the bisection process in overrelaxed steps.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[width, expansion steps, prob_overrelaxed, bisection steps]
. SeeTunableMethod.set_hyper_parameters()
.
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_width
(w)[source]¶ Sets the width for generating the interval.
This can either be a single number or an array with the same number of elements as the number of variables to update.
MCMC Summary¶
-
class
pints.
MCMCSummary
(chains, time=None, parameter_names=None)[source]¶ Calculates and prints key summaries of posterior samples and diagnostic quantities from MCMC chains.
These include the posterior mean, standard deviation, quantiles, rhat, effective sample size and (if running time is supplied) effective samples per second.
Parameters: - chains – An array or list of chains returned by an MCMC sampler.
- time (float) – The time taken for the run, in seconds (optional).
- parameter_names (sequence) – A list of parameter names (optional).
References
[1] “Inference from iterative simulation using multiple sequences”, A Gelman and D Rubin, 1992, Statistical Science. [2] (1, 2) “Bayesian data analysis”, 3rd edition, CRC Press., A Gelman et al., 2014. -
ess_per_second
()[source]¶ Return the effective sample size (as defined in [2]) per second of run time for each parameter.
This is only defined if a run time was passed in at construction time, if no run time is known
None
is returned.
-
rhat
()[source]¶ Return Gelman and Rubin’s rhat value as defined in [1]. If a single chain is used, the chain is split into two halves and rhat is calculated using these two parts.
Nested samplers¶
Nested sampler base class¶
-
class
pints.
NestedSampler
(log_prior)[source]¶ Abstract base class for nested samplers.
Parameters: log_prior (pints.LogPrior) – A logprior to draw proposal samples from. -
in_initial_phase
()[source]¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
needs_initial_phase
()[source]¶ Returns
True
if this method needs an initial phase, for example ellipsoidal nested sampling has a period of running rejection sampling before it starts to fit ellipsoids to points.
-
set_initial_phase
(in_initial_phase)[source]¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
tell
(fx)[source]¶ If a single evaluation is provided as arguments, a single point is accepted and returned if its likelihood exceeds the current threshold; otherwise None is returned.
If multiple evaluations are provided as arguments (for example, if running the algorithm in parallel), None is returned if no points have likelihood exceeding threshold; if a single point passes the threshold, it is returned; if multiple points pass, one is selected uniformly at random and returned and the others are stored for later use.
In all cases, two objects are returned: the proposed point (which may be None) and an array of other points that also pass the threshold (which is empty for single evaluation mode but may be non-empty for multiple evaluation mode).
-
-
class
pints.
NestedController
(log_likelihood, log_prior, method=None)[source]¶ Uses nested sampling to sample from a posterior distribution.
Parameters: - log_likelihood (pints.LogPDF) – A
LogPDF
function that evaluates points in the parameter space. - log_prior (pints.LogPrior) – A
LogPrior
function on the same parameter space.
References
[1] “Nested Sampling for General Bayesian Computation”, John Skilling, Bayesian Analysis 1:4 (2006). https://doi.org/10.1214/06-BA127 [2] “Multimodal nested sampling: an efficient and robust alternative to Markov chain Monte Carlo methods for astronomical data analyses” F. Feroz and M. P. Hobson, 2008, Mon. Not. R. Astron. Soc. -
effective_sample_size
()[source]¶ Calculates the effective sample size of posterior samples from a nested sampling run using the formula:
\[ESS = exp(-sum_{i=1}^{m} p_i log p_i),\]in other words, the information. Given by eqn. (39) in [1].
-
iterations
()[source]¶ Returns the total number of iterations that will be performed in the next run.
-
log_likelihood_vector
()[source]¶ Returns vector of log likelihoods for each of the stacked
[m_active, m_inactive]
points.
-
marginal_log_likelihood_standard_deviation
()[source]¶ Calculates standard deviation in marginal log likelihood as in [2].
-
marginal_log_likelihood_threshold
()[source]¶ Returns threshold for determining convergence in estimate of marginal log likelihood which leads to early termination of the algorithm.
-
n_posterior_samples
()[source]¶ Returns the number of posterior samples that will be returned (see
set_n_posterior_samples()
).
-
parallel
()[source]¶ Returns the number of parallel worker processes this routine will be run on, or
False
if parallelisation is disabled.
-
posterior_samples
()[source]¶ Returns posterior samples generated during run of nested sampling object.
-
prior_space
()[source]¶ Returns a vector of X samples which approximates the proportion of prior space compressed.
-
run
()[source]¶ Runs the nested sampling routine and returns a tuple of the posterior samples and an estimate of the marginal likelihood.
-
sample_from_posterior
(posterior_samples)[source]¶ Draws posterior samples based on nested sampling run using importance sampling. This function is automatically called in
NestedController.run()
but can also be called afterwards to obtain new posterior samples.
-
set_iterations
(iterations)[source]¶ Sets the total number of iterations to be performed in the next run.
-
set_log_to_file
(filename=None, csv=False)[source]¶ Enables logging to file when a filename is passed in, disables it if
filename
isFalse
orNone
.The argument
csv
can be set toTrue
to write the file in comma separated value (CSV) format. By default, the file contents will be similar to the output on screen.
-
set_marginal_log_likelihood_threshold
(threshold)[source]¶ Sets threshold for determining convergence in estimate of marginal log likelihood which leads to early termination of the algorithm.
-
set_n_posterior_samples
(posterior_samples)[source]¶ Sets the number of posterior samples to generate from points proposed by the nested sampling algorithm.
-
set_parallel
(parallel=False)[source]¶ Enables/disables parallel evaluation.
If
parallel=True
, the method will run using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. Parallelisation can be disabled by settingparallel
to0
orFalse
.
- log_likelihood (pints.LogPDF) – A
Nested ellipsoid sampler¶
-
class
pints.
NestedEllipsoidSampler
(log_prior)[source]¶ Creates a nested sampler that estimates the marginal likelihood and generates samples from the posterior.
This is the form of nested sampler described in [1], where an ellipsoid is drawn around surviving particles (typically with an enlargement factor to avoid missing prior mass), and then random samples are drawn from within the bounds of the ellipsoid. By sampling in the space of surviving particles, the efficiency of this algorithm aims to improve upon simple rejection sampling. This algorithm has the following steps:
Initialise:
Z_0 = 0 X_0 = 1
Draw samples from prior:
for i in 1:n_active_points: theta_i ~ p(theta), i.e. sample from the prior L_i = p(theta_i|X) endfor L_min = min(L) indexmin = min_index(L)
Run rejection sampling for
n_rejection_samples
to generate an initial sample, along with updated values ofL_min
andindexmin
.Fit active points using a minimum volume bounding ellipse. In our approach, we do this with the following procedure (which we term
minimum_volume_ellipsoid
in what follows) that returns the positive definite matrix A with centre c that define the ellipsoid by \((x - c)^t A (x - c) = 1\):cov = covariance(transpose(active_points)) cov_inv = inv(cov) c = mean(points) for i in n_active_points: dist[i] = (points[i] - c) * cov_inv * (points[i] - c) endfor enlargement_factor = max(dist) A = (1.0 / enlargement_factor) * cov_inv return A, c
From then on, in each iteration (t), the following occurs:
if mod(t, ellipsoid_update_gap) == 0: A, c = minimum_volume_ellipsoid(active_points) else: if dynamic_enlargement_factor: enlargement_factor *= ( exp(-(t + 1) / n_active_points)**alpha ) endif endif theta* = ellipsoid_sample(enlargement_factor, A, c) while p(theta*|X) < L_min: theta* = ellipsoid_sample(enlargement_factor, A, c) endwhile theta_indexmin = theta* L_indexmin = p(theta*|X)
If the parameter
dynamic_enlargement_factor
is true, the enlargement factor is shrunk as the sampler runs, to avoid inefficiencies in later iterations. By default, the enlargement factor begins at 1.1.In
ellipsoid_sample
, a point is drawn uniformly from within the minimum volume ellipsoid, whose volume is increased by a factorenlargement_factor
.At the end of iterations, there is a final
Z
increment:Z = Z + (1 / n_active_points) * (L_1 + L_2 + ..., + L_n_active_points)
The posterior samples are generated as described in [2] on page 849 by weighting each dropped sample in proportion to the volume of the posterior region it was sampled from. That is, the probability for drawing a given sample j is given by:
p_j = L_j * w_j / Z
where j = 1, …, n_iterations.
Extends
NestedSampler
.References
[1] “A nested sampling algorithm for cosmological model selection”, Pia Mukherjee, David Parkinson, Andrew R. Liddle, 2008. arXiv: arXiv:astro-ph/0508461v2 11 Jan 2006 https://doi.org/10.1086/501068 -
active_points
()¶ Returns the active points from nested sampling run.
-
alpha
()[source]¶ Returns alpha which controls rate of decline of enlargement factor with iteration (when dynamic_enlargement_factor is true).
-
ask
(n_points)[source]¶ If in initial phase, then uses rejection sampling. Afterwards, points are drawn from within an ellipse (needs to be in uniform sampling regime).
-
ellipsoid_update_gap
()[source]¶ Returns the ellipsoid update gap used in the algorithm (see
set_ellipsoid_update_gap()
).
-
enlargement_factor
()[source]¶ Returns the enlargement factor used in the algorithm (see
set_enlargement_factor()
).
-
min_index
()¶ Returns index of sample with lowest log-likelihood.
-
n_active_points
()¶ Returns the number of active points that will be used in next run.
-
n_rejection_samples
()[source]¶ Returns the number of rejection sample used in the algorithm (see
set_n_rejection_samples()
).
-
needs_sensitivities
()¶ Determines whether sampler uses sensitivities of the solution.
-
running_log_likelihood
()¶ Returns current value of the threshold log-likelihood value.
-
set_alpha
(alpha)[source]¶ Sets alpha which controls rate of decline of enlargement factor with iteration (when dynamic_enlargement_factor is true).
-
set_ellipsoid_update_gap
(ellipsoid_update_gap=100)[source]¶ Sets the frequency with which the minimum volume ellipsoid is re-estimated as part of the nested rejection sampling algorithm.
A higher rate of this parameter means each sample will be more efficiently produced, yet the cost of re-computing the ellipsoid may mean it is better to update this not each iteration – instead, with gaps of
ellipsoid_update_gap
between each update. By default, the ellipse is updated every 100 iterations.
-
set_enlargement_factor
(enlargement_factor=1.1)[source]¶ Sets the factor (>1) by which to increase the minimal volume ellipsoidal in rejection sampling.
A higher value means it is less likely that areas of high probability mass will be missed. A low value means that rejection sampling is more efficient.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[# active points, # rejection samples, enlargement factor, ellipsoid update gap, dynamic enlargement factor, alpha]
.
-
set_n_active_points
(active_points)¶ Sets the number of active points for the next run.
-
set_n_rejection_samples
(rejection_samples=200)[source]¶ Sets the number of rejection samples to take, which will be assigned weights and ultimately produce a set of posterior samples.
-
tell
(fx)¶ If a single evaluation is provided as arguments, a single point is accepted and returned if its likelihood exceeds the current threshold; otherwise None is returned.
If multiple evaluations are provided as arguments (for example, if running the algorithm in parallel), None is returned if no points have likelihood exceeding threshold; if a single point passes the threshold, it is returned; if multiple points pass, one is selected uniformly at random and returned and the others are stored for later use.
In all cases, two objects are returned: the proposed point (which may be None) and an array of other points that also pass the threshold (which is empty for single evaluation mode but may be non-empty for multiple evaluation mode).
-
Nested rejection sampler¶
-
class
pints.
NestedRejectionSampler
(log_prior)[source]¶ Creates a nested sampler that estimates the marginal likelihood and generates samples from the posterior.
This is the simplest form of nested sampler and involves using rejection sampling from the prior as described in the algorithm on page 839 in [1] to estimate the marginal likelihood and generate weights, preliminary samples (with their respective likelihoods), required to generate posterior samples.
The posterior samples are generated as described in [1] on page 849 by randomly sampling the preliminary point, accounting for their weights and likelihoods.
Initialise:
Z = 0 X_0 = 1
Draw samples from prior:
for i in 1:n_active_points: theta_i ~ p(theta), i.e. sample from the prior L_i = p(theta_i|X) endfor
In each iteration of the algorithm (t):
L_min = min(L) indexmin = min_index(L) X_t = exp(-t / n_active_points) w_t = X_t - X_t-1 Z = Z + L_min * w_t theta* ~ p(theta) while p(theta*|X) < L_min: theta* ~ p(theta) endwhile theta_indexmin = theta* L_indexmin = p(theta*|X)
At the end of iterations, there is a final
Z
increment:Z = Z + (1 / n_active_points) * (L_1 + L_2 + ..., + L_n_active_points)
The posterior samples are generated as described in [1] on page 849 by weighting each dropped sample in proportion to the volume of the posterior region it was sampled from. That is, the probability for drawing a given sample j is given by:
p_j = L_j * w_j / Z
where j = 1, …, n_iterations.
Extends
NestedSampler
.References
[1] (1, 2) “Nested Sampling for General Bayesian Computation”, John Skilling, Bayesian Analysis 1:4 (2006). https://doi.org/10.1214/06-BA127 -
active_points
()¶ Returns the active points from nested sampling run.
-
in_initial_phase
()¶ For methods that need an initial phase (see
needs_initial_phase()
), this method returnsTrue
if the method is currently configured to be in its initial phase. For other methods aNotImplementedError
is returned.
-
min_index
()¶ Returns index of sample with lowest log-likelihood.
-
n_active_points
()¶ Returns the number of active points that will be used in next run.
-
needs_initial_phase
()¶ Returns
True
if this method needs an initial phase, for example ellipsoidal nested sampling has a period of running rejection sampling before it starts to fit ellipsoids to points.
-
needs_sensitivities
()¶ Determines whether sampler uses sensitivities of the solution.
-
running_log_likelihood
()¶ Returns current value of the threshold log-likelihood value.
-
set_hyper_parameters
(x)[source]¶ Hyper-parameter vector is:
[active_points_rate]
Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters
-
set_initial_phase
(in_initial_phase)¶ For methods that need an initial phase (see
needs_initial_phase()
), this method toggles the initial phase algorithm. For other methods aNotImplementedError
is returned.
-
set_n_active_points
(active_points)¶ Sets the number of active points for the next run.
-
tell
(fx)¶ If a single evaluation is provided as arguments, a single point is accepted and returned if its likelihood exceeds the current threshold; otherwise None is returned.
If multiple evaluations are provided as arguments (for example, if running the algorithm in parallel), None is returned if no points have likelihood exceeding threshold; if a single point passes the threshold, it is returned; if multiple points pass, one is selected uniformly at random and returned and the others are stored for later use.
In all cases, two objects are returned: the proposed point (which may be None) and an array of other points that also pass the threshold (which is empty for single evaluation mode but may be non-empty for multiple evaluation mode).
-
Noise generators¶
- Pints contains a module
pints.noise
that contains methods that generate - different kinds of noise.
- This can then be added to simulation output to create “realistic” experimental
data.
Overview:
-
pints.noise.
ar1
(rho, sigma, n)[source]¶ Generates first-order autoregressive (AR1) noise that can be added to a vector of simulated data.
The generated noise follows the distribution
\[e(t) = \rho e(t - 1) + v(t),\]where \(v(t) \stackrel{\text{iid}}{\sim }\mathcal{N}(0, \sigma \sqrt{1 - \rho ^2})\).
Returns an array of length
n
containing the generated noise.Parameters: - rho – Determines the magnitude of the noise \(\rho\) (see above). Must be less than 1.
- sigma – The marginal standard deviation \(\sigma\) of
e(t)
(see above). Must be greater than zero. - n – The length of the signal. (Only single time-series are supported.)
Example
values = model.simulate(parameters, times) noisy_values = values + noise.ar1(0.9, 5, len(values))
-
pints.noise.
ar1_unity
(rho, sigma, n)[source]¶ Generates noise following an autoregressive order 1 process of mean 1, that a vector of simulated data can be multiplied with.
Returns an array of length
n
containing the generated noise.Parameters: - rho – Determines the magnitude of the noise (see
ar1()
). Must be less than or equal to 1. - sigma – The marginal standard deviation of
e(t)
(seear()
). Must be greater than 0. - n (int) – The length of the signal. (Only single time-series are supported.)
Example
values = model.simulate(parameters, times) noisy_values = values * noise.ar1_unity(0.5, 0.8, len(values))
- rho – Determines the magnitude of the noise (see
-
pints.noise.
arma11
(rho, theta, sigma, n)[source]¶ Generates an ARMA(1,1) error process of the form:
\[e(t) = (1 - \rho) + \rho * e(t - 1) + v(t) + \theta * v(t-1),\]where \(v(t) \stackrel{\text{iid}}{\sim }\mathcal{N}(0, \sigma ')\), and
\[\sigma ' = \sigma \sqrt{\frac{1 - \rho ^ 2}{1 + 2 \theta \rho + \theta ^ 2}}.\]
-
pints.noise.
arma11_unity
(rho, theta, sigma, n)[source]¶ Generates an ARMA(1,1) error process of the form:
e(t) = (1 - rho) + rho * e(t - 1) + v(t) + theta * v[t-1]
,where
v(t) ~ iid N(0, sigma')
,and
sigma' = sigma * sqrt((1 - rho^2) / (1 + 2 * theta * rho + theta^2))
.Returns an array of length
n
containing the generated noise.Parameters: - rho – Determines the long-run persistence of the noise (see
ar1()
). Must be less than 1. - theta – Contributes to first order autocorrelation of noise. Must be less than 1.
- sigma – The marginal standard deviation of
e(t)
(seear()
). Must be greater than 0. - n (int) – The length of the signal. (Only single time-series are supported.)
Example
values = model.simulate(parameters, times) noisy_values = values * noise.ar1_unity(0.5, 0.8, len(values))
- rho – Determines the long-run persistence of the noise (see
-
pints.noise.
independent
(sigma, shape)[source]¶ Generates independent Gaussian noise iid \(\mathcal{N}(0,\sigma)\).
Returns an array of shape
shape
containing the generated noise.Parameters: - sigma – The standard deviation of the noise. Must be zero or greater.
- shape – A tuple (or sequence) defining the shape of the generated noise array.
Example
values = model.simulate(parameters, times) noisy_values = values + noise.independent(5, values.shape)
-
pints.noise.
multiplicative_gaussian
(eta, sigma, f)[source]¶ Generates multiplicative Gaussian noise for a single output.
With multiplicative noise, the measurement error scales with the magnitude of the output. Given a model taking the form,
\[X(t) = f(t; \theta) + \epsilon(t)\]multiplicative Gaussian noise models the noise term as:
\[\epsilon(t) = f(t; \theta)^\eta v(t)\]where v(t) is iid Gaussian:
\[v(t) \stackrel{\text{ iid }}{\sim} \mathcal{N}(0, \sigma)\]The output magnitudes
f
are required as an input to this function. The noise terms are returned in an array of the same shape asf
.Parameters: - eta – The exponential power controlling the rate at which the noise scales with the output. The argument must be either a float (for single-output or multi-output noise) or an array_like of floats (for multi-output noise only, with one value for each output).
- sigma – The baseline standard deviation of the noise (must be greater than zero). The argument must be either a float (for single-output or multi-output noise) or an array_like of floats (for multi-output noise only, with one value for each output).
- f – A NumPy array giving the time-series for the output over time. For
multiple outputs, the array should have shape
(n_outputs, n_times)
.
Optimisers¶
Pints provides a number of optimisers, all implementing the Optimiser
interface, that can be used to find the parameters that minimise an
ErrorMeasure
or maximise a LogPDF
.
The easiest way to run an optimisation is by using the optimise()
method
or the OptimisationController
class.
Running an optimisation¶
-
pints.
optimise
(function, x0, sigma0=None, boundaries=None, transformation=None, method=None)[source]¶ Finds the parameter values that minimise an
ErrorMeasure
or maximise aLogPDF
.Parameters: - function – An
pints.ErrorMeasure
or apints.LogPDF
that evaluates points in the parameter space. - x0 – The starting point for searches in the parameter space. This value may
be used directly (for example as the initial position of a particle in
PSO
) or indirectly (for example as the center of a distribution inXNES
). - sigma0 – An optional initial standard deviation around
x0
. Can be specified either as a scalar value (one standard deviation for all coordinates) or as an array with one entry per dimension. Not all methods will use this information. - boundaries – An optional set of boundaries on the parameter space.
- transformation – An optional
pints.Transformation
to allow the optimiser to search in a transformed parameter space. If used, points shown or returned to the user will first be detransformed back to the original space. - method – The class of
pints.Optimiser
to use for the optimisation. If no method is specified,CMAES
is used.
Returns: - xbest (numpy array) – The best parameter set obtained
- fbest (float) – The corresponding score.
- function – An
-
class
pints.
OptimisationController
(function, x0, sigma0=None, boundaries=None, transformation=None, method=None)[source]¶ Finds the parameter values that minimise an
ErrorMeasure
or maximise aLogPDF
.Parameters: - function – An
pints.ErrorMeasure
or apints.LogPDF
that evaluates points in the parameter space. - x0 – The starting point for searches in the parameter space. This value may
be used directly (for example as the initial position of a particle in
PSO
) or indirectly (for example as the center of a distribution inXNES
). - sigma0 – An optional initial standard deviation around
x0
. Can be specified either as a scalar value (one standard deviation for all coordinates) or as an array with one entry per dimension. Not all methods will use this information. - boundaries – An optional set of boundaries on the parameter space.
- transformation – An optional
pints.Transformation
to allow the optimiser to search in a transformed parameter space. If used, points shown or returned to the user will first be detransformed back to the original space. - method – The class of
pints.Optimiser
to use for the optimisation. If no method is specified,CMAES
is used.
-
evaluations
()[source]¶ Returns the number of evaluations performed during the last run, or
None
if the controller hasn’t ran yet.
-
iterations
()[source]¶ Returns the number of iterations performed during the last run, or
None
if the controller hasn’t ran yet.
-
max_iterations
()[source]¶ Returns the maximum iterations if this stopping criterion is set, or
None
if it is not. Seeset_max_iterations()
.
-
max_unchanged_iterations
()[source]¶ Returns a tuple
(iterations, threshold)
specifying a maximum unchanged iterations stopping criterion, or(None, None)
if no such criterion is set. Seeset_max_unchanged_iterations()
.
-
parallel
()[source]¶ Returns the number of parallel worker processes this routine will be run on, or
False
if parallelisation is disabled.
-
run
()[source]¶ Runs the optimisation, returns a tuple
(xbest, fbest)
.An optional
callback
function can be passed in that will be called at the end of every iteration. The callback should take the arguments(iteration, optimiser)
, whereiteration
is the iteration count (an integer) andoptimiser
is the optimiser object.
-
set_callback
(cb=None)[source]¶ Allows a “callback” function to be passed in that will be called at the end of every iteration.
This can be used for e.g. visualising optimiser progress.
Example:
def cb(opt): plot(opt.xbest()) opt.set_callback(cb)
-
set_log_interval
(iters=20, warm_up=3)[source]¶ Changes the frequency with which messages are logged.
Parameters: - interval – A log message will be shown every
iters
iterations. - warm_up – A log message will be shown every iteration, for the first
warm_up
iterations.
- interval – A log message will be shown every
-
set_log_to_file
(filename=None, csv=False)[source]¶ Enables logging to file when a filename is passed in, disables it if
filename
isFalse
orNone
.The argument
csv
can be set toTrue
to write the file in comma separated value (CSV) format. By default, the file contents will be similar to the output on screen.
-
set_max_iterations
(iterations=10000)[source]¶ Adds a stopping criterion, allowing the routine to halt after the given number of
iterations
.This criterion is enabled by default. To disable it, use
set_max_iterations(None)
.
-
set_max_unchanged_iterations
(iterations=200, threshold=1e-11)[source]¶ Adds a stopping criterion, allowing the routine to halt if the objective function doesn’t change by more than
threshold
for the given number ofiterations
.This criterion is enabled by default. To disable it, use
set_max_unchanged_iterations(None)
.
-
set_parallel
(parallel=False)[source]¶ Enables/disables parallel evaluation.
If
parallel=True
, the method will run using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. Parallelisation can be disabled by settingparallel
to0
orFalse
.
-
set_threshold
(threshold)[source]¶ Adds a stopping criterion, allowing the routine to halt once the objective function goes below a set
threshold
.This criterion is disabled by default, but can be enabled by calling this method with a valid
threshold
. To disable it, useset_treshold(None)
.
-
threshold
()[source]¶ Returns the threshold stopping criterion, or
None
if no threshold stopping criterion is set. Seeset_threshold()
.
- function – An
-
class
pints.
Optimisation
(function, x0, sigma0=None, boundaries=None, transformation=None, method=None)[source]¶ Deprecated alias for
OptimisationController
.-
evaluations
()¶ Returns the number of evaluations performed during the last run, or
None
if the controller hasn’t ran yet.
-
iterations
()¶ Returns the number of iterations performed during the last run, or
None
if the controller hasn’t ran yet.
-
max_iterations
()¶ Returns the maximum iterations if this stopping criterion is set, or
None
if it is not. Seeset_max_iterations()
.
-
max_unchanged_iterations
()¶ Returns a tuple
(iterations, threshold)
specifying a maximum unchanged iterations stopping criterion, or(None, None)
if no such criterion is set. Seeset_max_unchanged_iterations()
.
-
optimiser
()¶ Returns the underlying optimiser object, allowing detailed configuration.
-
parallel
()¶ Returns the number of parallel worker processes this routine will be run on, or
False
if parallelisation is disabled.
-
run
()¶ Runs the optimisation, returns a tuple
(xbest, fbest)
.An optional
callback
function can be passed in that will be called at the end of every iteration. The callback should take the arguments(iteration, optimiser)
, whereiteration
is the iteration count (an integer) andoptimiser
is the optimiser object.
-
set_callback
(cb=None)¶ Allows a “callback” function to be passed in that will be called at the end of every iteration.
This can be used for e.g. visualising optimiser progress.
Example:
def cb(opt): plot(opt.xbest()) opt.set_callback(cb)
-
set_log_interval
(iters=20, warm_up=3)¶ Changes the frequency with which messages are logged.
Parameters: - interval – A log message will be shown every
iters
iterations. - warm_up – A log message will be shown every iteration, for the first
warm_up
iterations.
- interval – A log message will be shown every
-
set_log_to_file
(filename=None, csv=False)¶ Enables logging to file when a filename is passed in, disables it if
filename
isFalse
orNone
.The argument
csv
can be set toTrue
to write the file in comma separated value (CSV) format. By default, the file contents will be similar to the output on screen.
-
set_log_to_screen
(enabled)¶ Enables or disables logging to screen.
-
set_max_iterations
(iterations=10000)¶ Adds a stopping criterion, allowing the routine to halt after the given number of
iterations
.This criterion is enabled by default. To disable it, use
set_max_iterations(None)
.
-
set_max_unchanged_iterations
(iterations=200, threshold=1e-11)¶ Adds a stopping criterion, allowing the routine to halt if the objective function doesn’t change by more than
threshold
for the given number ofiterations
.This criterion is enabled by default. To disable it, use
set_max_unchanged_iterations(None)
.
-
set_parallel
(parallel=False)¶ Enables/disables parallel evaluation.
If
parallel=True
, the method will run using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. Parallelisation can be disabled by settingparallel
to0
orFalse
.
-
set_threshold
(threshold)¶ Adds a stopping criterion, allowing the routine to halt once the objective function goes below a set
threshold
.This criterion is disabled by default, but can be enabled by calling this method with a valid
threshold
. To disable it, useset_treshold(None)
.
-
threshold
()¶ Returns the threshold stopping criterion, or
None
if no threshold stopping criterion is set. Seeset_threshold()
.
-
time
()¶ Returns the time needed for the last run, in seconds, or
None
if the controller hasn’t run yet.
-
Optimiser base classes¶
-
class
pints.
Optimiser
(x0, sigma0=None, boundaries=None)[source]¶ Base class for optimisers implementing an ask-and-tell interface.
This interface provides fine-grained control. Users seeking to simply run an optimisation may wish to use the
OptimisationController
instead.Optimisation using “ask-and-tell” proceed by the user repeatedly “asking” the optimiser for points, and then “telling” it the function evaluations at those points. This allows a user to have fine-grained control over an optimisation, and implement custom parallelisation, logging, stopping criteria etc. Users who don’t need this functionality can use optimisers via the
OptimisationController
class instead.All PINTS optimisers are _minimisers_. To maximise a function simply pass in the negative of its evaluations to
tell()
(this is handled automatically by theOptimisationController
).All optimisers implement the
pints.Loggable
andpints.TunableMethod
interfaces.Parameters: - x0 – A starting point for searches in the parameter space. This value may be
used directly (for example as the initial position of a particle in
PSO
) or indirectly (for example as the center of a distribution inXNES
). - sigma0 – An optional initial standard deviation around
x0
. Can be specified either as a scalar value (one standard deviation for all coordinates) or as an array with one entry per dimension. Not all methods will use this information. - boundaries – An optional set of boundaries on the parameter space.
Example
An optimisation with ask-and-tell, proceeds roughly as follows:
optimiser = MyOptimiser() running = True while running: # Ask for points to evaluate xs = optimiser.ask() # Evaluate the score function or pdf at these points # At this point, code to parallelise evaluation can be added in fs = [f(x) for x in xs] # Tell the optimiser the evaluations; allowing it to update its # internal state. optimiser.tell(fs) # Check stopping criteria # At this point, custom stopping criteria can be added in if optimiser.fbest() < threshold: running = False # Check for optimiser issues if optimiser.stop(): running = False # At this point, code to visualise or benchmark optimiser behaviour # could be added in, for example by plotting `xs` in the parameter # space.
-
n_hyper_parameters
()¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
needs_sensitivities
()[source]¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
set_hyper_parameters
(x)¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
stop
()[source]¶ Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
-
tell
(fx)[source]¶ Performs an iteration of the optimiser algorithm, using the evaluations
fx
of the pointsx
previously specified byask
.For methods that require sensitivities (see
needs_sensitivities()
),fx
should be a tuple(objective, sensitivities)
, containing the values returned bypints.ErrorMeasure.evaluateS1()
.
- x0 – A starting point for searches in the parameter space. This value may be
used directly (for example as the initial position of a particle in
-
class
pints.
PopulationBasedOptimiser
(x0, sigma0=None, boundaries=None)[source]¶ Base class for optimisers that work by moving multiple points through the search space.
Extends
Optimiser
.-
ask
()¶ Returns a list of positions in the search space to evaluate.
-
fbest
()¶ Returns the objective function evaluated at the current best position.
-
name
()¶ Returns this method’s full name.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
population_size
()[source]¶ Returns this optimiser’s population size.
If no explicit population size has been set,
None
may be returned. Once running, the correct value will always be returned.
-
running
()¶ Returns
True
if this an optimisation is in progress.
-
set_population_size
(population_size=None)[source]¶ Sets a population size to use in this optimisation.
If population_size is set to
None
, the population size will be set using the heuristicsuggested_population_size()
.
-
stop
()¶ Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
-
suggested_population_size
(round_up_to_multiple_of=None)[source]¶ Returns a suggested population size for this method, based on the dimension of the search space (e.g. the parameter space).
If the optional argument
round_up_to_multiple_of
is set to an integer greater than 1, the method will round up the estimate to a multiple of that number. This can be useful to obtain a population size based on e.g. the number of worker processes used to perform objective function evaluations.
-
tell
(fx)¶ Performs an iteration of the optimiser algorithm, using the evaluations
fx
of the pointsx
previously specified byask
.For methods that require sensitivities (see
needs_sensitivities()
),fx
should be a tuple(objective, sensitivities)
, containing the values returned bypints.ErrorMeasure.evaluateS1()
.
-
xbest
()¶ Returns the current best position.
-
Convenience methods¶
-
pints.
fmin
(f, x0, args=None, boundaries=None, threshold=None, max_iter=None, max_unchanged=200, verbose=False, parallel=False, method=None)[source]¶ Minimises a callable function
f
, starting from positionx0
, using apints.Optimiser
.Returns a tuple
(xbest, fbest)
with the best position found, and the corresponding valuefbest = f(xbest)
.Parameters: - f – A function or callable class to be minimised.
- x0 – The initial point to search at. Must be a 1-dimensional sequence (e.g. a list or a numpy array).
- args – An optional tuple of extra arguments for
f
. - boundaries – An optional
pints.Boundaries
object or a tuple(lower, upper)
specifying lower and upper boundaries for the search. If no boundaries are provided an unbounded search is run. - threshold – An optional absolute threshold stopping criterium.
- max_iter – An optional maximum number of iterations stopping criterium.
- max_unchanged – A stopping criterion based on the maximum number of successive
iterations without a signficant change in
f
(seepints.OptimisationController()
). - verbose – Set to
True
to print progress messages to the screen. - parallel – Allows parallelisation to be enabled.
If set to
True
, the evaluations will happen in parallel using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. - method – The
pints.Optimiser
to use. If no method is specified,pints.CMAES
is used.
Example
import pints def f(x): return (x[0] - 3) ** 2 + (x[1] + 5) ** 2 xopt, fopt = pints.fmin(f, [1, 1])
-
pints.
curve_fit
(f, x, y, p0, boundaries=None, threshold=None, max_iter=None, max_unchanged=200, verbose=False, parallel=False, method=None)[source]¶ Fits a function
f(x, *p)
to a dataset(x, y)
by finding the value ofp
for whichsum((y - f(x, *p))**2) / n
is minimised (wheren
is the number of entries iny
).Returns a tuple
(xbest, fbest)
with the best position found, and the corresponding valuefbest = f(xbest)
.Parameters: - f (callable) – A function or callable class to be minimised.
- x – The values of an independent variable, at which
y
was recorded. - y – Measured values
y = f(x, p) + noise
. - p0 – An initial guess for the optimal parameters
p
. - boundaries – An optional
pints.Boundaries
object or a tuple(lower, upper)
specifying lower and upper boundaries for the search. If no boundaries are provided an unbounded search is run. - threshold – An optional absolute threshold stopping criterium.
- max_iter – An optional maximum number of iterations stopping criterium.
- max_unchanged – A stopping criterion based on the maximum number of successive
iterations without a signficant change in
f
(seepints.OptimisationController()
). - verbose – Set to
True
to print progress messages to the screen. - parallel – Allows parallelisation to be enabled.
If set to
True
, the evaluations will happen in parallel using a number of worker processes equal to the detected cpu core count. The number of workers can be set explicitly by settingparallel
to an integer greater than 0. - method – The
pints.Optimiser
to use. If no method is specified,pints.CMAES
is used.
Returns: - xbest (numpy array) – The best parameter set obtained.
- fbest (float) – The corresponding score.
Example
import numpy as np import pints def f(x, a, b, c): return a + b * x + c * x ** 2 x = np.linspace(-5, 5, 100) y = f(x, 1, 2, 3) + np.random.normal(0, 1) p0 = [0, 0, 0] popt = pints.curve_fit(f, x, y, p0)
Boundary transformations¶
-
class
pints.
TriangleWaveTransform
(boundaries)[source]¶ Transforms from unbounded to (rectangular) bounded parameter space using a periodic triangle-wave transform.
Note: The transform is applied _inside_ optimisation methods, there is no need to wrap this around your own problem or score function.
This can be applied as a transformation on
x
to implement _rectangular_ boundaries in methods with no natural boundary mechanism. It effectively mirrors the search space at every boundary, leading to a continuous (but non-smooth) periodic landscape. While this effectively creates an infinite number of minima/maxima, each one maps to the same point in parameter space.It should work well for methods that maintain a single search position or a single search distribution (e.g.
CMAES
,xNES
,SNES
), which will end up in one of the many mirror images. However, for methods that use independent search particles (e.g.PSO
) it could lead to a scattered population, with different particles exploring different mirror images. Other strategies should be used for such problems.
Bare-bones CMA-ES¶
-
class
pints.
BareCMAES
(x0, sigma0=0.1, boundaries=None)[source]¶ Finds the best parameters using the CMA-ES method described in [1, 2], using a bare bones re-implementation.
For general use, we recommend the
pints.CMAES
optimiser, which wraps around thecma
module provided by the authors of CMA-ES. Thecma
module provides a battle-tested version of the optimiser.The role of this class, is to provide a simpler implementation of only the core algorithm of CMA-ES, which is easier to read and analyse, and which can be used to compare with bare implementations of other methods.
Extends
PopulationBasedOptimiser
.References
[1] The CMA Evolution Strategy: A Tutorial Nikolaus Hanse, arxiv https://arxiv.org/abs/1604.00772 [2] Hansen, Mueller, Koumoutsakos (2003) “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)”. Evolutionary Computation https://doi.org/10.1162/106365603321828970 -
ask
()[source]¶ See
Optimiser.ask()
.
-
cov
(decomposed=False)[source]¶ Returns the current covariance matrix
C
of the proposal distribution.If the optional argument
decomposed
is set toTrue
, a tuple(R, S)
will be returned such thatR
contains the eigenvectors ofC
whileS
is a diagonal matrix containing the squares of the eigenvalues ofC
, such thatC = R S S R.T
.
-
fbest
()[source]¶ See
Optimiser.fbest()
.
-
n_hyper_parameters
()¶
-
name
()[source]¶ See
Optimiser.name()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
population_size
()¶ Returns this optimiser’s population size.
If no explicit population size has been set,
None
may be returned. Once running, the correct value will always be returned.
-
running
()[source]¶ See
Optimiser.running()
.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[population_size]
.
-
set_population_size
(population_size=None)¶ Sets a population size to use in this optimisation.
If population_size is set to
None
, the population size will be set using the heuristicsuggested_population_size()
.
-
stop
()[source]¶ See
Optimiser.stop()
.
-
suggested_population_size
(round_up_to_multiple_of=None)¶ Returns a suggested population size for this method, based on the dimension of the search space (e.g. the parameter space).
If the optional argument
round_up_to_multiple_of
is set to an integer greater than 1, the method will round up the estimate to a multiple of that number. This can be useful to obtain a population size based on e.g. the number of worker processes used to perform objective function evaluations.
-
tell
(fx)[source]¶ See
Optimiser.tell()
.
-
xbest
()[source]¶ See
Optimiser.xbest()
.
-
CMA-ES¶
-
class
pints.
CMAES
(x0, sigma0=None, boundaries=None)[source]¶ Finds the best parameters using the CMA-ES method described in [1], [2] and implemented in the
cma
module [3].CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy, and is designed for non-linear derivative-free optimization problems.
Extends
PopulationBasedOptimiser
.References
[1] The CMA Evolution Strategy: A Tutorial Nikolaus Hanse, arxiv https://arxiv.org/abs/1604.00772 [2] Hansen, Mueller, Koumoutsakos (2006) “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)”. Evolutionary Computation https://doi.org/10.1162/106365603321828970 [3] PyPi page for cma
https://pypi.org/project/cma/-
ask
()[source]¶ See
Optimiser.ask()
.
-
fbest
()[source]¶ See
Optimiser.fbest()
.
-
n_hyper_parameters
()¶
-
name
()[source]¶ See
Optimiser.name()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
population_size
()¶ Returns this optimiser’s population size.
If no explicit population size has been set,
None
may be returned. Once running, the correct value will always be returned.
-
running
()[source]¶ See
Optimiser.running()
.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[population_size]
.
-
set_population_size
(population_size=None)¶ Sets a population size to use in this optimisation.
If population_size is set to
None
, the population size will be set using the heuristicsuggested_population_size()
.
-
stop
()[source]¶ See
Optimiser.stop()
.
-
suggested_population_size
(round_up_to_multiple_of=None)¶ Returns a suggested population size for this method, based on the dimension of the search space (e.g. the parameter space).
If the optional argument
round_up_to_multiple_of
is set to an integer greater than 1, the method will round up the estimate to a multiple of that number. This can be useful to obtain a population size based on e.g. the number of worker processes used to perform objective function evaluations.
-
tell
(fx)[source]¶ See
Optimiser.tell()
.
-
xbest
()[source]¶ See
Optimiser.xbest()
.
-
Gradient descent (fixed learning rate)¶
-
class
pints.
GradientDescent
(x0, sigma0=0.1, boundaries=None)[source]¶ Gradient-descent method with a fixed learning rate.
-
ask
()[source]¶ See
Optimiser.ask()
.
-
fbest
()[source]¶ See
Optimiser.fbest()
.
-
name
()[source]¶ See
Optimiser.name()
.
-
running
()[source]¶ See
Optimiser.running()
.
-
set_hyper_parameters
(x)[source]¶ See
pints.TunableMethod.set_hyper_parameters()
.The hyper-parameter vector is
[learning_rate]
.
-
set_learning_rate
(eta)[source]¶ Sets the learning rate for this optimiser.
Parameters: eta (float) – The learning rate, as a float greater than zero.
-
stop
()¶ Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
-
tell
(reply)[source]¶ See
Optimiser.tell()
.
-
xbest
()[source]¶ See
Optimiser.xbest()
.
-
Nelder-Mead¶
-
class
pints.
NelderMead
(x0, sigma0=None, boundaries=None)[source]¶ Nelder-Mead downhill simplex method.
Implementation of the classical algorithm by [1], following the presentation in Algorithm 8.1 of [2].
This is a deterministic local optimiser. In most update steps it performs either 1 evaluation, or 2 sequential evaluations, so that it will not typically benefit from parallelisation.
Generates a “simplex” of
n + 1
samples around a given starting point, and evaluates their scores. Next, each iteration consists of a sequence of operations, typically the worst sampley_worst
is replaced with a new point:y_new = mu + delta * (mu - y_worst) mu = (1 / n) * sum(y), y != y_worst
where
delta
has one of four values, depending on the type of operation:- Reflection (
delta = 1
) - Expansion (
delta = 2
) - Inside contraction (
delta = -0.5
) - Outside contraction (
delta = 0.5
)
Note that the
delta
values here are common choices, but not the only valid choices.A fifth type of iteration called a “shrink” is occasionally performed, in which all samples except the best sample
y_best
are replaced:y_i_new = y_best + ys * (y_i - y_best)
where ys is a parameter (typically ys = 0.5).
The initialisation of the initial simplex was copied from [3].
References
[1] A simplex method for function minimization Nelder, Mead 1965, Computer Journal https://doi.org/10.1093/comjnl/7.4.308 [2] Introduction to derivative-free optimization Andrew R. Conn, Katya Scheinberg, Luis N. Vicente 2009, First edition. ISBN 978-0-098716-68-9 https://doi.org/10.1137/1.9780898718768 [3] SciPy on GitHub https://github.com/scipy/scipy/ -
ask
()[source]¶ See:
pints.Optimiser.ask()
.
-
fbest
()[source]¶ See:
pints.Optimiser.fbest()
.
-
n_hyper_parameters
()¶ Returns the number of hyper-parameters for this method (see
TunableMethod
).
-
name
()[source]¶ See:
pints.Optimiser.name()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
set_hyper_parameters
(x)¶ Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).Parameters: x – An array of length n_hyper_parameters
used to set the hyper-parameters.
-
stop
()[source]¶ See:
pints.Optimiser.stop()
.
-
tell
(fx)[source]¶ See:
pints.Optimiser.tell()
.
-
xbest
()[source]¶ See:
pints.Optimiser.xbest()
.
- Reflection (
PSO¶
-
class
pints.
PSO
(x0, sigma0=None, boundaries=None)[source]¶ Finds the best parameters using the PSO method described in [1].
Particle Swarm Optimisation (PSO) is a global search method (so refinement with a local optimiser is advised!) that works well for problems in high dimensions and with many local minima. Because it treats each parameter independently, it does not require preconditioning of the search space.
In a particle swarm optimization, the parameter space is explored by
n
independent particles. The particles perform a pseudo-random walk through the parameter space, guided by their own personal best score and the global optimum found so far.The method starts by creating a swarm of
n
particles and assigning each an initial position and initial velocity (see the explanation of the argumentshints
andv
for details). Each particle’s score is calculated and set as the particle’s current best local scorepl
. The best score of all the particles is set as the best global scorepg
.Next, an iterative procedure is run that updates each particle’s velocity
v
and positionx
using:v[k] = v[k-1] + al * (pl - x[k-1]) + ag * (pg - x[k-1]) x[k] = v[k]
Here,
x[t]
is the particle’s current position andv[t]
its current velocity. The valuesal
andag
are scalars randomly sampled from a uniform distribution, with values bound byr * 4.1
and(1 - r) * 4.1
. Thus a swarm withr = 1
will only use local information, while a swarm withr = 0
will only use global information. The de facto standard isr = 0.5
. The random sampling is done each timeal
andag
are used: at each time step every particle performsm
samplings, wherem
is the dimensionality of the search space.Pseudo-code algorithm:
almax = r * 4.1 agmax = 4.1 - almax while stopping criterion not met: for i in [1, 2, .., n]: if f(x[i]) < f(p[i]): p[i] = x[i] pg = min(p[1], p[2], .., p[n]) for j in [1, 2, .., m]: al = uniform(0, almax) ag = uniform(0, agmax) v[i,j] += al * (p[i,j] - x[i,j]) + ag * (pg[i,j] - x[i,j]) x[i,j] += v[i,j]
Extends
PopulationBasedOptimiser
.References
[1] Kennedy, Eberhart (1995) Particle Swarm Optimization. IEEE International Conference on Neural Networks https://doi.org/10.1109/ICNN.1995.488968 -
ask
()[source]¶ See
Optimiser.ask()
.
-
fbest
()[source]¶ See
Optimiser.fbest()
.
-
name
()[source]¶ See
Optimiser.name()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
population_size
()¶ Returns this optimiser’s population size.
If no explicit population size has been set,
None
may be returned. Once running, the correct value will always be returned.
-
running
()[source]¶ See
Optimiser.running()
.
-
set_hyper_parameters
(x)[source]¶ The hyper-parameter vector is
[population_size, local_global_balance]
.
-
set_local_global_balance
(r=0.5)[source]¶ Set the balance between local and global exploration for each particle, using a parameter r such that r = 1 is a fully local search and r = 0 is a fully global search.
-
set_population_size
(population_size=None)¶ Sets a population size to use in this optimisation.
If population_size is set to
None
, the population size will be set using the heuristicsuggested_population_size()
.
-
stop
()¶ Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
-
suggested_population_size
(round_up_to_multiple_of=None)¶ Returns a suggested population size for this method, based on the dimension of the search space (e.g. the parameter space).
If the optional argument
round_up_to_multiple_of
is set to an integer greater than 1, the method will round up the estimate to a multiple of that number. This can be useful to obtain a population size based on e.g. the number of worker processes used to perform objective function evaluations.
-
tell
(fx)[source]¶ See
Optimiser.tell()
.
-
xbest
()[source]¶ See
Optimiser.xbest()
.
-
SNES¶
-
class
pints.
SNES
(x0, sigma0=None, boundaries=None)[source]¶ Finds the best parameters using the SNES method described in [1], [2].
SNES stands for Seperable Natural Evolution Strategy, and is designed for non-linear derivative-free optimization problems in high dimensions and with many local minima [1].
It treats each dimension separately, making it suitable for higher dimensions.
Extends
PopulationBasedOptimiser
.References
[1] (1, 2) Schaul, Glasmachers, Schmidhuber (2011) “High dimensions and heavy tails for natural evolution strategies”. Proceedings of the 13th annual conference on Genetic and evolutionary computation. https://doi.org/10.1145/2001576.2001692 [2] PyBrain: The Python machine learning library http://pybrain.org -
ask
()[source]¶ See
Optimiser.ask()
.
-
fbest
()[source]¶ See
Optimiser.fbest()
.
-
n_hyper_parameters
()¶
-
name
()[source]¶ See
Optimiser.name()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
population_size
()¶ Returns this optimiser’s population size.
If no explicit population size has been set,
None
may be returned. Once running, the correct value will always be returned.
-
running
()[source]¶ See
Optimiser.running()
.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[population_size]
.
-
set_population_size
(population_size=None)¶ Sets a population size to use in this optimisation.
If population_size is set to
None
, the population size will be set using the heuristicsuggested_population_size()
.
-
stop
()¶ Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
-
suggested_population_size
(round_up_to_multiple_of=None)¶ Returns a suggested population size for this method, based on the dimension of the search space (e.g. the parameter space).
If the optional argument
round_up_to_multiple_of
is set to an integer greater than 1, the method will round up the estimate to a multiple of that number. This can be useful to obtain a population size based on e.g. the number of worker processes used to perform objective function evaluations.
-
tell
(fx)[source]¶ See
Optimiser.tell()
.
-
xbest
()[source]¶ See
Optimiser.xbest()
.
-
xNES¶
-
class
pints.
XNES
(x0, sigma0=None, boundaries=None)[source]¶ Finds the best parameters using the xNES method described in [1], [2].
xNES stands for Exponential Natural Evolution Strategy, and is designed for non-linear derivative-free optimization problems [1].
Extends
PopulationBasedOptimiser
.References
[1] (1, 2) Glasmachers, Schaul, Schmidhuber et al. (2010) “Exponential natural evolution strategies”. Proceedings of the 12th annual conference on Genetic and evolutionary computation. https://doi.org/10.1145/1830483.1830557 [2] PyBrain: The Python machine learning library http://pybrain.org -
ask
()[source]¶ See
Optimiser.ask()
.
-
fbest
()[source]¶ See
Optimiser.fbest()
.
-
n_hyper_parameters
()¶
-
name
()[source]¶ See
Optimiser.name()
.
-
needs_sensitivities
()¶ Returns
True
if this methods needs sensitivities to be passed in totell
along with the evaluated error.
-
population_size
()¶ Returns this optimiser’s population size.
If no explicit population size has been set,
None
may be returned. Once running, the correct value will always be returned.
-
running
()[source]¶ See
Optimiser.running()
.
-
set_hyper_parameters
(x)¶ The hyper-parameter vector is
[population_size]
.
-
set_population_size
(population_size=None)¶ Sets a population size to use in this optimisation.
If population_size is set to
None
, the population size will be set using the heuristicsuggested_population_size()
.
-
stop
()¶ Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
-
suggested_population_size
(round_up_to_multiple_of=None)¶ Returns a suggested population size for this method, based on the dimension of the search space (e.g. the parameter space).
If the optional argument
round_up_to_multiple_of
is set to an integer greater than 1, the method will round up the estimate to a multiple of that number. This can be useful to obtain a population size based on e.g. the number of worker processes used to perform objective function evaluations.
-
tell
(fx)[source]¶ See
Optimiser.tell()
.
-
xbest
()[source]¶ See
Optimiser.xbest()
.
-
Noise model diagnostics¶
Pints includes functionality to generate diagnostic plots of the residuals. These tools may be useful to evaluate the validity of a noise model.
Plotting functions:
plot_residuals_autocorrelation()
plot_residuals_binned_autocorrelation()
plot_residuals_binned_std()
plot_residuals_distance()
plot_residuals_vs_output()
Diagnostics:
Plotting functions¶
-
pints.residuals_diagnostics.
plot_residuals_autocorrelation
(parameters, problem, max_lag=10, thinning=None, significance_level=0.05, posterior_interval=0.95)[source]¶ Generate an autocorrelation plot of the residuals.
This function can be used to analyse the results of either optimisation or MCMC Bayesian inference. When multiple samples of the residuals are present (corresponding to multiple MCMC samples), the plot illustrates the distribution of autocorrelations across the MCMC samples. At each lag, a point is drawn at the median autocorrelation, and a line is drawn giving the percentile range of the posterior interval specified as an argument (by default, the 2.5th to the 97.5th percentile).
When multiple outputs are present, one residuals plot will be generated for each output.
When a significance level is provided, confidence bounds for the sample autocorrelations under the assumption of IID residuals are drawn on the plot. Many of the observed residuals autocorrelations falling outside these bounds could imply evidence against the residuals being IID.
Under the assumption that the residuals of length \(n\) are IID with mean 0 and variance \(\sigma^2\), for large \(n\) the residuals sample autocorrelations are approximately IID Normal(mean=0, variance=1/n). This result is proved in [1] (see Theorem 7.2.2 and Example 7.2.1). Therefore, the confidence bounds can be calculated by \(\pm z^* n^{-1/2}\) for the appropriate critical value \(z^*\).
This function returns a
matplotlib
figure.Parameters: - parameters – The parameter values with shape
(n_samples, n_parameters)
. When passing a single best fit parameter vector,n_samples
will be 1. - problem – The problem given by a
pints.SingleOutputProblem
orpints.MultiOutputProblem
, withn_parameters
greater than or equal to then_parameters
of theparameters
. Extra parameters not found in the problem are ignored. - max_lag – Optional int value (default 10). The highest lag to plot.
- thinning – Optional int value (greater than zero). If thinning is set to
n
, only every nth sample in parameters will be used. If set toNone
(default), some thinning will be applied so that about 200 samples will be used. - significance_level –
None
or float value (default 0.05). When a significance level is provided, dashed lines for the confidence interval corresponding to that significance level are drawn on the plot. WhenNone
, no lines are drawn. - posterior_interval – Float value (default 0.95). When multiple samples of the parameter values are provided, this gives the size of the credible region of the posterior to plot.
References
[1] Brockwell, P. J., & Davis, R. A. (1991). Time series: Theory and methods (2nd ed.). New York: Springer. - parameters – The parameter values with shape
-
pints.residuals_diagnostics.
plot_residuals_binned_autocorrelation
(parameters, problem, thinning=None, n_bins=25)[source]¶ Plot the autocorrelation of the residuals within bins (i.e. discrete time windows across the series).
Given a time series with observed residuals
\[e_i = y_i - f(t_i; \theta)\]This method divides the vector of residuals into some number of equally sized bins. The lag 1 autocorrelation is calculated for the residuals within each bin. The plot shows the lag 1 autocorrelation in each bin over time.
This diagnostic is useful for diagnosing time series with noise whose autocorrelation varies over time.
When passing an array of parameters (from an MCMC sampler), this method plots the autocorrelations of the posterior median residual values.
Typically, this diagnostic is called after obtaining the residuals of an IID fit, in order to determine whether the IID fit is satisfactory or a more complex noise model is needed.
This function returns a
matplotlib
figure.Parameters: - parameters – The parameter values with shape
(n_samples, n_parameters)
. When passing a single best fit parameter vector,n_samples
will be 1. - problem – The problem given by a
pints.SingleOutputProblem
orpints.MultiOutputProblem
, withn_parameters
greater than or equal to then_parameters
of theparameters
. Extra parameters not found in the problem are ignored. - thinning – Optional int value (greater than zero). If thinning is set to
n
, only every nth sample in parameters will be used. If set toNone
(default), some thinning will be applied so that about 200 samples will be used. - n_bins – Optional int value (greater than zero) giving the number of bins into which to divide the time series. By default, it is fixed to 25.
- parameters – The parameter values with shape
-
pints.residuals_diagnostics.
plot_residuals_binned_std
(parameters, problem, thinning=None, n_bins=25)[source]¶ Plot the standard deviation of the residuals within bins (i.e. discrete time windows across the series).
Given a time series with observed residuals
\[e_i = y_i - f(t_i; \theta)\]This method divides the vector of residuals into some number of equally sized bins. The standard deviation is calculated for the residuals within each bin. The plot shows the standard deviation in each bin over time.
This diagnostic is particularly useful for diagnosing time series whose noise exhibits a change in variance over time.
When passing an array of parameters (from an MCMC sampler), this method will plot the standard deviation of the posterior median residual values.
Typically, this diagnostic can be called after obtaining the residuals of an IID fit, in order to determine whether the IID fit is satisfactory or a more complex noise model is needed.
This function returns a
matplotlib
figure.Parameters: - parameters – The parameter values with shape
(n_samples, n_parameters)
. When passing a single best fit parameter vector,n_samples
will be 1. - problem – The problem given by a
pints.SingleOutputProblem
orpints.MultiOutputProblem
, withn_parameters
greater than or equal to then_parameters
of theparameters
. Extra parameters not found in the problem are ignored. - thinning – Optional int value (greater than zero). If thinning is set to
n
, only every nth sample in parameters will be used. If set toNone
(default), some thinning will be applied so that about 200 samples will be used. - n_bins – Optional int value (greater than zero) giving the number of bins into which to divide the time series. By default, it is fixed to 25.
- parameters – The parameter values with shape
-
pints.residuals_diagnostics.
plot_residuals_distance
(parameters, problem, thinning=None)[source]¶ Plot a distance matrix of the residuals.
Given a time series with observed residuals
\[e_i = y_i - f(t_i; \theta)\]this function generates and plots the distance matrix \(D\) whose entries are defined by
\[D_{i, j} = |e_i - e_j|\]The plot of this matrix may be helpful for identifying a time series with correlated noise. When the noise terms are correlated, the distance matrix \(D\) is likely to have a banded appearance.
For problems with multiple outputs, one distance matrix is generated for each output.
When passing an array of parameters (from an MCMC sampler), this method will plot the distance matrix of the posterior median residual values.
Typically, this diagnostic is called after obtaining the residuals of an IID fit, in order to determine whether the IID fit is satisfactory or a more complex noise model is needed.
This function returns a
matplotlib
figure.Parameters: - parameters – The parameter values with shape
(n_samples, n_parameters)
. When passing a single best fit parameter vector,n_samples
will be 1. - problem – The problem given by a
pints.SingleOutputProblem
orpints.MultiOutputProblem
, withn_parameters
greater than or equal to then_parameters
of theparameters
. Extra parameters not found in the problem are ignored. - thinning – Optional int value (greater than zero). If thinning is set to
n
, only every nth sample in parameters will be used. If set toNone
(default), some thinning will be applied so that about 200 samples will be used.
- parameters – The parameter values with shape
-
pints.residuals_diagnostics.
plot_residuals_vs_output
(parameters, problem, thinning=None)[source]¶ Draw a plot of the magnitude of residuals versus the solution output.
This plot is useful to detect any dependence between the error model and the magnitude of the solution. For example, it may help to detect multiplicative Gaussian noise, in which the standard deviation of the error scales with the output.
When multiple samples of the parameters are provided (from an MCMC chain), the residuals are calculated and plotted relative to the posterior median of the solution outputs.
This function returns a
matplotlib
figure.Parameters: - parameters – The parameter values with shape
(n_samples, n_parameters)
. When passing a single best fit parameter vector,n_samples
will be 1. - problem – The problem given by a
pints.SingleOutputProblem
orpints.MultiOutputProblem
, withn_parameters
greater than or equal to then_parameters
of theparameters
. Extra parameters not found in the problem are ignored. - thinning – Optional, integer value (greater than zero). If thinning is set to
n
, only every nth sample in parameters will be used. If set toNone
(default), some thinning will be applied so that about 200 samples will be used.
- parameters – The parameter values with shape
Diagnostics¶
-
pints.residuals_diagnostics.
acorr
(x, max_lag)[source]¶ Calculate the normalised autocorrelation for a given data series.
This function uses the same procedure as
matplotlib.pyplot.acorr
, but it just calculates the autocorrelation without plotting anything.Returns the autocorrelation as a NumPy array.
Parameters: - x – A 1d NumPy array containing the time series for which to calculate autocorrelation.
- max_lag – An int specifying the highest lag to consider.
-
pints.residuals_diagnostics.
calculate_residuals
(parameters, problem, thinning=None)[source]¶ Calculate the residuals (difference between actual data and the fit).
Either a single set of parameters or a chain of MCMC samples can be provided.
The residuals are returned as a 3-dimensional NumPy array with shape
(n_samples, n_outputs, n_times)
.Parameters: - parameters – The parameter values with shape
(n_samples, n_parameters)
. When passing a single best fit parameter vector,n_samples
will be 1. - problem – The problem given by a
pints.SingleOutputProblem
orpints.MultiOutputProblem
, withn_parameters
greater than or equal to then_parameters
of theparameters
. Extra parameters not found in the problem are ignored. - thinning – Optional, integer value (greater than zero). If thinning is set to
n
, only every nth sample in parameters will be used. If set toNone
(default), some thinning will be applied so that about 200 samples will be used.
- parameters – The parameter values with shape
Toy problems¶
The toy module provides toy models
,
distributions
and
error measures
that can be used for tests and in
examples.
Some toy classes provide extra functionality defined in the
pints.toy.ToyModel
and pints.toy.ToyLogPDF
classes.
Toy base classes¶
-
class
pints.toy.
ToyLogPDF
[source]¶ Abstract base class for toy distributions.
Extends
pints.LogPDF
.-
distance
(samples)[source]¶ Calculates a measure of distance from
samples
to some characteristic of the underlying distribution.
-
evaluateS1
(x)¶ Evaluates this LogPDF, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(L, L')
whereL
is a scalar value andL'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log-pdf, so
L' = d/dp log(f(p))
, evaluated atp=x
.This is an optional method that is not always implemented.
-
n_parameters
()¶ Returns the dimension of the space this
LogPDF
is defined over.
-
-
class
pints.toy.
ToyModel
[source]¶ Defines an interface for toy problems.
Note that toy models should extend both
ToyModel
and one of the forward model classes, e.g.pints.ForwardModel
.
-
class
pints.toy.
ToyODEModel
[source]¶ Defines an interface for toy problems where the underlying model is an ordinary differential equation (ODE) that describes some time-series generating model.
Note that toy ODE models should extend both
pints.ToyODEModel
and one of the forward model classes, e.g.pints.ForwardModel
orpints.ForwardModelS1
.To use this class as the basis for a
pints.ForwardModel
, the method_rhs()
should be reimplemented.Models implementing
_rhs()
,jacobian()
and_dfdp()
can be used to create apints.ForwardModelS1
.-
_dfdp
(y, t, p)[source]¶ Returns the derivative of the ODE RHS at time
t
, with respect to model parametersp
.Parameters: - y – The state vector at time
t
(with lengthn_outputs
). - t – The time to evaluate at (as a scalar).
- p – A vector of model parameters (of length
n_parameters
).
Returns: Return type: A matrix of dimensions
n_outputs
byn_parameters
.- y – The state vector at time
-
_rhs
(y, t, p)[source]¶ Returns the evaluated RHS (
dy/dt
) for a given state vectory
, timet
, and parameter vectorp
.Parameters: - y – The state vector at time
t
(with lengthn_outputs
). - t – The time to evaluate at (as a scalar).
- p – A vector of model parameters (of length
n_parameters
).
Returns: Return type: A vector of length
n_outputs
.- y – The state vector at time
-
jacobian
(y, t, p)[source]¶ Returns the Jacobian (the derivative of the RHS ODE with respect to the outputs) at time
t
.Parameters: - y – The state vector at time
t
(with lengthn_outputs
). - t – The time to evaluate at (as a scalar).
- p – A vector of model parameters (of length
n_parameters
).
Returns: Return type: A matrix of dimensions
n_outputs
byn_outputs
.- y – The state vector at time
-
n_states
()[source]¶ Returns number of states in underlying ODE. Note: will not be same as
n_outputs()
for models where only a subset of states are observed.
-
suggested_parameters
()¶ Returns an NumPy array of the parameter values that are representative of the model.
For example, these parameters might reproduce a particular result that the model is famous for.
-
suggested_times
()¶ Returns an NumPy array of time points that is representative of the model
-
Annulus Distribution¶
-
class
pints.toy.
AnnulusLogPDF
(dimensions=2, r0=10, sigma=1)[source]¶ Toy distribution based on a d-dimensional distribution of the form
\[f(x|r_0, \sigma) \propto e^{-(|x|-r_0)^2 / {2\sigma^2}}\]where \(x\) is a d-dimensional real, and \(|x|\) is the Euclidean norm.
This distribution is roughly a one-dimensional Gaussian distribution centred on \(r0\), that is smeared over the surface of a hypersphere of the same radius. In two dimensions, the density looks like a circular annulus.
Extends
pints.LogPDF
.Parameters: - dimensions (int) – The dimensionality of the space.
- r0 (float) – The radius of the hypersphere and is approximately the mean normed distance from the origin.
- sigma (float) – The width of the annulus; approximately the standard deviation of normed distance.
-
distance
(samples)[source]¶ Calculates a measure of normed distance of samples from exact mean and covariance matrix assuming uniform prior with bounds given by
suggested_bounds()
.See
ToyLogPDF.distance()
.
-
sample
(n_samples)[source]¶ See
ToyLogPDF.sample()
.
Beeler-Reuter Action Potential Model¶
-
class
pints.toy.
ActionPotentialModel
(y0=None)[source]¶ The 1977 Beeler-Reuter model of the mammalian ventricular action potential (AP).
This model is written as an ODE with 8 states and several intermediary variables: for the full model equations, please see the original paper [1].
The model contains 5 ionic currents, each described by a sub-model with several kinetic parameters, and a maximum conductance parameter that determines its magnitude. Only the 5 conductance parameters are varied in this
ToyModel
, all other parameters are fixed and assumed to be known. To aid in inference, a parameter transformation is used: instead of specifying the maximum conductances directly, their natural logarithm should be used. In other words, the parameter vector passed tosimulate()
should contain the logarithm of the five conductances.As outputs, we use the AP and the calcium transient, as these are the only two states (out of the total of eight) with a physically observable counterpart. This makes this a fairly hard problem.
Extends
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: y0 – The initial state of the observables V
andCa_i
, whereCa_i
must be 0 or greater. If not given, the defaults are -84.622 and 2e-7.References
[1] Reconstruction of the action potential of ventricular myocardial fibres. Beeler, Reuter (1977) Journal of Physiology https://doi.org/10.1113/jphysiol.1977.sp011853 -
set_solver_tolerances
(rtol=0.0001, atol=1e-06)[source]¶ Updates the solver tolerances. See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html
-
simulate_all_states
(parameters, times)[source]¶ Runs a simulation and returns all state variables, including the ones that do no have a physically observable counterpart.
-
suggested_parameters
()[source]¶ Returns suggested parameters for this model. The returned vector is already log-transformed, and can be passed directly to
simulate()
.
-
Cone Distribution¶
-
class
pints.toy.
ConeLogPDF
(dimensions=2, beta=1)[source]¶ Toy distribution based on a d-dimensional distribution of the form,
\[f(x) \propto e^{-|x|^\beta}\]where
x
is a d-dimensional real, and|x|
is the Euclidean norm. The mean and variance that are returned relate to expectations on|x|
not the multidimensionalx
.Extends
pints.LogPDF
.Parameters: - dimensions (int) – The dimensionality of the cone.
- beta (float) – The power to which
|x|
is raised in the exponential term, which must be positive.
-
distance
(samples)[source]¶ Calculates a measure of normed distance of samples from exact mean and covariance matrix assuming uniform prior with bounds given by
suggested_bounds()
.
-
sample
(n_samples)[source]¶ See
ToyLogPDF.sample()
.
Constant Model¶
-
class
pints.toy.
ConstantModel
(n, force_multi_output=False)[source]¶ Toy model that’s constant over time, linear over the parameters, mostly useful for unit testing.
For an n-dimensional model, evaluated with parameters
p = [p_1, p_2, ..., p_n]
, the simulated values are time-invariant, so that for any timet
\[f(t) = (p_1, 2 p_2, 3 p_3, ..., n p_n)\]The derivatives with respect to the parameters are time-invariant, and simply equal
\[\begin{split}\frac{\partial{f_i(t)}}{dp_j} = \begin{cases} i, i = j\\0, i \neq j \end{cases}\end{split}\]Extends
pints.ForwardModelS1
.Parameters: - n (int) – The number of parameters (and outputs) the model should have.
- force_multi_output (boolean) – Set to
True
to always return output of the shape(n_times, n_outputs)
, even ifn_outputs == 1
.
Example
times = np.linspace(0, 1, 100) m = pints.ConstantModel(2) m.simulate([1, 2], times)
In this example, the returned output is
[1, 4]
at every point in time.
Eight Schools distribution¶
-
class
pints.toy.
EightSchoolsLogPDF
(centered=True)[source]¶ The classic Eight Schools example that is discussed in [1].
The aim of this model (implemented as a
pints.ToyLogPDF
) is to determine the effects of coaching on SAT scores in 8 schools (each school being denoted by subscript j in the following equations). It it used by statisticians to illustrate how hierarchical models can quite easily become unidentified, making inference hard.This model is hierarchical and takes the form,
\[\begin{split}\begin{align} \mu &\sim \mathcal{N}(0, 5) \\ \tau &\sim \text{Cauchy}(0, 5) \\ \theta_j &\sim \mathcal{N}(\mu, \tau) \\ y_j &\sim \mathcal{N}(\theta_j, \sigma_j), \\ \end{align}\end{split}\]where \(\sigma_j\) is known. The user may choose between the “centered” parameterisation of the model (which exactly mirrors the statistical model), and the “non-centered” parameterisation, which introduces auxillary variables to improve chain mixing. The non-centered model takes the form,
\[\begin{split}\begin{align} \mu &\sim \mathcal{N}(0, 5) \\ \tau &\sim \text{Cauchy}(0, 5) \\ \tilde{\theta}_j &\sim \mathcal{N}(0, 1) \\ \theta_j &= mu + \tilde{\theta}_j \tau \\ y_j &\sim \mathcal{N}(\theta_j, \sigma_j). \\ \end{align}\end{split}\]Note that, in the non-centered case, the parameter samples correspond to \(\tilde{\theta}\) rather than \(\theta\).
The model uses a 10-dimensional parameter vector, composed of
mu
, the population-level scoretau
, the population-level standard deviationtheta_j
, school j’s mean score (for each of the 8 schools).
Extends
pints.toy.ToyLogPDF
.Parameters: centered (bool) – Whether or not to use the centered formulation. References
[1] (1, 2) “Bayesian data analysis”, 3rd edition, 2014, Gelman, A et al.. -
distance
(samples)¶ Calculates a measure of distance from
samples
to some characteristic of the underlying distribution.
-
sample
(n_samples)¶ Generates independent samples from the underlying distribution.
Fitzhugh-Nagumo Model¶
-
class
pints.toy.
FitzhughNagumoModel
(y0=None)[source]¶ Fitzhugh-Nagumo model of the action potential [1].
Has two states, and three phenomenological parameters:
a
,b
,c
. All states are visible\[\frac{d \mathbf{y}}{dt} = \mathbf{f}(\mathbf{y},\mathbf{p},t)\]where
\[\begin{split}\mathbf{y} &= (V,R)\\ \mathbf{p} &= (a,b,c)\end{split}\]The RHS, jacobian and change in RHS with the parameters are given by
\[\begin{split}\begin{align} \mathbf{f}(\mathbf{y},\mathbf{p},t) &= \left[\begin{matrix} c \left(R - V^{3}/3+V\right) \\ - \frac{1}{c} \left(R b + V - a\right) \end{matrix}\right] \\ \frac{\partial \mathbf{f}}{\partial \mathbf{y}} &= \left[\begin{matrix} c \left(1- V^{2}\right) & c \\ - \frac{1}{c} & - \frac{b}{c} \end{matrix}\right] \\ \frac{\partial \mathbf{f}}{\partial \mathbf{p}} &= \left[\begin{matrix} 0 & 0 & R - V^{3}/3 + V\\ \frac{1}{c} & - \frac{R}{c} & \frac{1}{c^{2}} \left(R b + V - a\right) \end{matrix}\right] \end{align}\end{split}\]Extends
pints.ForwardModelS1
, pints.toy.ToyODEModel.Parameters: y0 – The system’s initial state. If not given, the default [-1, 1]
is used.References
[1] A kinetic model of the conductance changes in nerve membrane Fitzhugh (1965) Journal of Cellular and Comparative Physiology. https://doi.org/10.1002/jcp.1030660518 -
initial_conditions
()¶ Returns the initial conditions of the model.
-
n_states
()¶ Returns number of states in underlying ODE. Note: will not be same as
n_outputs()
for models where only a subset of states are observed.
-
set_initial_conditions
(y0)¶ Sets the initial conditions of the model.
-
simulate
(parameters, times)¶
-
simulateS1
(parameters, times)¶
-
Gaussian distribution¶
-
class
pints.toy.
GaussianLogPDF
(mean=[0, 0], sigma=[1, 1])[source]¶ Toy distribution based on a multivariate (unimodal) Normal/Gaussian distribution.
Extends
pints.toy.ToyLogPDF
.Parameters: - mean – The distribution mean (specified as a vector).
- sigma – The distribution’s covariance matrix. Can be given as either a matrix
or a vector (in which case
diag(sigma)
will be used. Should be symmetric and positive-semidefinite.
-
distance
(samples)[source]¶ Returns the
Kullback-Leibler divergence
.
-
kl_divergence
(samples)[source]¶ Calculates the Kullback-Leibler divergence between a given list of samples and the distribution underlying this LogPDF.
The returned value is (near) zero for perfect sampling, and then increases as the error gets larger.
See: https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
-
suggested_bounds
()¶ Returns suggested boundaries for prior.
German Credit Hierarchical Logistic Distribution¶
-
class
pints.toy.
GermanCreditHierarchicalLogPDF
(x=None, y=None, download=False)[source]¶ Toy distribution based on a hierarchical logistic regression model, which takes the form,
\[f(z, y|\beta) \propto \text{exp}(-\sum_{i=1}^{N} \text{log}(1 + \text{exp}(-y_i z_i.\beta)) - \beta.\beta/2\sigma^2 - N/2 \text{log }\sigma^2 - \lambda \sigma^2)\]The data \((z, y)\) are a matrix of individual predictors (with 1s in the first column) and responses (1 if the individual should receive credit and -1 if not) respectively; \(\beta\) is a 325x1 vector of coefficients and \(N=1000\); \(z\) is the design matrix formed by creating all interactions between individual variables and themselves as defined in [2].
Extends
pints.LogPDF
.Parameters: theta (float) – vector of coefficients of length 326 (first dimension is sigma; other entries make up beta) References
[1] “UCI machine learning repository”, 2010. A. Frank and A. Asuncion. [2] “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”, 2014, M.D. Hoffman and A. Gelman. -
distance
(samples)¶ Calculates a measure of distance from
samples
to some characteristic of the underlying distribution.
-
sample
(n_samples)¶ Generates independent samples from the underlying distribution.
-
German Credit Logistic Distribution¶
-
class
pints.toy.
GermanCreditLogPDF
(x=None, y=None, download=False)[source]¶ Toy distribution based on a logistic regression model, which takes the form,
\[f(x, y|\beta) \propto \text{exp}(-\sum_{i=1}^{N} \text{log}(1 + \text{exp}(-y_i x_i.\beta)) - \beta.\beta/2\sigma^2)\]The data \((x, y)\) are a matrix of individual predictors (with 1s in the first column) and responses (1 if the individual should receive credit and -1 if not) respectively; \(\beta\) is a 25x1 vector of coefficients and \(\sigma^2=100\). The dataset here is from [1] but the test problem is defined in [2].
Extends
pints.LogPDF
.Parameters: beta (float) – vector of coefficients of length 25. References
[1] “UCI machine learning repository”, 2010. A. Frank and A. Asuncion. [2] “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo”, 2014, M.D. Hoffman and A. Gelman. -
distance
(samples)¶ Calculates a measure of distance from
samples
to some characteristic of the underlying distribution.
-
sample
(n_samples)¶ Generates independent samples from the underlying distribution.
-
Goodwin oscillator model¶
-
class
pints.toy.
GoodwinOscillatorModel
[source]¶ Three-state Goodwin oscillator toy model introduced in [1], [2], but best described in [3]. The model considers level of mRNA, \(x\), which is translated into protein \(y\), which, in turn, stimulated production of protein \(z\) that inhibits production of mRNA. The ODE system is described by the following equations,
\[ \begin{align}\begin{aligned}\dot{x} = 1 / (1 + z^{10}) - m_1 x\\\dot{y} = k_2 x - m_2 y\\\dot{z} = k_3 y - m_3 z\end{aligned}\end{align} \]Parameters are \([k_2, k_3, m_1, m_2, m_3]\). The initial conditions are hard-coded at
[0.0054, 0.053, 1.93]
.Extends
pints.ForwardModelS1
,pints.toy.ToyODEModel
.References
[1] Oscillatory behavior in enzymatic control processes. Goodwin (1965) Advances in enzyme regulation. https://doi.org/10.1016/0065-2571(65)90067-1 [2] Mathematics of cellular control processes I. Negative feedback to one gene. Griffith (1968) Journal of theoretical biology. https://doi.org/10.1016/0022-5193(68)90189-6 [3] Estimating Bayes factors via thermodynamic integration and population MCMC. Ben Calderhead and Mark Girolami, 2009, Computational Statistics and Data Analysis. -
initial_conditions
()¶ Returns the initial conditions of the model.
-
n_states
()¶ Returns number of states in underlying ODE. Note: will not be same as
n_outputs()
for models where only a subset of states are observed.
-
set_initial_conditions
(y0)¶ Sets the initial conditions of the model.
-
simulate
(parameters, times)¶
-
simulateS1
(parameters, times)¶
-
HES1 Michaelis-Menten Model¶
-
class
pints.toy.
Hes1Model
(m0=None, fixed_parameters=None)[source]¶ HES1 Michaelis-Menten model of regulatory dynamics [1].
This model describes the expression level of the transcription factor Hes1.
\[\begin{split}\frac{dm}{dt} &= -k_{deg}m + \frac{1}{1 + (p_2/P_0)^h} \\ \frac{dp_1}{dt} &= -k_{deg} p_1 + \nu m - k_1 p_1 \\ \frac{dp_2}{dt} &= -k_{deg} p_2 + k_1 p_1\end{split}\]The system is determined by 3 state variables \(m\), \(p_1\), and \(p_2\). It is assumed that only \(m\) can be observed, that is only \(m\) is an observable. The initial condition of the other two state variables and \(k_{deg}\) are treated as implicit parameters of the system. The input order of parameters of interest is \(\{ P_0, \nu, k_1, h \}\).
Extends
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: - m0 (float) – The initial condition of the observable
m
. Requiresm0 >= 0
. - fixed_parameters – The fixed parameters of the model which are not inferred, given as a
vector
[p1_0, p2_0, k_deg]
withp1_0, p2_0, k_deg >= 0
.
References
[1] Silk, D., el al. 2011. Designing attractive models via automated identification of chaotic and oscillatory dynamical regimes. Nature communications, 2, p.489. https://doi.org/10.1038/ncomms1496 -
fixed_parameters
()[source]¶ Returns the fixed parameters of the model which are not inferred, given as a vector
[p1_0, p2_0, k_deg]
.
-
initial_conditions
()¶ Returns the initial conditions of the model.
-
set_initial_conditions
(y0)¶ Sets the initial conditions of the model.
-
simulate
(parameters, times)¶
-
simulateS1
(parameters, times)¶
-
simulate_all_states
(parameters, times)[source]¶ Returns all state variables that
simulate()
does not return.
-
suggested_values
()[source]¶ Returns a suggested set of values that matches
suggested_times()
.
- m0 (float) – The initial condition of the observable
High dimensional Gaussian distribution¶
-
class
pints.toy.
HighDimensionalGaussianLogPDF
(dimension=20, rho=0.5)[source]¶ High-dimensional zero-mean multivariate Gaussian log pdf, with off-diagonal correlations.
Specifically, the covariance matrix Sigma is constructed so that diagonal elements are integers: Sigma_i,i = i and off-diagonal elements are Sigma_i,j = rho * sqrt(i) * sqrt(j).
Extends
pints.toy.ToyLogPDF
.Parameters: - dimension (int) – Dimensions of multivariate Gaussian distribution (which must exceed 1).
- rho (float) – The correlation between pairs of parameter dimensions. Note that this
must be between
`-1 / (dimension - 1) and 1
so that the covariance matrix is positive semi-definite.
-
distance
(samples)[source]¶ Returns approximate Kullback-Leibler divergence between samples and underlying distribution.
-
kl_divergence
(samples)[source]¶ Returns approximate Kullback-Leibler divergence between samples and underlying distribution.
The returned value is (near) zero for perfect sampling, and then increases as the error gets larger.
See: https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
Hodgkin-Huxley IK Experiment Model¶
-
class
pints.toy.
HodgkinHuxleyIKModel
(initial_condition=0.3)[source]¶ Toy model based on the potassium current experiments used for Hodgkin and Huxley’s 1952 model of the action potential of a squid’s giant axon [1].
A voltage-step protocol is created and applied to an axon, and the elicited potassium current (\(I_\text{K}\)) is given as model output.
The model equations are
\[\begin{split}\alpha &= p_1 \frac{-V - 75 + p_2}{\exp[(-V - 75 + p_2) / p_3] - 1} \\ \beta &= p_4 \exp[(-V - 75) / p_5] \\ \frac{dn}{dt} &= \alpha \cdot (1 - n) - \beta \cdot n \\ E_\text{K} &= -88 \\ g_\text{max} &= 36 \\ I_\text{K} &= g_\text{max} \cdot n^4 \cdot (V - E_\text{K})\end{split}\]Where \(p_1, p_2, ..., p_5\) are the parameters varied in this toy model.
During simulation, the membrane potential \(V\) is varied by holding it at -75mV for 90ms, then at a “step potential” for 10ms. The step potentials are based on the values used in the original paper, and are -69, -64, -56, -49, -43, -37, -24, -12, 1, 13, 25, and 34mV. The protocol is applied in the interval \(t = [0, 1200]\), so sampling outside this interval will not provide new information.
With the parameter values from
suggested_parameters()
, simulation results will match those in [1].Extends
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: initial_condition (float) – The initial value of the state variable \(n\). References
[1] (1, 2) A quantitative description of membrane currents and its application to conduction and excitation in nerve. Hodgkin, Huxley (1952d) Journal of Physiology. https://doi.org/10.1113/jphysiol.1964.sp007378 Example usage:
model = HodgkinHuxleyIKModel() p0 = model.suggested_parameters() times = model.suggested_times() values = model.simulate(p0, times) import matplotlib.pyplot as plt plt.figure() plt.plot(times, values)
Alternatively, the data can be displayed using the
fold()
method:plt.figure() for t, v in model.fold(times, values): plt.plot(t, v) plt.show()
-
fold
(times, values)[source]¶ Takes a set of times and values as return by this model, and “folds” the individual currents over each other, to create a very common plot in electrophysiology.
Returns a list of tuples
(times, values)
for each different voltage step.
-
n_outputs
()¶ Returns the number of outputs this model has. The default is 1.
-
suggested_duration
()[source]¶ Returns the duration of the experimental protocol modeled in this toy model.
-
suggested_parameters
()[source]¶ See
pints.toy.ToyModel.suggested_parameters()
.Returns an array with the original model parameters used by Hodgkin and Huxley.
-
Logistic model¶
-
class
pints.toy.
LogisticModel
(initial_population_size=2)[source]¶ Logistic model of population growth [1].
\[\begin{split}f(t) &= \frac{k}{1+(k/p_0 - 1) \exp(-r t)} \\ \frac{\partial f(t)}{\partial r} &= \frac{k t (k / p_0 - 1) \exp(-r t)} {((k/p_0-1) \exp(-r t) + 1)^2} \\ \frac{\partial f(t)}{ \partial k} &= -\frac{k \exp(-r t)} {p_0 ((k/p_0-1)\exp(-r t) + 1)^2} + \frac{1}{(k/p_0 - 1)\exp(-r t) + 1}\end{split}\]Has two model parameters: A growth rate \(r\) and a carrying capacity \(k\). The initial population size \(p_0 = f(0)\) is a fixed (known) parameter in the model.
Extends
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: initial_population_size (float) – Sets the initial population size \(p_0\). References
[1] https://en.wikipedia.org/wiki/Population_growth -
n_outputs
()¶ Returns the number of outputs this model has. The default is 1.
-
Lotka-Volterra model¶
-
class
pints.toy.
LotkaVolterraModel
(y0=None)[source]¶ Lotka-Volterra model of Predatory-Prey relationships [1].
This model describes cyclical fluctuations in the populations of two interacting species.
\[\begin{split}\frac{dx}{dt} = ax - bxy \\ \frac{dy}{dt} = -cy + dxy\end{split}\]where
x
is the number of prey, andy
is the number of predators.Real data is included via
suggested_values()
, which was taken from [2], and includes hare and lynx pelt count data collected by the Hudson’s Bay Company, in Canada in the early twentieth century.Extends
pints.ForwardModelS1
,pints.toy.ToyODEModel
.Parameters: y0 – The initial population, given as a vector [a, b]
such thata >= 0
andb >= 0
.References
[1] https://en.wikipedia.org/wiki/Lotka-Volterra_equations [2] (1, 2) Howard, P. (2009). Modeling basics. Lecture Notes for Math 442, Texas A&M University -
n_states
()¶ Returns number of states in underlying ODE. Note: will not be same as
n_outputs()
for models where only a subset of states are observed.
-
simulate
(parameters, times)¶
-
simulateS1
(parameters, times)¶
-
Multimodal Gaussian distribution¶
-
class
pints.toy.
MultimodalGaussianLogPDF
(modes=None, covariances=None)[source]¶ Multimodal (un-normalised) multivariate Gaussian distribution.
By default, the distribution is on a 2-dimensional space, with modes at at
(0, 0)
and(10, 10)
with independent unit covariance matrices.Examples:
# Default 2d, bimodal f = pints.toy.MultimodalGaussianLogPDF() # 3d bimodal f = pints.toy.MultimodalGaussianLogPDF([[0, 1, 2], [10, 10, 10]]) # 2d with 3 modes f = pints.toy.MultimodalGaussianLogPDF([[0, 0], [5, 5], [5, 0]])
Extends
pints.toy.ToyLogPDF
.Parameters: - modes – A list of points that will form the modes of the distribution. Must all have the same dimension. If not set, the method will revert to the bimodal distribution described above.
- covariances – A list of covariance matrices, one for each mode. If not set, a unit matrix will be used for each.
-
distance
(samples)[source]¶ Calculates
per mode approximate KL divergence
then sums these.
-
kl_divergence
(samples)[source]¶ Calculates the approximate Kullback-Leibler divergence between a given list of samples and the distribution underlying this LogPDF. It does this by first assigning each point to its most likely mode then calculating KL for each mode separately. If one mode is found with no near samples then all the samples are used to calculate KL for this mode.
The returned value is (near) zero for perfect sampling, and then increases as the error gets larger.
See: https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
Neal’s Funnel Distribution¶
-
class
pints.toy.
NealsFunnelLogPDF
(dimensions=10)[source]¶ Toy distribution based on a d-dimensional distribution of the form,
\[f(x_1, x_2,...,x_d,\nu) = \left[\prod_{i=1}^d\mathcal{N}(x_i|0,e^{\nu/2})\right] \times \mathcal{N}(\nu|0,3)\]where
x
is a d-dimensional real. This distribution was introduced in [1].Extends
pints.toy.ToyLogPDF
.Parameters: dimensions (int) – The dimensionality of funnel (by default equal to 10) which must exceed 1. References
[1] “Slice sampling”. R. Neal, Annals of statistics, 705 (2003) https://doi.org/10.1214/aos/1056562461 -
kl_divergence
(samples)[source]¶ Calculates the KL divergence of samples of the \(nu\) parameter of Neal’s funnel from the analytic \(\mathcal{N}(0, 3)\) result.
-
Parabolic error¶
Repressilator model¶
-
class
pints.toy.
RepressilatorModel
(y0=None)[source]¶ The “Repressilator” model describes oscillations in a network of proteins that suppress their own creation [1], [2].
The formulation used here is taken from [3] and analysed in [4]. It has three protein states (\(p_i\)), each encoded by mRNA (\(m_i\)). Once expressed, they suppress each other:
\[ \begin{align}\begin{aligned}\dot{m_0} = -m_0 + \frac{\alpha}{1 + p_2^n} + \alpha_0\\\dot{m_1} = -m_1 + \frac{\alpha}{1 + p_0^n} + \alpha_0\\\dot{m_2} = -m_2 + \frac{\alpha}{1 + p_1^n} + \alpha_0\\\dot{p_0} = -\beta (p_0 - m_0)\\\dot{p_1} = -\beta (p_1 - m_1)\\\dot{p_2} = -\beta (p_2 - m_2)\end{aligned}\end{align} \]With parameters
alpha_0
,alpha
,beta
, andn
.Only the mRNA states are visible as output.
Extends
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: y0 – The system’s initial state, must have 6 entries all >=0. References
[1] A Synthetic Oscillatory Network of Transcriptional Regulators. Elowitz, Leibler (2000) Nature. https://doi.org/10.1038/35002125 [2] https://en.wikipedia.org/wiki/Repressilator [3] Dynamic models in biology. Ellner, Guckenheimer (2006) Princeton University Press [4] Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Toni, Welch, Strelkowa, Ipsen, Stumpf (2009) J. R. Soc. Interface. https://doi.org/10.1098/rsif.2008.0172
Rosenbrock function¶
-
class
pints.toy.
RosenbrockError
[source]¶ Error measure based on the rosenbrock function [1].
\[f(x,y) = (1 - x)^2 + 100(y - x^2)^2\]Extends
pints.ErrorMeasure
.References
[1] https://en.wikipedia.org/wiki/Rosenbrock_function -
evaluateS1
(x)¶ Evaluates this error measure, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data has the shape
(e, e')
wheree
is a scalar value ande'
is a sequence of lengthn_parameters
.This is an optional method that is not always implemented.
-
-
class
pints.toy.
RosenbrockLogPDF
[source]¶ Unnormalised LogPDF based on the Rosenbrock function [2] with an addition of 1 on the denominator to avoid a discontinuity:
\[f(x,y) = -log[1 + (1 - x)^2 + 100(y - x^2)^2 ]\]Extends
pints.toy.ToyLogPDF
.References
[2] https://en.wikipedia.org/wiki/Rosenbrock_function -
distance
(samples)[source]¶ Calculates a measure of normed distance of samples from exact mean and covariance matrix assuming uniform prior with bounds given by
suggested_bounds()
.
-
sample
(n_samples)¶ Generates independent samples from the underlying distribution.
-
Simple Egg Box Distribution¶
-
class
pints.toy.
SimpleEggBoxLogPDF
(sigma=2, r=4)[source]¶ Two-dimensional multimodal Gaussian distribution, with four more-or-less independent modes, each centered in a different quadrant.
Extends
pints.toy.ToyLogPDF
.Parameters: - sigma (float) – The variance of each mode.
- r (float) – Determines the positions of the modes, which will be located at
(d, d)
,(-d, d)
,(-d, -d)
, and(d, -d)
, whered = r * sigma
.
-
distance
(samples)[source]¶ Calculates
approximate mode-wise KL divergence
.
-
kl_divergence
(samples)[source]¶ Calculates a heuristic score for how well a given set of samples matches this LogPDF’s underlying distribution, based on Kullback-Leibler divergence of the individual modes. This only works well if the modes are nicely separated, i.e. for larger values of
r
.
-
sample
(n)[source]¶ See
ToyLogPDF.sample()
.
Simple Harmonic Oscillator model¶
-
class
pints.toy.
SimpleHarmonicOscillatorModel
[source]¶ Simple harmonic oscillator model for a particle that experiences a force in proportion to its displacement from an equilibrium position, and, in addition, a friction force. The system’s behaviour is determined by a second order ordinary differential equation (from Newton’s second law):
\[\frac{d^2y}{dt^2} = -y(t) - \theta \frac{dy(t)}{dt}\]Here it has been assumed that the particle has unit mass and that the restoring force has constant of proportionality equal to 1.
The model has three parameters: the initial position of the particle,
y(0)
, its initial momentum,dy/dt(0)
and the magnitude of the friction force,theta
.Extends
pints.ForwardModel
,pints.toy.ToyModel
.References
[1] https://en.wikipedia.org/wiki/Simple_harmonic_motion -
n_outputs
()¶ Returns the number of outputs this model has. The default is 1.
-
SIR Epidemiology model¶
-
class
pints.toy.
SIRModel
(y0=None)[source]¶ The SIR model of infectious disease models the number of susceptible (S), infected (I), and recovered (R) people in a population [1], [2].
The particular model given here is analysed in [3],_ and is described by the following three-state ODE:
\[ \begin{align}\begin{aligned}\dot{S} = -\gamma S I\\\dot{I} = \gamma S I - v I\\\dot{R} = v I\end{aligned}\end{align} \]Where the parameters are
gamma
(infection rate), andv
, recovery rate. In addition, we assume the initial value of S,S0
, is unknwon, leading to a three parameter model(gamma, v, S0)
.The number of infected people and recovered people are observable, making this a 2-output system. S can be thought of as an unknown number of susceptible people within a larger population.
The model does not account for births and deaths, which are assumed to happen much slower than the spread of the (non-lethal) disease.
Real data is included via
suggested_values()
, which was taken from [3], [4], [5].Extends
pints.ForwardModel
, pints.toy.ToyModel.Parameters: y0 – The system’s initial state, must have 3 entries all >=0. References
[1] A Contribution to the Mathematical Theory of Epidemics. Kermack, McKendrick (1927) Proceedings of the Royal Society A. https://doi.org/10.1098/rspa.1927.0118 [2] https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology [3] (1, 2) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Toni, Welch, Strelkowa, Ipsen, Stumpf (2009) J. R. Soc. Interface. https://doi.org/10.1098/rsif.2008.0172 [4] (1, 2) A mathematical model of common-cold epidemics on Tristan da Cunha. Hammond, Tyrrell (1971) Epidemiology & Infection. https://doi.org/10.1017/S0022172400021677 [5] (1, 2) Common colds on Tristan da Cunha. Shybli, Gooch, Lewis, Tyrell (1971) Epidemiology & Infection. https://doi.org/10.1017/S0022172400021483
Stochastic degradation model¶
-
class
pints.toy.
StochasticDegradationModel
(initial_molecule_count=20)[source]¶ Stochastic degradation model of a single chemical reaction starting from an initial molecule count \(A(0)\) and degrading to 0 with a fixed rate \(k\):
\[A \xrightarrow{k} 0\]Simulations are performed using Gillespie’s algorithm [1], [2]:
- Sample a random value \(r\) from a uniform distribution
\[r \sim U(0,1)\]- Calculate the time \(\tau\) until the next single reaction as
\[\tau = \frac{-\ln(r)}{A(t) k}\]- Update the molecule count \(A\) at time \(t + \tau\) as:
\[A(t + \tau) = A(t) - 1\]- Return to step (1) until the molecule count reaches 0
The model has one parameter, the rate constant \(k\).
Extends
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: initial_molecule_count – The initial molecule count \(A(0)\). References
[1] A Practical Guide to Stochastic Simulations of Reaction Diffusion Processes. Erban, Chapman, Maini (2007). arXiv:0704.1908v2 [q-bio.SC] https://arxiv.org/abs/0704.1908 [2] A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Gillespie (1976). Journal of Computational Physics https://doi.org/10.1016/0021-9991(76)90041-3 -
interpolate_mol_counts
(time, mol_count, output_times)[source]¶ Takes raw times and inputs and mol counts and outputs interpolated values at output_times
-
mean
(parameters, times)[source]¶ Returns the deterministic mean of infinitely many stochastic simulations, which follows \(A(0) \exp(-kt)\).
-
n_outputs
()¶ Returns the number of outputs this model has. The default is 1.
Stochastic Logistic Model¶
-
class
pints.toy.
StochasticLogisticModel
(initial_molecule_count=50)[source]¶ This model describes the growth of a population of individuals, where the birth rate per capita, initially \(b_0\), decreases to \(0\) as the population size, \(\mathcal{C}(t)\), starting from an initial population size, \(n_0\), approaches a carrying capacity, \(k\). This process follows a rate according to [1]
\[A \xrightarrow{b_0(1-\frac{\mathcal{C}(t)}{k})} 2A.\]The model is simulated using the Gillespie stochastic simulation algorithm [2], [3].
Extends:
pints.ForwardModel
,pints.toy.ToyModel
.Parameters: initial_molecule_count (float) – Sets the initial population size \(n_0\). References
[1] Simpson, M. et al. 2019. Process noise distinguishes between indistinguishable population dynamics. bioRxiv. https://doi.org/10.1101/533182 [2] Gillespie, D. 1976. A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions. Journal of Computational Physics. 22 (4): 403-434. https://doi.org/10.1016/0021-9991(76)90041-3 [3] Erban R. et al. 2007. A practical guide to stochastic simulations of reaction-diffusion processes. arXiv. https://arxiv.org/abs/0704.1908v2 -
mean
(parameters, times)[source]¶ Computes the deterministic mean of infinitely many stochastic simulations with times \(t\) and parameters (\(b\), \(k\)), which follows: \(\frac{kC(0)}{C(0) + (k - C(0)) \exp(-bt)}\).
Returns an array with the same length as times.
-
n_outputs
()¶ Returns the number of outputs this model has. The default is 1.
-
Twisted Gaussian distribution¶
-
class
pints.toy.
TwistedGaussianLogPDF
(dimension=10, b=0.1, V=100)[source]¶ Twisted multivariate Gaussian ‘banana’ with un-normalised density [1]:
\[p(x_1, x_2, x_3, ..., x_n) \propto \pi(\phi(x_1, x_2, x_2, ..., x_n))\]where pi is the multivariate Gaussian density with covariance matrix \(\Sigma=\text{diag}(100, 1, 1, ..., 1)\) and
\[\phi(x_1,x_2,x_3,...,x_n) = (x_1, x_2 + b x_1^2 - V b, x_3, ..., x_n),\]Extends
pints.toy.ToyLogPDF
.Parameters: - dimension (int) – Problem dimension (
n
), must be 2 or greater. - b (float) – “Bananicity”:
b = 0.01
induces mild non-linearity in target density, while non-linearity forb = 0.1
is high. Must be greater than or equal to zero. - V (float) – Offset (see equation).
References
[1] Adaptive proposal distribution for random walk Metropolis algorithm Haario, Saksman, Tamminen (1999) Computational Statistics. https://doi.org/10.1007/s001800050022 -
distance
(samples)[source]¶ Returns
approximate Kullback-Leibler divergence
of samples from underyling distribution.
-
kl_divergence
(samples)[source]¶ Calculates the approximate Kullback-Leibler divergence between a given list of samples and the distribution underlying this LogPDF.
The returned value is (near) zero for perfect sampling, and then increases as the error gets larger.
See: https://en.wikipedia.org/wiki/Kullback-Leibler_divergence
- dimension (int) – Problem dimension (
Transformations¶
Transformation
objects provide methods to transform between different
representations of a parameter space; for example from a “model space”
(\(p\)) where parameters have units and some physical counterpart to
a “search space” (e.g. \(q = \log(p)\)) where parameters are
non-dimensionalised and less-recognisable to the modeller.
The transformed space may in many cases prove simpler to work with for
inference: leading to more effective and efficient optimisation and sampling.
To perform optimisation or sampling in a transformed space, users can choose to
write their pints.ForwardModel
in “search space” directly, but the
issue with this is that we will no longer be correctly inferring the “model
parameters”. An alternative is to write the ForwardModel
in model
parameters, and pass a Transformation
object to e.g. an
OptimisationController
or MCMCController
. Using the
Transformation
object ensures users get the correct statistics about
the model parameters (not the search space parameters).
Parameter transformation can be useful in many situations, for example
transforming from a constrained parameter space to an unconstrained search
space using RectangularBoundariesTransformation
leads to crucial
performance improvements for many methods.
Example:
transform = pints.LogTransformation(n_parameters)
mcmc = pints.MCMCController(log_posterior, n_chains, x0, transform=transform)
Overview:
ComposedTransformation
IdentityTransformation
LogitTransformation
LogTransformation
RectangularBoundariesTransformation
ScalingTransformation
Transformation
TransformedBoundaries
TransformedErrorMeasure
TransformedLogPDF
TransformedLogPrior
-
class
pints.
ComposedTransformation
(*transformations)[source]¶ N-dimensional
Transformation
composed of one or more other \(N_i\)-dimensional sub-transformations, so that \(\sum _i N_i = N\).The dimensionality of the individual transformations does not have to be the same, i.e. \(N_i\neq N_j\) is allowed.
For example, a composed transformation:
t = pints.ComposedTransformation( transformation_1, transformation_2, transformation_3)
where
transformation_1
,transformation_2
, andtransformation_3
have dimension 1, 2 and 1 respectively, will have dimension N=4.The evaluation and transformation of the composed transformations assume that the input transformations are all independent from each other.
The input parameters of the
ComposedTransformation
are ordered in the same way as the individual tranforms for the parameter vector. In the above example the transformation may be performed byt.to_search(p)
, where:p = [parameter_1_for_transformation_1, parameter_1_for_transformation_2, parameter_2_for_transformation_2, parameter_1_for_transformation_3]
Extends
Transformation
.-
convert_boundaries
(boundaries)¶ Returns a transformed boundaries class.
-
convert_covariance_matrix
(C, q)¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_error_measure
(error_measure)¶ Returns a transformed error measure class.
-
convert_log_pdf
(log_pdf)¶ Returns a transformed log-PDF class.
-
convert_log_prior
(log_prior)¶ Returns a transformed log-prior class.
-
convert_standard_deviation
(s, q)¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
-
class
pints.
IdentityTransformation
(n_parameters)[source]¶ :class`Transformation` that returns the input (untransformed) parameters, i.e. the search space under this transformation is the same as the model space. And its Jacobian matrix is the identity matrix.
Extends
Transformation
.Parameters: n_parameters – Number of model parameters this transformation is defined over. -
convert_boundaries
(boundaries)¶ Returns a transformed boundaries class.
-
convert_covariance_matrix
(C, q)¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_error_measure
(error_measure)¶ Returns a transformed error measure class.
-
convert_log_pdf
(log_pdf)¶ Returns a transformed log-PDF class.
-
convert_log_prior
(log_prior)¶ Returns a transformed log-prior class.
-
convert_standard_deviation
(s, q)¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
-
class
pints.
LogitTransformation
(n_parameters)[source]¶ Logit (or log-odds) transformation of the model parameters.
The transformation is given by
\[q = \text{logit}(p) = \log(\frac{p}{1 - p}),\]where \(p\) is the model parameter vector and \(q\) is the search space vector.
The Jacobian adjustment of the logit transformation is given by
\[|\frac{d}{dq} \text{logit}^{-1}(q)| = \text{logit}^{-1}(q) \times (1 - \text{logit}^{-1}(q)).\]And its derivative is given by
\[\frac{d^2}{dq^2} \text{logit}^{-1}(q) = \frac{d f^{-1}(q)}{dq} \times \left( \frac{\exp(-q) - 1}{exp(-q) + 1} \right).\]The first order derivative of the log determinant of the Jacobian is
\[\frac{d}{dq} \log(|J(q)|) = 2 \times \exp(-q) \times \text{logit}^{-1}(q) - 1.\]Extends
Transformation
.Parameters: n_parameters – Number of model parameters this transformation is defined over. -
convert_boundaries
(boundaries)¶ Returns a transformed boundaries class.
-
convert_covariance_matrix
(C, q)¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_error_measure
(error_measure)¶ Returns a transformed error measure class.
-
convert_log_pdf
(log_pdf)¶ Returns a transformed log-PDF class.
-
convert_log_prior
(log_prior)¶ Returns a transformed log-prior class.
-
convert_standard_deviation
(s, q)¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
-
class
pints.
LogTransformation
(n_parameters)[source]¶ Logarithm transformation of the model parameters:
The transformation is given by
\[q = \log(p),\]where \(p\) is the model parameter vector and \(q\) is the search space vector.
The Jacobian adjustment of the log transformation is given by
\[|\frac{d}{dq} \exp(q)| = \exp(q).\]And its derivative is given by
\[\frac{d^2}{dq^2} \exp(q) = \exp(q).\]The first order derivative of the log determinant of the Jacobian is
\[\frac{d}{dq} \log(|J(q)|) = 1.\]Extends
Transformation
.Parameters: n_parameters – Number of model parameters this transformation is defined over. -
convert_boundaries
(boundaries)¶ Returns a transformed boundaries class.
-
convert_covariance_matrix
(C, q)¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_error_measure
(error_measure)¶ Returns a transformed error measure class.
-
convert_log_pdf
(log_pdf)¶ Returns a transformed log-PDF class.
-
convert_log_prior
(log_prior)¶ Returns a transformed log-prior class.
-
convert_standard_deviation
(s, q)¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
-
class
pints.
RectangularBoundariesTransformation
(lower_or_boundaries, upper=None)[source]¶ A generalised version of the logit transformation for the model parameters, which transforms an interval or rectangular boundaries \([a, b)\) to all real number.
The transformation is given by
\[q = f(p) = \text{logit}\left(\frac{p - a}{b - p}\right) = \log(p - a) - \log(b - p),\]where \(p\) is the model parameter vector and \(q\) is the search space vector. Note that
LogitTransformation
is a special case where \(a = 0\) and \(b = 1\).The Jacobian adjustment of the transformation is given by
\[|\frac{d}{dq} f^{-1}(q)| = \frac{b - a}{\exp(q) (1 + \exp(-q)) ^ 2}.\]And its derivative is given by
\[\frac{d^2}{dq^2} f^{-1}(q) = \frac{d f^{-1}(q)}{dq} \times \left( \frac{\exp(-q) - 1}{exp(-q) + 1} \right).\]The log-determinant of the Jacobian matrix is given by
\[\log|\frac{d}{dq} f^{-1}(q)| = \sum_i \left( \log(b_i - a_i) - 2 \times \log(1 + \exp(-q_i)) - q_i \right)\]The first order derivative of the log determinant of the Jacobian is
\[\frac{d}{dq} \log(|J(q)|) = 2 \times \exp(-q) \times \text{logit}^{-1}(q) - 1.\]For example, to create a transformation with \(p_1 \in [0, 4)\), \(p_2 \in [1, 5)\), and \(p_3 \in [2, 6)\) use either:
transformation = pints.RectangularBoundariesTransformation([0, 1, 2], [4, 5, 6])
or:
boundaries = pints.RectangularBoundaries([0, 1, 2], [4, 5, 6]) transformation = pints.RectangularBoundariesTransformation(boundaries)
Extends
Transformation
.-
convert_boundaries
(boundaries)¶ Returns a transformed boundaries class.
-
convert_covariance_matrix
(C, q)¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_error_measure
(error_measure)¶ Returns a transformed error measure class.
-
convert_log_pdf
(log_pdf)¶ Returns a transformed log-PDF class.
-
convert_log_prior
(log_prior)¶ Returns a transformed log-prior class.
-
convert_standard_deviation
(s, q)¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
-
class
pints.
ScalingTransformation
(scalings)[source]¶ Scaling transformation scales the input parameters by multiplying with an array
scalings
element-wisely. And its Jacobian matrix is a diagonal matrix with the values of1 / scalings
on the diagonal.Extends
Transformation
.-
convert_boundaries
(boundaries)¶ Returns a transformed boundaries class.
-
convert_covariance_matrix
(C, q)¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_error_measure
(error_measure)¶ Returns a transformed error measure class.
-
convert_log_pdf
(log_pdf)¶ Returns a transformed log-PDF class.
-
convert_log_prior
(log_prior)¶ Returns a transformed log-prior class.
-
convert_standard_deviation
(s, q)¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
-
class
pints.
Transformation
[source]¶ Abstract base class for objects that provide transformations between two parameter spaces: the model parameter space and a search space.
If
trans
is an instance of aTransformation
class, you can apply the transformation of a parameter vector from the model spacep
to the search spaceq
by usingq = trans.to_search(p)
and the inverse by usingp = trans.to_model(q)
.References
[1] (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14) How to Obtain Those Nasty Standard Errors From Transformed Data. Erik Jorgensen and Asger Roer Pedersen. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9023 [2] The Matrix Cookbook. Kaare Brandt Petersen and Michael Syskind Pedersen. 2012. -
convert_covariance_matrix
(C, q)[source]¶ Converts a convariance matrix
C
from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
-
convert_standard_deviation
(s, q)[source]¶ Converts standard deviation
s
, either a scalar or a vector, from the model space to the search space around a parameter vectorq
provided in the search space.The transformation is performed using a first order linear approximation [1] with the Jacobian \(\mathbf{J}\):
\[\begin{split}\mathbf{C}(\boldsymbol{q}) &= \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \mathbf{C}(\boldsymbol{p}) \left( \frac{d\boldsymbol{g}(\boldsymbol{p})}{d\boldsymbol{p}} \right)^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2) \\ &= \mathbf{J}^{-1}(\boldsymbol{q}) \mathbf{C}(\boldsymbol{p}) (\mathbf{J}^{-1}(\boldsymbol{q}))^T + \mathcal{O}(\mathbf{C}(\boldsymbol{p})^2).\end{split}\]Using the property that \(\mathbf{J}^{-1} = \frac{d\boldsymbol{g}}{d\boldsymbol{p}}\), from the inverse function theorem, i.e. the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
To transform the provided standard deviation \(\boldsymbol{s}\), we assume the covariance matrix \(\mathbf{C}(\boldsymbol{p})\) above is a diagonal matrix with \(\boldsymbol{s}^2\) on the diagonal, such that
\[s_i(\boldsymbol{q}) = \left( \mathbf{J}^{-1} (\mathbf{J}^{-1})^T \right)^{1/2}_{i, i} s_i(\boldsymbol{p}).\]
-
elementwise
()[source]¶ Returns True if the transformation is element-wise.
An element-wise transformation is a transformation \(\boldsymbol{f}\) that can be carried out element by element: for a parameter vector \(\boldsymbol{p}\) in the model space and a parameter vector \(\boldsymbol{q}\) in the search space, then it has
\[q_i = f(p_i),\]where \(x_i\) denotes the \(i^{\text{th}}\) element of the vector \(\boldsymbol{x}\), as opposed to a transformation in which multiple elements are combined to create the transformed elements.
-
jacobian
(q)[source]¶ Returns the Jacobian matrix of the transformation calculated at the parameter vector
q
in the search space. For a transformation \(\boldsymbol{q} = \boldsymbol{f}(\boldsymbol{p})\), the Jacobian matrix is defined as\[\mathbf{J} = \left[\frac{\partial \boldsymbol{f}^{-1}}{\partial q_1} \quad \frac{\partial \boldsymbol{f}^{-1}}{\partial q_2} \quad \cdots \right].\]This is an optional method. It is needed when transformation of standard deviation
Transformation.convert_standard_deviation()
or covariance matrixTransformation.convert_covariance_matrix()
is needed, or whenevaluateS1()
is needed.
-
jacobian_S1
(q)[source]¶ Computes the Jacobian matrix of the transformation calculated at the parameter vector
q
in the search space, and returns the result along with the partial derivatives of the result with respect to the parameters.The returned data is a tuple
(S, S')
whereS
is an_parameters
byn_parameters
matrix andS'
is a sequence ofn_parameters
matrices.This is an optional method. It is needed when the transformation is used along with a non-element-wise transformation in
ComposedTransformation
.
-
log_jacobian_det
(q)[source]¶ Returns the logarithm of the absolute value of the determinant of the Jacobian matrix of the transformation
Transformation.jacobian()
calculated at the parameter vectorq
in the search space.The default implementation numerically calculates the determinant of the full matrix which only works if the optional method
Transformation.jacobian()
is implemented. If there is an analytic expression for the specific transformation, a reimplementation of this method may be preferred.This is an optional method. It is needed when transformation is performed on
LogPDF
and/or that requiresevaluateS1()
; e.g. not necessary if it’s used forErrorMeasure
withoutErrorMeasure.evaluateS1()
.
-
log_jacobian_det_S1
(q)[source]¶ Computes the logarithm of the absolute value of the determinant of the Jacobian, and returns the result plus the partial derivatives of the result with respect to the parameters.
The returned data is a tuple
(S, S')
whereS
is a scalar value andS'
is a sequence of lengthn_parameters
.Note that the derivative returned is of the log of the determinant of the Jacobian, so
S' = d/dq log(|det(J(q))|)
, evaluated at input.The absolute value of the determinant of the Jacobian is provided by
Transformation.log_jacobian_det()
. The default implementation calculates the derivatives of the log-determinant using [2]\[\frac{d}{dq} \log(|det(\mathbf{J})|) = trace(\mathbf{J}^{-1} \frac{d}{dq} \mathbf{J}),\]where the derivative of the Jacobian matrix is provided by
Transformation.jacobian_S1()
and the matrix inversion is numerically calculated. If there is an analytic expression for the specific transformation, a reimplementation of this method may be preferred.This is an optional method. It is needed when transformation is performed on
LogPDF
and that requiresevaluateS1()
.
-
-
class
pints.
TransformedBoundaries
(boundaries, transformation)[source]¶ A
pints.Boundaries
that accepts parameters in a transformed search space.Extends
pints.Boundaries
.Parameters: - boundaries – A
pints.Boundaries
. - transformation – A
pints.Transformation
.
-
check
(q)[source]¶ See
Boundaries.check()
.
-
sample
(n=1)¶ Returns
n
random samples from within the boundaries, for example to use as starting points for an optimisation.The returned value is a NumPy array with shape
(n, d)
wheren
is the requested number of samples, andd
is the dimension of the parameter space these boundaries are defined on.Note that implementing :meth:`sample()` is optional, so some boundary types may not support it.
Parameters: n (int) – The number of points to sample
- boundaries – A
-
class
pints.
TransformedErrorMeasure
(error, transformation)[source]¶ A
pints.ErrorMeasure
that accepts parameters in a transformed search space.For the first order sensitivity of a
pints.ErrorMeasure
\(E\) and apints.Transformation
\(\boldsymbol{q} = \boldsymbol{f}(\boldsymbol{p})\), the transformation is done using\[\begin{split}\frac{\partial E(\boldsymbol{q})}{\partial q_i} &= \frac{\partial E(\boldsymbol{f}^{-1}(\boldsymbol{q}))}{\partial q_i}\\ &= \sum_l \frac{\partial E(\boldsymbol{p})}{\partial p_l} \frac{\partial p_l}{\partial q_i}.\end{split}\]Extends
pints.ErrorMeasure
.Parameters: - error – A
pints.ErrorMeasure
. - transformation – A
pints.Transformation
.
- error – A
-
class
pints.
TransformedLogPDF
(log_pdf, transformation)[source]¶ A
pints.LogPDF
that accepts parameters in a transformed search space.When a
TransformedLogPDF
object (initialised with apints.LogPDF
of \(\pi(\boldsymbol{p})\) and aTransformation
of \(\boldsymbol{q} = \boldsymbol{f}(\boldsymbol{p})\)) is called with a vector argument \(\boldsymbol{q}\) in the search space, it returns \(\log(\pi(\boldsymbol{q}))\) where \(\pi(\boldsymbol{q})\) is the transformed unnormalised PDF of the input PDF, using\[\pi(\boldsymbol{q}) = \pi(\boldsymbol{f}^{-1}(\boldsymbol{q})) \,\, |det(\mathbf{J}(\boldsymbol{f}^{-1}(\boldsymbol{q})))|.\]\(\mathbf{J}\) is the Jacobian matrix:
\[\mathbf{J} = \left[\frac{\partial \boldsymbol{f}^{-1}}{\partial q_1} \quad \frac{\partial \boldsymbol{f}^{-1}}{\partial q_2} \quad \cdots \right].\]Hence
\[\log(\pi(\boldsymbol{q})) = \log(\pi(\boldsymbol{f}^{-1}(\boldsymbol{q}))) + \log(|det(\mathbf{J}(\boldsymbol{f}^{-1}(\boldsymbol{q})))|).\]For the first order sensitivity, the transformation is done using
\[\frac{\partial \log(\pi(\boldsymbol{q}))}{\partial q_i} = \frac{\partial \log(\pi(\boldsymbol{f}^{-1}(\boldsymbol{q})))}{\partial q_i} + \frac{\partial \log(|det(\mathbf{J})|)}{\partial q_i}.\]The first term can be calculated using the chain rule
\[\frac{\partial \log(\pi(\boldsymbol{f}^{-1}(\boldsymbol{q})))}{\partial q_i} = \sum_l \frac{\partial \log(\pi(\boldsymbol{p}))}{\partial p_l} \frac{\partial p_l}{\partial q_i}.\]Extends
pints.LogPDF
.Parameters: - log_pdf – A
pints.LogPDF
. - transformation – A
pints.Transformation
.
-
evaluateS1
(q)[source]¶ See
LogPDF.evaluateS1()
.
- log_pdf – A
-
class
pints.
TransformedLogPrior
(log_prior, transformation)[source]¶ A
pints.LogPrior
that accepts parameters in a transformed search space.Extends
pints.LogPrior
,pints.TransformedLogPDF
.Parameters: - log_prior – A
pints.LogPrior
. - transformation – A
pints.Transformation
.
-
cdf
(x)¶ Returns the cumulative density function at point(s)
x
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_from_unit_cube
(u)¶ Converts samples
u
uniformly drawn from the unit cube into those drawn from the prior space, typically by transforming usingLogPrior.icdf()
.u
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
convert_to_unit_cube
(x)¶ Converts samples from the prior
x
to be drawn uniformly from the unit cube, typically by transforming usingLogPrior.cdf()
.x
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
evaluateS1
(q)¶ See
LogPDF.evaluateS1()
.
-
icdf
(p)¶ Returns the inverse cumulative density function at cumulative probability/probabilities
p
.p
should be ann x d
array, wheren
is the number of input samples andd
is the dimension of the parameter space.
-
mean
()¶ Returns the analytical value of the expectation of a random variable distributed according to this
LogPDF
.
-
n_parameters
()¶
- log_prior – A
Utilities¶
Overview:
-
class
pints.
Loggable
[source]¶ Interface for classes that can log to a
Logger
.-
_log_init
(logger)[source]¶ Adds this
Loggable's
fields to aLogger
.
-
_log_write
(logger)[source]¶ Logs data for each of the fields specified in
_log_init()
.
-
-
class
pints.
Logger
[source]¶ Logs numbers to screen and/or a file.
Example
log = pints.Logger() log.add_counter('id', width=2) log.add_float('Length') log.log(1, 1.23456) log.log(2, 7.8901)
-
add_counter
(name, width=5, max_value=None, file_only=False)[source]¶ Adds a field for positive integers.
Returns this
Logger
object.Parameters: - name (str) – This field’s name. Will be displayed in the header.
- width (int) – A hint for the width of this column. If numbers exceed this width layout will break, but no information will be lost.
- max_value (int|None) – A hint for the maximum number this field will need to display.
- file_only (boolean) – If set to
True
, this field will not be shown on screen.
-
add_float
(name, width=9, file_only=False)[source]¶ Adds a field for floating point number.
Returns this
Logger
object.Parameters: - name (str) – This field’s name. Will be displayed in the header.
- width (int) – A hint for the field’s width. The minimum width is 7.
- file_only (boolean) – If set to
True
, this field will not be shown on screen.
-
add_int
(name, width=5, file_only=False)[source]¶ Adds a field for a (positive or negative) integer.
Returns this
Logger
object.Parameters: - name (str) – This field’s name. Will be displayed in the header.
- width (int) – A hint for the width of this column. If numbers exceed this width layout will break, but no information will be lost.
- file_only (boolean) – If set to
True
, this field will not be shown on screen.
-
add_long_float
(name, file_only=False)[source]¶ Adds a field for a maximum precision floating point number.
Returns this
Logger
object.Parameters: - name (str) – This field’s name. Will be displayed in the header.
- file_only (boolean) – If set to
True
, this field will not be shown on screen.
-
add_string
(name, width, file_only=False)[source]¶ Adds a field showing (at most
width
characters of) string values.Returns this
Logger
object.Parameters: - name (str) – This field’s name. Will be displayed in the header.
- width (int) – The maximum width for strings to display.
- file_only (boolean) – If set to
True
, this field will not be shown on screen.
-
add_time
(name, file_only=False)[source]¶ Adds a field showing a formatted time (given in seconds).
Returns this
Logger
object.Parameters: - name (str) – This field’s name. Will be displayed in the header.
- file_only (boolean) – If set to
True
, this field will not be shown on screen.
-
-
class
pints.
Timer
(output=None)[source]¶ Provides accurate timing.
Example
timer = pints.Timer() print(timer.format(timer.time()))
-
pints.
matrix2d
(x)[source]¶ Copies
x
and returns a 2d read-only NumPy array of floats with shape(m, n)
.Raises a
ValueError
ifx
has an incompatible shape.
-
pints.
vector
(x)[source]¶ Copies
x
and returns a 1d read-only NumPy array of floats with shape(n,)
.Raises a
ValueError
ifx
has an incompatible shape.
-
pints.
sample_initial_points
(function, n_points, random_sampler=None, boundaries=None, max_tries=50, parallel=False, n_workers=None)[source]¶ Samples
n_points
parameter values to use as starting points in a sampling or optimisation routine on the givenfunction
.How the initial points are determined depends on the arguments supplied. In order of precedence:
- If a method
random_sampler
is provided then this will be used to draw the random samples. - If no sampler method is given but
function
is aLogPosterior
then the methodfunction.log_prior().sample()
will be used. - If no sampler method is supplied and
function
is not aLogPosterior
and ifboundaries
are provided then the methodboundaries.sample()
will be used to draw samples.
A
ValueError
is raised if none of the above options are available.Each sample
x
is tested to ensure thatfunction(x)
returns a finite result withinboundaries
if these are supplied. If not, a new sample will be drawn. This is repeated at mostmax_tries
times, after which an error is raised.Parameters: - function – A
pints.ErrorMeasure
or apints.LogPDF
that evaluates points in the parameter space. If the latter, it is optional thatfunction
be of typeLogPosterior
. - n_points (int) – The number of initial values to generate.
- random_sampler – A function that when called returns draws from a probability
distribution of the same dimensionality as
function
. The only argument to this function should be an integer specifying the number of draws. - boundaries – An optional set of boundaries on the parameter space of class
pints.Boundaries
. - max_tries (int) – Number of attempts to find a finite initial value across all
n_points
. By default this is 50 per point. - parallel (bool) – Whether to evaluate
function
in parallel (defaults to False). - n_workers (int) – Number of workers on which to run parallel evaluation.
- If a method
Hierarchy of methods¶
Pints contains different types of methods, that can be roughly arranged into a hierarchy, as follows.
Sampling¶
MCMC without gradients
MetropolisRandomWalkMCMC
, works on anyLogPDF
.- Metropolis-Hastings
- Adaptive methods
AdaptiveCovarianceMC
, works on anyLogPDF
.
PopulationMCMC
, works on anyLogPDF
.- Differential evolution methods
DifferentialEvolutionMCMC
, works on anyLogPDF
.DreamMCMC
, works on anyLogPDF
.EmceeHammerMCMC
, works on anyLogPDF
.
Nested sampling
NestedEllipsoidSampler
, requires aLogPDF
and aLogPrior
that can be sampled from.NestedRejectionSampler
, requires aLogPDF
and aLogPrior
that can be sampled from.
- Particle based samplers
- SMC
- Likelihood free sampling (Need distance between data and states, e.g. least squares?)
- ABC-MCMC
- ABC-SMC
- 1st order sensitivity MCMC samplers (Need derivatives of
LogPDF
)Metropolis-Adjusted Langevin Algorithm (MALA)
, works on anyLogPDF
that provides 1st order sensitivities.Hamiltonian Monte Carlo
, works on anyLogPDF
that provides 1st order sensitivities.- NUTS
- Differential geometric methods (Need Hessian of
LogPDF
)- smMALA
- RMHMC
Problems in Pints¶
Pints defines single
and
multi-output
problem classes that wrap around
models and data, and over which error measures
or
log-likelihoods
can be defined.
To find the appropriate type of Problem to use, see the overview below:
- Systems with a single observable output
- Single data set: Use a
SingleOutputProblem
and any of the appropriate error measures or log-likelihoods - Multiple, independent data sets: Define multiple
SingleOutputProblems
and an error measure / log-likelihood on each, and then combine using e.g.SumOfErrors
orSumOfIndependentLogPDFs
.
- Single data set: Use a
- Systems with multiple observable outputs
- Single data set: Use a
MultiOutputProblem
and any of the appropriate error measures or log-likelihoods
- Single data set: Use a