Statistics

Samples

class desilike.statistics.samples.Samples(latex={}, **kwargs)[source]

Class for storing samples of parameters.

Initialize a sample of parameters.

Parameters:
  • latex (dict or None, optional) – LaTeX expression for parameters. Default is None.

  • **kwargs – Samples of parameters. Each sample must have the same length.

Raises:

ValueError – If not all samples have the same length.

append(samples)[source]

Append a sample, i.e., add additional rows.

Parameters:

samples (desilike.statistics.Samples) – Samples to add. Must have the same keys as the current samples.

Raises:

ValueError – If keys do not match.

classmethod concatenate(samples)[source]

Concatenate samples.

Parameters:

samples (list of desilike.Samples) – Samples to concatenate.

Returns:

combined – Concatenated samples.

Return type:

desilike.Samples

copy()[source]

Return a copy of the samples object.

covariance(params=None)[source]

Compute the covariance of the sample.

Parameters:

params (list or None, optional) – Keys to compute the covariance for. If None, all keys are used. Default is None.

Returns:

cov – Covariance of the samples. The ordering is the same as keys or self.keys if keys is None.

Return type:

numpy.ndarray

get_flag(flag, param)[source]

Get the value of the status flag for all samples.

Parameters:
  • flag (str) – Status flag.

  • param (str or None, optional) – The parameter to which the flag applies.

Returns:

value – Boolean array contain the status flag for each sample.

Return type:

numpy.ndarray

Raises:

ValueError – If the status is not known, the parameter does not exist for this sample, or the flag has not been set for this specific combination of status and parameter.

getdist(params=None)[source]

Convert the sample into a getdist.MCSamples instance.

Parameters:

params (array-like or None, optional) – List of parameters to convert. If None, all parameters are included. Default is None.

Returns:

Samples converted to getdist format.

Return type:

getdist.MCSamples

Raises:

ImportError – If getdist is not installed.

interval(param, threshold, posterior=True)[source]

Get interval where likelihood/posterior is above a threshold.

Parameters:
  • param (str) – Paramater for which to get the interval.

  • threshold (float) – Threshold such that the likelihood/posterior is at least its maximum plus the threshold. Must be negative.

  • posterior (bool, optional) – If True, compute the intervals for the (log) posterior. If False, the intervals for the (log) likelihood are returned. Default is True.

Returns:

bounds – List of pairs of lower and upper bound. For unimodal likelihood, this should typically be a single pair. If a lower and/or upper bound cannot be determined inside the range sampled, the value will be np.nan.

Return type:

list

Raises:
  • ValueError – If threshold is not negative.

  • RuntimeError – If the likelihood/posterior is identical to the maximum plus the threshold over some range instead of specific points.

property keys

Return the keys of the sample as a list of strings.

classmethod load(filepath)[source]

Read samples from a file.

This function supports npz, and hdf5 file endings.

Parameters:

filepath (str or Path) – Where to read samples from.

Raises:

ValueError – If file ending is not supported or file ending is hdf5 but h5py is not installed.

mean(params=None, return_as_dict=False)[source]

Compute the mean of the sample.

Parameters:
  • params (list or None, optional) – Keys to compute the mean for. If None, all keys are used. Default is None.

  • return_as_dict (bool, optional) – If True, return a dictionary. Otherwise, return a numpy array. Default is False.

Returns:

means – Means of the samples.

Return type:

list or dict

property params

Return the parameters of the sample as a list of strings.

profile_interpolator(param, posterior=True)[source]

Get a cubic profile interpolator.

Parameters:
  • param (str or list) – Parameter(s) for which to compute the interpolator.

  • posterior (bool, optional) – If True, get a profile for the (log) posterior. If False, a profile for the (log) likelihood is returned. Default is True.

Returns:

interp – Profile interpolator.

Return type:

scipy.interpolate.CubicSpline or scipy.interpolate.RegularGridInterpolator

Raises:

ValueError – If there are not enough points to compute an interpolation.

save(filepath, keys=None)[source]

Save samples to a file.

This function supports csv, npz, and hdf5 file endings. csv is typically used for sharing results outside of desilike.

Parameters:
  • filepath (str or Path) – Where to save samples.

  • keys (list or None, optional) – Keys to write. If None, all keys are used. Default is None.

Raises:

ValueError – If file ending is not supported, file ending is hdf5 but h5py is not installed, or parameters to be saved are multidimensional and the output is csv.

set_flag(flag, param, value)[source]

Get the value of the status flag for all samples.

Parameters:
  • flag (str) – Status flag.

  • param (str or None, optional) – The parameter to which the flag applies.

  • value (numpy.ndarray) – Boolean array contain the status flag for each sample.

Raises:

ValueError – If the status is not known or the parameter does not exist for this sample.

tabulate(keys=None, use_latex=False, **kwargs)[source]

Use the tabulate package to print the table.

Parameters:
  • keys (array-like or None, optional) – List of keys to print. If None, all columns are printed. Default is None.

  • use_latex (bool, optional) – Whether to use the LaTeX names in the columns headers. Default is False.

  • **kwargs – Additional keyword arguments passed to tabulate.tabulate().

Returns:

table – Table as plain text.

Return type:

str

Raises:

ImportError – If tabulate is not installed.

property weight

Return the (normalized) weight of each sample.

Diagnostics

Module implementing diagnostics for Markov chains.

desilike.statistics.diagnostics.gelman_rubin(chains, n_splits=None, keys=None)[source]

Estimate the Gelman-Rubin statistic.

Parameters:
  • chains (desilike.statistics.samples.Samples, list of desilike.statistics.samples.Samples, or numpy.ndarray) –

    Chains for which to compute the Gelman-Rubin statistic. If a numpy array, the expected shapes are as follows:

    • (n_steps,) if one-dimensional

    • (n_steps, n_dim) if two-dimensional

    • (n_chains, n_steps, n_dim) if three-dimensional

  • n_splits (int or None, optional) – Number of splits for each chain. If None, a single chain will be split into 2 parts. Splitting allows computation of Gelman-Rubin statistics even with one chain. Default is None.

  • keys (list of str, optional) – Keys for which to compute the Gelman-Rubin statistic. Only used if chains is a desilike.Samples or list thereof. If None, use all keys in the chain. Default is None.

Returns:

gr

The estimated Gelman-Rubin statistics.
  • dict if chains is a desilike.Samples or list thereof

  • float if chains is a one-dimensional array

  • numpy.ndarray otherwise

Return type:

dict, float, or numpy.ndarray

Raises:

ValueError – If n_chains * n_splits is 1 or n_splits is larger than the number of samples.

desilike.statistics.diagnostics.integrated_autocorrelation_time(chains, keys=None)[source]

Estimate the integrated autocorrelation time for Markov chains.

Autocorrelation times are computed in the same way as in emcee. See https://emcee.readthedocs.io/en/stable/tutorials/autocorr/ for details. While the results have been verified to agree with those from emcee, although the implementation is independent.

Parameters:
  • chains (desilike.statistics.samples.Samples, list of desilike.statistics.samples.Samples, or numpy.ndarray) –

    Chains for which to compute the autocorrelation time. If a numpy array, the expected shapes are as follows:

    • (n_steps,) if one-dimensional

    • (n_steps, n_dim) if two-dimensional

    • (n_chains, n_steps, n_dim) if three-dimensional

  • keys (list of str, optional) – Keys for which to compute the autocorrelation time. Only used if chains is a desilike.Samples or list thereof. If None, use all keys in the chain. Default is None.

Returns:

tau – The estimated autocorrelation times.

  • dict if chains is a desilike.Samples or list thereof

  • float if chains is a one-dimensional array

  • numpy.ndarray otherwise

In all cases, the autocorrelation function (not time) for each parameter is averaged across chains, if multiple chains are provided.

Return type:

dict, float, or numpy.ndarray

Plotting

Module implementing plotting routines.

desilike.statistics.plotting.gelman_rubin(chains, keys=None, colors=None, n_splits=None, threshold=None, slices=100, offset=None, fontsize=None, plot_options=None, legend_options=None, fig=None)[source]

Plot Gelman-Rubin statistics as a function of steps.

Parameters:
  • chains (desilike.Samples or list of desilike.Samples) – List of (or single) :class:desilike.Samples instance(s).

  • keys (list or None, optional) – Parameters to plot the Gelman-Rubin statistic for. If None, plot all parameters. Default is None.

  • colors (str, list, or None, optional) – Dictionary of (or single) color(s) for parameters. Default is None.

  • n_splits (int or None, optional) – Number of splits for each chain. If None, a single chain will be split into 2 parts. Splitting allows computation of Gelman-Rubin statistics even with one chain. Default is None.

  • threshold (float, optional) – If not None, plot horizontal line at this value. Default is None.

  • slices (int, optional) – Number of linearly spaced steps for which to compute the Gelman-Rubin statistic. Default is 100.

  • offset (float or None, optional) – Offset to apply to the Gelman-Rubin statistics, typically 0 or -1. Default is None.

  • fontsize (int or None, optional) – Label sizes. Default is None.

  • plot_options (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.plot. Default is None.

  • legend_options (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.legend. Default is None.

  • fig (matplotlib.figure.Figure or None, optional) – Figure to plot on. If None, create a new one. Default is None.

Raises:

ValueError – If not all chains have the same length.

Returns:

fig – Figure with plot on it.

Return type:

matplotlib.figure.Figure

desilike.statistics.plotting.integrated_autocorrelation_time(chains, keys=None, colors=None, slices=10, fontsize=None, plot_options=None, legend_options=None, fig=None)[source]

Plot integrated autocorrelation time as a function of steps.

Parameters:
  • chains (desilike.Samples or list of desilike.Samples) – List of (or single) :class:desilike.Samples instance(s).

  • keys (list or None, optional) – Parameters to plot the integrated autocorrelation time for. If None, plot all parameters. Default is None.

  • colors (str, list, or None, optional) – Dictionary of (or single) color(s) for parameters. Default is None.

  • slices (int, optional) – Number of linearly spaced steps for which to compute the integrated autocorrelation time. Default is 10.

  • fontsize (int or None, optional) – Label sizes. Default is None.

  • plot_options (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.plot. Default is None.

  • legend_options (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.legend. Default is None.

  • fig (matplotlib.figure.Figure or None, optional) – Figure to plot on. If None, create a new one. Default is None.

Raises:

ValueError – If not all chains have the same length.

Returns:

fig – Figure with plot on it.

Return type:

matplotlib.figure.Figure

desilike.statistics.plotting.one_dimensional_profile(samples, param, ax=None, plot=True, plot_kwargs=None, scatter=False, scatter_kwargs=None)[source]

Add 1D profile to axes.

Parameters:
  • samples (desilike.Samples) – desilike.Samples instance returned from a profiler.

  • param (str) – Parameter to plot profile for.

  • ax (matplotlib.axes.Axes, default=None) – Axes where to add profile. If None, use plt.gca(). Default is None.

  • plot (bool, optional) – Whether to interpolate and plot the profile. Default is True.

  • plot_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.plot(). Default is None.

  • scatter (bool, optional) – Whether the plot individual points. Default is False.

  • scatter_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.scatter(). Default is None.

Raises:

ValueError – If both or neither of the posterior and likelihood are given.

desilike.statistics.plotting.plotter(f)[source]

Add plotting arguments and check if matplotlib is installed.

Parameters:
  • filepath (str, pathlib.Path or None, optional) – If not None, save the figure to that location. Default is None.

  • show (bool, optional) – If True, show the figure. Default is False.

  • save_options (dict or None, optional) – Additional options passed to the savefig function of matplotlib. Default is None.

Raises:

ImportError – If matplotlib is not installed.

desilike.statistics.plotting.trace(chains, keys=None, colors=None, fontsize=None, plot_options=None, fig=None)[source]

Make trace plot as a function of steps, with a panel for each parameter.

Parameters:
  • chains (desilike.Samples or list of desilike.Samples) – List of (or single) :class:desilike.Samples instance(s).

  • keys (list or None, optional) – Parameters to plot trace for. If None, plot all parameters. Default is None.

  • colors (str, list, or None, optional) – List of (or single) color(s) for chains. Default is None.

  • fontsize (int or None, optional) – Label sizes. Default is None.

  • plot_options (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.plot. Default is None.

  • fig (matplotlib.figure.Figure or None, optional) – Figure to plot on. If None, create a new one. Default is None.

Raises:

ValueError – If the provided figure has less axes than the chains have keys.

Returns:

fig – Figure with plot on it.

Return type:

matplotlib.figure.Figure

desilike.statistics.plotting.triangle_posterior(samples, params=None, **kwargs)[source]

Create a triangle posterior plot using getdist.

References

Parameters:
  • samples (desilike.Samples or list of desilike.Samples) – List of (or single) desilike.Samples instance(s).

  • params (list or None, optional) – Parameters to plot posterior for. If None, plot all parameters. Default is None.

  • **kwargs – Optional parameters for getdist.plots.GetDistPlotter.triangle_plot().

Raises:

ImportError – If getdist is not installed.

desilike.statistics.plotting.triangle_profile(samples, params=None, plot=True, plot_kwargs=None, levels=[1.14, 3.0, 4.61], contour_kwargs=None, scatter=False, scatter_kwargs=None, threshold=4.5, fig=None)[source]

Create a triangle profile plot.

Parameters:
  • samples (desilike.Samples) – Samples for which to plot the profile for.

  • params (list or None, optional) – Parameters to plot profile for. If None, plot all parameters. Default is None.

  • plot (bool, optional) – Whether to interpolate and plot the one-dimensional profiles. Default is True.

  • plot_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.plot(). Default is None.

  • levels (list, optional) – Confidence levels to plot for the two-dimensional profiles, i.e., the values \(\Delta \log \mathcal{P}\) where \(\log \mathcal{P} = \max \log \mathcal{P} - \Delta \log \mathcal{P}\). Default is [1.14, 3.00, 4.61] which corresponds to the 68%, 95%, and 99% credible intervals of a two-dimensional Gaussian.

  • contour_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.contour(). Default is None.

  • scatter (bool, optional) – Whether the plot individual points. Default is False.

  • scatter_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.scatter(). Default is None.

  • threshold (float, optional) – Limit the ranges for each parameter to the corresponding intervals for this threshold. Default is 4.5.

  • fig (matplotlib.figure.Figure or None, optional) – Figure to plot on. If None, create a new one. Default is None.

desilike.statistics.plotting.two_dimensional_profile(samples, params, ax=None, levels=[-4.61, -3.0, -1.14], contour_kwargs=None, scatter=False, scatter_kwargs=None)[source]

Add 2D profile to axes.

Parameters:
  • samples (desilike.Samples) – desilike.Samples instance returned from a profiler.

  • params (tuple of str) – Parameters to plot profile for.

  • ax (matplotlib.axes.Axes, default=None) – Axes where to add profile. If None, use plt.gca(). Default is None.

  • levels (list, optional) – Confidence levels to plot, i.e., the values \(z\) where \(\log \mathcal{P} = \max \log \mathcal{P} + z\). Default is [-4.61, -3.00, -1.14] which correspond to the 68%, 95%, and 99% credible intervals for a two-dimensional Gaussian.

  • contour_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.contour(). Default is None.

  • scatter (bool, optional) – Whether the plot individual points. Default is False.

  • scatter_kwargs (dict or None, optional) – Optional arguments for matplotlib.axes.Axes.scatter(). Default is None.

Raises:

ValueError – If both or neither of the posterior and likelihood are given or an incorrect number of parameters is given.