mogpe API

Base Classes

Mixture of Experts

Experts

class mogpe.experts.ExpertBase(*args, **kwargs)

Abstract base class for an individual expert.

Each subclass that inherits ExpertBase should implement the predict_dist() method that returns the individual experts prediction at an input.

Parameters
  • args (Any) –

  • kwargs (Any) –

abstract predict_dist(Xnew, **kwargs)

Returns the individual experts prediction at Xnew.

TODO: this does not return a tfd.Distribution

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]

Returns

an instance of a TensorFlow Distribution

class mogpe.experts.ExpertsBase(experts_list=None, name='Experts')

Abstract base class for a set of experts.

Provides an interface between ExpertBase and MixtureOfExperts. Each subclass that inherits ExpertsBase should implement the predict_dists() method that returns the set of experts predictions at an input (as a batched TensorFlow distribution).

Parameters

experts_list (Optional[List[ExpertBase]]) –

abstract predict_dists(Xnew, **kwargs)

Returns the set of experts predicted dists at Xnew.

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]

Return type

Distribution

Returns

a batched tfd.Distribution with batch_shape […, num_test, output_dim, num_experts]

Gating Networks

class mogpe.gating_networks.GatingNetworkBase(*args, **kwargs)

Abstract base class for the gating network.

Parameters
  • args (Any) –

  • kwargs (Any) –

abstract predict_fs(Xnew, **kwargs)

Calculates the set of gating function posteriors at Xnew

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]

TODO correct dimensions :rtype: Tuple[Tensor, Tensor] :returns: mean and var batched Tensors with shape […, num_test, 1, num_experts]

abstract predict_mixing_probs(Xnew, **kwargs)

Calculates the set of experts mixing probabilities at Xnew \(\{\Pr(\alpha=k | x)\}^K_{k=1}\)

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]

Return type

Tensor

Returns

a batched Tensor with shape […, num_test, 1, num_experts]

SVGP Classes

Mixture of SVGP Experts

class mogpe.mixture_of_experts.MixtureOfSVGPExperts(gating_network, experts, num_data, num_samples=1, bound='further_gating')

Mixture of SVGP experts using stochastic variational inference.

Implemention of a mixture of Gaussian process (GPs) experts method where the gating network is also implemented using GPs. The model is trained with stochastic variational inference by exploiting the factorization achieved by sparse GPs.

Parameters
  • gating_network (SVGPGatingNetwork) – an instance of the GatingNetworkBase class with the predict_mixing_probs(Xnew) method implemented.

  • experts (SVGPExperts) – an instance of the SVGPExperts class with the predict_dists(Xnew) method implemented.

  • num_inducing_samples – the number of samples to draw from the inducing point distributions during training.

  • num_data (int) – the number of data points.

  • num_samples (int) –

  • bound (str) –

elbo(data)

Returns the evidence lower bound (ELBO) of the log marginal likelihood.

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

lower_bound_analytic(data)

Lower bound to the log-marginal likelihood (ELBO).

This bound assumes each output dimension is independent and takes the product over them within the logarithm (and before the expert indicator variable is marginalised).

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_dagp(data)

Lower bound used in Data Association with GPs (DAGP).

This bound doesn’t marginalise the expert indicator variable.

TODO check I’ve implemented this correctlyy. It’s definitely slower thatn it should be.

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_further(data)

Lower bound to the log-marginal likelihood (ELBO).

Looser bound than lower_bound_tight as it marginalises both of the expert’s and the gating network’s inducing variables $q(hat{f}, hat{h})$ in closed-form. Replaces M-dimensional approx integrals with 1-dimensional approx integrals.

This bound is equivalent to a different likelihood approximation that only mixes the noise models (as opposed to the full GPs).

This bound assumes each output dimension is independent.

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_further_2(data)

Lower bound to the log-marginal likelihood (ELBO).

Looser bound than lower_bound_tight but marginalises the inducing variables $q(hat{f}, hat{h})$ in closed-form. Replaces M-dimensional approx integrals with 1-dimensional approx integrals.

This bound assumes each output dimension is independent.

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_further_experts(data)

Lower bound to the log-marginal likelihood (ELBO).

Similar to lower_bound_tight but with a further bound on the experts. The bound removes the M dimensional integral over each expert’s inducing variables $q(hat{mathbf{U}})$ with 1 dimensional integrals over the gating network variational posterior $q(mathbf{h}_n)$.

This bound is equivalent to a different likelihood approximation that only mixes the noise models (as opposed to the full GPs).

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_further_gating(data)

Lower bound to the log-marginal likelihood (ELBO).

Similar to lower_bound_tight but with a further bound on the gating network. The bound removes the M dimensional integral over the gating network inducing variables $q(hat{mathbf{U}})$ with 1 dimensional integrals over the gating network variational posterior $q(mathbf{h}_n)$.

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_tight(data)

Lower bound to the log-marginal likelihood (ELBO).

Tighter bound than lower_bound_further but requires an M dimensional expectation over the inducing variables $q(hat{f}, hat{h})$ to be approximated (with Gibbs sampling).

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

lower_bound_tight_2(data)

Lower bound to the log-marginal likelihood (ELBO).

Tighter bound than lower_bound_further but requires an M dimensional expectation over the inducing variables $q(hat{f}, hat{h})$ to be approximated (with Gibbs sampling).

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

loss - a Tensor with shape ()

marginal_likelihood(data)

Marginal likelihood (ML).

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

marginal likelihood - a Tensor with shape ()

marginal_likelihood_new(data)

Marginal likelihood (ML).

Parameters

data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])

Return type

Tensor

Returns

marginal likelihood - a Tensor with shape ()

maximum_log_likelihood_objective(data)

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Parameters

data (Tuple[Tensor, Tensor]) –

Return type

Tensor

predict_experts_fs(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)

“Compute mean and (co)variance of experts latent functions at Xnew.

If num_inducing_samples is not None then sample inducing points instead of analytically integrating them. This is required in the mixture of experts lower bound.

Parameters
  • Xnew (Tensor) – inputs with shape [num_test, input_dim]

  • num_inducing_samples (Optional[int]) – the number of samples to draw from the inducing point distributions during training.

Return type

Tuple[Tensor, Tensor]

Returns

a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]

SVGP Experts

class mogpe.experts.SVGPExpert(kernel, likelihood, inducing_variable, mean_function=None, num_latent_gps=1, q_diag=False, q_mu=None, q_sqrt=None, whiten=True, num_data=None)

Sparse Variational Gaussian Process Expert.

This class inherits the prior_kl() method from the SVGPModel class and implements the predict_dist() method using SVGPModel’s predict_y method.

Parameters
  • kernel (Kernel) –

  • likelihood (Likelihood) –

  • mean_function (Optional[MeanFunction]) –

  • num_latent_gps (int) –

  • q_diag (bool) –

  • whiten (bool) –

predict_dist(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)

Returns the mean and (co)variance of the experts prediction at Xnew.

Parameters
  • Xnew (Tensor) – inputs with shape [num_test, input_dim]

  • num_inducing_samples (Optional[int]) – the number of samples to draw from the inducing points joint distribution.

  • full_cov (bool) – If True, draw correlated samples over the inputs. Computes the Cholesky over the dense covariance matrix of size [num_data, num_data]. If False, draw samples that are uncorrelated over the inputs.

  • full_output_cov (bool) – If True, draw correlated samples over the outputs. If False, draw samples that are uncorrelated over the outputs.

Return type

Tuple[Tensor, Tensor]

Returns

tuple of Tensors (mean, variance), means shape is [num_inducing_samples, num_test, output_dim], if full_cov=False variance tensor has shape [num_inducing_samples, num_test, ouput_dim] and if full_cov=True, [num_inducing_samples, output_dim, num_test, num_test]

class mogpe.experts.SVGPExperts(experts_list=None, name='Experts')

Extension of ExpertsBase for a set of SVGPExpert experts.

Provides an interface between a set of SVGPExpert instances and the MixtureOfSVGPExperts class.

Parameters

experts_list (Optional[List[SVGPExpert]]) –

predict_dists(Xnew, **kwargs)

Returns the set of experts predicted dists at Xnew.

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]

Return type

Distribution

Returns

a batched tfd.Distribution with batch_shape […, num_test, output_dim, num_experts]

predict_fs(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)

Returns the set experts latent function mean and (co)vars at Xnew.

Parameters
  • Xnew (Tensor) – inputs with shape [num_test, input_dim]

  • num_inducing_samples (Optional[int]) –

Return type

Tuple[Tensor, Tensor]

Returns

a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]

predict_ys(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)

Returns the set of experts predictions mean and (co)vars at Xnew.

Parameters
  • Xnew (Tensor) – inputs with shape [num_test, input_dim]

  • num_inducing_samples (Optional[int]) –

Return type

Tuple[Tensor, Tensor]

Returns

a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]

prior_kls()

Returns the set of experts KL divergences as a batched tensor.

Return type

Tensor

Returns

a Tensor with shape [num_experts,]

SVGP Gating Networks