mogpe API¶

Base Classes¶

Mixture of Experts¶

Experts¶

class mogpe.experts.ExpertBase(*args, **kwargs)¶

Abstract base class for an individual expert.

Each subclass that inherits ExpertBase should implement the predict_dist() method that returns the individual experts prediction at an input.

Parameters

args (Any) –
kwargs (Any) –

abstract predict_dist(Xnew, **kwargs)¶

Returns the individual experts prediction at Xnew.

TODO: this does not return a tfd.Distribution

Parameters: Xnew (Tensor) – inputs with shape [num_test, input_dim]
Returns: an instance of a TensorFlow Distribution

class mogpe.experts.ExpertsBase(experts_list=None, name='Experts')¶

Abstract base class for a set of experts.

Provides an interface between ExpertBase and MixtureOfExperts. Each subclass that inherits ExpertsBase should implement the predict_dists() method that returns the set of experts predictions at an input (as a batched TensorFlow distribution).

Parameters: experts_list (Optional[List[ExpertBase]]) –

abstract predict_dists(Xnew, **kwargs)¶

Returns the set of experts predicted dists at Xnew.

Parameters: Xnew (Tensor) – inputs with shape [num_test, input_dim]
Return type: Distribution
Returns: a batched tfd.Distribution with batch_shape […, num_test, output_dim, num_experts]

Gating Networks¶

class mogpe.gating_networks.GatingNetworkBase(*args, **kwargs)¶

Abstract base class for the gating network.

Parameters

args (Any) –
kwargs (Any) –

abstract predict_fs(Xnew, **kwargs)¶

Calculates the set of gating function posteriors at Xnew

Parameters: Xnew (Tensor) – inputs with shape [num_test, input_dim]

TODO correct dimensions :rtype: Tuple[Tensor, Tensor] :returns: mean and var batched Tensors with shape […, num_test, 1, num_experts]

abstract predict_mixing_probs(Xnew, **kwargs)¶

Calculates the set of experts mixing probabilities at Xnew $\{\Pr(\alpha=k | x)\}^K_{k=1}$

Parameters: Xnew (Tensor) – inputs with shape [num_test, input_dim]
Return type: Tensor
Returns: a batched Tensor with shape […, num_test, 1, num_experts]

SVGP Classes¶

Mixture of SVGP Experts¶

class mogpe.mixture_of_experts.MixtureOfSVGPExperts(gating_network, experts, num_data, num_samples=1, bound='further_gating')¶

Mixture of SVGP experts using stochastic variational inference.

Implemention of a mixture of Gaussian process (GPs) experts method where the gating network is also implemented using GPs. The model is trained with stochastic variational inference by exploiting the factorization achieved by sparse GPs.

Parameters

gating_network (SVGPGatingNetwork) – an instance of the GatingNetworkBase class with the predict_mixing_probs(Xnew) method implemented.
experts (SVGPExperts) – an instance of the SVGPExperts class with the predict_dists(Xnew) method implemented.
num_inducing_samples – the number of samples to draw from the inducing point distributions during training.
num_data (int) – the number of data points.
num_samples (int) –
bound (str) –

elbo(data)¶

Returns the evidence lower bound (ELBO) of the log marginal likelihood.

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor

lower_bound_analytic(data)¶

Lower bound to the log-marginal likelihood (ELBO).

This bound assumes each output dimension is independent and takes the product over them within the logarithm (and before the expert indicator variable is marginalised).

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_dagp(data)¶

Lower bound used in Data Association with GPs (DAGP).

This bound doesn’t marginalise the expert indicator variable.

TODO check I’ve implemented this correctlyy. It’s definitely slower thatn it should be.

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_further(data)¶

Lower bound to the log-marginal likelihood (ELBO).

Looser bound than lower_bound_tight as it marginalises both of the expert’s and the gating network’s inducing variables $q(hat{f}, hat{h})$ in closed-form. Replaces M-dimensional approx integrals with 1-dimensional approx integrals.

This bound is equivalent to a different likelihood approximation that only mixes the noise models (as opposed to the full GPs).

This bound assumes each output dimension is independent.

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_further_2(data)¶

Lower bound to the log-marginal likelihood (ELBO).

Looser bound than lower_bound_tight but marginalises the inducing variables $q(hat{f}, hat{h})$ in closed-form. Replaces M-dimensional approx integrals with 1-dimensional approx integrals.

This bound assumes each output dimension is independent.

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_further_experts(data)¶

Lower bound to the log-marginal likelihood (ELBO).

Similar to lower_bound_tight but with a further bound on the experts. The bound removes the M dimensional integral over each expert’s inducing variables $q(hat{mathbf{U}})$ with 1 dimensional integrals over the gating network variational posterior $q(mathbf{h}_n)$.

This bound is equivalent to a different likelihood approximation that only mixes the noise models (as opposed to the full GPs).

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_further_gating(data)¶

Lower bound to the log-marginal likelihood (ELBO).

Similar to lower_bound_tight but with a further bound on the gating network. The bound removes the M dimensional integral over the gating network inducing variables $q(hat{mathbf{U}})$ with 1 dimensional integrals over the gating network variational posterior $q(mathbf{h}_n)$.

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_tight(data)¶

Lower bound to the log-marginal likelihood (ELBO).

Tighter bound than lower_bound_further but requires an M dimensional expectation over the inducing variables $q(hat{f}, hat{h})$ to be approximated (with Gibbs sampling).

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

lower_bound_tight_2(data)¶

Lower bound to the log-marginal likelihood (ELBO).

Tighter bound than lower_bound_further but requires an M dimensional expectation over the inducing variables $q(hat{f}, hat{h})$ to be approximated (with Gibbs sampling).

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: loss - a Tensor with shape ()

marginal_likelihood(data)¶

Marginal likelihood (ML).

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: marginal likelihood - a Tensor with shape ()

marginal_likelihood_new(data)¶

Marginal likelihood (ML).

Parameters: data (Tuple[Tensor, Tensor]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])
Return type: Tensor
Returns: marginal likelihood - a Tensor with shape ()

maximum_log_likelihood_objective(data)¶

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Parameters: data (Tuple[Tensor, Tensor]) –
Return type: Tensor

predict_experts_fs(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶

“Compute mean and (co)variance of experts latent functions at Xnew.

If num_inducing_samples is not None then sample inducing points instead of analytically integrating them. This is required in the mixture of experts lower bound.

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]
num_inducing_samples (Optional[int]) – the number of samples to draw from the inducing point distributions during training.

Return type

Tuple[Tensor, Tensor]

Returns

a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]

SVGP Experts¶

class mogpe.experts.SVGPExpert(kernel, likelihood, inducing_variable, mean_function=None, num_latent_gps=1, q_diag=False, q_mu=None, q_sqrt=None, whiten=True, num_data=None)¶

Sparse Variational Gaussian Process Expert.

This class inherits the prior_kl() method from the SVGPModel class and implements the predict_dist() method using SVGPModel’s predict_y method.

Parameters

kernel (Kernel) –
likelihood (Likelihood) –
mean_function (Optional[MeanFunction]) –
num_latent_gps (int) –
q_diag (bool) –
whiten (bool) –

predict_dist(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶

Returns the mean and (co)variance of the experts prediction at Xnew.

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]
num_inducing_samples (Optional[int]) – the number of samples to draw from the inducing points joint distribution.
full_cov (bool) – If True, draw correlated samples over the inputs. Computes the Cholesky over the dense covariance matrix of size [num_data, num_data]. If False, draw samples that are uncorrelated over the inputs.
full_output_cov (bool) – If True, draw correlated samples over the outputs. If False, draw samples that are uncorrelated over the outputs.

Return type

Tuple[Tensor, Tensor]

Returns

tuple of Tensors (mean, variance), means shape is [num_inducing_samples, num_test, output_dim], if full_cov=False variance tensor has shape [num_inducing_samples, num_test, ouput_dim] and if full_cov=True, [num_inducing_samples, output_dim, num_test, num_test]

class mogpe.experts.SVGPExperts(experts_list=None, name='Experts')¶

Extension of ExpertsBase for a set of SVGPExpert experts.

Provides an interface between a set of SVGPExpert instances and the MixtureOfSVGPExperts class.

Parameters: experts_list (Optional[List[SVGPExpert]]) –

predict_dists(Xnew, **kwargs)¶

Returns the set of experts predicted dists at Xnew.

Parameters: Xnew (Tensor) – inputs with shape [num_test, input_dim]
Return type: Distribution
Returns: a batched tfd.Distribution with batch_shape […, num_test, output_dim, num_experts]

predict_fs(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶

Returns the set experts latent function mean and (co)vars at Xnew.

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]
num_inducing_samples (Optional[int]) –

Return type

Tuple[Tensor, Tensor]

Returns

a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]

predict_ys(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶

Returns the set of experts predictions mean and (co)vars at Xnew.

Parameters

Xnew (Tensor) – inputs with shape [num_test, input_dim]
num_inducing_samples (Optional[int]) –

Return type

Tuple[Tensor, Tensor]

Returns

a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]

prior_kls()¶

Returns the set of experts KL divergences as a batched tensor.

Return type: Tensor
Returns: a Tensor with shape [num_experts,]

mogpe API¶

Base Classes¶

Mixture of Experts¶

Experts¶

Gating Networks¶

SVGP Classes¶

Mixture of SVGP Experts¶

SVGP Experts¶

SVGP Gating Networks¶