What’s going on with this code?!¶
In this section we provide details on the Mixtures of Gaussian Process Experts code (mogpe
).
The implementation is motivated by making it easy to implement different Mixtures of Gaussian Process
Experts models and inference algorithms.
It exploits both inheritance and composition (building blocks of OOP)
making it easier to evolve as new features are added or requirements change.
Class Inheritance and Composition¶
Let’s detail the basic building blocks and how they are related. There are three main components,
The mixture of experts model (
mogpe.mixture_of_experts
),The set of experts (
mogpe.experts
),And individual experts,
The gating network (
mogpe.gating_networks
),And individual gating functions.
Mixture of Experts Base¶
At the heart of this package is the MixtureOfExperts
base class
that extends GPflow’s BayesianModel
class
(any instantiation requires the maximum_log_likelihood_objective()
method to be implemented).
It defines the basic methods of a mixture of experts model, namely,
A method to predict the mixing probabilities at a set of input locations
MixtureOfExperts.predict_mixing_probs()
,A method to predict the set of expert predictions at a set of input locations
MixtureOfExperts.predict_experts_dists()
,A method to predict the mixture distribution at a set of input locations
MixtureOfExperts.predict_y()
.
The constructor requires an instance of a subclass of ExpertsBase
to
represent the set of experts and an instance of a subclass of
GatingNetworkBase
to represent the gating network.
MixtureOfSVGPExperts¶
The main model class in this package is MixtureOfSVGPExperts
which implements a lower bound
maximum_log_likelihood_objective()
given both
the experts and gating functions are modelled as sparse variational Gaussian processes (SVGP).
The implementation extends the ExpertsBase
class creating
SVGPExperts
which implements the required abstract methods as well as extra methods which are used
in the lower bound.
It also extends the GatingNetworkBase
class creating the
SVGPGatingNetwork
class.
This class implements a gating network based on SVGP’s for both the special two expert case and
the general k expert case.
Let’s now detail the base classes for the experts and gating network.
Expert(s) Base¶
Before detailing the ExpertsBase
class we need to first introduce
the base class for an individual expert.
Any class representing an individual expert must inherit the ExpertBase
class and implement the predict_dist()
method, returning the experts prediction at Xnew.
For example, the SVGPExpert
class inherits the
ExpertBase
class to implement
an expert as a sparse variational Gaussian process.
Any class representing the set of all experts must inherit the
ExpertsBase
class and should implement the predict_dists()
method, returning a batched TensorFlow Probability Distribution.
The constructor requires a list of expert instances inherited from a subclass of
ExpertBase
.
For example, the SVGPExperts
class represents a set of
SVGPExpert
experts and adds a method for returning the set of
inducing point KL divergences required in the MixtureOfSVGPExperts
lower bound.
Gating Network Base¶
All gating networks should inherit the GatingNetworkBase
class and implement the
predict_mixing_probs()
and predict_fs()
methods.
This package is mainly interested in gating networks based on Gaussian processes, in particular
sparse variational Gaussian processes.
The SVGPGatingNetwork
class implements a gating network as a sparse variational Gaussian
process.
Similarly to GPflow’s SVGP, its constructor requires a likelihood.
This likelihood governs the behaviour of the gating network.
If a Bernoulli likelihood is passed then the gating network will use a single gating function as
as we know \(\Pr(\alpha=2 | x) = 1 - \Pr(\alpha=1 | x)\).
As such, the kernel and inducing variables should correspond a single-output SVGP.
In the general case, i.e. with more than two experts, the gating network adopts a Softmax likelihood
which depends on a gating function for each expert.
In this setting, the kernel and inducing variables should be of multiple-output types, i.e.
SeparateIndependent and SharedIndependentInducingVariables respectively.