mogpe documentation!¶
This package implements a Mixtures of Gaussian Process Experts (MoGPE) model with a GP-based gating network.
Inference exploits factorisation through sparse GPs and trains a variational lower bound stochastically.
It also provides the building blocks for implementing other Mixtures of Gaussian Process Experts models.
mogpe
uses
GPflow 2.2/TensorFlow 2.4+
for running computations, which allows fast execution on GPUs, and uses Python ≥ 3.8.
It was originally created by Aidan Scannell.
Getting Started¶
To get started please see the Install instructions.
Notes on using mogpe
can be found in Usage and
the examples directory
and notebooks show how the model can be configured and trained.
Details on the implementation can be found in
What’s going on with this code?! and the mogpe API.
Install¶
This is a Python package that should be installed into a virtual environment. Start by cloning the repo from Github:
git clone https://github.com/aidanscannell/mogpe.git
The package can then be installed into a virutal environment by adding it as a local dependency.
Install with Poetry¶
mogpe
’s dependencies and packaging are being managed with Poetry, instead of other tools such as Pipenv.
To install mogpe
into an existing poetry environment add it as a dependency under
[tool.poetry.dependencies]
(in the pyproject.toml
configuration file) with the following line:
mogpe = {path = "/path/to/mogpe"}
If you want to develop the mogpe
codebase then set develop=true
:
mogpe = {path = "/path/to/mogpe", develop=true}
The dependencies in a pyproject.toml
file are resolved and installed with:
poetry install
If you do not require the development packages then you can opt to install without them:
poetry install --no-dev
Running Python scripts inside Poetry Environments¶
There are multiple ways to run code with Poetry and I advise checking out the documentation. My favourite option is to spawn a shell within the virtual environment:
poetry shell
and then python scripts can simply be run with:
python codey_mc_code_face.py
Alternatively, you can run scripts without spawning an instance of the virtual environment with the following command:
poetry run python codey_mc_code_face.py
I am much preferring using Poetry, however, it does feel quite slow doing some things and annoyingly doesn’t
integrate that well with Read the Docs.
A setup.py
file is still needed for building the docs on Read the Docs, so
I use Dephell to generate the requirements.txt
and setup.py
files from pyproject.toml
.
Install with Pip¶
Create a new virtual environment and activate it, for example:
mkvirtualenv --python=python3 mogpe-env
workon mogpe-env
cd into the root of this package and install it and its dependencies with:
pip install .
Usage¶
The model (and training with optional logging and checkpointing) can be configured using a TOML file.
Please see the examples directory showing
how to configure and train MixtureOfSVGPExperts
on multiple data sets.
See the notebooks (two experts
and three experts)
for how to define and train an instance of MixtureOfSVGPExperts
without configuration files.
Training¶
The training directory contains methods for three different training loops, for saving and loading the model, and for initialising the model (and training) from TOML config files.
Training Loops¶
mogpe.training.training_loops
contains three different training loops,
A simple TensorFlow training loop,
A monitoring tf training loop - a TensorFlow training loop with monitoring within tf.function(). This method only monitors the model parameters and loss (elbo) and does not generate images.
A monitoring training loop - this loop generates images during training. The matplotlib functions cannot be inside the tf.function so this training loop should be slower but provide more insights.
To use Tensorboard cd to the logs directory and start Tensorboard:
cd /path-to-log-dir
tensorboard --logdir . --reload_multifile=true
Tensorboard can then be found by visiting http://localhost:6006/ in your browser.
mogpe.helpers¶
The helpers directory contains classes to aid plotting models with 1D and 2D inputs. These are exploited by the monitored training loops.
Training MixtureOfSVGPExperts on the Motorcycle Data Set (with two experts)¶
This notebook is a basic example of configuring and training a Mixture of Gaussian Process Experts (using MixtureOfSVGPExperts
) on the motorcycle dataset with two experts. Instantiating the model with two experts is a special case because only a single gating function is needed (not two!) and the gating network can be calculated in closed form, which is not the case when using more than two experts.
[1]:
import numpy as np
import gpflow as gpf
import tensorflow as tf
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import clear_output
from gpflow import default_float
from gpflow.utilities import print_summary
from gpflow.likelihoods import Bernoulli
from mogpe.experts import SVGPExperts, SVGPExpert
from mogpe.gating_networks import SVGPGatingNetwork
from mogpe.mixture_of_experts import MixtureOfSVGPExperts
from mogpe.training import training_tf_loop
from mogpe.helpers.plotter import Plotter1D
Let’s start by loading the motorcycle dataset and plotting it to see what we’re dealing with.
[2]:
def load_mcycle_dataset(filename='../data/mcycle.csv'):
df = pd.read_csv(filename, sep=',')
X = pd.to_numeric(df['times']).to_numpy().reshape(-1, 1)
Y = pd.to_numeric(df['accel']).to_numpy().reshape(-1, 1)
X = tf.convert_to_tensor(X, dtype=default_float())
Y = tf.convert_to_tensor(Y, dtype=default_float())
print("Input data shape: ", X.shape)
print("Output data shape: ", Y.shape)
# standardise input
mean_x, var_x = tf.nn.moments(X, axes=[0])
mean_y, var_y = tf.nn.moments(Y, axes=[0])
X = (X - mean_x) / tf.sqrt(var_x)
Y = (Y - mean_y) / tf.sqrt(var_y)
data = (X, Y)
return data
[3]:
data_file = '../data/mcycle.csv'
dataset = load_mcycle_dataset(filename=data_file)
X, Y = dataset
num_data, input_dim = X.shape
output_dim = Y.shape[1]
plt.scatter(X, Y)
Input data shape: (133, 1)
Output data shape: (133, 1)
2021-12-08 18:27:42.245992: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[3]:
<matplotlib.collections.PathCollection at 0x19f6aad60>

Given this data set, let’s specify some of the model and training parameters. It is clear that there is a low noise, long lengthscales function at \(x<-1\) and at \(x>-1\) the noise increases and the lengthscale shortens. With this knowledge, let’s initialise expert one with a short lengthscale and expert two with a longer lengthscale. We specify each expert to have 6 inducing points and the gating network to have 7 inducing points.
[4]:
num_experts = 2
experts_lengthscales = [1.0, 10.0] # lengthsales for expert 1 and 2
num_inducing_expert = 6 # number of inducing points for each expert
num_inducing_gating = 7 # number of inducing points for gating network
num_samples = 1 # number of samples to draw from variational posterior in ELBO
batch_size = 16
learning_rate = 0.01
In order to initialie the MixtureOfSVGPExperts
class for two experts we must pass it an instance of SVGPExperts
and an instance of SVGPGatingNetwork
with a Bernoulli
likelihood. Let’s start by creating an instance of SVGPExperts
. To do this we must first create two SVGPExpert
instances and pass them as a list to SVGPExperts
. Let’s create out first expert.
[5]:
def init_expert(lengthscales=1.0, kernel_variance=1.0, noise_variance=1.0):
idx = np.random.choice(range(num_data), size=num_inducing_expert, replace=False)
inducing_variable = X.numpy()[idx, ...].reshape(-1, input_dim)
inducing_variable = gpf.inducing_variables.InducingPoints(inducing_variable)
mean_function = gpf.mean_functions.Constant()
likelihood = gpf.likelihoods.Gaussian(noise_variance)
kernel = gpf.kernels.RBF(lengthscales=lengthscales, variance=kernel_variance)
return SVGPExpert(kernel,
likelihood,
mean_function=mean_function,
inducing_variable=inducing_variable)
[6]:
experts_list = [init_expert(lengthscales=experts_lengthscales[k]) for k in range(num_experts)]
We can now create an instance of SVGPExperts
by passing our two experts as a list.
[7]:
experts = SVGPExperts(experts_list)
print_summary(experts, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
SVGPExperts.experts_list[0].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
SVGPExperts.experts_list[0].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[0].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[0].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[0].inducing_variable.Z | Parameter | Identity | True | (6, 1) | float64 | [[-1.74116353... | |
SVGPExperts.experts_list[0].q_mu | Parameter | Identity | True | (6, 1) | float64 | [[0.... | |
SVGPExperts.experts_list[0].q_sqrt | Parameter | FillTriangular | True | (1, 6, 6) | float64 | [[[1., 0., 0.... | |
SVGPExperts.experts_list[1].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
SVGPExperts.experts_list[1].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[1].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.0 | |
SVGPExperts.experts_list[1].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[1].inducing_variable.Z | Parameter | Identity | True | (6, 1) | float64 | [[-0.71690236... | |
SVGPExperts.experts_list[1].q_mu | Parameter | Identity | True | (6, 1) | float64 | [[0.... | |
SVGPExperts.experts_list[1].q_sqrt | Parameter | FillTriangular | True | (1, 6, 6) | float64 | [[[1., 0., 0.... |
Lovely stuff. We now need to create an instance of SVGPGatingNetwork
with a Bernoulli
likelihood. Remember that we only need a single gating function for the two expert case. Let’s go ahead and create our gating function and use it to construct our gating network.
[8]:
def init_gating_network():
idx = np.random.choice(range(num_data), size=num_inducing_gating, replace=False)
inducing_variable = X.numpy()[idx, ...].reshape(-1, input_dim)
inducing_variable = gpf.inducing_variables.InducingPoints(inducing_variable)
mean_function = gpf.mean_functions.Zero()
kernel = gpf.kernels.RBF()
return SVGPGatingNetwork(kernel,
likelihood=Bernoulli(),
inducing_variable=inducing_variable,
mean_function=mean_function)
[9]:
gating_network = init_gating_network()
print_summary(gating_network, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
SVGPGatingNetwork.kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.kernel.lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.inducing_variable.Z | Parameter | Identity | True | (7, 1) | float64 | [[-0.56402756... | |
SVGPGatingNetwork.q_mu | Parameter | Identity | True | (7, 1) | float64 | [[0.... | |
SVGPGatingNetwork.q_sqrt | Parameter | FillTriangular | True | (1, 7, 7) | float64 | [[[1., 0., 0.... |
We now have all the components to construct our MixtureOfSVGPExperts
model so let’s go ahead and do it.
[10]:
model = MixtureOfSVGPExperts(gating_network=gating_network,
experts=experts,
num_samples=num_samples,
num_data=num_data)
print_summary(model, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
MixtureOfSVGPExperts.gating_network.kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.kernel.lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.inducing_variable.Z | Parameter | Identity | True | (7, 1) | float64 | [[-0.56402756... | |
MixtureOfSVGPExperts.gating_network.q_mu | Parameter | Identity | True | (7, 1) | float64 | [[0.... | |
MixtureOfSVGPExperts.gating_network.q_sqrt | Parameter | FillTriangular | True | (1, 7, 7) | float64 | [[[1., 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[0].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[0].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[0].inducing_variable.Z | Parameter | Identity | True | (6, 1) | float64 | [[-1.74116353... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_mu | Parameter | Identity | True | (6, 1) | float64 | [[0.... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_sqrt | Parameter | FillTriangular | True | (1, 6, 6) | float64 | [[[1., 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[1].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.0 | |
MixtureOfSVGPExperts.experts.experts_list[1].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[1].inducing_variable.Z | Parameter | Identity | True | (6, 1) | float64 | [[-0.71690236... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_mu | Parameter | Identity | True | (6, 1) | float64 | [[0.... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_sqrt | Parameter | FillTriangular | True | (1, 6, 6) | float64 | [[[1., 0., 0.... |
mogpe.helpers.plotter
to plot our model before training.[11]:
plotter = Plotter1D(model, X, Y)
plotter.plot_model()



We must now convert our numpy data set into a TensorFlow data set and set it up for stochastic optimisation by setting the batch size. We set drop_remainder=True to ensure the model receives the batch size.
[12]:
prefetch_size = tf.data.experimental.AUTOTUNE
shuffle_buffer_size = num_data // 2
num_batches_per_epoch = num_data // batch_size
train_dataset = tf.data.Dataset.from_tensor_slices(dataset)
train_dataset = (train_dataset.repeat().prefetch(prefetch_size).shuffle(
buffer_size=shuffle_buffer_size).batch(batch_size, drop_remainder=True))
We then use GPflows training_loss_closure
method to get our training loss.
[13]:
training_loss = model.training_loss_closure(iter(train_dataset))
In mogpe.training.training_loops some training loops are defined. Here we use the simple training_tf_loop which runs the Adam optimizer on model with training_loss as the objective function. The loop does not use any TensorBoard monitoring. We first configure the training/logging parameters.
[14]:
logging_epoch_freq = 5
plotting_epoch_freq = 500
num_epochs = 2500
[15]:
def plot_elbo(elbo):
plt.subplot(111)
plt.scatter(np.arange(len(elbo))*logging_epoch_freq, elbo)
plt.xlabel("Epoch")
plt.ylabel("ELBO")
[16]:
optimizer = tf.optimizers.Adam(learning_rate=learning_rate)
@tf.function
def tf_optimization_step():
optimizer.minimize(training_loss, model.trainable_variables)
elbo_log = []
for epoch in range(num_epochs):
for _ in range(num_batches_per_epoch):
tf_optimization_step()
epoch_id = epoch + 1
if epoch_id % logging_epoch_freq == 0:
elbo_log.append(training_loss()*-1.0)
if epoch_id % plotting_epoch_freq == 0:
clear_output(True)
tf.print(f"Epoch {epoch_id}: ELBO (train) {training_loss()}")
plot_elbo(elbo_log)
plt.show()
Epoch 2500: ELBO (train) -12.587674046751173

Now that we have trained the model we can use our plotter again to visualise what we have learned.
[17]:
plotter.plot_model()



[18]:
print_summary(model, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
MixtureOfSVGPExperts.gating_network.kernel.variance | Parameter | Softplus | True | () | float64 | 10.044302011299953 | |
MixtureOfSVGPExperts.gating_network.kernel.lengthscales | Parameter | Softplus | True | () | float64 | 0.9191759962241284 | |
MixtureOfSVGPExperts.gating_network.inducing_variable.Z | Parameter | Identity | True | (7, 1) | float64 | [[-0.33679763... | |
MixtureOfSVGPExperts.gating_network.q_mu | Parameter | Identity | True | (7, 1) | float64 | [[1.23060906... | |
MixtureOfSVGPExperts.gating_network.q_sqrt | Parameter | FillTriangular | True | (1, 7, 7) | float64 | [[[0.24373665, 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[0].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.2537413] | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.variance | Parameter | Softplus | True | () | float64 | 0.7245331900215413 | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 0.266099322492376 | |
MixtureOfSVGPExperts.experts.experts_list[0].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 0.10439134246880631 | |
MixtureOfSVGPExperts.experts.experts_list[0].inducing_variable.Z | Parameter | Identity | True | (6, 1) | float64 | [[-0.82094873... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_mu | Parameter | Identity | True | (6, 1) | float64 | [[-0.22453415... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_sqrt | Parameter | FillTriangular | True | (1, 6, 6) | float64 | [[[0.14292316, 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[1].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.47905155] | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.variance | Parameter | Softplus | True | () | float64 | 1.875849923049286e-07 | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 14.49029624682839 | |
MixtureOfSVGPExperts.experts.experts_list[1].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 0.001153125667128228 | |
MixtureOfSVGPExperts.experts.experts_list[1].inducing_variable.Z | Parameter | Identity | True | (6, 1) | float64 | [[-1.55605991... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_mu | Parameter | Identity | True | (6, 1) | float64 | [[-0.00318104... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_sqrt | Parameter | FillTriangular | True | (1, 6, 6) | float64 | [[[-1.01888223e+00, 0.00000000e+00, 0.00000000e+00... |
[ ]:
Training MixtureOfSVGPExperts on the Motorcycle Data Set (with three experts)¶
This notebook is a basic example of configuring and training a Mixture of Gaussian Process Experts (using MixtureOfSVGPExperts
) in the gerenal case, i.e. with more than two experts. This notebook instantiates the model with three experts and trains it on the motorcycle dataset. It’s worth noting that this approach is applicable for any number of experts.
[ ]:
import numpy as np
import gpflow as gpf
import tensorflow as tf
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import clear_output
from gpflow import default_float
from gpflow.inducing_variables import InducingPoints, SharedIndependentInducingVariables
from gpflow.likelihoods import Softmax
from gpflow.utilities import print_summary
from mogpe.experts import SVGPExperts, SVGPExpert
from mogpe.gating_networks import SVGPGatingNetwork
from mogpe.mixture_of_experts import MixtureOfSVGPExperts
from mogpe.training import training_tf_loop
from mogpe.helpers.plotter import Plotter1D
Let’s start by loading the motorcycle dataset and plotting it to see what we’re dealing with.
[ ]:
def load_mcycle_dataset(filename='../data/mcycle.csv'):
df = pd.read_csv(filename, sep=',')
X = pd.to_numeric(df['times']).to_numpy().reshape(-1, 1)
Y = pd.to_numeric(df['accel']).to_numpy().reshape(-1, 1)
X = tf.convert_to_tensor(X, dtype=default_float())
Y = tf.convert_to_tensor(Y, dtype=default_float())
print("Input data shape: ", X.shape)
print("Output data shape: ", Y.shape)
# standardise input
mean_x, var_x = tf.nn.moments(X, axes=[0])
mean_y, var_y = tf.nn.moments(Y, axes=[0])
X = (X - mean_x) / tf.sqrt(var_x)
Y = (Y - mean_y) / tf.sqrt(var_y)
data = (X, Y)
return data
[3]:
data_file = '../data/mcycle.csv'
dataset = load_mcycle_dataset(filename=data_file)
X, Y = dataset
num_data, input_dim = X.shape
output_dim = Y.shape[1]
plt.scatter(X, Y)
Input data shape: (133, 1)
Output data shape: (133, 1)
2021-12-08 18:49:51.670128: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[3]:
<matplotlib.collections.PathCollection at 0x1942c85b0>

Given this data set, let’s specify some of the model and training parameters. It is clear that there is a low noise, long lengthscales function at \(x<-1\) and at \(x>-1\) the noise increases and the lengthscale shortens. When fitting MixtureOfSVGPExperts
with two experts the gating network starts tending to a uniform distribution at \(x>1\). It is therefore interesting to consider if the model will fit a third expert in this region. With this knowledge, let’s initialise expert
one and expert three with long lengthscales and expert two with a shorter lengthscale. We specify each expert to have 4 inducing points and the gating network to have 7 inducing points.
[4]:
num_experts = 3
experts_lengthscales = [10.0, 1.0, 10.0] # lengthsales for expert 1 and 2
# experts_lengthscales = [1.0, 1.0, 1.0] # lengthsales for expert 1 and 2
num_inducing_expert = 4 # number of inducing points for each expert
num_inducing_gating = 7 # number of inducing points for the gating network
num_samples = 1 # number of samples to draw from variational posterior in ELBO
batch_size = 16
learning_rate = 0.01
In order to initialie the MixtureOfSVGPExperts
class for three experts we must pass it an instance of SVGPExperts
and an instance of SVGPGatingNetwork
with a Softmax
likelihood. Let’s start by creating an instance of SVGPExperts
. To do this we must first create three SVGPExpert instances and pass them as a list to SVGPExperts
.
[5]:
def init_expert(lengthscales=1.0, kernel_variance=1.0, noise_variance=1.0):
idx = np.random.choice(range(num_data), size=num_inducing_expert, replace=False)
inducing_variable = X.numpy()[idx, ...].reshape(-1, input_dim)
inducing_variable = gpf.inducing_variables.InducingPoints(inducing_variable)
mean_function = gpf.mean_functions.Constant()
likelihood = gpf.likelihoods.Gaussian(noise_variance)
kernel = gpf.kernels.RBF(lengthscales=lengthscales, variance=kernel_variance)
return SVGPExpert(kernel,
likelihood,
mean_function=mean_function,
inducing_variable=inducing_variable)
[6]:
experts_list = [init_expert(lengthscales=experts_lengthscales[k]) for k in range(num_experts)]
We can now create an instance of SVGPExperts
by instantiating three experts and passing them to SVGPExperts
constructor as a list.
[7]:
experts = SVGPExperts(experts_list)
print_summary(experts, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
SVGPExperts.experts_list[0].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
SVGPExperts.experts_list[0].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[0].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.0 | |
SVGPExperts.experts_list[0].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[0].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[1.34690746... | |
SVGPExperts.experts_list[0].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.... | |
SVGPExperts.experts_list[0].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1., 0., 0.... | |
SVGPExperts.experts_list[1].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
SVGPExperts.experts_list[1].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[1].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[1].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[1].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[0.70483329... | |
SVGPExperts.experts_list[1].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.... | |
SVGPExperts.experts_list[1].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1., 0., 0.... | |
SVGPExperts.experts_list[2].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
SVGPExperts.experts_list[2].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[2].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.0 | |
SVGPExperts.experts_list[2].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
SVGPExperts.experts_list[2].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[0.15448401... | |
SVGPExperts.experts_list[2].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.... | |
SVGPExperts.experts_list[2].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1., 0., 0.... |
We now need to create an instance of SVGPGatingNetwork
with a Softmax
likelihood. In contrast to the two expert case (where a single gating function can be used), the general case requires a gating function for each expert. The SVGPGatingNetwork
inherits GPflow’s multioutput SVGP
and uses SharedIndependentInducingVariables
for the inducing inputs and SeparateIndependent
kernels. The gating functions are independent but should share the same inducing inputs, unlike the
experts where the separate inducing points loosely partition the data set.
[8]:
def init_gating_network(num_experts):
idx = np.random.choice(range(num_data), size=num_inducing_gating, replace=False)
inducing_variable = X.numpy()[idx, ...].reshape(-1, input_dim)
inducing_variable = SharedIndependentInducingVariables(InducingPoints(inducing_variable))
mean_function = gpf.mean_functions.Zero()
kernel_list = [gpf.kernels.RBF() for _ in range(num_experts)]
kernel = gpf.kernels.SeparateIndependent(kernel_list)
return SVGPGatingNetwork(kernel,
likelihood=Softmax(num_experts),
inducing_variable=inducing_variable,
num_gating_functions=num_experts,
mean_function=mean_function)
[9]:
gating_network = init_gating_network(num_experts=num_experts)
print_summary(gating_network, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
SVGPGatingNetwork.kernel.kernels[0].variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.kernel.kernels[0].lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.kernel.kernels[1].variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.kernel.kernels[1].lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.kernel.kernels[2].variance | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.kernel.kernels[2].lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
SVGPGatingNetwork.inducing_variable.inducing_variable.Z | Parameter | Identity | True | (7, 1) | float64 | [[1.72909446... | |
SVGPGatingNetwork.q_mu | Parameter | Identity | True | (7, 3) | float64 | [[0., 0., 0.... | |
SVGPGatingNetwork.q_sqrt | Parameter | FillTriangular | True | (3, 7, 7) | float64 | [[[1., 0., 0.... |
We now have all the components to construct our MixtureOfSVGPExperts
model so let’s go ahead and do it.
[10]:
model = MixtureOfSVGPExperts(gating_network=gating_network,
experts=experts,
num_samples=num_samples,
num_data=num_data)
print_summary(model, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
MixtureOfSVGPExperts.gating_network.kernel.kernels[0].variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[0].lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[1].variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[1].lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[2].variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[2].lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.gating_network.inducing_variable.inducing_variable.Z | Parameter | Identity | True | (7, 1) | float64 | [[1.72909446... | |
MixtureOfSVGPExperts.gating_network.q_mu | Parameter | Identity | True | (7, 3) | float64 | [[0., 0., 0.... | |
MixtureOfSVGPExperts.gating_network.q_sqrt | Parameter | FillTriangular | True | (3, 7, 7) | float64 | [[[1., 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[0].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.0 | |
MixtureOfSVGPExperts.experts.experts_list[0].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[0].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[1.34690746... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1., 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[1].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[1].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[1].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[0.70483329... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1., 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[2].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.] | |
MixtureOfSVGPExperts.experts.experts_list[2].kernel.variance | Parameter | Softplus | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[2].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.0 | |
MixtureOfSVGPExperts.experts.experts_list[2].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 1.0 | |
MixtureOfSVGPExperts.experts.experts_list[2].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[0.15448401... | |
MixtureOfSVGPExperts.experts.experts_list[2].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.... | |
MixtureOfSVGPExperts.experts.experts_list[2].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1., 0., 0.... |
mogpe.helpers.plotter
to plot our model before training.[11]:
plotter = Plotter1D(model, X, Y)
plotter.plot_model()



We must now convert our numpy data set into a TensorFlow data set and set it up for stochastic optimisation by setting the batch size. We set drop_remainder=True to ensure the model receives the batch size.
[12]:
prefetch_size = tf.data.experimental.AUTOTUNE
shuffle_buffer_size = num_data // 2
num_batches_per_epoch = num_data // batch_size
train_dataset = tf.data.Dataset.from_tensor_slices(dataset)
train_dataset = (train_dataset.repeat().prefetch(prefetch_size).shuffle(
buffer_size=shuffle_buffer_size).batch(batch_size, drop_remainder=True))
We then use GPflows training_loss_closure method to get our training loss.
[13]:
training_loss = model.training_loss_closure(iter(train_dataset))
In mogpe.training.training_loops some training loops are defined. Here we use the simple training_tf_loop which runs the Adam optimizer on model with training_loss as the objective function. The loop does not use any TensorBoard monitoring. We first configure the training/logging parameters.
[14]:
logging_epoch_freq = 5
plotting_epoch_freq = 500
num_epochs = 2000
[15]:
def plot_elbo(elbo):
plt.subplot(111)
plt.scatter(np.arange(len(elbo))*logging_epoch_freq, elbo)
plt.xlabel("Epoch")
plt.ylabel("ELBO")
[16]:
optimizer = tf.optimizers.Adam(learning_rate=learning_rate)
@tf.function
def tf_optimization_step():
optimizer.minimize(training_loss, model.trainable_variables)
elbo_log = []
for epoch in range(num_epochs):
for _ in range(num_batches_per_epoch):
tf_optimization_step()
epoch_id = epoch + 1
if epoch_id % logging_epoch_freq == 0:
elbo_log.append(training_loss()*-1.0)
if epoch_id % plotting_epoch_freq == 0:
clear_output(True)
tf.print(f"Epoch {epoch_id}: ELBO (train) {training_loss()}")
plot_elbo(elbo_log)
plt.show()
Epoch 2000: ELBO (train) 49.81311591329514

Now that we have trained the model we can use our plotter again to visualise what we have learned.
[17]:
plotter.plot_model()



[18]:
print_summary(model, fmt="notebook")
name | class | transform | prior | trainable | shape | dtype | value |
---|---|---|---|---|---|---|---|
MixtureOfSVGPExperts.gating_network.kernel.kernels[0].variance | Parameter | Softplus | True | () | float64 | 7.135789778871159 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[0].lengthscales | Parameter | Softplus | True | () | float64 | 1.3016590612834866 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[1].variance | Parameter | Softplus | True | () | float64 | 23.512699977881155 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[1].lengthscales | Parameter | Softplus | True | () | float64 | 0.8741896699556088 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[2].variance | Parameter | Softplus | True | () | float64 | 0.1437596400755563 | |
MixtureOfSVGPExperts.gating_network.kernel.kernels[2].lengthscales | Parameter | Softplus | True | () | float64 | 0.8437475112514834 | |
MixtureOfSVGPExperts.gating_network.inducing_variable.inducing_variable.Z | Parameter | Identity | True | (7, 1) | float64 | [[1.26555844... | |
MixtureOfSVGPExperts.gating_network.q_mu | Parameter | Identity | True | (7, 3) | float64 | [[-0.89398975, -0.55701677, 0.18885994... | |
MixtureOfSVGPExperts.gating_network.q_sqrt | Parameter | FillTriangular | True | (3, 7, 7) | float64 | [[[2.84199781e-01, 0.00000000e+00, 0.00000000e+00... | |
MixtureOfSVGPExperts.experts.experts_list[0].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.47284263] | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.variance | Parameter | Softplus | True | () | float64 | 6.118328089014628e-07 | |
MixtureOfSVGPExperts.experts.experts_list[0].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 13.425503933562686 | |
MixtureOfSVGPExperts.experts.experts_list[0].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 0.001097764117017966 | |
MixtureOfSVGPExperts.experts.experts_list[0].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[1.20314838... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[-0.00874161... | |
MixtureOfSVGPExperts.experts.experts_list[0].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1.00055666e+00, 0.00000000e+00, 0.00000000e+00... | |
MixtureOfSVGPExperts.experts.experts_list[1].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.25559829] | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.variance | Parameter | Softplus | True | () | float64 | 0.8519896116531239 | |
MixtureOfSVGPExperts.experts.experts_list[1].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 0.27061685655515777 | |
MixtureOfSVGPExperts.experts.experts_list[1].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 0.061669447665278404 | |
MixtureOfSVGPExperts.experts.experts_list[1].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[0.51534479... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[1.32988718... | |
MixtureOfSVGPExperts.experts.experts_list[1].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[0.13110926, 0., 0.... | |
MixtureOfSVGPExperts.experts.experts_list[2].mean_function.c | Parameter | Identity | True | (1,) | float64 | [0.55334962] | |
MixtureOfSVGPExperts.experts.experts_list[2].kernel.variance | Parameter | Softplus | True | () | float64 | 1.084895730768368e-05 | |
MixtureOfSVGPExperts.experts.experts_list[2].kernel.lengthscales | Parameter | Softplus | True | () | float64 | 10.279219461615842 | |
MixtureOfSVGPExperts.experts.experts_list[2].likelihood.variance | Parameter | Softplus + Shift | True | () | float64 | 0.0958555963107556 | |
MixtureOfSVGPExperts.experts.experts_list[2].inducing_variable.Z | Parameter | Identity | True | (4, 1) | float64 | [[0.35725987... | |
MixtureOfSVGPExperts.experts.experts_list[2].q_mu | Parameter | Identity | True | (4, 1) | float64 | [[0.0038293... | |
MixtureOfSVGPExperts.experts.experts_list[2].q_sqrt | Parameter | FillTriangular | True | (1, 4, 4) | float64 | [[[1.00016938, 0., 0.... |
[ ]:
What’s going on with this code?!¶
In this section we provide details on the Mixtures of Gaussian Process Experts code (mogpe
).
The implementation is motivated by making it easy to implement different Mixtures of Gaussian Process
Experts models and inference algorithms.
It exploits both inheritance and composition (building blocks of OOP)
making it easier to evolve as new features are added or requirements change.
Class Inheritance and Composition¶
Let’s detail the basic building blocks and how they are related. There are three main components,
The mixture of experts model (
mogpe.mixture_of_experts
),The set of experts (
mogpe.experts
),And individual experts,
The gating network (
mogpe.gating_networks
),And individual gating functions.
Mixture of Experts Base¶
At the heart of this package is the MixtureOfExperts
base class
that extends GPflow’s BayesianModel
class
(any instantiation requires the maximum_log_likelihood_objective()
method to be implemented).
It defines the basic methods of a mixture of experts model, namely,
A method to predict the mixing probabilities at a set of input locations
MixtureOfExperts.predict_mixing_probs()
,A method to predict the set of expert predictions at a set of input locations
MixtureOfExperts.predict_experts_dists()
,A method to predict the mixture distribution at a set of input locations
MixtureOfExperts.predict_y()
.
The constructor requires an instance of a subclass of ExpertsBase
to
represent the set of experts and an instance of a subclass of
GatingNetworkBase
to represent the gating network.
MixtureOfSVGPExperts¶
The main model class in this package is MixtureOfSVGPExperts
which implements a lower bound
maximum_log_likelihood_objective()
given both
the experts and gating functions are modelled as sparse variational Gaussian processes (SVGP).
The implementation extends the ExpertsBase
class creating
SVGPExperts
which implements the required abstract methods as well as extra methods which are used
in the lower bound.
It also extends the GatingNetworkBase
class creating the
SVGPGatingNetwork
class.
This class implements a gating network based on SVGP’s for both the special two expert case and
the general k expert case.
Let’s now detail the base classes for the experts and gating network.
Expert(s) Base¶
Before detailing the ExpertsBase
class we need to first introduce
the base class for an individual expert.
Any class representing an individual expert must inherit the ExpertBase
class and implement the predict_dist()
method, returning the experts prediction at Xnew.
For example, the SVGPExpert
class inherits the
ExpertBase
class to implement
an expert as a sparse variational Gaussian process.
Any class representing the set of all experts must inherit the
ExpertsBase
class and should implement the predict_dists()
method, returning a batched TensorFlow Probability Distribution.
The constructor requires a list of expert instances inherited from a subclass of
ExpertBase
.
For example, the SVGPExperts
class represents a set of
SVGPExpert
experts and adds a method for returning the set of
inducing point KL divergences required in the MixtureOfSVGPExperts
lower bound.
Gating Network Base¶
All gating networks should inherit the GatingNetworkBase
class and implement the
predict_mixing_probs()
and predict_fs()
methods.
This package is mainly interested in gating networks based on Gaussian processes, in particular
sparse variational Gaussian processes.
The SVGPGatingNetwork
class implements a gating network as a sparse variational Gaussian
process.
Similarly to GPflow’s SVGP, its constructor requires a likelihood.
This likelihood governs the behaviour of the gating network.
If a Bernoulli likelihood is passed then the gating network will use a single gating function as
as we know \(\Pr(\alpha=2 | x) = 1 - \Pr(\alpha=1 | x)\).
As such, the kernel and inducing variables should correspond a single-output SVGP.
In the general case, i.e. with more than two experts, the gating network adopts a Softmax likelihood
which depends on a gating function for each expert.
In this setting, the kernel and inducing variables should be of multiple-output types, i.e.
SeparateIndependent and SharedIndependentInducingVariables respectively.
mogpe API¶
Base Classes¶
Mixture of Experts¶
Experts¶
-
class
mogpe.experts.
ExpertBase
(*args, **kwargs)¶ Abstract base class for an individual expert.
Each subclass that inherits ExpertBase should implement the predict_dist() method that returns the individual experts prediction at an input.
- Parameters
args (
Any
) –kwargs (
Any
) –
-
abstract
predict_dist
(Xnew, **kwargs)¶ Returns the individual experts prediction at Xnew.
TODO: this does not return a tfd.Distribution
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]- Returns
an instance of a TensorFlow Distribution
-
class
mogpe.experts.
ExpertsBase
(experts_list=None, name='Experts')¶ Abstract base class for a set of experts.
Provides an interface between ExpertBase and MixtureOfExperts. Each subclass that inherits ExpertsBase should implement the predict_dists() method that returns the set of experts predictions at an input (as a batched TensorFlow distribution).
- Parameters
experts_list (
Optional
[List
[ExpertBase
]]) –
-
abstract
predict_dists
(Xnew, **kwargs)¶ Returns the set of experts predicted dists at Xnew.
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]- Return type
Distribution
- Returns
a batched tfd.Distribution with batch_shape […, num_test, output_dim, num_experts]
Gating Networks¶
-
class
mogpe.gating_networks.
GatingNetworkBase
(*args, **kwargs)¶ Abstract base class for the gating network.
- Parameters
args (
Any
) –kwargs (
Any
) –
-
abstract
predict_fs
(Xnew, **kwargs)¶ Calculates the set of gating function posteriors at Xnew
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]
TODO correct dimensions :rtype:
Tuple
[Tensor
,Tensor
] :returns: mean and var batched Tensors with shape […, num_test, 1, num_experts]
-
abstract
predict_mixing_probs
(Xnew, **kwargs)¶ Calculates the set of experts mixing probabilities at Xnew \(\{\Pr(\alpha=k | x)\}^K_{k=1}\)
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]- Return type
Tensor
- Returns
a batched Tensor with shape […, num_test, 1, num_experts]
SVGP Classes¶
Mixture of SVGP Experts¶
-
class
mogpe.mixture_of_experts.
MixtureOfSVGPExperts
(gating_network, experts, num_data, num_samples=1, bound='further_gating')¶ Mixture of SVGP experts using stochastic variational inference.
Implemention of a mixture of Gaussian process (GPs) experts method where the gating network is also implemented using GPs. The model is trained with stochastic variational inference by exploiting the factorization achieved by sparse GPs.
- Parameters
gating_network (
SVGPGatingNetwork
) – an instance of the GatingNetworkBase class with the predict_mixing_probs(Xnew) method implemented.experts (
SVGPExperts
) – an instance of the SVGPExperts class with the predict_dists(Xnew) method implemented.num_inducing_samples – the number of samples to draw from the inducing point distributions during training.
num_data (
int
) – the number of data points.num_samples (
int
) –bound (
str
) –
-
elbo
(data)¶ Returns the evidence lower bound (ELBO) of the log marginal likelihood.
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
-
lower_bound_analytic
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
This bound assumes each output dimension is independent and takes the product over them within the logarithm (and before the expert indicator variable is marginalised).
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_dagp
(data)¶ Lower bound used in Data Association with GPs (DAGP).
This bound doesn’t marginalise the expert indicator variable.
TODO check I’ve implemented this correctlyy. It’s definitely slower thatn it should be.
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_further
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
Looser bound than lower_bound_tight as it marginalises both of the expert’s and the gating network’s inducing variables $q(hat{f}, hat{h})$ in closed-form. Replaces M-dimensional approx integrals with 1-dimensional approx integrals.
This bound is equivalent to a different likelihood approximation that only mixes the noise models (as opposed to the full GPs).
This bound assumes each output dimension is independent.
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_further_2
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
Looser bound than lower_bound_tight but marginalises the inducing variables $q(hat{f}, hat{h})$ in closed-form. Replaces M-dimensional approx integrals with 1-dimensional approx integrals.
This bound assumes each output dimension is independent.
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_further_experts
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
Similar to lower_bound_tight but with a further bound on the experts. The bound removes the M dimensional integral over each expert’s inducing variables $q(hat{mathbf{U}})$ with 1 dimensional integrals over the gating network variational posterior $q(mathbf{h}_n)$.
This bound is equivalent to a different likelihood approximation that only mixes the noise models (as opposed to the full GPs).
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_further_gating
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
Similar to lower_bound_tight but with a further bound on the gating network. The bound removes the M dimensional integral over the gating network inducing variables $q(hat{mathbf{U}})$ with 1 dimensional integrals over the gating network variational posterior $q(mathbf{h}_n)$.
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_tight
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
Tighter bound than lower_bound_further but requires an M dimensional expectation over the inducing variables $q(hat{f}, hat{h})$ to be approximated (with Gibbs sampling).
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
lower_bound_tight_2
(data)¶ Lower bound to the log-marginal likelihood (ELBO).
Tighter bound than lower_bound_further but requires an M dimensional expectation over the inducing variables $q(hat{f}, hat{h})$ to be approximated (with Gibbs sampling).
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
loss - a Tensor with shape ()
-
marginal_likelihood
(data)¶ Marginal likelihood (ML).
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
marginal likelihood - a Tensor with shape ()
-
marginal_likelihood_new
(data)¶ Marginal likelihood (ML).
- Parameters
data (
Tuple
[Tensor
,Tensor
]) – data tuple (X, Y) with inputs [num_data, input_dim] and outputs [num_data, ouput_dim])- Return type
Tensor
- Returns
marginal likelihood - a Tensor with shape ()
-
maximum_log_likelihood_objective
(data)¶ Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.
- Parameters
data (
Tuple
[Tensor
,Tensor
]) –- Return type
Tensor
-
predict_experts_fs
(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶ “Compute mean and (co)variance of experts latent functions at Xnew.
If num_inducing_samples is not None then sample inducing points instead of analytically integrating them. This is required in the mixture of experts lower bound.
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]num_inducing_samples (
Optional
[int
]) – the number of samples to draw from the inducing point distributions during training.
- Return type
Tuple
[Tensor
,Tensor
]- Returns
a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]
SVGP Experts¶
-
class
mogpe.experts.
SVGPExpert
(kernel, likelihood, inducing_variable, mean_function=None, num_latent_gps=1, q_diag=False, q_mu=None, q_sqrt=None, whiten=True, num_data=None)¶ Sparse Variational Gaussian Process Expert.
This class inherits the prior_kl() method from the SVGPModel class and implements the predict_dist() method using SVGPModel’s predict_y method.
- Parameters
kernel (
Kernel
) –likelihood (
Likelihood
) –mean_function (
Optional
[MeanFunction
]) –num_latent_gps (
int
) –q_diag (
bool
) –whiten (
bool
) –
-
predict_dist
(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶ Returns the mean and (co)variance of the experts prediction at Xnew.
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]num_inducing_samples (
Optional
[int
]) – the number of samples to draw from the inducing points joint distribution.full_cov (
bool
) – If True, draw correlated samples over the inputs. Computes the Cholesky over the dense covariance matrix of size [num_data, num_data]. If False, draw samples that are uncorrelated over the inputs.full_output_cov (
bool
) – If True, draw correlated samples over the outputs. If False, draw samples that are uncorrelated over the outputs.
- Return type
Tuple
[Tensor
,Tensor
]- Returns
tuple of Tensors (mean, variance), means shape is [num_inducing_samples, num_test, output_dim], if full_cov=False variance tensor has shape [num_inducing_samples, num_test, ouput_dim] and if full_cov=True, [num_inducing_samples, output_dim, num_test, num_test]
-
class
mogpe.experts.
SVGPExperts
(experts_list=None, name='Experts')¶ Extension of ExpertsBase for a set of SVGPExpert experts.
Provides an interface between a set of SVGPExpert instances and the MixtureOfSVGPExperts class.
- Parameters
experts_list (
Optional
[List
[SVGPExpert
]]) –
-
predict_dists
(Xnew, **kwargs)¶ Returns the set of experts predicted dists at Xnew.
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]- Return type
Distribution
- Returns
a batched tfd.Distribution with batch_shape […, num_test, output_dim, num_experts]
-
predict_fs
(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶ Returns the set experts latent function mean and (co)vars at Xnew.
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]num_inducing_samples (
Optional
[int
]) –
- Return type
Tuple
[Tensor
,Tensor
]- Returns
a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]
-
predict_ys
(Xnew, num_inducing_samples=None, full_cov=False, full_output_cov=False)¶ Returns the set of experts predictions mean and (co)vars at Xnew.
- Parameters
Xnew (
Tensor
) – inputs with shape [num_test, input_dim]num_inducing_samples (
Optional
[int
]) –
- Return type
Tuple
[Tensor
,Tensor
]- Returns
a tuple of (mean, (co)var) each with shape […, num_test, output_dim, num_experts]
-
prior_kls
()¶ Returns the set of experts KL divergences as a batched tensor.
- Return type
Tensor
- Returns
a Tensor with shape [num_experts,]