Department of Biomedical Engineering and Computational Science

GP with Students t residual model in a regression problem, 'demo_tgp'

This example demonstrates the use of Student's t-distribution model for residual in a regression problem. The synthetic data used here is the same used by Radford M. Neal in his regression problem with outliers example in Software for Flexible Bayesian Modeling (http://www.cs.toronto.edu/~radford/fbm.software.html). The problem consist of one dimensional input and target variables. The input data, x, is sampled from standard Gaussian distribution and the corresponding target values come from a distribution with mean given by

y = 0.3 + 0.4x + 0.5sin(2.7x) + 1.1/(1+x^2).

For most of the cases the distribution about this mean is Gaussian with standard deviation of 0.1, but with probability 0.05 a case is an outlier for which the standard deviation is 1.0. There are total 200 cases from which the first 100 are used for training and the last 100 for testing. The training data is shown in the picture below together with the underlying mean.

Figure 1. A training data with Gaussian noise and few outliers.

Gaussian process parameters are given a Gaussian hierarchical prior and for the residual we construct a model with Student's t-distribution. The number of degrees of freedom is given an Inverse-Gamma prior. The model is discussed in detail in (Bayesian Approach for Neural Networks -- Review and Case Studies, Lampinen and Vehtari, 2001, Neural Networks). The code of demonstration program is given below. The predictions of network for test cases are shown in the picture 3 and the test cases in the picture 2.

 Figure 2. A test data with Gaussian noise and few outliers. Figure 3. The prediction from MLP network.
```% load the data. First 100 variables are for training
% and last 100 for test
xt = x(101:end,1);
yt = x(101:end,2);
y = x(1:100,2);
x = x(1:100,1);

% plot the training data with dots and the underlying
% mean of it as a line
xx = [-2.7:0.1:2.7];
yy = 0.3+0.4*xx+0.5*sin(2.7*xx)+1.1./(1+xx.^2);
figure
plot(x,y,'.')
hold on
plot(xx,yy)
title('training data')

% create the Gaussian process
[n, nin] = size(x);
gp=gp2('gp2r',nin,'exp');
gp.jitterSigmas=0.1;
gp.expScale=0.2;
gp.expSigmas=repmat(0.1,1,gp.nin);
gp.noiseSigmas=0.2;
gp.f='norm';

% Create a Gaussian multivariate hierarchical prior for GP parameters...
gp.p.expSigmas=invgam_p({0.1 0.5 0.05 1});
gp.p.expScale=invgam_p({0.05 0.5});
gp.p.noiseSigmas=invgam_p({0.05 0.5});

% set the sampling options for first two rounds of sampling
opt=gp2_mcopt;
opt.repeat=20;
opt.nsamples=2;
opt.hmc_opt.steps=20;
opt.hmc_opt.nsamples=1;
opt.hmc_opt.persistence=1;
opt.sample_variances=0;
hmc2('state', sum(100*clock));

[r,gp,rstate1]=gp2r_mc(opt, gp, x, y);

% Create Student's t prior for residual and sample with fixed number of
% degrees of freedom.
gp.p.noiseSigmas=invgam_p({gp.noiseSigmas 4 0.05 1});
gp.noiseSigmas=repmat(gp.noiseSigmas,1,n);
gp.noiseVariances=gp.noiseSigmas.^2;
opt.sample_variances=1;

[r,gp,rstate]=gp2r_mc(opt, gp, x, y);

% Add a hyperprior for degrees of freedom in t-distribution
gp.p.noiseSigmas.p.nu=invgam_p({6 1});
opt.nsamples= 300;

% Do the main samling
[r,gp,rstate]=gp2r_mc(opt, gp, x, y, [], [], r, rstate);

% thin the record
rr = thin(r,10,2);

% make predictions for test set
tga = mean(squeeze(gp2fwds(rr,x,y,xt)),2);

% Plot the network outputs as '.', and underlying mean with '--'
figure
plot(xt,tga,'.')
hold on
plot(xx,yy,'--')
legend('prediction', 'mean')
title('prediction from MLP')

% plot the test data set as '.', and underlying mean with '--'
figure
plot(xt,yt,'.')
hold on
plot(xx,yy,'--')
legend('data', 'mean')
title('test data')

% evaluate the RMSE error with respect to mean
yyt = 0.3+0.4*xt+0.5*sin(2.7*xt)+1.1./(1+xt.^2);
er = (yyt-tga)'*(yyt-tga)/length(yyt);
```