GP with Students t residual model in a regression problem, 'demo_tgp'
This example demonstrates the use of Student's
tdistribution model for residual in a regression
problem. The synthetic data used here is the same used
by Radford M. Neal in his regression problem with
outliers example in Software for Flexible Bayesian
Modeling (http://www.cs.toronto.edu/~radford/fbm.software.html).
The problem consist of one dimensional input and
target variables. The input data, x, is sampled from
standard Gaussian distribution and the corresponding
target values come from a distribution with mean given
by
y = 0.3 + 0.4x + 0.5sin(2.7x) + 1.1/(1+x^2).
For most of the cases the distribution about this mean
is Gaussian with standard deviation of 0.1, but with
probability 0.05 a case is an outlier for which the
standard deviation is 1.0. There are total 200 cases
from which the first 100 are used for training and the
last 100 for testing. The training data is shown in
the picture below together with the underlying
mean.
Gaussian process parameters are given a Gaussian hierarchical prior and for the residual we construct a model with Student's tdistribution. The number of degrees of freedom is given an InverseGamma prior. The model is discussed in detail in (Bayesian Approach for Neural Networks  Review and Case Studies, Lampinen and Vehtari, 2001, Neural Networks). The code of demonstration program is given below. The predictions of network for test cases are shown in the picture 3 and the test cases in the picture 2.


% load the data. First 100 variables are for training % and last 100 for test x = load(L); xt = x(101:end,1); yt = x(101:end,2); y = x(1:100,2); x = x(1:100,1); % plot the training data with dots and the underlying % mean of it as a line xx = [2.7:0.1:2.7]; yy = 0.3+0.4*xx+0.5*sin(2.7*xx)+1.1./(1+xx.^2); figure plot(x,y,'.') hold on plot(xx,yy) title('training data') % create the Gaussian process [n, nin] = size(x); gp=gp2('gp2r',nin,'exp'); gp.jitterSigmas=0.1; gp.expScale=0.2; gp.expSigmas=repmat(0.1,1,gp.nin); gp.noiseSigmas=0.2; gp.f='norm'; % Create a Gaussian multivariate hierarchical prior for GP parameters... gp.p.expSigmas=invgam_p({0.1 0.5 0.05 1}); gp.p.expScale=invgam_p({0.05 0.5}); gp.p.noiseSigmas=invgam_p({0.05 0.5}); % set the sampling options for first two rounds of sampling opt=gp2_mcopt; opt.repeat=20; opt.nsamples=2; opt.hmc_opt.steps=20; opt.hmc_opt.stepadj=0.4; opt.hmc_opt.nsamples=1; opt.hmc_opt.persistence=1; opt.sample_variances=0; hmc2('state', sum(100*clock)); [r,gp,rstate1]=gp2r_mc(opt, gp, x, y); % Create Student's t prior for residual and sample with fixed number of % degrees of freedom. gp.p.noiseSigmas=invgam_p({gp.noiseSigmas 4 0.05 1}); gp.noiseSigmas=repmat(gp.noiseSigmas,1,n); gp.noiseVariances=gp.noiseSigmas.^2; opt.sample_variances=1; [r,gp,rstate]=gp2r_mc(opt, gp, x, y); % Add a hyperprior for degrees of freedom in tdistribution gp.p.noiseSigmas.p.nu=invgam_p({6 1}); opt.hmc_opt.stepadj=0.4; opt.nsamples= 300; % Do the main samling [r,gp,rstate]=gp2r_mc(opt, gp, x, y, [], [], r, rstate); % thin the record rr = thin(r,10,2); % make predictions for test set tga = mean(squeeze(gp2fwds(rr,x,y,xt)),2); % Plot the network outputs as '.', and underlying mean with '' figure plot(xt,tga,'.') hold on plot(xx,yy,'') legend('prediction', 'mean') title('prediction from MLP') % plot the test data set as '.', and underlying mean with '' figure plot(xt,yt,'.') hold on plot(xx,yy,'') legend('data', 'mean') title('test data') % evaluate the RMSE error with respect to mean yyt = 0.3+0.4*xt+0.5*sin(2.7*xt)+1.1./(1+xt.^2); er = (yyttga)'*(yyttga)/length(yyt);