Department of Biomedical Engineering and Computational Science

MLP with Students t residual model in a regression problem, 'demo_tmlp'

This example demonstrates the use of Student's t-distribution model for residual in a regression problem. The synthetic data used here is the same used by Radford M. Neal in his regression problem with outliers example in Software for Flexible Bayesian Modeling (http://www.cs.toronto.edu/~radford/fbm.software.html). The problem consist of one dimensional input and target variables. The input data, x, is sampled from standard Gaussian distribution and the corresponding target values come from a distribution with mean given by

y = 0.3 + 0.4x + 0.5sin(2.7x) + 1.1/(1+x^2).

For most of the cases the distribution about this mean is Gaussian with standard deviation of 0.1, but with probability 0.05 a case is an outlier for wchich the standard deviation is 1.0. There are total 200 cases from which the first 100 are used for training and the last 100 for testing. The training data is shown in the picture below together with the underlying mean.

Figure 1

Figure 1. A training data with Gaussian noise and few outliers.

Network weights are given a Gaussian hierarchical prior and for the residual we construct a model with Student's t-distribution. The number of degrees of freedom is sampled from discretized values. The model is discussed in detail in (Bayesian Approach for Neural Networks -- Review and Case Studies, Lampinen and Vehtari, 2001, Neural Networks). The code of demonstration program is given below. The predictions of network for test cases are shown in the picture 3 and the test cases in the picture 2.


Figure 1

Figure 2. A test data with Gaussian noise and few outliers.

Figure 2

Figure 3. The prediction from MLP network.

% load the data. First 100 variables are for training
% and last 100 for test
x = load('demos/odata');
xt = x(101:end,1);
yt = x(101:end,2);
y = x(1:100,2);
x = x(1:100,1);

% plot the training data in dots and the underlying
% mean of it as a line
xx = [-2.7:0.1:2.7];
yy = 0.3+0.4*xx+0.5*sin(2.7*xx)+1.1./(1+xx.^2);
figure
plot(x,y,'.')
hold on
plot(xx,yy)

% create the network
nin=size(x,2);
nhid=8;
nout=1;
net = mlp2('mlp2r', nin, nhid, nout);

% Create a Gaussian multivariate hierarchical prior for network...
net=mlp2normp(net, {{0.1 0.05 0.5 -0.05 1} ...
          {0.1 0.05 0.5} ...
          {0.1 -0.05 0.5} ...
          {1}})

% Create students t-distribution prior for residual.
% first the prior is created wiht fixed number of freedoms.
% after which the sampling is done forone round
net.p.r = t_p({0.1 4 0.05 0.5});

% Set the sampling options
opt=mlp2r_mcopt;
opt.repeat=50;
opt.plot=0;
opt.hmc_opt.steps=40;
opt.hmc_opt.stepadj=0.1;
hmc2('state', sum(100*clock));

% Sample for one round with fixed nu
net = mlp2unpak(net, mlp2pak(net)*0);
[r1,net1]=mlp2r_mc(opt, net, x, y);

% create the prior structure for the final sampling
% Here nu (number of freedom) is sampled from discrete
% set of values.
net1.p.r = t_p({net.p.r.a.s net.p.r.a.nu 0.05 0.5 ...
         [2 2.3 2.6 3 3.5 4 4.5 5 6 7 8 9 10 12 14 16 18 20 25 30 35 40 45 50]});

opt.hmc_opt.stepadj=0.2;
opt.hmc_opt.steps=60;
opt.repeat=70;
opt.gibbs=1;
[r2,net2]=mlp2r_mc(opt, net1, x, y, [], [], r1);

% The sampling parameters are fixed a bit and the sampling with
% windowing and persistence is started. After this the starting
% values for main sampling are reached and main sampling is started.
opt.hmc_opt.stepadj=0.3;
opt.hmc_opt.steps=100;
opt.hmc_opt.window=5;
opt.hmc_opt.persistence=1;
[r3,net3]=mlp2r_mc(opt, net2, x, y, [], [], r2);

% Sample for the posterior XX samples .
opt.repeat=60;
opt.hmc_opt.steps=100;
opt.hmc_opt.stepadj=0.5;
opt.nsamples=2000;

[r,net]=mlp2r_mc(opt, net3, x, y, [], [], r3);

% thin the record
rr = thin(r,50,10)

% make predictions for test set
tga = mean(mlp2fwds(r,xt),3);

% Plot the network outputs as '.', and underlying mean with '--'
figure
plot(xt,tga,'.')
hold on
plot(xx,yy,'--')
legend('prediction', 'mean')

% plot the test data set as '.', and underlying mean with '--'
figure
plot(xt,yt,'.')
hold on
plot(xx,yy,'--')
legend('data', 'mean')

% evaluate tha averige squored error to the mean
yyt = 0.3+0.4*xt+0.5*sin(2.7*xt)+1.1./(1+xt.^2);
er = (yyt-tga)'*(yyt-tga)/length(yyt);