MLP with Students t residual model in a regression problem, 'demo_tmlp'
This example demonstrates the use of Student's tdistribution model
for residual in a regression problem. The synthetic data used here is
the same used by Radford M. Neal in his regression problem with outliers
example in Software for Flexible Bayesian Modeling (http://www.cs.toronto.edu/~radford/fbm.software.html).
The problem consist of one dimensional input and target variables. The
input data, x, is sampled from standard Gaussian distribution and the
corresponding target values come from a distribution with mean given
by
y = 0.3 + 0.4x + 0.5sin(2.7x) + 1.1/(1+x^2).
For most of the cases the distribution about this mean is Gaussian with
standard deviation of 0.1, but with probability 0.05 a case is an outlier
for wchich the standard deviation is 1.0. There are total 200 cases
from which the first 100 are used for training and the last 100 for
testing. The training data is shown in the picture below together with
the underlying mean.
Network weights are given a Gaussian hierarchical prior and for the residual we construct a model with Student's tdistribution. The number of degrees of freedom is sampled from discretized values. The model is discussed in detail in (Bayesian Approach for Neural Networks  Review and Case Studies, Lampinen and Vehtari, 2001, Neural Networks). The code of demonstration program is given below. The predictions of network for test cases are shown in the picture 3 and the test cases in the picture 2.


% load the data. First 100 variables are for training % and last 100 for test x = load('demos/odata'); xt = x(101:end,1); yt = x(101:end,2); y = x(1:100,2); x = x(1:100,1); % plot the training data in dots and the underlying % mean of it as a line xx = [2.7:0.1:2.7]; yy = 0.3+0.4*xx+0.5*sin(2.7*xx)+1.1./(1+xx.^2); figure plot(x,y,'.') hold on plot(xx,yy) % create the network nin=size(x,2); nhid=8; nout=1; net = mlp2('mlp2r', nin, nhid, nout); % Create a Gaussian multivariate hierarchical prior for network... net=mlp2normp(net, {{0.1 0.05 0.5 0.05 1} ... {0.1 0.05 0.5} ... {0.1 0.05 0.5} ... {1}}) % Create students tdistribution prior for residual. % first the prior is created wiht fixed number of freedoms. % after which the sampling is done forone round net.p.r = t_p({0.1 4 0.05 0.5}); % Set the sampling options opt=mlp2r_mcopt; opt.repeat=50; opt.plot=0; opt.hmc_opt.steps=40; opt.hmc_opt.stepadj=0.1; hmc2('state', sum(100*clock)); % Sample for one round with fixed nu net = mlp2unpak(net, mlp2pak(net)*0); [r1,net1]=mlp2r_mc(opt, net, x, y); % create the prior structure for the final sampling % Here nu (number of freedom) is sampled from discrete % set of values. net1.p.r = t_p({net.p.r.a.s net.p.r.a.nu 0.05 0.5 ... [2 2.3 2.6 3 3.5 4 4.5 5 6 7 8 9 10 12 14 16 18 20 25 30 35 40 45 50]}); opt.hmc_opt.stepadj=0.2; opt.hmc_opt.steps=60; opt.repeat=70; opt.gibbs=1; [r2,net2]=mlp2r_mc(opt, net1, x, y, [], [], r1); % The sampling parameters are fixed a bit and the sampling with % windowing and persistence is started. After this the starting % values for main sampling are reached and main sampling is started. opt.hmc_opt.stepadj=0.3; opt.hmc_opt.steps=100; opt.hmc_opt.window=5; opt.hmc_opt.persistence=1; [r3,net3]=mlp2r_mc(opt, net2, x, y, [], [], r2); % Sample for the posterior XX samples . opt.repeat=60; opt.hmc_opt.steps=100; opt.hmc_opt.stepadj=0.5; opt.nsamples=2000; [r,net]=mlp2r_mc(opt, net3, x, y, [], [], r3); % thin the record rr = thin(r,50,10) % make predictions for test set tga = mean(mlp2fwds(r,xt),3); % Plot the network outputs as '.', and underlying mean with '' figure plot(xt,tga,'.') hold on plot(xx,yy,'') legend('prediction', 'mean') % plot the test data set as '.', and underlying mean with '' figure plot(xt,yt,'.') hold on plot(xx,yy,'') legend('data', 'mean') % evaluate tha averige squored error to the mean yyt = 0.3+0.4*xt+0.5*sin(2.7*xt)+1.1./(1+xt.^2); er = (yyttga)'*(yyttga)/length(yyt);