Department of Biomedical Engineering and Computational Science

# MLP Network in Regression Problem with 2 Inputs, 'demo_2input'

The demonstration program solves a simple 2-dimensional regression problem using an MLP network with Bayesian lerning. The demonstration is discussed completely in the documentation. The data in the demonstration has been used in the work by Christopher J. Paciorek and Mark J. Schervish (2004). The training data has an unknown Gaussian noise and can be seen in the figure 1.

An MLP network with 10 hidden units and one output is created. The weight and bias structure is hierarchical with three levels of hyper-parameters and automatic relevance determination (ARD). The sampling for hyperpameters is done with Gibbs sampling and for weights with Hybrid Monte Carlo sampling. We sampled for 2500 samples from which 250 burn in samples were thrown away. After this We still thinned the sample chain by taking only every 138'th sample. This was done to ensure the independence of all samples. The convergence, independence and explicitly of samples was examined by Markov Chain Monte Carlo diagnostics tools. Figure 2 shows the prediction from the network.

The code needed to get the results is shown below and is treated more carefully in the documentation. The noisy data and the prediction of network is shown in the bottom.

function demo_2input

x = [data(:,1) data(:,2)];
y = data(:,3);

% Draw the data. The data is not sorted.
figure
title({'The noisy training data'});
[xi,yi,zi]=griddata(data(:,1),data(:,2),data(:,3),-1.8:0.01:1.8,[-1.8:0.01:1.8]');
mesh(xi,yi,zi)

% Create an MLP network for regression model. function mlp
% initializes weights to zero.
nin=size(x,2);
nhid=10;
nout=1;

net = mlp2('mlp2r', nin, nhid, nout);

% Create a Gaussian multivariate hierarchical prior with ARD
% for network...
net=mlp2normp(net, {{repmat(0.1,1,net.nin) 0.05 0.5 -0.05 1} ...
{0.1 0.05 0.5} ...
{0.1 -0.05 0.5} ...
{1}});

% ...and the same for residual.
net.p.r = norm_p({0.05 0.05 0.5});

% Set the options for Monte Carlo sampling
opt=mlp2r_mcopt;
opt.repeat=50;
opt.plot=0;
opt.hmc_opt.steps=40;
hmc2('state', sum(100*clock));

% Initialize weights to zero and sample for the first time.
% The opt.gibbs = 0 for this round of MC-sampling, so the
% sampling is done only for weigths. This way we can reach the
% equilibrium distribution more quickly.
net = mlp2unpak(net, mlp2pak(net)*0);
[r1,net1]=mlp2r_mc(opt, net, x, y);

% Now sample also for hyper-priors with Gibbs sampling.
opt.hmc_opt.steps=60;
opt.repeat=70;
opt.gibbs=1;
[r2,net2]=mlp2r_mc(opt, net1, x, y, [], [], r1);

% Now we should be close to equilibrium already and we can start
% the sampling with windowing and persistence.
opt.hmc_opt.steps=100;
opt.hmc_opt.window=5;
opt.hmc_opt.persistence=1;
[r3,net3]=mlp2r_mc(opt, net2, x, y, [], [], r2);

% Sample for the posterior 2500 samples from which every second
% is taken after 41 burn-in samples (the function thin).
opt.repeat=60;
opt.hmc_opt.steps=100;
opt.nsamples=2500;
[r,net]=mlp2r_mc(opt, net3, x, y, [], [], r3);
mlp=thin(r,150,25);

% Create new data with the right scale checked above
figure
[p1,p2]=meshgrid(-1.8:0.05:1.8,-1.8:0.05:1.8);
p=[p1(:) p2(:)];
tms=squeeze(mlp2fwds(mlp,p))';
g=mean(tms);

%Plot the new data
gp=zeros(size(p1));
gp(:)=g;
mesh(p1,p2,gp);
axis on;

Figure 1. A training data with Gaussian noise.

Figure 2. The prediction from MLP network.