Department of Biomedical Engineering and Computational Science

MLP network in 2-class Classification problem, 'demo_2class'

In the demonstration program an MLP network is used with a Bayesian learning for classification problem of 2 classes.

The demonstration program is based on synthetic two class data used by B. D. Ripley (http://www.stats.ox.ac.uk/pub/PRNN/). The data consists of 2-dimensional vectors that are divided into to classes, labeled 0 or 1. Each class has a bimodal distribution generated from equal mixtures of Gaussian distributions with identical covariance matrices.

There are two data sets from which other is used for network training and the other for testing the prediction. The trained network misclassifies the test data only in ~10% of cases. The decision line and the training data is shown in the picture below.

The code of demonstration program is shown below.

function demo_2class

% Load the data
x=load('demos/synth.tr');
y=x(:,end);
x(:,end)=[];

nin=size(x,2);
nhid=10;
nout=size(y,2);
% create MLP with logistic output function ('mlp2b')
net = mlp2('mlp2b', nin, nhid, nout);

%Create a Gaussian multivariate hierarchical prior with ARD
% for network...
net=mlp2normp(net, {{repmat(10,1,net.nin) 0.05 0.5 -0.05 1}... % input-hidden weigth
  {10 0.05 0.5} ...                          % bias-hidden
  {10 -0.05 0.5} ...                         % hidden-output weigth
  {1}})                                      % bias-output


% Intialize weights to zero and set the optimization parameters...
w=randn(size(mlp2pak(net)))*0.01;

fe=str2fun('mlp2b_e');
fg=str2fun('mlp2b_g');
n=length(y);
itr=1:floor(0.5*n);     % training set of data for early stop
its=floor(0.5*n)+1:n;   % test set of data for early stop
optes=scges_opt;
optes.display=1;
optes.tolfun=1e-1;
optes.tolx=1e-1;

% ... Start scaled conjugate gradient optimization with early stopping.
[w,fs,vs]=scges(fe, w, optes, fg, net, x(itr,:),y(itr,:), net,x(its,:),y(its,:));
net=mlp2unpak(net,w);

% Set the starting values for weights hyperparameter to be the
% variance of the early stopped weight.
shypw1 = std(net.w1,0,2)';
shypb1 = std(net.b1);
shypw2 = std(net.w2(:));

net=mlp2normp(net, {{shypw1.^2  0.5 0.05 -0.05 1}...               % input-hidden weigth
      {shypb1.^2  0.5 0.05} ...                      % bias-hidden
      {shypw2.^2 -0.5 0.05} ...                      % hidden-output weigth
      {1}})                                          % bias-output

% First we initialize random seed for Monte
% Carlo sampling and set the sampling options to default.
hmc2('state', sum(100*clock));
opt=mlp2b_mcopt;
opt.sample_inputs=0; % do not use RJMCMC for input variables

%  Here we do the sampling without persistence.
opt.repeat=70;
opt.hmc_opt.steps=10;
opt.hmc_opt.stepadj=0.2;
opt.gibbs=1;
opt.nsamples=1;
[r,net,rstate]=mlp2b_mc(opt, net, x, y);

% Now that the starting values are found we start the main sampling.
opt.hmc_opt.stepadj=0.5;
opt.hmc_opt.persistence=1;
opt.hmc_opt.steps=40;
opt.hmc_opt.decay=0.95;
opt.repeat=50;
opt.nsamples=600;
opt.hmc_opt.window=5;

[r,net,rstate]=mlp2b_mc(opt, net, x, y, [], [], r, rstate);

% Thin the sample chain.
r=thin(r,200,6);

% Draw the decision line and training points in the same plot
[p1,p2]=meshgrid(-1.3:0.05:1.1,-1.3:0.05:1.1);
p=[p1(:) p2(:)];
tms=mean((logsig(mlp2fwds(r,p))),3);

%Plot the decision line
gp=zeros(size(p1));
gp(:)=tms;
contour(p1,p2,gp,[0.5 0.5],'k');

hold on;
% plot the train data o=0, x=1
plot(x(y==0,1),x(y==0,2),'o');
plot(x(y==1,1),x(y==1,2),'x');
hold off;

% Lets test how well the network works for test data.
tx=load('demos/synth.ts');
ty=tx(:,end);
tx(:,end)=[];

tga=mean(logsig(mlp2fwds(r,tx)),3);

% lets calculate the percentage of misclassified points
missed = sum(abs(round(tga)-ty))/size(ty,1)*100;
Figure 1

Figure 1. The training points and decision line.