Department of Biomedical Engineering and Computational Science

Input variable selection with RJMCMC for GP, 'demo_rjmcmcgp'

This program demonstrates the benefits of RJMCMC sampling in input variable selection. The program is an extension of demonstration program 'demo_2classgp', Here, we have added three irrelevant inputs to the model. The sampling is done between models with different inputs using RJMCMC.

RJMCMC method is an extension of Metropolis-Hastings algorithm that allows jumps between models with different dimensional parameter spaces. In the case of input selection the dimension of networks parameter space changes with respect to the number of inputs chosen in the model. RJMCMC visits the models according to their posterior probability. After sampling a chain of convenient length the marginal posterior probability of input can be approximated from the number of visits to that model.

The demonstration program is based on synthetic two class data used by B. D. Ripley (http://www.stats.ox.ac.uk/pub/PRNN/). The data consists of 2-dimensional vectors that are divided into to classes, labeled 0 or 1. Each class has a bimodal distribution generated from equal mixtures of Gaussian distributions with identical covariance matrices. After the vectors are labeled we have added three irrelevant components to them.

There are two data sets from which other is used for training and the other for testing the prediction. The trained Gaussian process misclassifies the test data only in ~9.6% of cases. The posterior importance of each input is: 0.9950 (input 1), 0.9980 (input 2), 0.0379 (input 3), 0.0529 (input 4) and 0.1048 (input 5).

The code of demonstration program is shown below.

x=load('demos/synth2.mat');
x = x.x;
y=load('demos/synth.tr');
y=y(:,end);

[n, nin] = size(x);
gp=gp2('gp2b',nin,'exp', 'jitter');
gp.jitterSigmas=10;
gp.expScale=0.2;
gp.expSigmas=repmat(0.1,1,gp.nin);
gp.noiseSigmas = 0.2;
gp.f='norm';

gp.p.expSigmas=invgam_p({0.1 0.5 0.05 1});
gp.p.expScale=invgam_p({0.05 0.5});

% Intialize weights to zero and set the optimization parameters...
w=randn(size(gp2pak(gp)))*0.01;

fe=str2fun('gp2r_e');
fg=str2fun('gp2r_g');
n=length(y);
itr=1:floor(0.5*n);     % training set of data for early stop
its=floor(0.5*n)+1:n;   % test set of data for early stop
optes=scges_opt;
optes.display=1;
optes.tolfun=1e-1;
optes.tolx=1e-1;

% ... Start scaled conjugate gradient optimization with early stopping.
[w,fs,vs]=scges(fe, w, optes, fg, gp, x(itr,:),y(itr,:), gp,x(its,:),y(its,:));
gp=gp2unpak(gp,w);

% set the options for sampling
opt=gp2_mcopt;
opt.repeat=20;
opt.nsamples=1;
opt.hmc_opt.steps=20;
opt.hmc_opt.stepadj=0.1;
opt.hmc_opt.nsamples=1;

opt.sample_latent = 20;
opt.sample_latent_scale = 0.5;
gp.latentValues = randn(size(y));
hmc2('state', sum(100*clock));

[r,gp,rstate]=gp2b_mc(opt, gp, x, y);

% Set the main sampling options
opt.repeat=10;
opt.nsamples = 2000;
opt.hmc_opt.persistence=1;
opt.sample_variances=0;
opt.hmc_opt.window=5;
opt.hmc_opt.stepadj=0.75;
opt.hmc_opt.steps=10;

% Set the sampling options for RJMCMC
opt.sample_inputs=2;
opt.rj_opt.pswitch = 1/3;
opt.rj_opt.pdeath = 0.5;
opt.rj_opt.lpk = log(ones(1,gp.nin)/gp.nin); % log of uniform prior for k:s
gp.inputii = logical(ones(1,gp.nin));

[r,g,rstate]=gp2b_mc(opt, gp, x, y, [], [], r, rstate);
% Thin the sample chain.
r=thin(r,50,6);

% Draw the decision line and training points in the same plot
[p1,p2]=meshgrid(-1.3:0.05:1.1,-1.3:0.05:1.1);
p=[p1(:) p2(:)];

tms2=mean((logsig(gp2fwds(r, x, r.latentValues', p))),3);

%Plot the decision line
gp=zeros(size(p1));
gp(:)=tms2;
contour(p1,p2,gp,[0.5 0.5],'k');

hold on;
% plot the train data o=0, x=1
plot(x(y==0,1),x(y==0,2),'o');
plot(x(y==1,1),x(y==1,2),'x');
hold off;

% Lets test how well the network works for test data.
tx=load('demos/synth.ts');
ty=tx(:,end);
tx(:,end)=[];

tga=mean(logsig(gp2fwds(r, x, r.latentValues', tx)),3);

% lets calculate the percentage of misclassified points
missed = sum(abs(round(tga)-ty))/size(ty,1)*100;