Demo of MatBugs using 'schools' dataset

Written by Kevin Murphy, 11 October 2007.

Consider the following example from Gelman et al, "Bayesian data analysis", p138. We want to estimate the performance of 8 schools on a standardized test. Let y_j be the average score of school j and sigma_j be the corresponding standard deviation, for j=1:J where J=8. Let theta_j be the "true" score of school j. Since we expect schools to have similar scores, we will assume that each theta_j is drawn from a common prior, theta_j ~ N(mu.theta, tau.theta), where tau.theta is the precision of the prior. This lets us share information between schools (c.f. "soft weight sharing" in a neural network). This is illustrated below.

schools model

Let us use an uninformative prior for mu.theta and sigma.theta, and a Gaussian observation model. In BUGS,

Y <- f(X)
means Y is a deterministic function of X. This is denoted by a double arrow in the picture above. For example,
Y <- pow(X,2)
means Y=X^2.

model {
  for (j in 1:J){                          # J=8, the number of schools
    y[j] ~ dnorm (theta[j], tau.y[j])      # data model:  the likelihood
    tau.y[j] <- pow(sigma.y[j], -2)        # tau = 1/sigma^2
  }
  for (j in 1:J){
    theta[j] ~ dnorm (mu.theta, tau.theta) # hierarchical model for theta
  }
  tau.theta <- pow(sigma.theta, -2)        # tau = 1/sigma^2
  mu.theta ~ dnorm (0.0, 1.0E-6)           # noninformative prior on mu
  sigma.theta ~ dunif (0, 1000)            # noninformative prior on sigma
}
Now we need to specify the data and initial conditions. In Matlab, we type

dataStruct = struct('J', 8, ...
                    'y', [28, 8, -3, 7, -1, 1, 18, 12], ...
                   'sigma_y', [15, 10, 16, 11, 9, 11, 10, 18]);

Nchains = 3;

% we initialize the params to the observed data values, but with decreasing confidence,
% as suggested on p593 of Gelman
for i=1:Nchains
  S.theta = dataStruct.y;
  S.mu_theta = 0;
  S.sigma_theta = 10^i; % each chain becomes more over-dispersed
  initStructs(i) = S;
end
Now we call the function.

[samples, stats] = matbugs(dataStruct, ...
		fullfile(pwd, 'schools_model.txt'), ...
		'init', initStructs, ...
		'view', 1, 'nburnin', 1000, 'nsamples', 500, ...
		'thin', 10, ...
		'monitorParams', {'theta', 'mu_theta', 'sigma_theta'}, ...
		'Bugdir', 'C:/Program Files/WinBUGS14');
BUGS produces the following text output:
display(log)
check(C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/schools_model.txt)
model is syntactically correct
data(C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/tmp/data.txt)
data loaded
compile(3)
model compiled
inits(1,C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/tmp/init_1.txt)
chain initialized but other chain(s) contain uninitialized variables
inits(2,C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/tmp/init_2.txt)
chain initialized but other chain(s) contain uninitialized variables
inits(3,C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/tmp/init_3.txt)
model is initialized
refresh(100)
gen.inits()
command #Bugs:gen.inits cannot be executed (is greyed out)
update(1000)
set(theta)
set(mu.theta)
set(sigma.theta)
thin.updater(10)
update(500)
coda(*,C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/tmp/coda)
stats(*)

Node statistics
	 node	 mean	 sd	 MC error	2.5%	median	97.5%	start	sample
	mu.theta	7.639	5.081	0.1715	-2.524	7.624	18.03	1001	1500
	sigma.theta	6.815	5.91	0.2234	0.2156	5.526	22.16	1001	1500
	theta[1]	11.6	8.673	0.2949	-1.731	10.16	32.61	1001	1500
	theta[2]	7.932	6.199	0.1803	-4.347	7.867	20.51	1001	1500
	theta[3]	5.911	8.001	0.2314	-12.09	6.576	20.2	1001	1500
	theta[4]	7.467	6.469	0.1899	-5.926	7.29	20.97	1001	1500
	theta[5]	5.102	6.35	0.2235	-8.715	5.421	16.25	1001	1500
	theta[6]	5.957	6.802	0.2118	-8.824	6.243	18.46	1001	1500
	theta[7]	10.59	6.615	0.2096	-0.8319	10.0	25.48	1001	1500
	theta[8]	8.143	7.974	0.2524	-7.525	7.934	25.78	1001	1500
history(*,C:/kmurphy/svnCheckout/root/code/learning/sampling/BUGS/matbugs/tmp/history.txt)
BUGS also prints graphical traceplots of each of the parameters. It then waits for you to close/ quit it before returning to Matlab. (Set 'view', 0 if you want it to automatically close the window and return to Matlab.)

'samples' are the actual Gibbs samples that were generated.

samples = 
       mu_theta: [3x500 double]
    sigma_theta: [3x500 double]
          theta: [3x500x8 double]
The indexing convention is as follows:
scalarVar(chain, sample)
vectorVar(chain, sample, dim1)
matrixVar(chain, sample, dim1, dim2)
'stats' gives means, standard deviation and the EPSR statistics.

>> stats

    Rhat: [1x1 struct]
    mean: [1x1 struct]
     std: [1x1 struct]

>> stats.mean

       mu_theta: 7.6393
    sigma_theta: 6.8145
          theta: [11.5975 7.9320 5.9106 7.4669 5.1025 5.9571 10.5945 8.1431]

> stats.Rhat

       mu_theta: 1.0006
    sigma_theta: 0.9999
          theta: [8x1 double]
You can use the final samples statistics as you please. One trivial thing to do would be to just make trace plots of the raw samples. We use a different color for each chain.
N = 8+2; % monitor 10 variables
colors = 'rgb';
for j=1:8
  subplot(3,3,j); hold on
  for c=1:Nchains
    plot(samples.theta(c,:,j), colors(c));
  end
  title(sprintf('theta %d', j));
end
subplot(3,3,9); hold on
for c=1:Nchains
  plot(samples.mu_theta(c,:), colors(c));
end
title(sprintf('mu.theta'))

traceplot

These plots indicate that the MCMC chains have mixed.
A more interesting thing to do is to make empirical summaries of the posterior marginals. Using the 'ksdensity' (kernel smoothing) function in the statistics toolbox we have

figure(1); clf
for j=1:8
  subplot(3,3,j); hold on
  %dat = samples.theta(:,:,j);
  for c=1:Nchains
    [p, x] = ksdensity(samples.theta(c,:,j));
    plot(x, p, colors(c));
  end
  title(sprintf('theta %d', j));
end
subplot(3,3,9); hold on
for c=1:Nchains
  [p, x] = ksdensity(samples.mu_theta(c,:));
  plot(x, p, colors(c));
end
title(sprintf('mu.theta'))

We can also plot the 80% posterior credible intervals using the 'quantile' function from the statistics toolbox.

% Posterior summaries - intervals
figure(1); clf
hold on
for j=1:8
  for c=1:Nchains
    q = quantile(samples.theta(c,:,j), [0.1 0.9]);
    h = line([q(1) q(2)], [j j]+c*0.1); set(h, 'color', colors(c));
    q = quantile(samples.theta(c,:,j), [0.5]);
    h=plot(q,j+c*0.1,'*'); set(h, 'color', colors(c));
  end
  legendstr{j} = sprintf('theta %d', j);
end
j = 9;
for c=1:Nchains
  q = quantile(samples.mu_theta(c,:), [0.1 0.9]);
  h = line([q(1) q(2)], [j j]+c*0.1); set(h, 'color', colors(c));
  q = quantile(samples.mu_theta(c,:), [0.5]);
  h=plot(q,j+c*0.1,'*'); set(h, 'color', colors(c));
end
legendstr{j} = sprintf('mu.theta');
set(gca,'yticklabel',[])
xlim = get(gca, 'xlim');
for i=1:j
  text(xlim(1), i, legendstr{i});
end
title('80pc posterior intervals (*=median)')