Cross-validation vs. DIC using stack loss data
Aki Vehtari
2003-01-15 Original
2004-08-16 Added reference to selection induced bias.
Introduction
Here is the code and results for comparing cross-validation (CV) vs. deviance information criteria (DIC) using stack loss data. Stack loss data is used as an example in Classic BUGS and WinBUGS (Spiegelhalter et al., 1996, pages 27-29) and was specifically used to demonstrate the DIC by Spiegelhalter et al. (2002). Data is available in Classic BUGS and WinBUGS distributions. Residual models expect for one are also available in BUGS distributions but with slightly different priors. To make comparison as fair as possible, Brad Carlin kindly provided the models and priors used by Spiegelhalter et al. (2002).Both DIC and cross-validation estimate the expected predictive performance, that is, expected utilities of the model (Vehtari, 2002; Vehtari and Lampinen, 2002; Vehtari and Lampinen, 2003). We presented in the 2002 International Conference of the Royal Statistical Society some results comparing CV and DIC (Slides in PDF, Abstract in PDF). Cross-validation was made using Matlab to divide the data to cross-validation folds and to call Classic BUGS or WinBUGS to do the MCMC sampling. We used DIC values reported by Spiegelhalter et al. (2002).
Robust regression using stack loss data
Problem is to make regression model for predicting the amount of stack loss (escaping ammonia in industrial application). There are three predictor variables and linear regression model is used. The model selection problem is to choose residual model. Five residual models were compared: 1) Normal, 2) Double-exponential (Laplace), 3) Logistic, 4) Student's t-distribution with 4 degrees of freedom (t_4), and 5) t_4 as scale mixture model.Code
Code and explanation of filesResults
Figure 1 shows the expected predictive deviance estimated with CV and DIC. They produce similar results, but the DIC gives consistently lower values. This is probably because of using plug-in predictive distributions instead of full predictive distributions, and thus ignoring the uncertainty in the parameter values. Largest difference is in the scale mixture model, which supports this argument. Figure 2 shows the effective number of parameters. There is no need to compute this in the CV approach, but it may be computed if thought that it would provide addtional insight to the models. |
|
| Figure 1 | Figure 2 |
In the case of DIC estimation of the uncertainty in the estimate is still under investigation and usually only point estimates with some heuristic is used to estimate what difference is significant. In the case of cross-validation it is easy to estimate the associated uncertainty. Figure 3 shows the pairwise comparison of t_4 scale mixture model to every other model. Comparison is presented by plotting the the distribution of the estimate of the difference between the expected utilities of two models. It is easy to see differences and associated uncertainties. Note that the amount of uncertainty in the comparison depends heavily on which models are compared. From these results it is also possible to compute the probability that one model is better than other one. For example, probabilities that t_4 scale mixture model is better than models 1,2,3 and 4 are 0.85, 0.48, 0.96 and 0.98, respectively. Models 2 and 5 have better predictive performance than models 1,2 and 4. Models 2 and 5 are indistinguishable on grounds of predictive performance.
|
| Figure 3 |
Discussion
Note that if there is a large number of models compared, selection induced bias causes selection of overfitted model (Vehtari and Lampinen, 2004).
References
- Spiegelhalter, D. J., Thomas, A., Best,
N. G., and Gilks, W. R. (1996). BUGS Examples Volume 1, Version 0.5,
(version ii). Cambridge: Medical Research Council Biostatistics Unit.
(PDF)
- Spiegelhalter, D. J., Best, N. G.,
Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of
model complexity and fit (with discussion). Journal of the Royal Statistical
Society. Series B (Statistical Methodology), 64(3):583-639.
- Vehtari, A. (2002). Discussion to `Bayesian
measures of model complexity and fit' by Spiegelhalter, D. J., Best,
N. G., Carlin, B. P., and van der Linde, A. Journal of the Royal
Statistical Society, Series B (Statistical Methodology), 64(4):620.
(PostScript)
(PDF)
- Vehtari, A. and Lampinen, J. (2002).
Bayesian model assessment and comparison using cross-validation predictive
densities. Neural Computation, 14(10):2439-2468. (PostScript)
(PDF)
- Vehtari, A. and Lampinen, J. (2003).
Expected utility estimation via cross-validation. In J. M. Bernardo,
et al., editors, Bayesian Statistics 7, in press. Oxford
University Press. (PostScript)
(PDF)
- Vehtari, A. and Lampinen, J. (2004).
Model selection via predictive explanatory power. Report B38, Laboratory
of Computational Engineering, Helsinki University of Technology. (PostScript)
(PDF)
