Desiderata for an open source graphical models library ------------------------------------------------------ [Posted to OpenBayes by Kevin Murphy on 12 July 2001.] - the basic architecture should be a loose collection of "tools", closely modeled on BNT, which has proved to be very popular http://HTTP.CS.Berkeley.EDU/~murphyk/Bayes/BNT.html - the GUI should be based on GraphViz, an excellent open-source graph layout and manipulation package from AT&T http://www.research.att.com/sw/tools/graphviz/ - the file format for models should be based on extensions to XML-BIF http://www.cs.cmu.edu/~fgcozman/Research/InterchangeFormat/ - models can be created using the GUI, or by editing the XML specification in an editor, or by directly calling the API - the library should be callable from some high-level scripting language, to enable easy interactive use - the data format should also be based on XML. It should distinguish between data missing at random and missing systematically (as Heckerman recommended), and can accomodate annotation of the raw data. - the library should support directed, undirected and chain graphs, and also "non-standard" representations like dependency networks - the library should support 1D time series, 2D image processing, and 3D spatio-temporal modelling as well "traditional" models which lack such repetitive structure - the library should support probabilistic networks and influence diagrams - the library may support probabilistic relational models, if only as a form of syntactic sugar - the library should support many kinds of Conditional Probability Distributions (CPDs): tabular, noisy-OR, conditional linear Gaussian, decision/regression trees, generalized linear models, kernel regression, etc. (Essentially, one should be able to use almost any classification/regression method used by statisticians to define a CPD) - the library should support several kinds of clique potentials for undirected graphs, e.g., tabular, Gibbs distribution over binary features, mixture of Gaussians - the library should support Bayesian modeling, i.e., parameters can be explicitely represented as random variables/nodes, although they do not have to be. An extensible range of priors should be provided. - the library should support plates, as a way of specifying parameter tying - the library should support many kinds of inference algorithms, which make different tradeoffs between speed/ flexibility/ accuracy, etc. Examples should include junction tree, Gibbs sampling, mean field, loopy belief propagation, as well as inference algorithms for specialized architectures (e.g., quickscore for BN2O's). Different implementations of the same algorithm should be supported, e.g., an easy-to-read version in a high-level language, plus an optimized version in C. The C version may or may not be parallelizable. - the library should support many kinds of parameter and structure learning. Some may return point estimates (e.g., ML/MAP estimation using EM), some may return posterior distributions (e.g., using MCMC or bootstrap). Different methods for handling missing data should be supported (e.g., EM, bound-and-collapse). Different scoring metrics for structure learning should be supported (e.g., Bayesian, BIC/MDL, conditional independence tests). Methods for discovering hidden variables should be incorporated. - the library should support all the examples in the textbook currently being written by Jordan and Bishop. - the library should be open-source and distributed under the GNU GPL (v2) license, or maybe the BSD license.