PostScript

In many multivariate domains, we are interested in analyzing the dependency
structure of the underlying distribution, e.g., whether two variables
are in direct interaction. We can represent dependency structures
using *Bayesian network* models.
To analyze a given data set,
Bayesian model selection attempts to find the most likely (MAP) model,
and uses its structure to answer these questions. However, when the
amount of available data is modest, there might be many models that
have non-negligible posterior. Thus, we want compute the Bayesian
posterior of a feature, i.e., the total posterior probability of all
models that contain it.
In this paper, we propose a new approach for this task.
We first show how to efficiently compute a sum over the exponential number
of networks that are consistent with a fixed order over network
variables. This allows us to compute, for a given order, both the
marginal probability of the data and the posterior of a feature.
We then use this result as the basis for an algorithm that approximates
the Bayesian posterior of a feature. Our approach uses a *
Markov Chain Monte Carlo* (MCMC) method, but over orders rather
than over network structures. The space of orders is smaller and
more regular than the space of structures, and has much a smoother posterior
``landscape''. We present empirical results on synthetic and real-life
datasets that compare our approach to full model averaging (when
possible), to MCMC over network structures, and to a non-Bayesian
bootstrap approach.

nir@cs.huji.ac.il