Methods for learning Bayesian networks can discover dependency
structure between observed variables. Although these methods are useful in many
applications, they run into computational and statistical problems in domains
that involve a large number of variables. In this paper, we consider a solution
that is applicable when many variables have similar behavior. We introduce a new
class of models, *module networks*,
that explicitly partition the variables into modules that share the same parents
in the network and the same conditional probability distribution. We define the
semantics of module networks, and describe an algorithm that learns the
modules' composition and their dependency structure from data. Evaluation on
real data in the domains of gene expression and the stock market shows that
module networks generalize better than Bayesian networks, and that the learned
module network structure reveals regularities that are obscured in learned
Bayesian networks.

nir@cs.huji.ac.il