We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. We formally derive a criterion based on the Minimal Description Length principle for choosing the threshold values for the discretization. This new metric embodies a tradeoff between the complexity of the learned discretization, the complexity of the Bayesian network, and the fitness of the network as a model of the training data. This metric has the attractive property of decomposition: the discretization of each variable depends only on the interactions between the variable and its local neighborhood in the network. We examine other properties of this metric that are relevant to the computation of a discretization policy and propose an iterative algorithm for learning a policy. Finally, we illustrate the behavior of the discretization in applications to both supervised and unsupervised learning.