Data perturbation for escaping local maxima in learning

G. Elidan, M. Ninio, N. Friedman, and D. Schuurmans

Eighteenth National Conf. on Artificial Intelligence (AAAI), 2002.

Web Supplement


Almost all machine learning algorithms---be they for regression, classification or density estimation---seek hypotheses that optimize a score on training data. In most interesting cases, however, full global optimization is not feasible and local search techniques are used to discover reasonable solutions. Unfortunately, the quality of the local maxima reached depends on initialization and is often weaker than the global maximum. In this paper, we present a simple approach for combining global search with local optimization to discover improved hypotheses in general machine learning problems. The main idea is to escape local maxima by perturbing the training data to create plausible new ascent directions, rather than perturbing hypotheses directly. Specifically, we consider example-reweighting strategies that are reminiscent of boosting and other ensemble learning methods, but applied in a different way with a different goal: to produce a single hypothesis that achieves a good score on training and test data. To evaluate the performance of our algorithms we consider a number of problems in learning Bayesian networks from data, including discrete training problems (structure search), continuous training problems (parametric EM, non-linear logistic regression), and mixed training problems ; (Structural EM)---on both synthetic and real-world data. In each case, we obtain state of the art performance on both training and test data.