Ross building, The Edmond J. Safra Campus. Picture: Dror Bar-Natan, Fall 2001
 Home → Channel  Channel The Learning Club   Time: Thursday 10:00 - 11:00 Place: Rothberg B-221 Join or leave our distribution mailing list. Coordinator: Nir Rosenfeld Past Years Past talks Fall 2013/2014 Semester   Efficient Training of Structured SVMs via Soft ConstraintsThu, 25 Dec 2014, 10:15 Speaker: Ofer Meshi , TTI ChicagoStructured output prediction is a powerful framework for jointly predicting interdependent output labels. Learning the parameters of structured predictors is a central task in machine learning applications. However, training the model from data often becomes computationally expensive. Several methods have been proposed to exploit the model structure, or decomposition, in order to obtain efficient training algorithms. In particular, methods based on linear programming relaxation, or dual decomposition, decompose the prediction task into multiple simpler prediction tasks and enforce agreement between overlapping predictions. In this work we observe that relaxing these agreement constraints and replacing them with soft constraints yields a much easier optimization problem. Based on this insight we propose an alternative training objective, analyze its theoretical properties, and derive an algorithm for its optimization. Our method, based on the Frank-Wolfe algorithm, achieves significant speedups over existing state-of-the-art methods without hurting prediction accuracy.   Acquiring Semantic Knowledge using PatternsThu, 04 Dec 2014, 10:15 Speaker: Roy Schwartz , HUJI"Building word representations is one of the core tasks in Natural Language Processing. In recent years there have been several successful attempts to tackle this problem, which are based on deep neural networks (aka word embeddings). However, the resulting embeddings are limited in their ability to express general semantic relations, as well as the inability to interpret their elements. In this talk, I will present a model for constructing interpretable word embeddings. Unlike the great majority of previous approaches, our model is based on patterns learned from plain text (e.g., X and Y''). I will show that the representation learned using our model (a) outperforms state-of-the-art embeddings (word2vec) on a word similarity task, (b) is able to capture semantic relations that word2vec embeddings are unable to capture and (c) can be combined with word2vec embeddings to create a representation that is better than both representation separately. Based on a joint work with Roi Reichart and Ari Rappoport."   Probabilistic inference by randomly perturbing max-solversThu, 27 Nov 2014, 10:15 Speaker: Tamir Hazan , Haifa UniversityPredictions in modern inference problems can be increasingly understood in terms of discrete structures such as arrangements of objects in computer vision, parses in natural language processing or molecular structures in computational biology. In a fully probabilistic treatment, all possible alternative assignments are considered thus sampling from traditional structured probabilistic models may be computationally expensive for many machine learning applications. These computational difficulties are circumvented with a variety of optimization techniques that provide max-solvers to predict the most likely structure. In this talk I will present a new approach to relax the exponential complexity of probabilistic reasoning in structured models while relying on efficient predictions under random perturbations. This approach leads to a new statistical inference framework that is based on probability models that measure the stability of the prediction to random changes of the structures scores.   Machine Learning, art and designThu, 20 Nov 2014, 10:15 Speaker: Michael Fink ,   Sequential Monte Carlo methods and their use in graphical modelsThu, 13 Nov 2014, 10:15 Speaker: Thomas Schon , Uppsala UniversityIn this talk I will introduce our new methods (to be published at NIPS in December) for inference in general probabilistic graphical models (PGMs). The key is a sequential decomposition of the PGM which provides a sequence of auxiliary distributions defined on a monotonically increasing sequence of probability spaces. By targeting these auxiliary distributions using Sequential Monte Carlo (SMC) methods we are able to approximate the full joint distribution defined by the PGM. I will also introduce the underlying Sequential Monte Carlo methods (e.g. particle filters/smoothers), which are computational methods primarily used to deal with the state inference problem in nonlinear state space models. We are (since a few years back) seeing these methods finding new applications in more and more general model classes, for example probabilistic graphical models. Furthermore, the systematic combination of SMC and MCMC, referred to as particle MCMC provides another powerful family of algorithms. The first algorithms of this type were published in 2010 and since then we have (for very good reasons) witnessed a rapidly growing interest in these algorithms.   * Interpolation Between Online Learning SettingsThu, 23 Oct 2014, 10:15 Speaker: Yevgeni Seldin , University of Copenhagen, DenmarkOnline learning problems can be characterized by several key features, including the amount of feedback (full information, bandit feedback, partial monitoring, etc.); adversariality of the environment ("stochastic" regime, adversarial regime, adaptive adversaries, etc); and structural richness (stateless problems, problems with side information, MDPs, etc.). There is a significant amount of research on isolated settings in this huge space, for example, "prediction with expert advice", "stochastic multiarmed bandits", "adversarial multiarmed bandits", and so on. In most situations algorithms developed for one problem, say, stochastic multiarmed bandits, are either unsuitable or suboptimal for any other problem. We present a number of algorithms that achieve the optimal performance (up to logarithmic factors) in a number of different regimes as well as in intermediate regimes. This includes one algorithm that achieves the optimal performance in both stochastic and adversarial multiarmed bandits. The same algorithm achieves the optimal performance in two intermediate regimes, moderately contaminated stochastic regime and adversarial regime with a gap. We also present two other algorithms that achieve the optimal performance on a continuous range of problems extending from prediction with expert advice to adversarial multiarmed bandits and a bit further into partial monitoring games. The results naturally lead to a number of challenging open questions: (1) how far can we push the "general algorithms" line - is there a limit on the number of different settings that can be solved with one algorithm? (2) is there a price to be paid for "generality"? (the optimal performance is achieved up to logarithmic factors, but are these factors inevitable and if yes, how to derive lower bounds establishing that no single algorithm can achieve the optimal performance in several different regimes without paying a price?) Based on joint work with Aleksandrs Slivkins, Peter Bartlett, Koby Crammer, and Yasin Abbasi-Yadkori.   Exact Matrix Completion.Thu, 19 Jun 2014, 10:00 Speaker: Naomi Vinokurov , HUJI   Deep Neural Networks for Acoustic Modeling in Speech RecognitionThu, 19 Jun 2014, 10:40 Speaker: Yoel Sher , HUJIMost current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a deep feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. I will present the shared views of four research groups (From University of Torronto, Google, Microsoft and IBM) that had recent success in using these neural networks for acoustic modeling in speech recognition.   Generalized Principal Component AnalysisThu, 12 Jun 2014, 10:00 Speaker: Moran Mizrahi , HUJII will present the paper "Generalized Principal Component Analysis" by Rene Vidal, Yi Ma and Shankar Sastry from 2003. The paper develops an extension of PCA to the case where data is drawn from a mixture of linear subspaces. I will describe an algebraic geometric algorithm proposed by the authors for solving this problem, for the case where all the subspaces are of the same dimension.   Tighter Linear Program Relaxations for High Order Graphical ModelsThu, 29 May 2014, 10:00 Speaker: Elad Mezuman , HUJIGraphical models with High Order Potentials (HOPs) have received considerable interest in recent years. While there are a variety of approaches to inference in these models, nearly all of them amount to solving a linear program (LP) relaxation with unary consistency constraints between the HOP and the individual variables. In many cases, the resulting relaxations are loose, and in these cases the results of inference can be poor. It is thus desirable to look for more accurate ways of performing inference. In this work, we study the LP relaxations that result from enforcing additional consistency constraints between the HOP and the rest of the model. We address theoretical questions about the strength of the resulting relaxations compared to the relaxations that arise in standard approaches, and we develop practical and efficient message passing algorithms for optimizing the LPs. Empirically, we show that the LPs with additional consistency constraints lead to more accurate inference on some challenging problems that include a combination of low order and high order terms. joint work with Daniel Tarlow, Amir Globerson and Yair Weiss   Learning a topic hierarchy using the Chinese restaurant processThu, 22 May 2014, 10:00 Speaker: Oren Bernstein , HUJIWe address the problem of learning topic hierarchies from data. The problem is very rich: we need to learn the character of each topic, their number and their structure relative to one another. Starting with a fixed number of clusters, we build up a model which solves this problem in a general way, using the non-parametric distribution known as the Chinese restaurant process.   Least Squares Revisited: Scalable Approaches for Multi-class PredictionThu, 22 May 2014, 10:45 Speaker: Alon Cohen , HUJII will present a paper by Agarwal et al. The paper discusses generalized linear models for binary prediction and their extension to multiclass prediction. We will show simple algorithms for learning such models, for when the specific model is known and when it is unknown. These algorithms are essentially iterative least-squares updates and are provided with formal convergence guarantees. On the empirical side, I will show a scalable stagewise variant of their approach, which achieves dramatic computational speedups on standard datasets (MNIST and CIFAR-10), while attaining state-of-the-art accuracies.   James Stein estimatorThu, 15 May 2014, 10:00 Speaker: Rotem Hayut , HUJIIn 1961, James and Stein (JS) introduced a new parameter estimation scheme. Their JS estimator has a surprising and counterintuitive property: when three or more unrelated parameters are measured, the estimation error can be reduced by using a combined estimator (even though they are unrelated!). Imagine three measurements, the first is a speed of light test, the second is the euro-league final-four results and the third will be kebab consumption before Independence Day. By combining the measurements of the three above we can reach a better MSE than estimating them separately. In the lecture, I will briefly talk about basic definitions in estimation theory, and compare maximum likelihood and James-Stein estimators in terms of MSE risk. Two intuitions for the estimator will be presented, as well as an example of predicting success in baseball batting average.   Tensor Decomposition Algorithms for Latent Variable Model EstimationThu, 08 May 2014, 10:00 Speaker: Uri Shalit , HUJI   Winning The Large Scale Visual Recognition Challenge with Dropout And Deep CNNThu, 01 May 2014, 10:00 Speaker: Lior Bar , HUJIIn 2010 a team from the University of Toronto won the LSVRC-2010 (Large Scale Visual Recognition Challenge) with more then 8% better labelling accuracy then the other competitors. They classified 1000 image classes using Deep Convolutional Neural Networks (CNN) and also applied the newly developed 'Dropout' training approach, which controls overfitting by hiding part of the features between the network's layers during the training. In this talk we will describe the CNN architecture used in the challenge (and implementation details) , as well the dropout approach. We will also discuss some recent analysis of dropout and its relation to regularization.   Discrete Chebyshev ClassifiersThu, 03 Apr 2014, 10:00 Speaker: Elad Eban , HUJIIn large scale learning problems it is often easy to collect simple statistics of the data, but hard or impractical to store all the original data. A key question in this setting is how to construct classifiers based on such partial information. In the talk I will present a framework for discriminative learning given a set of statistics. Specifically, we address the case where all variables are discrete and we have access to various marginals. We show that for certain sets of statistics the problem is tractable, and in the general case can be approximated using MAP LP relaxations. Empirical results show that the method is competitive with other approaches that use the same input.   On Huber’s robust regression and its connection to compressed sensingThu, 27 Mar 2014, 10:00 Speaker: Jakob Vovnoboy , HUJIinear regression is a common practice in the fields of signal processing and machine learning. In some cases the samples obtained for the regression are corrupted by gross outliers. Hence, classic methods may perform poorly or even break down. To address this issue, robust regression methods were developed throughout the years. Huber's robust regression is probably the most known of them and was introduced as early as 1964. In this talk we discuss two approaches to the problem of regression in the presents of outliers. The first one is looking at the outliers as random variables taken from an unknown distribution. The second is looking at them sparse deterministic unknowns. Thus, compressed sensing methods can be used to identify them . We show that both approaches are tightly connected to the Huber's robust regression method and present some conclusions from this connection. based on chapters from "Robust statistics" by Huber and Ronchetti and on "USPACOR: Universal sparsity-controlling outlier rejection" Giannakis et al.   Blind Source Separation by Sparse DecompositionThu, 20 Mar 2014, 10:00 Speaker: Michael Zibulevsky , TechnionThe blind source separation problem is to extract the underlying source signals from a set of their linear mixtures, where the mixing matrix is unknown. This situation is common, e.g. in acoustics, radio, and medical signal processing. We exploit the property of the sources to have a sparse representation in a corresponding (possibly overcomplete) signal dictionary. Such a dictionary may consist of wavelets, wavelet packets, etc., or be obtained by learning from a given family of signals. Starting from the maximum a posteriori framework, which is applicable to the case of more sources than mixtures, we derive a few other categories of objective functions, which provide faster and more robust computations, when there are an equal number of sources and mixtures. Our experiments with artificial signals and with musical sounds demonstrate significantly better separation than other known techniques.   On the intersection of mathematics and machine learningThu, 13 Mar 2014, 10:00 Speaker: Shai Dekel , GE, TAUIn the first part of the talk we will provide an overview of the industrial internet (also known as the “Internet Of Things (IOT)”) and present some of the challenges in this domain such as big data analytics, predictive analytics, large scale asset management and machine vision problems in advanced manufacturing. In the second part of the talk we will show how mathematics (Approximation Theory and Harmonic Analysis) can be combined with Machine Learning to solve some of these challenges. Tools such as the tree-based Random Forest and the Gradient Boosting Machine are popular and powerful machine learning algorithms that are also employed as part of 'Deep Learning' systems. Constructing the right form of wavelet decomposition of these tools allows establishing ordering of their forest decision nodes: from significant' features to less significant' to insignificant' noise. Consequently, simple wavelet techniques can be used to overcome the presence of noise and misclassifications in the training sets and compress large scale neural networks.   Information Trade-offs in Machine LearningThu, 06 Mar 2014, 10:00 Speaker: Ohad Shamir , Weizmann Institute of ScienceMany machine learning approaches are characterized by information constraints on how they interact with the training data. These include memory and sequential access constraints (e.g. fast first-order methods to solve stochastic optimization problems); communication constraints (e.g. distributed learning); partial access to the underlying data (e.g. missing features and multi-armed bandits) and more. Currently, we have little understanding how such information constraints fundamentally affect our performance, independent of the learning problem semantics. For example, are there learning problems where *any* algorithm which has small memory footprint (or can use any bounded number of bits from each example, or has certain communication constraints) will perform worse than what is possible without such constraints? In this talk, I'll describe how a single set of results implies positive answers to the above, for a variety of settings.   Minimize Your Regret and Have ItThu, 27 Feb 2014, 10:00 Speaker: Eyal Gofer , HUJIOnline linear optimization (OLO) models a wide range of sequential decision making problems in a possibly adversarial environment. For this setting, Regularized Follow the Leader algorithms, and in particular Hedge and Online Gradient Descent (OGD), are known to uniformly upper bound regret. This talk will discuss a surprising flip side of these algorithms: They uniformly \emph{guarantee} regret. As will be shown, this property makes them useful, of all things, in proving \emph{lower} bounds on regret. In fact, the message of this talk is that when attempting to prove high regret in the OLO setting, regret minimization algorithms could feature in the solution. I will begin with results by G. and Mansour (2012). That work lower bounded the individual sequence anytime regret of a large family of algorithms in terms of the quadratic variation of the loss sequence, Q_T, and the learning rate. It further showed that any learning rate that guarantees a regret upper bound of O(\sqrt{Q_T}) necessarily implies an \Omega(\sqrt{Q_T}) anytime regret on any sequence with quadratic variation Q_T. These bounds apply to Hedge and OGD as special cases. One application of these bounds is an asymptotically optimal lower bound on the arbitrage-free price of call options. Time permitting, I will discuss new results for OLO with switching costs. For this model, a novel construction based on regret minimization algorithms implies that for normed switching costs, under general conditions, regret cannot be upper bounded by any learner given only an upper bound on $Q_T$, necessitating additional constraints.   Volumetric Spanners: an Efficient Exploration Basis for Learning Thu, 20 Feb 2014, 10:00 Speaker: Zohar Karnin , YahooNumerous machine learning problems require an {\it exploration basis} - a mechanism to explore the action space. We define a novel geometric notion of exploration basis with low variance called volumetric spanners, and give efficient algorithms to construct such bases. We show how efficient volumetric spanners give rise to the first efficient and optimal-regret algorithm for bandit linear optimization over general convex sets. Previously such results were known only for specific convex sets, or under special conditions such as the existence of an efficient self-concordant barrier for the underlying set.   The sample complexity of agnostic learning when the labeling is deterministicWed, 19 Feb 2014, 15:00 Speaker: Shai Ben-David , WaterlooThe case of learning a deterministic classifier with a class to which it does not belong has so far received relatively little research attention. Its relevance grows with the emergence of tools that allow handling data with a huge number of features, so big that it is reasonable to assume that, over the full set of features, the labeling is (almost) fully determined. We investigate this setting and show that it displays a very different behavior than that of the fundamental results of learning for the common PAC setups. First, the sample complexity of learning a binary hypothesis class is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimensiond that are learnable from O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, in this setup is that there are classes for which any proper learner is has suboptimal sample complexity. While the class can be learned with sample complexity O(d/ε), any proper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We also shaw that in such cases, access to unlabeled data suffices to close that gap. A semi-supervised version of ERM has optimal label complexity once it has access to sufficiently many unlabeled examples. We provide combinatorial characterizations of both phenomena. This is joint work with Ruth Urner.   Pay attention to the leading constantsThu, 13 Feb 2014, 10:00 Speaker: Yoram Singer , Google Abstract: In the talk I review one of the three largest machine learning platforms at Google called Sibyl. Sibyl can handle over 100B examples in 100B dimensions so long as each example is very sparse. The recent version of Sibyl fuses the views of Nesterov's accelerated gradient method and parallel coordinate descent. The result is an algorithm that retains the momentum and convergence properties of the accelerated gradient method while taking into account the curvature of the objective function. The algorithm is fast to convergence and is easy to parallelize in large scale. I conclude with a few examples of problems at Google that the system handles.   What is learnable and how to learnThu, 16 Jan 2014, 10:00 Speaker: Amit Daniely , HUJIThe fundamental theorem of statistical learning states that if a problem is at all learnable, then the empirical risk minimization learning rule is an optimal learner, and the amount of data required for learning is characterized by the VC dimension of the class. This theorem fully answers the questions of "what is learnable" and "how to learn". We start by showing that the theorem breaks in multiclass classification problems, and rather surprisingly, an optimal multiclass learner must be improper, namely, it must have the ability to output hypotheses which does not come from the hypothesis class. As a result, the fundamental questions of "what is learnable" and "how to learn" are again widely open. We describe a complete answer to the question of "how to learn" and propose a new notion of dimension, that we conjecture to characterize the answer to the "what is learnable" question. The talk is based on: "Multiclass learnability and the ERM principle" (D., Sabato, Shalev-Shwartz, Ben-David, COLT'11) "Multiclass Learnng Approaches: A theoretical comparison with implication" (D., Sabato, Shalev-Shwartz, NIPS'12) "Multiclass classification: An optimal algorithm, linear classifiers, and the statistical role(!) of improper learning" (D., Shalev-Shwartz, Soon on arXiv)   Unsupervised Ranking and Ensemble Learning Thu, 09 Jan 2014, 10:00 Speaker: Boaz Nadler , Weizmann Institute of ScienceIn various decision making problems, one is given the advice or predictions of several classifiers of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting where classifier accuracy can be assessed using available labeled training or validation data, and raises several questions: given only the predictions of several classifiers of unknown accuracies, over a large set of unlabeled test data, is it possible to a) reliably rank them, and b) construct a meta-classifier more accurate than any individual classifier in the ensemble? In this talk we'll show that under standard independence assumptions between classifier errors, this high dimensional data hides a simple low dimensional structure. In particular, we'll present a spectral approach to address the above questions, and derive a novel unsupervised spectral meta-learner (SML). On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point for iterative estimation of the maximum likelihood estimator than classical majority voting. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth. Joint work with Fabio Parisi, Francesco Strino and Yuval Kluger (Yale).   Robust dynamic decision makingThu, 02 Jan 2014, 10:00 Speaker: Shie Mannor , TechnionPlanning when the model parameters are not fully known is a common problem encountered in operations research, control, and artificial intelligence. I will start with demonstrating why planning with parameter uncertainty is an important issue. I will then describe several approaches: Bayesian uncertainty model over the unknown parameters, a robust approach that takes a worst case view, and a frequentist approach. I will outline the advantages and disadvantages of each approach and discuss its potential to scale-up to large problems. I will finally discuss the challenges that are posed by a higher level of uncertainty, where the model itself rather than its parameters may not be fully known.   On learning parametric-output HMMsThu, 26 Dec 2013, 10:00 Speaker: Aryeh Kontorovich , BGUWe present a novel approach for learning an HMM whose outputs are distributed according to a parametric family. This is done by {\em decoupling} the learning task into two steps: first estimating the output parameters, and then estimating the hidden states transition probabilities. The first step is accomplished by fitting a mixture model to the output stationary distribution. Given the parameters of this mixture model, the second step is formulated as the solution of an easily solvable convex quadratic program. We provide an error analysis for the estimated transition probabilities and show they are robust to small perturbations in the estimates of the mixture parameters. Finally, we support our analysis with some encouraging empirical results. Joint work with Boaz Nadler and Roi Weiss, ICML 2013.   Online Learning with k ActionsThu, 19 Dec 2013, 10:00 Speaker: Ofer Dekel , Microsoft Resarch“Online learning with k actions” is the simplest and most fundamental online learning setting. Two of its best-known special cases are the “adversarial multi-armed bandit” problem and the “predicting with expert advice” setting. In this talk, we present a rich and complete theoretical characterization of this setting, as we vary the power of the adversary and the amount of feedback given to the learning algorithm. Specifically, we show that the different variants of this setting can be arranged in three tiers of difficulty: easy problems (with sqrt(T) minimax regret), hard problems (with T^(2/3) minimax regret), and hopeless problems (with linear minimax regret). The class of hard online learning problems was previously unknown, but turns out to include quite a few interesting settings. As a corollary of our characterization, we provide the first formal confirmation that learning with bandit feedback can be strictly harder than learning with full information. Joint work with Raman Arora (TTI-C), Nicolo Cesa-Bianchi (Milano), Jian Ding (UChicago), Tomer Koren (Technion), Yuval Peres (MSR), Ohad Shamir (Weizmann) , and Ambuj Tewari (UMICH)   postponed due to the snowThu, 12 Dec 2013, 10:15 Speaker: Yair Wiener , Technion   K-means recovers ICA filters when independent components are sparseThu, 28 Nov 2013, 10:15 Speaker: Alon Vinnikov , HUJI   From Bandits to Experts: A Tale of Domination and IndependenceThu, 21 Nov 2013, 10:15 Speaker: Yishay Mansour , TAUWe consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [NIPS 2011]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner. Joint work with Noga Alon, Nicolo Cesa-Bianchi and Claudio Gentile.   Coordinate-descent for learning orthogonal matrices through Givens rotationsThu, 14 Nov 2013, 10:15 Speaker: Uri Shalit , HUJIOptimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. We propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. It is based on Givens-rotations, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decomposition that is used in learning mixture models, and an algorithm for sparse-PCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genome-wide brain-wide mRNA expression dataset.   Vanishing Component AnalysisThu, 07 Nov 2013, 10:00 Speaker: Roi Livni , HUJIThe vanishing ideal of a set of points, S ⊂ Rn, is the set of all polynomials that attain the value of zero on all the points in S. Such ideals can be compactly represented using a small set of polynomials known as generators of the ideal. Here we describe and analyze an efficient procedure that constructs a set of generators of a vanishing ideal. Our procedure is numerically stable, and can be used to find approximately vanishing polynomials. The resulting polynomials capture nonlinear structure in data, and can for example be used within supervised learning. Empirical comparison with kernel methods show that our method constructs more compact classifiers with comparable accuracy.   A Bayesian Probability Calculus for Density MatricesThu, 31 Oct 2013, 10:15 Speaker: Manfred Warmuth , UC Santa CruzOne of the main concepts in quantum physics is a density matrix, which is a symmetric positive definite matrix of trace one. Finite probability distributions can be seen as a special case when the density matrix is restricted to be diagonal. We develop a probability calculus based on these more general distributions that includes definitions of joints, conditionals and formulas that relate these, including analogs of the Theorem of Total Probability and various Bayes rules for the calculation of posterior density matrices. The resulting calculus parallels the familiar `conventional'' probability calculus and always retains the latter as a special case when all matrices are diagonal. Whereas the conventional Bayesian methods maintain uncertainty about which model has the highest data likelihood, the generalization maintains uncertainty about which unit direction has the largest variance. Surprisingly the bounds also generalize: as in the conventional setting we upper bound the negative log likelihood of the data by the negative log likelihood of the MAP estimator. This is joint work with Dima Kuzmin. No background in Quantum Physics is required for this talk and we will give elaborate visualizations of all needed concepts. Some background in Bayesian analyses will be helpful for motivating the approach. A version of the talk can be found at http://www.cse.ucsc.edu/~manfred/pubs/C76talk.pdf   Crowd-sourcing epidemic detectionThu, 24 Oct 2013, 10:15 Speaker: Constantine Caramanis , The University of Texas at AustinThe history of infections and epidemics holds famous examples where understanding, containing and ultimately treating an outbreak began with understanding its mode of spread. The key question then, is: which network of interactions is the main cause of the spread? Our current approaches to understand and predict epidemics rely on the scarce, but exact/reliable, expert diagnoses. In this talk we explore a different way forward: use more readily available but also more noisy, incomplete and unreliable data to determine the causative network of an epidemic, thus making an accurate global diagnosis from highly unreliable local data. Spring 2012/2013 Semester   Method-of-Moments for Learning Diagnosis NetworksThu, 10 Oct 2013, 10:00 Speaker: Yoni Halpern , NYUI will present recent work on learning the parameters and structure of bipartite "noisy-or" bayesian networks used to express the relationships between diseases and symptoms in expert systems for medical diagnosis. Method-of-moments approaches provide efficient alternatives to EM for learning in these networks where exact inference is intractable.   Method-of-Moments for Learning Diagnosis NetworksThu, 10 Oct 2013, 10:15 Speaker: Yoni Halpern , NYUethod-of-Moments for Learning Diagnosis Networks I will present recent work on learning the parameters and structure of bipartite "noisy-or" bayesian networks used to express the relationships between diseases and symptoms in expert systems for medical diagnosis. Method-of-moments approaches provide efficient alternatives to EM for learning in these networks where exact inference is intractable. Yoni Halpern is a PhD student at New York University focusing on probabilistic graphical models for improving health care.   On MAP inference by MWSS on perfect graphsWed, 31 Jul 2013, 12:00 Speaker: Adrian Weller , ColumbiaFinding the most likely (MAP) configuration of a Markov random field (MRF) is NP-hard in general. A promising, recent technique is to reduce the problem to finding a maximum weight stable set (MWSS) on a derived weighted graph, which if perfect, allows inference in polynomial time. In this talk, we’ll review the approach and discuss new results, including a general decomposition theorem for MRFs of any order and number of labels, and a characterization of which binary pairwise MRFs can be efficiently solved with this method. This defines the power of the approach on this class of models and improves our toolbox. Joint work with Tony Jebara If time, I’ll also review the AISTATs work on using discrete methods to approximate the Bethe partition function, including a PTAS for attractive binary pairwise models with max degree O(log |V|).   Toward understanding the urban informationThu, 04 Jul 2013, 10:00 Speaker: Qiao Wang , Nanjing Institute of Communications TechnologiesThere are over one hundred big cities in China, each with population more than one million. It brings us both challenge and opportunity to develop information processing technologies for establishing the urban service system, based on understanding the behavior of people in these cities. In this talk, we will illustrate the evolution patterns concerning the business hot-pots, traffic, and/or flu propagation, according to analysis from Chinese social networks and other sources. The research is based on a typical big city, Nanjing, with 8 million people.   Computable Performance Analysis of Sparse Recovery with ApplicationsThu, 27 Jun 2013, 10:00 Speaker: Arye Nehorai , Washington University in St. LouisThe last decade has witnessed burgeoning developments in the reconstruction of signals based on exploiting their low-dimensional structures, particularly their sparsity, block-sparsity, and low-rankness. The reconstruction performance of these signals is heavily dependent on the structure of the operating matrix used in sensing. The quality of these matrices in the context of signal recovery is usually quantified by the restricted isometry constant and its variants. However, the restricted isometry constant and its variants are extremely difficult to compute. We present a framework for analytically computing the performance of the recovery of signals with sparsity structures. We define a family of incoherence measures to quantify the goodness of arbitrary sensing matrices. Our primary contribution is the design of efficient algorithms, based on linear programming and second order cone programming, to compute these incoherence measures. As a by-product, we implement efficient algorithms to verify sufficient conditions for exact signal recovery in the noise-free case. The utility of the proposed incoherence measures lies in their relationship to the performance of reconstruction methods. We derive closed-form expressions of bounds on the recovery errors of convex relaxation algorithms in terms of these measures. The second part of the talk applies the developed theory and algorithms to the optimal design of an OFDM radar system with multi-path components.   Building High-level features using large scale unsupervised learningThu, 20 Jun 2013, 10:00 Speaker: Yoel Sher , The Hebrew UniversityConsider the problem of building high level, class-speciﬁc feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? In this talk I will present Quoc V Le's and Andrew Ng's paper on building high-level features using large scale unsupervised learning (ICML 2012). In their paper they trained a 9-layer neural network with 1 billion connections on 10 million images from YouTube database. They trained this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, their experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. In this talk I will go over the concepts of unsupervised feature learning, sparse deep autoencoders and other building blocks in this model.   Prediction with Limited Advice and Multiarmed Bandits with Paid ObservationsThu, 06 Jun 2013, 10:00 Speaker: Yevgeny Seldin , Queensland University of Technology and UC BerkeleyWe study two basic questions in online learning. The first question is what happens between full-information and limited-feedback games and the second question is the cost of information acquisition in online learning. The questions are addressed by defining two variations of standard online learning games. In the first variation, \emph{prediction with limited advice}, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of $M$ out of $N$ experts. We present an algorithm that achieves $O(\sqrt{(N/M) T \ln N})$ regret on $T$ rounds of this game. The second variation, the \emph{multiarmed bandit with paid observations}, is a variant of the adversarial $N$-armed bandit game, where on round $t$ of the game, we can observe the reward of any number of arms, but each observation has a cost $c$. We present an algorithm that achieves $O((c N \ln N)^{1/3} T^{2/3})$ regret on $T$ rounds of this game. We present lower bounds that show that, apart from the logarithmic factors, these regret bounds cannot be improved.   Information, Complexity and LearningThu, 30 May 2013, 09:00 Speaker: - , The workshop will be held in honor of Prof. Naftali Tishby, marking his 60th birthday, and celebrating his influential research career   Metro Maps of InformationThu, 23 May 2013, 10:00 Speaker: Dafna Shahaf , Carnegie Mellon UniversityWhen information is abundant, it becomes increasingly difficult to fit nuggets of knowledge into a single coherent picture. Complex stories spaghetti into branches, side stories, and intertwining narratives; search engines, our most popular navigational tools, are limited in their capacity to explore such complex stories. We propose a methodology for creating structured summaries of information, which we call metro maps. Just as cartographic maps have been relied upon for centuries to help us understand our surroundings, metro maps can help us understand the relationships between many pieces of information. We formalize characteristics of good maps and formulate their construction as an optimization problem. We provide efficient, scalable methods with theoretical guarantees for generating maps. User studies over real-world datasets demonstrate that our method is able to produce maps which help users acquire knowledge efficiently.   projection-free online learningThu, 09 May 2013, 10:00 Speaker: Elad Hazan , The TechnionThe computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient on- line learning algorithms that eschew projections in favor of much more efficient linear optimization steps. Our algorithms are based on a new algorithm for general convex optimization based on the Frank-Wolfe technique, which is of independent interest. joint work with Dan Garber, Technion   Adaptive Coding of Actions and ObservationsThu, 02 May 2013, 10:00 Speaker: Pedro Ortega , The Hebrew UniversityThe application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties, agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule - the solution to the adaptive coding problem when there are not only observations but also actions. I will review the central ideas behind the Bayesian control rule.   Winning The Large Scale Visual Recognition Challenge with Dropout And Deep CNNWed, 01 May 2013, 10:00 Speaker: Lior Bar , HUJI In 2010 the team from Toronto won the LSVRC-2010 (Large Scale Visual Recognition Challenge) with more then 8% better labeling then all others. They classified 1000 Image classes using Deep Convolutional Neural Networks (CNN) and also applied the newly developed 'Dropout' that controls overfit by corrupting the training data. We will see the close connection between dropout and regularization, and why (in some cases) it's better. We will also talk about their network architecture, and the Technical optimizations that led them to the first place.   Optimizing the measure of performance in structured learningThu, 25 Apr 2013, 10:00 Speaker: Joseph Keshet , TTIThe goal of discriminative learning is to train a system to optimize a certain desired measure of performance. In simple classification we seek a function that assigns a binary label to a single object, and tries to minimize the error rate (correct or incorrect) on unseen data. In structured prediction we are interested in the prediction a structured label, where the input is a complex object. Typically, each structured prediction task has its own measure of performance, or cost function, such as word error rate in speech recognition, the BLEU score in machine translation or the intersection-over-union score in object segmentation. Not only that those cost functions are much more involved than the binary error rate, the structured prediction itself spans an exponentially large label space. In the talk, I will present two algorithms each designed to the minimize a given cost. First, I will present a new theorem stating that a general learning update rule for linear models directly corresponds to the gradient of the desired measure of performance, and will describe its proof technique. I will present empirical results on the task of phoneme-to-speech alignment, where the goal is to minimize the special alignment cost function. Then, I will show a generalization of the theorem to training non-linear models such as HMMs, and will present empirical results on phoneme recognition task which surpass results from HMMs trained with all other training techniques. In the second part of the talk, I will describe a new algorithm which aims to minimize a regularized cost function. The algorithm is derived by directly minimizing a generalization bound for structured prediction, which gives an upper-bound on the expected cost (risk) in terms of the empirical cost. The resulting algorithm is iterative and easy to implement, and as far as we know, the only algorithm that can handle a non-separable cost functions. We will present experimental results on the task of phoneme recognition, and will show that the algorithm achieves the lowest phoneme error rate (normalized edit distance) compared to other discriminative and generative models with the same expressive power.   A Hilbert Space Embedding for Distributions. Paper by Alex Smola, Arthur Gretton, Le Song, Bernhard Schölkopf.Thu, 18 Apr 2013, 10:00 Speaker: Yoav Wald , The Hebrew UniversityThe authors describe a technique for comparing distributions without the need for density estimation as an intermediate step. Their approach relies on mapping distributions into a reproducing kernel Hilbert space, the paper is an overview of methods proposed over several past works, all based on this approach. In this talk I'll describe the technique and its application to a few tasks including two-sample tests and measures of independence.   Adaptive Metric Dimensionality ReductionSun, 07 Apr 2013, 10:00 Speaker: Aryeh Kontorovich , Ben Gurion UniversityWe initiate the study of dimensionality reduction in general metric spaces in the context of supervised learning. Our statistical contribution consists of tight Rademacher bounds for Lipschitz functions in metric spaces that are doubling, or nearly doubling. As a by-product, we obtain a new theoretical explanation for the empirically reported improvements gained by pre-processing Euclidean data by PCA (Principal Components Analysis) prior to constructing a linear classifier. On the algorithmic front, we describe an analogue of PCA for metric spaces, namely an efficient procedure that approximates the data's intrinsic dimension, which is often much lower than the ambient dimension. Thus, our approach can exploit the dual benefits of low dimensionality: (1) more efficient proximity search algorithms, and (2) more optimistic generalization bounds. Joint work with Lee-Ad Gottlieb and Robert Krauthgamer.   Multiclass Learning Approaches: A Theoretical Comparison with Implications.Thu, 14 Mar 2013, 10:00 Speaker: Amit Daniely , The Hebrew UniversityWe outline a theoretical analysis and comparison of five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. The analysis is distribution free and "hypothesis class based" (similar to the PAC/VC theory). Every method naturally defines a hypothesis class of classifiers. For each such class, we study both the approximation error (the error of the best hypothesis in the class) and the estimation error (the difference between the error of the returned classifier and the approximation error). Similar to binary classification, the estimation error is evaluated using the Graph Dimension, a generalization of theVC dimension. For the approximation error, we give two kind of result - the first shows that certain classes will always have better approximation error than other classes. The second shows that certain classes are very likely to have very large approximation error (close to 1/2). The analysis yields several conclusions of practical relevance, and reveals some phenomenons that do not happen in binary classification. Joint work with Sivan Sabato and Shai Shalev-Shwartz   D.C. Programming and DCAThu, 07 Mar 2013, 10:00 Speaker: Nadav Cohen , The Hebrew UniversityD.C. programming addresses the problem of minimizing a function f=g-h, g and h being convex. We will discuss the properties of D.C. programs and D.C. functions, including global and local optimality conditions and D.C. duality. The DCA (D.C. algorithm) is based on local optimality conditions and duality, and tackles a D.C. program with a convex analysis approach. It is a generalization of CCCP (concave-convex procedure), although historically it preceded the latter. We will discuss the merits and drawbacks of DCA, and in particular its convergence properties. Finally, we will concentrate on the class of polyhedral D.C. programs, which arise naturally in many cases, and on which DCA exhibits finite convergence.   Exact Lifted Probabilistic InferenceThu, 28 Feb 2013, 10:00 Speaker: Udi Aspel , Ben Gurion UniversityRelational graphical models extend the descriptive scope of graphical models by using a first-order language. Compared with "regular" models that use propositional language, relational models are able to compactly represent large scale problems, by capturing repeated patterns and relations between domain entities. E.g., a friend of a smoker is likely to be a smoker as well. A specialized class of inference algorithms called "lifted inference" algorithms, allows inference to be applied directly on probabilistic relational models, and has proven to be vastly superior to the alternative, where a relational model is extracted into a large (and exponentially less efficient) equivalent model of propositional language. The talk will include: i. A brief introduction to Variable Elimination - an exact inference algorithm for propositional models. ii. Introduction to Relational Probabilistic Models. iii. An overview of First Order Variable Elimination - an exact inference algorithm for Relational Probabilistic Models. iv. A brief overview of our contribution. Joint work with Prof. Ronen Brafman. Fall 2012/2013 Semester   TBAThu, 07 Feb 2013, 10:00 Speaker: Edo Liberty , Yahoo!   Convergence rate of coordinate descent for MAPThu, 24 Jan 2013, 10:00 Speaker: Ofer Meshi , The Hebrew UniversityFinding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efﬁciently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However, these are generally not guaranteed to converge to a global optimum. One approach to remedy this is to smooth the LP, and perform coordinate descent on the smoothed dual. However, little was known about the convergence rate of this procedure. We perform a thorough rate analysis of such schemes and derive primal and dual convergence rates. We also provide a simple dual to primal mapping that yields feasible primal solutions with a guaranteed rate of convergence. This is a joint work with Tommi Jaakkola and Amir Globerson. Presented at NIPS 2012.   On Measure Transformed Canonical Correlation Analysis Thu, 17 Jan 2013, 10:00 Speaker: Koby Todros , University of MichiganIn this work linear canonical correlation analysis (LCCA) is generalized by applying a structured transform to the joint probability distribution of the considered pair of random vectors, i.e., a transformation of the joint probability measure defined on their joint observation space. This framework, called measure transformed canonical correlation analysis (MTCCA), applies LCCA to the data after transformation of the joint probability measure. We show that judicious choice of the transform leads to a modified canonical correlation analysis, which, in contrast to LCCA, is capable of detecting non-linear relationships between the considered pair of random vectors. Unlike kernel canonical correlation analysis, where the transformation is applied to the random vectors, in MTCCA the transformation is applied to their joint probability distribution. This results in performance advantages and reduced implementation complexity. The proposed approach is illustrated for graphical model selection in simulated data having non-linear dependencies, and for measuring long-term associations between companies traded in the NASDAQ and NYSE stock markets.   Nonlinear Signal Processing Based on Empirical Intrinsic Geometry Thu, 10 Jan 2013, 10:00 Speaker: Ronen Talmon , Yale UniversityIn this talk, I will present a method for nonlinear signal processing based on empirical intrinsic geometry (EIG). This method provides a convenient framework for combining geometric and statistical analysis and incorporates concepts from information geometry. Unlike classic information geometry that assumes known probabilistic models, we empirically infer an intrinsic model of distribution estimates, while maintaining similar theoretical guarantees. The key observation is that the probability distributions of signals, rather than specific realizations, uncover relevant geometric information. The proposed modeling exhibits two important properties which demonstrate its advantage compared to common geometric algorithms. We show that our model is noise resilient and invariant under different observation and instrumental modalities. In addition, we show that it can be extended efficiently to newly acquired measurements in a sequential manner. These two properties make the proposed model especially suitable for signal processing. We revisit the Bayesian approach and incorporate statistical dynamics and empirical intrinsic geometric models into a unified nonlinear filtering framework. We then apply the proposed method to nonlinear and non-Gaussian filtering problems. In addition, we show applications to biomedical signal analysis and acoustic signal processing.   Multi-Target Radar Detection with Almost Linear ComplexitySun, 06 Jan 2013, 14:00 Speaker: Alexander Fish , University of WisconsinWe would like to know the distances to moving objects and their velocities. The radar system is built to fulfill this task. The radar transmits a waveform S which bounds back from the objects and the echo R is received. In practice we can work in the digital model, namely S and R are sequences of N complex numbers (e.g., N=1023). THE RADAR PROBLEM IS: Design S, and an effective method of extracting, using S and R, the distances and velocities of all targets. In many applications the current sequences S which are used are pseudo-random and the algorithm they support takes O(N²logN) arithmetic operations. In the lecture we will introduce the Heisenberg sequences, and a much faster detection algorithm called the Cross Method. It solves the Radar Problem in O(NlogN+m²) operations for m objects. This is a joint work with S. Gurevich (Math, Madison), A. Sayeed (EE, Madison), K. Scheim (General Motors, Herzeliya), O. Schwartz (EECS, Berkeley).   Causal Reasoning and Learning SystemsThu, 03 Jan 2013, 10:00 Speaker: Elon Portugaly , Microsoft Research Cambridge / Bing Ads In complex real world systems, machine learning is used to influence actions, rather than just provide predictions. Those actions in turn influence the environment of the system. The goal of machine learning in these systems is therefore causal rather than correlational. E.g. what would be the survival chance of patient A if we gave them drug B (causal question); what is the survival chance of the patient A knowing that they were given drug B (correlational question). By injecting noise into actions taken by the system, we can collect data that allows us to infer causality, and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work provides a framework (which can be viewed as a generalization of the A/B testing framework) for counterfactual causal inference in complex systems. Parts of this framework were implemented in Microsotf’s bing search advertising system, and I will show data from this implementation.   Distributed Learning and Communication ComplexityThu, 27 Dec 2012, 10:00 Speaker: Yonatan Kibarski , The Hebrew UniversityConsider a dataset distributed over several computers around the world each containing several terabytes each. We describe a learning problem that learns a hypothesis over the unified dataset and show general bounds on the amount of data transfer and specific efficient algorithms. Paper by Maria-Florina Balcan, Maria-Florina Balcan, Avrim Blum, Shai Fine, Yishay Mansour.   Robust Subspace ModelingThu, 20 Dec 2012, 10:00 Speaker: Gilad Lerman , University Of MinnesotaConsider a dataset of vector-valued observations that consists of a modest number of noisy inliers, which are explained well by a low-dimensional subspace, along with a large number of outliers, which have no linear structure. We describe a convex optimization problem that can reliably fit a low-dimensional model to this type of data. When the inliers are contained in a low-dimensional subspace we provide a rigorous theory that describes when this optimization can recover the subspace exactly. We present an efficient algorithm for solving this optimization problem, whose computational cost is comparable to that of the non-truncated SVD. We also show that the sample complexity of the proposed subspace recovery is of the same order as PCA subspace recovery and we consequently obtain some nontrivial robustness to noise. This presentation is based on three joint works: 1) with Teng Zhang, 2) with Michael McCoy, Joel Tropp and Teng Zhang, and 3) with Matthew Coudron.   Learning patterns in Big data from small data using core-sets Thu, 13 Dec 2012, 10:00 Speaker: Dan Feldman , MITWhen we need to solve an optimization problem we usually use the best available algorithm/software or try to improve it. In recent years we have started exploring a different approach: instead of improving the algorithm, reduce the input data and run the existing algorithm on the reduced data to obtain the desired output much faster on a streaming input, using a manageable amount of memory, and in parallel (say, using Hadoop, cloud service, or GPUs). A core-set for a given problem is a semantic compression of its input, in the sense that a solution for the problem with the (small) coreset as input yields a provable approximate solution to the problem with the original (Big) data. In this talk I will describe how we applied this magical paradigm to obtain algorithmic achievements with performance guarantees in iDiary: a system that combines sensor networks, computer vision, differential privacy, and text mining. It turns large signals collected from smart-phones or robots sensors into textual descriptions of the trajectories. The system features a user interface similar to Google Search that allows users to type text queries on their activities (e.g., "Where did I have dinner last time I visited Paris?"") and receive textual answers based on their signals.   no talk this wee due to NIPSThu, 06 Dec 2012, 10:00 Speaker: -------- ,   What Cannot be Learned with Bethe ApproximationsThu, 22 Nov 2012, 10:00 Speaker: Uri Heinemann , The Hebrew University   Learning the experts for online sequence predictionThu, 08 Nov 2012, 10:00 Speaker: Elad Eban , The Hebrew University   Learning Halfspaces with the Zero-One Loss: Time-Accuracy TradeoffsThu, 01 Nov 2012, 10:00 Speaker: Aharon Birnbaum , The Hebrew University   Learnability beyond uniform convergenceThu, 25 Oct 2012, 10:00 Speaker: Shai Shalev-Swartz , The Hebrew University Spring 2011/2012 Semester   Efficient Inference and Learning with Cardinality-like PotentialsWed, 20 Jun 2012, 14:00 Speaker: Danny Tarlow , University of Toronto)Special Talk   Structured Prediction CascadesThu, 14 Jun 2012, 10:15 Speaker: Aviv Peretz , The Hebrew UniversityStudent Talk   A Spectral Algorithm for Learning Hidden Markov ModelsThu, 07 Jun 2012, 10:15 Speaker: Hillel Taub-Tabib , The Hebrew UniversityStudent Talk   What happens when design students hear about machine learning.Thu, 31 May 2012, 10:15 Speaker: Michael Fink , Google   Pricing Derivatives Using Regret MinimizationThu, 24 May 2012, 10:15 Speaker: Eyal Gofer , Tel Aviv University   Log-sum-exp, convexity and covariance estimationThu, 17 May 2012, 10:15 Speaker: Ami Wiesel , The Hebrew University   Agnostic Active LearningThu, 10 May 2012, 10:15 Speaker: Noam Horev , The Hebrew UniversityStudent Talk   Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial GrammarsThu, 03 May 2012, 10:15 Speaker: Miri Cohen , The Hebrew UniversityStudent Talk   Norm-based Adaboost-like algorithmsThu, 19 Apr 2012, 10:15 Speaker: Dan Gutfreund , IBM Resarch   The Isotron Algorithm: High-Dimensional Isotonic RegressionThu, 29 Mar 2012, 10:15 Speaker: Hila Gonen , The Hebrew UniversityStudent Talk   Globally Optimizing Graph Partitioning Problems Using Message PassingThu, 22 Mar 2012, 10:15 Speaker: Elad Mezuman , The Hebrew University   Bounded Planning in Passive POMDPsThu, 15 Mar 2012, 10:15 Speaker: Roy Fox , The Hebrew University Fall 2011/2012 Semester   Combining One-Class Classifiers via Meta-LearningThu, 02 Feb 2012, 10:15 Speaker: Eitan Menahem , BGU   Learning Large Scale Music RecommendationsThu, 26 Jan 2012, 10:15 Speaker: Noam Koenigstein , Tel Aviv University   GraphLab: Parallel Large Scale Machine LearningThu, 19 Jan 2012, 10:15 Speaker: Danny Bickson , Carnegie Mellon University   TBAWed, 18 Jan 2012, 15:00 Speaker: Greg Shakhnarovich , Toyota Technological Institute at ChicagoNote the special day   Active Learning under Margin AssumptionsSun, 15 Jan 2012, 13:00 Speaker: Sivan Sabato , The Hebrew UniversityNote the special day   Decomposing Documents into Authorial ThreadsThu, 12 Jan 2012, 10:15 Speaker: Moshe Koppel , Bar-Ilan University   CancledWed, 11 Jan 2012, 15:00 Speaker: Ilya Stuskever , Univ. of Toronto -> Stanfordcancled   Programming by Example: from Pipe Dreams to the FutureWed, 04 Jan 2012, 15:00 Speaker: Adam Kalai , Microsoft ResearchNote the special day   The Median HypothesisThu, 22 Dec 2011, 10:15 Speaker: Ran Gilad-Bachrach , Microsoft Research   Graph-guided Importance samplingThu, 08 Dec 2011, 10:15 Speaker: Rina Dechter , University of California Irvine (UCI)Joint work with Vibhav Gogate. Rina is visiting HUJI for the year.   Fast Effective Clustering for Graphs and Documents Thu, 01 Dec 2011, 10:15 Speaker: William Cohen , Carnegie Mellon University   Architecture 101 for Machine Learning Thu, 24 Nov 2011, 10:15 Speaker: Uri Weiser , The Technion   Learning a Classifi er When the Labeling Is KnownThu, 17 Nov 2011, 10:15 Speaker: Shalev Ben David , MITSecond of the week   Optimization, Learning and the Universality of Mirror DescentWed, 16 Nov 2011, 15:00 Speaker: Nathan Srebro , Toyota Technological Institute at ChicagoNotice the special date and time!   Multiclass learnability and the ERM principleThu, 10 Nov 2011, 10:15 Speaker: Amit Daniely , The Hebrew University   Fast Effective Clustering for Graphs and Documents Thu, 03 Nov 2011, 10:15 Speaker: William Cohen , Carnegie Mellon Universitypostponed to Dec. 1st   Domain Adaptation–Can Quantity compensate for Quality Thu, 03 Nov 2011, 10:15 Speaker: Ruth Urner , The University of Waterloo Spring 2010/2011 Semester   Simplified PAC-Bayesian Margin BoundsThu, 26 May 2011, 10:15 Speaker: Alon Gonen , The Hebrew UniversityStudent Talk   Unsupervised Supervised Learning: Who Needs Labels Anyway? Thu, 19 May 2011, 10:15 Speaker: Guy Lebanon , Georgia Institute of Technology   Learning rates and principled reward discounting in RLThu, 12 May 2011, 10:15 Speaker: Tali Tishby , The Hebrew University   Sublinear optimization for machine learning Thu, 05 May 2011, 10:15 Speaker: Elad Hazan , Technion - Israel Institute of Technology     Learning Quickly When Irrelevant Attributes AboundThu, 31 Mar 2011, 10:15 Speaker: Amit Beka , The Hebrew University   Tight Sample Complexity of Large-Margin LearningThu, 24 Mar 2011, 10:15 Speaker: Sivan Sabato , The Hebrew University   Efficient learning for structured predictionThu, 17 Mar 2011, 10:15 Speaker: Ofer Meshi , The Hebrew University   Efficient blue-noise sampling using multiscale MCMC samplerThu, 10 Mar 2011, 10:15 Speaker: Raanan Fattal , The Hebrew University     Convergent Message Passing Algorithms - A Unifying ViewThu, 17 Feb 2011, 10:15 Speaker: Talya Meltzer , The Hebrew University Fall 2010/2011 Semester   Multi-view learning of speech feature spacesThu, 13 Jan 2011, 10:15 Speaker: Karen Livescu , TTI, Chicago     Robust High-dimensional Principal Component AnalysisThu, 30 Dec 2010, 10:15 Speaker: Shie Mannor , The Technion.   Optimal Distributed Online Prediction using Mini-BatchesThu, 23 Dec 2010, 10:15 Speaker: Ran Gilad-Bachrach , Microsoft.   Semi-Supervised Learning - A transductive graph-based algorithm and a theoretical modelThu, 16 Dec 2010, 10:15 Speaker: Nir Rosenfeld , HUJIStudent seminar talk   A Tale of Two NormsThu, 02 Dec 2010, 10:15 Speaker: Nathan Srebro , TTI, Chicago   Natural Image Denoising: Optimality and Inherent BoundsThu, 25 Nov 2010, 10:15 Speaker: Anat Levin , Weizmann Institute of Science   Evaluating Recommender SystemsThu, 18 Nov 2010, 10:15 Speaker: Guy Shani , Ben Gurion University   Approximated Learning and Inference in Large Scale Graphical ModelsWed, 17 Nov 2010, 13:00 Speaker: Tamir Hazan , TTI, ChicagoNote special date and time!   General Classes of Performance Bounds for Parameter EstimationThu, 11 Nov 2010, 10:15 Speaker: Koby Todros , Ben Gurion University   Uncertainty Estimates, Classification and Active LearningThu, 04 Nov 2010, 10:15 Speaker: Boaz Nadler , Weizmann Institute of Science     Machine Learning in the Data Revolution EraThu, 14 Oct 2010, 10:15 Speaker: Shai Shalev-Shwartz , The Hebrew University Spring 2009/2010 Semester   Which clustering algorithm should I use? Towards principled guidelinesThu, 07 Oct 2010, 10:15 Speaker: Shai Ben-David , University of Waterloo   Continuous-Time Belief PropagationThu, 17 Jun 2010, 10:15 Speaker: Tal El-Hay , The Hebrew University   Copula NetworksThu, 10 Jun 2010, 10:15 Speaker: Gal Elidan , The Hebrew University   The value of future-information – the missing piece in reinforcement learningThu, 03 Jun 2010, 10:15 Speaker: Naftali Tishby , The Hebrew University   Query by CommitteeThu, 27 May 2010, 10:15 Speaker: Adam Nitzan , The Hebrew University   Learning and Domain AdaptationThu, 13 May 2010, 10:15 Speaker: Yishay Mansour , Tel-Aviv University   Learning to Predict by the Methods of Temporal DifferencesThu, 06 May 2010, 10:15 Speaker: Inbal Marhaim , The Hebrew University   Learning Linear Classifiers with ConfidenceThu, 29 Apr 2010, 10:15 Speaker: Koby Crammer , The Technion   When Quantity Makes Quality: Learning with Information ConstraintsThu, 22 Apr 2010, 10:15 Speaker: Ohad Shamir , The Hebrew University   A Joint Factor-Network Model for Social Networks EvolutionThu, 15 Apr 2010, 10:15 Speaker: Amit Gruber , University of Toronto   Locality Sensitive HashingThu, 18 Mar 2010, 10:15 Speaker: Roie Kliper , The Hebrew University   Long time asymptotic in Hidden Markov ModelsThu, 11 Mar 2010, 10:15 Speaker: Pavel Chigansky , The Hebrew University   Belief Propagation and BeyondThu, 04 Mar 2010, 10:15 Speaker: Michael Chertkov , Los Alamos National Laboratory   Learning from Labeled and Unlabeled Data, Global vs. Multiscale Approaches.Thu, 25 Feb 2010, 10:15 Speaker: Boaz Nadler , The Weizmann Institute Fall 2009/2010 Semester   Combining One-Class Classifiers via Meta-LearningTue, 02 Feb 2010, 10:15 Speaker: Eitan Menahem , Ben-Gurion University   TBATue, 02 Feb 2010, 10:15 Speaker: Eitan Menahem , Ben-Gurion University of the Negev   The Multiarmed Bandit ProblemThu, 14 Jan 2010, 10:15 Speaker: Tomer Ezra , The Hebrew University   Combining Labeled and unlabeled examples with CO- TrainingThu, 07 Jan 2010, 10:15 Speaker: Cobi Cario , The Hebrew University   Recent Advances in Solving Combinatorial Optimization Tasks over Graphical Models Thu, 31 Dec 2009, 10:15 Speaker: Rina Dechter , UC Irvine   A Belief-Propagation Algorithm for Constrained Linear Least-Squares ProblemsThu, 24 Dec 2009, 10:15 Speaker: Jacob Goldberger , Bar-Ilan University   No Free Lunch theorems for Domain Adaptation LearningThu, 17 Dec 2009, 10:15 Speaker: Shai Ben-David , University of Waterloo   From "GooGoo images" to "Big Bird" - 10 Interactive Projects Bridging Machine Learning and Design Thu, 10 Dec 2009, 10:15 Speaker: Michael Fink , Google   The "tree-dependent components" of natural scenes are edge filtersThu, 03 Dec 2009, 10:15 Speaker: Daniel Zoran , The Hebrew University   An LP View of the M-best MAP problemThu, 26 Nov 2009, 10:15 Speaker: Menachem Fromer , The Hebrew University   Learning in Real-Time and Adaptivity of Online AlgorithmsThu, 19 Nov 2009, 10:15 Speaker: Elad Hazan , IBM     Modeling Natural Image Statistics with Markov Random FieldsThu, 05 Nov 2009, 10:15 Speaker: Urs Koster , University of Helsinki   Efficient Learning of Sparse PredictorsThu, 29 Oct 2009, 10:15 Speaker: Shai Shalev-Shwartz , The Hebrew University   Reducing Label Complexity by Learning From BagsThu, 22 Oct 2009, 10:15 Speaker: Sivan Sabato , The Hebrew University Spring 2008/2009 Semester   Probabilistic Models for Holistic Scene UnderstandingThu, 25 Jun 2009, 10:15 Speaker: Daphne Koller , Stanford University   Markov Logic NetworksThu, 18 Jun 2009, 10:15 Speaker: Gil Ben-Zvi , The Hebrew University   Algorithms for Transfer Learning and Semi-Supervised Learning (A review of a paper by Ando & Zhang)Thu, 04 Jun 2009, 10:15 Speaker: Ido Ginodi , The Hebrew University   Information theoretic model validation in clusteringTue, 19 May 2009, 10:15 Speaker: Joachim Buhmann , ETH Zurich(Special Date)   Mean Field Variational Approximation for Continuous-Time Bayesian NetworksThu, 14 May 2009, 10:15 Speaker: Ido Cohn , The Hebrew University   Generalization in LearningThu, 07 May 2009, 10:15 Speaker: Yevgeny Seldin , The Hebrew University   Generalization in Reinforcement LearningThu, 23 Apr 2009, 10:15 Speaker: Peter Stone , The University of Texas at Austin   Learning similarities on a large scale through rankingThu, 02 Apr 2009, 10:15 Speaker: Gal Chechik , Google   Energy Minimization Methods and Graph CutsThu, 26 Mar 2009, 10:15 Speaker: Danny Rosenberg , The Hebrew University     PostponedThu, 12 Mar 2009, 10:15 Speaker: Yevgeny Seldin , The Hebrew University Fall 2008/2009 Semester   Revealing Modularity and Organization in Biological Networks Using Bi-clusteringThu, 05 Feb 2009, 10:15 Speaker: Asaf Pe'er , The Hebrew University   Stochastic Convex OptimizationThu, 29 Jan 2009, 10:15 Speaker: Nati Srebro , Toyota Technological Insitute - Chicago     All Learning Is RobustThu, 15 Jan 2009, 10:15 Speaker: Shie Mannor , The Technion   Support Vector Machines with a Reject OptionThu, 08 Jan 2009, 10:15 Speaker: Joseph Keshet , IDIAP Research Institute   Nonlinearity, memory, and phase transitions in learningThu, 01 Jan 2009, 10:15 Speaker: Ilya Nemenman , Los Alamos National Laboratory   Tightening LP Relaxations for MAP using Message PassingThu, 18 Dec 2008, 10:15 Speaker: Talya Meltzer , The Hebrew University   Machine Learning of Evaluation (with Applications to Computer Chess)Thu, 04 Dec 2008, 10:15 Speaker: Amir Ban , The Hebrew University   From Co-Occurrence to CorrespondenceThu, 27 Nov 2008, 10:15 Speaker: Ben Taskar , University of Pennsylvania   Learning Bounded Treewidth Bayesian NetworksThu, 13 Nov 2008, 10:15 Speaker: Gal Elidan , The Hebrew University Spring 2007/2008 Semester   * Pre-ICML-UAI-COLT talks - Part IIITue, 01 Jul 2008, 10:30 Speaker: Amit Gruber, Ohad Shamir, Tamir Hazan , The Hebrew UniversityNote special day and time!   * Non-Parametric Modeling and Visualization of Partially Ranked DataMon, 30 Jun 2008, 10:15 Speaker: Guy Lebanon , Purdue UniversityNote special day!   * Pre-ICML-UAI-COLT talks - Part ISun, 29 Jun 2008, 10:00 Speaker: Tal El-Hay , The Hebrew UniversityNote special day and time!   * Pre-ICML-UAI-COLT talks - Part IISun, 29 Jun 2008, 16:15 Speaker: Yevgeny Seldin, Ohad Shamir , The Hebrew UniversityNote special day and time!   Active Sampling for Multiple Output IdentificationThu, 26 Jun 2008, 10:15 Speaker: Oded Margalit , IBM   Latent Topic Models for HypertextThu, 19 Jun 2008, 10:15 Speaker: Amit Gruber , The Hebrew University     Multiple Instance Learning for CAD (Computer Aided Diagnosis)Thu, 15 May 2008, 10:15 Speaker: Inna Stainvas , Siemens Fall 2007/2008 Semester   On the Exchange Rate between Equivalence Constraints and LabelsThu, 01 May 2008, 10:15 Speaker: Liat Ein-Dor , Intel   Selective-PAC learningThu, 17 Apr 2008, 10:15 Speaker: Yair Wiener , Technion   Robust Real Time Pattern Matching using Bayesian Sequential Hypothesis TestingThu, 10 Apr 2008, 10:15 Speaker: Ofir Pele , The Hebrew University     Online Prediction: The Role of Convexity and RandomizationThu, 20 Mar 2008, 10:15 Speaker: Shai Shalev-Shwartz , TTI     * Does a large data-set mean more, or less, work?Sun, 24 Feb 2008, 10:15 Speaker: Nathan Srebro , Toyota Technological Institute at ChicagoJoint learning-vision seminar (in Ross 201).     Learning in High Dimensions, Sparsity and TreeletsThu, 24 Jan 2008, 10:15 Speaker: Boaz Nadler , Weizmann Institute of Science   * The Empirical Minimization Algorithm - Upper and Lower BoundsWed, 23 Jan 2008, 10:30 Speaker: Shahar Mendelson , ANU, TechnionJoint learning-theory seminar (in Ross 201).     On LASSO type estimatorsThu, 10 Jan 2008, 10:15 Speaker: Yaacov Ritov , The Hebrew University   Robust Inference in Bayesian Networks, with Application to Gene Expression Time-Series DataThu, 03 Jan 2008, 10:15 Speaker: Omer Berkman , Tel-Aviv University   A perceptron-like algorithm for regressionThu, 27 Dec 2007, 10:15 Speaker: Adam Kalai , Georgia Tech   Effective Optimization Procedures for Machine Learning Based on Data-dependent Error BoundsThu, 20 Dec 2007, 10:15 Speaker: Dori Peleg , Technion   Protein Structuring from Random Cryo-EM ImagesThu, 13 Dec 2007, 10:15 Speaker: Yoel Shkolnisky , Yale University   Data Clustering and Stability for Finite SamplesThu, 29 Nov 2007, 10:15 Speaker: Ohad Shamir , The Hebrew University   Continuous Time Markov NetworksThu, 22 Nov 2007, 10:15 Speaker: Tal El-Hay , The Hebrew University   The tempotron: applications to visual and auditory processingThu, 15 Nov 2007, 10:15 Speaker: Robert Guetig , The Hebrew University     Nonnegative Sparse PCAThu, 01 Nov 2007, 10:15 Speaker: Ron Zass , The Hebrew University   Some thoughts on the mathematical foundations of machine learningThu, 25 Oct 2007, 10:15 Speaker: Nati Linial , The Hebrew University Spring 2006/2007 Semester   Learnability beyond uniform convergenceSun, 12 Oct 0025, 10:00 Speaker: Shai Shalev-Shwartz , HUJI * indicates a special talk, not on the regular time-slot or place 1