Ross building, The Edmond J. Safra Campus. Picture: Dror Bar-Natan, Fall 2001
 Home → Channel  Channel The Learning Club   Time: Thursdays 10:15 - 11:30 (refreshments at 10:00) Place: Ross 63, The Edmond J. Safra Campus. Here are maps and directions to the Ross Building. Join or leave our distribution mailing list. Coordinator: Prof. Amir Globerson, Elad Eban Google calendar Past Years Future talks Fall 2013/2014 Semester   TBAThu, 12 Dec 2013, 10:15 Speaker: Yair Wiener , Technion * indicates a special talk, not on the regular time-slot or place Past talks Fall 2013/2014 Semester   TBAThu, 28 Nov 2013, 10:15 Speaker: Alon Vinnikov , HUJI   From Bandits to Experts: A Tale of Domination and IndependenceThu, 21 Nov 2013, 10:15 Speaker: Yishay Mansour , TAUWe consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir [NIPS 2011]. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph (which must be accessible before selecting an action). In the undirected case, we show that the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner. Joint work with Noga Alon, Nicolo Cesa-Bianchi and Claudio Gentile.   Coordinate-descent for learning orthogonal matrices through Givens rotationsThu, 14 Nov 2013, 10:15 Speaker: Uri Shalit , HUJIOptimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. We propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces. It is based on Givens-rotations, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decomposition that is used in learning mixture models, and an algorithm for sparse-PCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genome-wide brain-wide mRNA expression dataset.   Vanishing Component AnalysisThu, 07 Nov 2013, 10:00 Speaker: Roi Livni , HUJIThe vanishing ideal of a set of points, S ⊂ Rn, is the set of all polynomials that attain the value of zero on all the points in S. Such ideals can be compactly represented using a small set of polynomials known as generators of the ideal. Here we describe and analyze an efficient procedure that constructs a set of generators of a vanishing ideal. Our procedure is numerically stable, and can be used to find approximately vanishing polynomials. The resulting polynomials capture nonlinear structure in data, and can for example be used within supervised learning. Empirical comparison with kernel methods show that our method constructs more compact classifiers with comparable accuracy.   A Bayesian Probability Calculus for Density MatricesThu, 31 Oct 2013, 10:15 Speaker: Manfred Warmuth , UC Santa CruzOne of the main concepts in quantum physics is a density matrix, which is a symmetric positive definite matrix of trace one. Finite probability distributions can be seen as a special case when the density matrix is restricted to be diagonal. We develop a probability calculus based on these more general distributions that includes definitions of joints, conditionals and formulas that relate these, including analogs of the Theorem of Total Probability and various Bayes rules for the calculation of posterior density matrices. The resulting calculus parallels the familiar conventional'' probability calculus and always retains the latter as a special case when all matrices are diagonal. Whereas the conventional Bayesian methods maintain uncertainty about which model has the highest data likelihood, the generalization maintains uncertainty about which unit direction has the largest variance. Surprisingly the bounds also generalize: as in the conventional setting we upper bound the negative log likelihood of the data by the negative log likelihood of the MAP estimator. This is joint work with Dima Kuzmin. No background in Quantum Physics is required for this talk and we will give elaborate visualizations of all needed concepts. Some background in Bayesian analyses will be helpful for motivating the approach. A version of the talk can be found at http://www.cse.ucsc.edu/~manfred/pubs/C76talk.pdf   Crowd-sourcing epidemic detectionThu, 24 Oct 2013, 10:15 Speaker: Constantine Caramanis , The University of Texas at AustinThe history of infections and epidemics holds famous examples where understanding, containing and ultimately treating an outbreak began with understanding its mode of spread. The key question then, is: which network of interactions is the main cause of the spread? Our current approaches to understand and predict epidemics rely on the scarce, but exact/reliable, expert diagnoses. In this talk we explore a different way forward: use more readily available but also more noisy, incomplete and unreliable data to determine the causative network of an epidemic, thus making an accurate global diagnosis from highly unreliable local data. Fall 2013/2014 Semester   Method-of-Moments for Learning Diagnosis NetworksThu, 10 Oct 2013, 10:00 Speaker: Yoni Halpern , NYUI will present recent work on learning the parameters and structure of bipartite "noisy-or" bayesian networks used to express the relationships between diseases and symptoms in expert systems for medical diagnosis. Method-of-moments approaches provide efficient alternatives to EM for learning in these networks where exact inference is intractable. Fall 2013/2014 Semester   Method-of-Moments for Learning Diagnosis NetworksThu, 10 Oct 2013, 10:15 Speaker: Yoni Halpern , NYUethod-of-Moments for Learning Diagnosis Networks I will present recent work on learning the parameters and structure of bipartite "noisy-or" bayesian networks used to express the relationships between diseases and symptoms in expert systems for medical diagnosis. Method-of-moments approaches provide efficient alternatives to EM for learning in these networks where exact inference is intractable. Yoni Halpern is a PhD student at New York University focusing on probabilistic graphical models for improving health care. Spring 2012/2013 Semester   On MAP inference by MWSS on perfect graphsWed, 31 Jul 2013, 12:00 Speaker: Adrian Weller , ColumbiaFinding the most likely (MAP) configuration of a Markov random field (MRF) is NP-hard in general. A promising, recent technique is to reduce the problem to finding a maximum weight stable set (MWSS) on a derived weighted graph, which if perfect, allows inference in polynomial time. In this talk, we’ll review the approach and discuss new results, including a general decomposition theorem for MRFs of any order and number of labels, and a characterization of which binary pairwise MRFs can be efficiently solved with this method. This defines the power of the approach on this class of models and improves our toolbox. Joint work with Tony Jebara If time, I’ll also review the AISTATs work on using discrete methods to approximate the Bethe partition function, including a PTAS for attractive binary pairwise models with max degree O(log |V|).   Toward understanding the urban informationThu, 04 Jul 2013, 10:00 Speaker: Qiao Wang , Nanjing Institute of Communications TechnologiesThere are over one hundred big cities in China, each with population more than one million. It brings us both challenge and opportunity to develop information processing technologies for establishing the urban service system, based on understanding the behavior of people in these cities. In this talk, we will illustrate the evolution patterns concerning the business hot-pots, traffic, and/or flu propagation, according to analysis from Chinese social networks and other sources. The research is based on a typical big city, Nanjing, with 8 million people.   Computable Performance Analysis of Sparse Recovery with ApplicationsThu, 27 Jun 2013, 10:00 Speaker: Arye Nehorai , Washington University in St. LouisThe last decade has witnessed burgeoning developments in the reconstruction of signals based on exploiting their low-dimensional structures, particularly their sparsity, block-sparsity, and low-rankness. The reconstruction performance of these signals is heavily dependent on the structure of the operating matrix used in sensing. The quality of these matrices in the context of signal recovery is usually quantified by the restricted isometry constant and its variants. However, the restricted isometry constant and its variants are extremely difficult to compute. We present a framework for analytically computing the performance of the recovery of signals with sparsity structures. We define a family of incoherence measures to quantify the goodness of arbitrary sensing matrices. Our primary contribution is the design of efficient algorithms, based on linear programming and second order cone programming, to compute these incoherence measures. As a by-product, we implement efficient algorithms to verify sufficient conditions for exact signal recovery in the noise-free case. The utility of the proposed incoherence measures lies in their relationship to the performance of reconstruction methods. We derive closed-form expressions of bounds on the recovery errors of convex relaxation algorithms in terms of these measures. The second part of the talk applies the developed theory and algorithms to the optimal design of an OFDM radar system with multi-path components.   Building High-level features using large scale unsupervised learningThu, 20 Jun 2013, 10:00 Speaker: Yoel Sher , The Hebrew UniversityConsider the problem of building high level, class-speciﬁc feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? In this talk I will present Quoc V Le's and Andrew Ng's paper on building high-level features using large scale unsupervised learning (ICML 2012). In their paper they trained a 9-layer neural network with 1 billion connections on 10 million images from YouTube database. They trained this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, their experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. In this talk I will go over the concepts of unsupervised feature learning, sparse deep autoencoders and other building blocks in this model.   Prediction with Limited Advice and Multiarmed Bandits with Paid ObservationsThu, 06 Jun 2013, 10:00 Speaker: Yevgeny Seldin , Queensland University of Technology and UC BerkeleyWe study two basic questions in online learning. The first question is what happens between full-information and limited-feedback games and the second question is the cost of information acquisition in online learning. The questions are addressed by defining two variations of standard online learning games. In the first variation, \emph{prediction with limited advice}, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of $M$ out of $N$ experts. We present an algorithm that achieves $O(\sqrt{(N/M) T \ln N})$ regret on $T$ rounds of this game. The second variation, the \emph{multiarmed bandit with paid observations}, is a variant of the adversarial $N$-armed bandit game, where on round $t$ of the game, we can observe the reward of any number of arms, but each observation has a cost $c$. We present an algorithm that achieves $O((c N \ln N)^{1/3} T^{2/3})$ regret on $T$ rounds of this game. We present lower bounds that show that, apart from the logarithmic factors, these regret bounds cannot be improved.   Information, Complexity and LearningThu, 30 May 2013, 09:00 Speaker: - , The workshop will be held in honor of Prof. Naftali Tishby, marking his 60th birthday, and celebrating his influential research career   Metro Maps of InformationThu, 23 May 2013, 10:00 Speaker: Dafna Shahaf , Carnegie Mellon UniversityWhen information is abundant, it becomes increasingly difficult to fit nuggets of knowledge into a single coherent picture. Complex stories spaghetti into branches, side stories, and intertwining narratives; search engines, our most popular navigational tools, are limited in their capacity to explore such complex stories. We propose a methodology for creating structured summaries of information, which we call metro maps. Just as cartographic maps have been relied upon for centuries to help us understand our surroundings, metro maps can help us understand the relationships between many pieces of information. We formalize characteristics of good maps and formulate their construction as an optimization problem. We provide efficient, scalable methods with theoretical guarantees for generating maps. User studies over real-world datasets demonstrate that our method is able to produce maps which help users acquire knowledge efficiently.   projection-free online learningThu, 09 May 2013, 10:00 Speaker: Elad Hazan , The TechnionThe computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient on- line learning algorithms that eschew projections in favor of much more efficient linear optimization steps. Our algorithms are based on a new algorithm for general convex optimization based on the Frank-Wolfe technique, which is of independent interest. joint work with Dan Garber, Technion   Adaptive Coding of Actions and ObservationsThu, 02 May 2013, 10:00 Speaker: Pedro Ortega , The Hebrew UniversityThe application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties, agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule - the solution to the adaptive coding problem when there are not only observations but also actions. I will review the central ideas behind the Bayesian control rule.   Optimizing the measure of performance in structured learningThu, 25 Apr 2013, 10:00 Speaker: Joseph Keshet , TTIThe goal of discriminative learning is to train a system to optimize a certain desired measure of performance. In simple classification we seek a function that assigns a binary label to a single object, and tries to minimize the error rate (correct or incorrect) on unseen data. In structured prediction we are interested in the prediction a structured label, where the input is a complex object. Typically, each structured prediction task has its own measure of performance, or cost function, such as word error rate in speech recognition, the BLEU score in machine translation or the intersection-over-union score in object segmentation. Not only that those cost functions are much more involved than the binary error rate, the structured prediction itself spans an exponentially large label space. In the talk, I will present two algorithms each designed to the minimize a given cost. First, I will present a new theorem stating that a general learning update rule for linear models directly corresponds to the gradient of the desired measure of performance, and will describe its proof technique. I will present empirical results on the task of phoneme-to-speech alignment, where the goal is to minimize the special alignment cost function. Then, I will show a generalization of the theorem to training non-linear models such as HMMs, and will present empirical results on phoneme recognition task which surpass results from HMMs trained with all other training techniques. In the second part of the talk, I will describe a new algorithm which aims to minimize a regularized cost function. The algorithm is derived by directly minimizing a generalization bound for structured prediction, which gives an upper-bound on the expected cost (risk) in terms of the empirical cost. The resulting algorithm is iterative and easy to implement, and as far as we know, the only algorithm that can handle a non-separable cost functions. We will present experimental results on the task of phoneme recognition, and will show that the algorithm achieves the lowest phoneme error rate (normalized edit distance) compared to other discriminative and generative models with the same expressive power.   A Hilbert Space Embedding for Distributions. Paper by Alex Smola, Arthur Gretton, Le Song, Bernhard Schölkopf.Thu, 18 Apr 2013, 10:00 Speaker: Yoav Wald , The Hebrew UniversityThe authors describe a technique for comparing distributions without the need for density estimation as an intermediate step. Their approach relies on mapping distributions into a reproducing kernel Hilbert space, the paper is an overview of methods proposed over several past works, all based on this approach. In this talk I'll describe the technique and its application to a few tasks including two-sample tests and measures of independence.   Adaptive Metric Dimensionality ReductionSun, 07 Apr 2013, 10:00 Speaker: Aryeh Kontorovich , Ben Gurion UniversityWe initiate the study of dimensionality reduction in general metric spaces in the context of supervised learning. Our statistical contribution consists of tight Rademacher bounds for Lipschitz functions in metric spaces that are doubling, or nearly doubling. As a by-product, we obtain a new theoretical explanation for the empirically reported improvements gained by pre-processing Euclidean data by PCA (Principal Components Analysis) prior to constructing a linear classifier. On the algorithmic front, we describe an analogue of PCA for metric spaces, namely an efficient procedure that approximates the data's intrinsic dimension, which is often much lower than the ambient dimension. Thus, our approach can exploit the dual benefits of low dimensionality: (1) more efficient proximity search algorithms, and (2) more optimistic generalization bounds. Joint work with Lee-Ad Gottlieb and Robert Krauthgamer.   Multiclass Learning Approaches: A Theoretical Comparison with Implications.Thu, 14 Mar 2013, 10:00 Speaker: Amit Daniely , The Hebrew UniversityWe outline a theoretical analysis and comparison of five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. The analysis is distribution free and "hypothesis class based" (similar to the PAC/VC theory). Every method naturally defines a hypothesis class of classifiers. For each such class, we study both the approximation error (the error of the best hypothesis in the class) and the estimation error (the difference between the error of the returned classifier and the approximation error). Similar to binary classification, the estimation error is evaluated using the Graph Dimension, a generalization of theVC dimension. For the approximation error, we give two kind of result - the first shows that certain classes will always have better approximation error than other classes. The second shows that certain classes are very likely to have very large approximation error (close to 1/2). The analysis yields several conclusions of practical relevance, and reveals some phenomenons that do not happen in binary classification. Joint work with Sivan Sabato and Shai Shalev-Shwartz   D.C. Programming and DCAThu, 07 Mar 2013, 10:00 Speaker: Nadav Cohen , The Hebrew UniversityD.C. programming addresses the problem of minimizing a function f=g-h, g and h being convex. We will discuss the properties of D.C. programs and D.C. functions, including global and local optimality conditions and D.C. duality. The DCA (D.C. algorithm) is based on local optimality conditions and duality, and tackles a D.C. program with a convex analysis approach. It is a generalization of CCCP (concave-convex procedure), although historically it preceded the latter. We will discuss the merits and drawbacks of DCA, and in particular its convergence properties. Finally, we will concentrate on the class of polyhedral D.C. programs, which arise naturally in many cases, and on which DCA exhibits finite convergence.   Exact Lifted Probabilistic InferenceThu, 28 Feb 2013, 10:00 Speaker: Udi Aspel , Ben Gurion UniversityRelational graphical models extend the descriptive scope of graphical models by using a first-order language. Compared with "regular" models that use propositional language, relational models are able to compactly represent large scale problems, by capturing repeated patterns and relations between domain entities. E.g., a friend of a smoker is likely to be a smoker as well. A specialized class of inference algorithms called "lifted inference" algorithms, allows inference to be applied directly on probabilistic relational models, and has proven to be vastly superior to the alternative, where a relational model is extracted into a large (and exponentially less efficient) equivalent model of propositional language. The talk will include: i. A brief introduction to Variable Elimination - an exact inference algorithm for propositional models. ii. Introduction to Relational Probabilistic Models. iii. An overview of First Order Variable Elimination - an exact inference algorithm for Relational Probabilistic Models. iv. A brief overview of our contribution. Joint work with Prof. Ronen Brafman. Fall 2012/2013 Semester   TBAThu, 07 Feb 2013, 10:00 Speaker: Edo Liberty , Yahoo!   Convergence rate of coordinate descent for MAPThu, 24 Jan 2013, 10:00 Speaker: Ofer Meshi , The Hebrew UniversityFinding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efﬁciently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However, these are generally not guaranteed to converge to a global optimum. One approach to remedy this is to smooth the LP, and perform coordinate descent on the smoothed dual. However, little was known about the convergence rate of this procedure. We perform a thorough rate analysis of such schemes and derive primal and dual convergence rates. We also provide a simple dual to primal mapping that yields feasible primal solutions with a guaranteed rate of convergence. This is a joint work with Tommi Jaakkola and Amir Globerson. Presented at NIPS 2012.   On Measure Transformed Canonical Correlation Analysis Thu, 17 Jan 2013, 10:00 Speaker: Koby Todros , University of MichiganIn this work linear canonical correlation analysis (LCCA) is generalized by applying a structured transform to the joint probability distribution of the considered pair of random vectors, i.e., a transformation of the joint probability measure defined on their joint observation space. This framework, called measure transformed canonical correlation analysis (MTCCA), applies LCCA to the data after transformation of the joint probability measure. We show that judicious choice of the transform leads to a modified canonical correlation analysis, which, in contrast to LCCA, is capable of detecting non-linear relationships between the considered pair of random vectors. Unlike kernel canonical correlation analysis, where the transformation is applied to the random vectors, in MTCCA the transformation is applied to their joint probability distribution. This results in performance advantages and reduced implementation complexity. The proposed approach is illustrated for graphical model selection in simulated data having non-linear dependencies, and for measuring long-term associations between companies traded in the NASDAQ and NYSE stock markets.   Nonlinear Signal Processing Based on Empirical Intrinsic Geometry Thu, 10 Jan 2013, 10:00 Speaker: Ronen Talmon , Yale UniversityIn this talk, I will present a method for nonlinear signal processing based on empirical intrinsic geometry (EIG). This method provides a convenient framework for combining geometric and statistical analysis and incorporates concepts from information geometry. Unlike classic information geometry that assumes known probabilistic models, we empirically infer an intrinsic model of distribution estimates, while maintaining similar theoretical guarantees. The key observation is that the probability distributions of signals, rather than specific realizations, uncover relevant geometric information. The proposed modeling exhibits two important properties which demonstrate its advantage compared to common geometric algorithms. We show that our model is noise resilient and invariant under different observation and instrumental modalities. In addition, we show that it can be extended efficiently to newly acquired measurements in a sequential manner. These two properties make the proposed model especially suitable for signal processing. We revisit the Bayesian approach and incorporate statistical dynamics and empirical intrinsic geometric models into a unified nonlinear filtering framework. We then apply the proposed method to nonlinear and non-Gaussian filtering problems. In addition, we show applications to biomedical signal analysis and acoustic signal processing.   Multi-Target Radar Detection with Almost Linear ComplexitySun, 06 Jan 2013, 14:00 Speaker: Alexander Fish , University of WisconsinWe would like to know the distances to moving objects and their velocities. The radar system is built to fulfill this task. The radar transmits a waveform S which bounds back from the objects and the echo R is received. In practice we can work in the digital model, namely S and R are sequences of N complex numbers (e.g., N=1023). THE RADAR PROBLEM IS: Design S, and an effective method of extracting, using S and R, the distances and velocities of all targets. In many applications the current sequences S which are used are pseudo-random and the algorithm they support takes O(N²logN) arithmetic operations. In the lecture we will introduce the Heisenberg sequences, and a much faster detection algorithm called the Cross Method. It solves the Radar Problem in O(NlogN+m²) operations for m objects. This is a joint work with S. Gurevich (Math, Madison), A. Sayeed (EE, Madison), K. Scheim (General Motors, Herzeliya), O. Schwartz (EECS, Berkeley).   Causal Reasoning and Learning SystemsThu, 03 Jan 2013, 10:00 Speaker: Elon Portugaly , Microsoft Research Cambridge / Bing Ads In complex real world systems, machine learning is used to influence actions, rather than just provide predictions. Those actions in turn influence the environment of the system. The goal of machine learning in these systems is therefore causal rather than correlational. E.g. what would be the survival chance of patient A if we gave them drug B (causal question); what is the survival chance of the patient A knowing that they were given drug B (correlational question). By injecting noise into actions taken by the system, we can collect data that allows us to infer causality, and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work provides a framework (which can be viewed as a generalization of the A/B testing framework) for counterfactual causal inference in complex systems. Parts of this framework were implemented in Microsotf’s bing search advertising system, and I will show data from this implementation.   Distributed Learning and Communication ComplexityThu, 27 Dec 2012, 10:00 Speaker: Yonatan Kibarski , The Hebrew UniversityConsider a dataset distributed over several computers around the world each containing several terabytes each. We describe a learning problem that learns a hypothesis over the unified dataset and show general bounds on the amount of data transfer and specific efficient algorithms. Paper by Maria-Florina Balcan, Maria-Florina Balcan, Avrim Blum, Shai Fine, Yishay Mansour.   Robust Subspace ModelingThu, 20 Dec 2012, 10:00 Speaker: Gilad Lerman , University Of MinnesotaConsider a dataset of vector-valued observations that consists of a modest number of noisy inliers, which are explained well by a low-dimensional subspace, along with a large number of outliers, which have no linear structure. We describe a convex optimization problem that can reliably fit a low-dimensional model to this type of data. When the inliers are contained in a low-dimensional subspace we provide a rigorous theory that describes when this optimization can recover the subspace exactly. We present an efficient algorithm for solving this optimization problem, whose computational cost is comparable to that of the non-truncated SVD. We also show that the sample complexity of the proposed subspace recovery is of the same order as PCA subspace recovery and we consequently obtain some nontrivial robustness to noise. This presentation is based on three joint works: 1) with Teng Zhang, 2) with Michael McCoy, Joel Tropp and Teng Zhang, and 3) with Matthew Coudron.   Learning patterns in Big data from small data using core-sets Thu, 13 Dec 2012, 10:00 Speaker: Dan Feldman , MITWhen we need to solve an optimization problem we usually use the best available algorithm/software or try to improve it. In recent years we have started exploring a different approach: instead of improving the algorithm, reduce the input data and run the existing algorithm on the reduced data to obtain the desired output much faster on a streaming input, using a manageable amount of memory, and in parallel (say, using Hadoop, cloud service, or GPUs). A core-set for a given problem is a semantic compression of its input, in the sense that a solution for the problem with the (small) coreset as input yields a provable approximate solution to the problem with the original (Big) data. In this talk I will describe how we applied this magical paradigm to obtain algorithmic achievements with performance guarantees in iDiary: a system that combines sensor networks, computer vision, differential privacy, and text mining. It turns large signals collected from smart-phones or robots sensors into textual descriptions of the trajectories. The system features a user interface similar to Google Search that allows users to type text queries on their activities (e.g., "Where did I have dinner last time I visited Paris?"") and receive textual answers based on their signals.   no talk this wee due to NIPSThu, 06 Dec 2012, 10:00 Speaker: -------- ,   What Cannot be Learned with Bethe ApproximationsThu, 22 Nov 2012, 10:00 Speaker: Uri Heinemann , The Hebrew University   Learning the experts for online sequence predictionThu, 08 Nov 2012, 10:00 Speaker: Elad Eban , The Hebrew University   Learning Halfspaces with the Zero-One Loss: Time-Accuracy TradeoffsThu, 01 Nov 2012, 10:00 Speaker: Aharon Birnbaum , The Hebrew University   Learnability beyond uniform convergenceThu, 25 Oct 2012, 10:00 Speaker: Shai Shalev-Swartz , The Hebrew University Spring 2011/2012 Semester   Efficient Inference and Learning with Cardinality-like PotentialsWed, 20 Jun 2012, 14:00 Speaker: Danny Tarlow , University of Toronto)Special Talk   Structured Prediction CascadesThu, 14 Jun 2012, 10:15 Speaker: Aviv Peretz , The Hebrew UniversityStudent Talk   A Spectral Algorithm for Learning Hidden Markov ModelsThu, 07 Jun 2012, 10:15 Speaker: Hillel Taub-Tabib , The Hebrew UniversityStudent Talk   What happens when design students hear about machine learning.Thu, 31 May 2012, 10:15 Speaker: Michael Fink , Google   Pricing Derivatives Using Regret MinimizationThu, 24 May 2012, 10:15 Speaker: Eyal Gofer , Tel Aviv University   Log-sum-exp, convexity and covariance estimationThu, 17 May 2012, 10:15 Speaker: Ami Wiesel , The Hebrew University   Agnostic Active LearningThu, 10 May 2012, 10:15 Speaker: Noam Horev , The Hebrew UniversityStudent Talk   Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial GrammarsThu, 03 May 2012, 10:15 Speaker: Miri Cohen , The Hebrew UniversityStudent Talk   Norm-based Adaboost-like algorithmsThu, 19 Apr 2012, 10:15 Speaker: Dan Gutfreund , IBM Resarch   The Isotron Algorithm: High-Dimensional Isotonic RegressionThu, 29 Mar 2012, 10:15 Speaker: Hila Gonen , The Hebrew UniversityStudent Talk   Globally Optimizing Graph Partitioning Problems Using Message PassingThu, 22 Mar 2012, 10:15 Speaker: Elad Mezuman , The Hebrew University   Bounded Planning in Passive POMDPsThu, 15 Mar 2012, 10:15 Speaker: Roy Fox , The Hebrew University Fall 2011/2012 Semester   Combining One-Class Classifiers via Meta-LearningThu, 02 Feb 2012, 10:15 Speaker: Eitan Menahem , BGU   Learning Large Scale Music RecommendationsThu, 26 Jan 2012, 10:15 Speaker: Noam Koenigstein , Tel Aviv University   GraphLab: Parallel Large Scale Machine LearningThu, 19 Jan 2012, 10:15 Speaker: Danny Bickson , Carnegie Mellon University   TBAWed, 18 Jan 2012, 15:00 Speaker: Greg Shakhnarovich , Toyota Technological Institute at ChicagoNote the special day   Active Learning under Margin AssumptionsSun, 15 Jan 2012, 13:00 Speaker: Sivan Sabato , The Hebrew UniversityNote the special day   Decomposing Documents into Authorial ThreadsThu, 12 Jan 2012, 10:15 Speaker: Moshe Koppel , Bar-Ilan University   CancledWed, 11 Jan 2012, 15:00 Speaker: Ilya Stuskever , Univ. of Toronto -> Stanfordcancled   Programming by Example: from Pipe Dreams to the FutureWed, 04 Jan 2012, 15:00 Speaker: Adam Kalai , Microsoft ResearchNote the special day   The Median HypothesisThu, 22 Dec 2011, 10:15 Speaker: Ran Gilad-Bachrach , Microsoft Research   Graph-guided Importance samplingThu, 08 Dec 2011, 10:15 Speaker: Rina Dechter , University of California Irvine (UCI)Joint work with Vibhav Gogate. Rina is visiting HUJI for the year.   Fast Effective Clustering for Graphs and Documents Thu, 01 Dec 2011, 10:15 Speaker: William Cohen , Carnegie Mellon University   Architecture 101 for Machine Learning Thu, 24 Nov 2011, 10:15 Speaker: Uri Weiser , The Technion   Learning a Classifi er When the Labeling Is KnownThu, 17 Nov 2011, 10:15 Speaker: Shalev Ben David , MITSecond of the week   Optimization, Learning and the Universality of Mirror DescentWed, 16 Nov 2011, 15:00 Speaker: Nathan Srebro , Toyota Technological Institute at ChicagoNotice the special date and time!   Multiclass learnability and the ERM principleThu, 10 Nov 2011, 10:15 Speaker: Amit Daniely , The Hebrew University   Fast Effective Clustering for Graphs and Documents Thu, 03 Nov 2011, 10:15 Speaker: William Cohen , Carnegie Mellon Universitypostponed to Dec. 1st   Domain Adaptation–Can Quantity compensate for Quality Thu, 03 Nov 2011, 10:15 Speaker: Ruth Urner , The University of Waterloo Spring 2010/2011 Semester   Simplified PAC-Bayesian Margin BoundsThu, 26 May 2011, 10:15 Speaker: Alon Gonen , The Hebrew UniversityStudent Talk   Unsupervised Supervised Learning: Who Needs Labels Anyway? Thu, 19 May 2011, 10:15 Speaker: Guy Lebanon , Georgia Institute of Technology   Learning rates and principled reward discounting in RLThu, 12 May 2011, 10:15 Speaker: Tali Tishby , The Hebrew University   Sublinear optimization for machine learning Thu, 05 May 2011, 10:15 Speaker: Elad Hazan , Technion - Israel Institute of Technology     Learning Quickly When Irrelevant Attributes AboundThu, 31 Mar 2011, 10:15 Speaker: Amit Beka , The Hebrew University   Tight Sample Complexity of Large-Margin LearningThu, 24 Mar 2011, 10:15 Speaker: Sivan Sabato , The Hebrew University   Efficient learning for structured predictionThu, 17 Mar 2011, 10:15 Speaker: Ofer Meshi , The Hebrew University   Efficient blue-noise sampling using multiscale MCMC samplerThu, 10 Mar 2011, 10:15 Speaker: Raanan Fattal , The Hebrew University     Convergent Message Passing Algorithms - A Unifying ViewThu, 17 Feb 2011, 10:15 Speaker: Talya Meltzer , The Hebrew University Fall 2010/2011 Semester   Multi-view learning of speech feature spacesThu, 13 Jan 2011, 10:15 Speaker: Karen Livescu , TTI, Chicago     Robust High-dimensional Principal Component AnalysisThu, 30 Dec 2010, 10:15 Speaker: Shie Mannor , The Technion.   Optimal Distributed Online Prediction using Mini-BatchesThu, 23 Dec 2010, 10:15 Speaker: Ran Gilad-Bachrach , Microsoft.   Semi-Supervised Learning - A transductive graph-based algorithm and a theoretical modelThu, 16 Dec 2010, 10:15 Speaker: Nir Rosenfeld , HUJIStudent seminar talk   A Tale of Two NormsThu, 02 Dec 2010, 10:15 Speaker: Nathan Srebro , TTI, Chicago   Natural Image Denoising: Optimality and Inherent BoundsThu, 25 Nov 2010, 10:15 Speaker: Anat Levin , Weizmann Institute of Science   Evaluating Recommender SystemsThu, 18 Nov 2010, 10:15 Speaker: Guy Shani , Ben Gurion University   Approximated Learning and Inference in Large Scale Graphical ModelsWed, 17 Nov 2010, 13:00 Speaker: Tamir Hazan , TTI, ChicagoNote special date and time!   General Classes of Performance Bounds for Parameter EstimationThu, 11 Nov 2010, 10:15 Speaker: Koby Todros , Ben Gurion University   Uncertainty Estimates, Classification and Active LearningThu, 04 Nov 2010, 10:15 Speaker: Boaz Nadler , Weizmann Institute of Science     Machine Learning in the Data Revolution EraThu, 14 Oct 2010, 10:15 Speaker: Shai Shalev-Shwartz , The Hebrew University Spring 2009/2010 Semester   Which clustering algorithm should I use? Towards principled guidelinesThu, 07 Oct 2010, 10:15 Speaker: Shai Ben-David , University of Waterloo   Continuous-Time Belief PropagationThu, 17 Jun 2010, 10:15 Speaker: Tal El-Hay , The Hebrew University   Copula NetworksThu, 10 Jun 2010, 10:15 Speaker: Gal Elidan , The Hebrew University   The value of future-information – the missing piece in reinforcement learningThu, 03 Jun 2010, 10:15 Speaker: Naftali Tishby , The Hebrew University   Query by CommitteeThu, 27 May 2010, 10:15 Speaker: Adam Nitzan , The Hebrew University   Learning and Domain AdaptationThu, 13 May 2010, 10:15 Speaker: Yishay Mansour , Tel-Aviv University   Learning to Predict by the Methods of Temporal DifferencesThu, 06 May 2010, 10:15 Speaker: Inbal Marhaim , The Hebrew University   Learning Linear Classifiers with ConfidenceThu, 29 Apr 2010, 10:15 Speaker: Koby Crammer , The Technion   When Quantity Makes Quality: Learning with Information ConstraintsThu, 22 Apr 2010, 10:15 Speaker: Ohad Shamir , The Hebrew University   A Joint Factor-Network Model for Social Networks EvolutionThu, 15 Apr 2010, 10:15 Speaker: Amit Gruber , University of Toronto   Locality Sensitive HashingThu, 18 Mar 2010, 10:15 Speaker: Roie Kliper , The Hebrew University   Long time asymptotic in Hidden Markov ModelsThu, 11 Mar 2010, 10:15 Speaker: Pavel Chigansky , The Hebrew University   Belief Propagation and BeyondThu, 04 Mar 2010, 10:15 Speaker: Michael Chertkov , Los Alamos National Laboratory   Learning from Labeled and Unlabeled Data, Global vs. Multiscale Approaches.Thu, 25 Feb 2010, 10:15 Speaker: Boaz Nadler , The Weizmann Institute Fall 2009/2010 Semester   TBATue, 02 Feb 2010, 10:15 Speaker: Eitan Menahem , Ben-Gurion University of the Negev   Combining One-Class Classifiers via Meta-LearningTue, 02 Feb 2010, 10:15 Speaker: Eitan Menahem , Ben-Gurion University   The Multiarmed Bandit ProblemThu, 14 Jan 2010, 10:15 Speaker: Tomer Ezra , The Hebrew University   Combining Labeled and unlabeled examples with CO- TrainingThu, 07 Jan 2010, 10:15 Speaker: Cobi Cario , The Hebrew University   Recent Advances in Solving Combinatorial Optimization Tasks over Graphical Models Thu, 31 Dec 2009, 10:15 Speaker: Rina Dechter , UC Irvine   A Belief-Propagation Algorithm for Constrained Linear Least-Squares ProblemsThu, 24 Dec 2009, 10:15 Speaker: Jacob Goldberger , Bar-Ilan University   No Free Lunch theorems for Domain Adaptation LearningThu, 17 Dec 2009, 10:15 Speaker: Shai Ben-David , University of Waterloo   From "GooGoo images" to "Big Bird" - 10 Interactive Projects Bridging Machine Learning and Design Thu, 10 Dec 2009, 10:15 Speaker: Michael Fink , Google   The "tree-dependent components" of natural scenes are edge filtersThu, 03 Dec 2009, 10:15 Speaker: Daniel Zoran , The Hebrew University   An LP View of the M-best MAP problemThu, 26 Nov 2009, 10:15 Speaker: Menachem Fromer , The Hebrew University   Learning in Real-Time and Adaptivity of Online AlgorithmsThu, 19 Nov 2009, 10:15 Speaker: Elad Hazan , IBM     Modeling Natural Image Statistics with Markov Random FieldsThu, 05 Nov 2009, 10:15 Speaker: Urs Koster , University of Helsinki   Efficient Learning of Sparse PredictorsThu, 29 Oct 2009, 10:15 Speaker: Shai Shalev-Shwartz , The Hebrew University   Reducing Label Complexity by Learning From BagsThu, 22 Oct 2009, 10:15 Speaker: Sivan Sabato , The Hebrew University Spring 2008/2009 Semester   Probabilistic Models for Holistic Scene UnderstandingThu, 25 Jun 2009, 10:15 Speaker: Daphne Koller , Stanford University   Markov Logic NetworksThu, 18 Jun 2009, 10:15 Speaker: Gil Ben-Zvi , The Hebrew University   Algorithms for Transfer Learning and Semi-Supervised Learning (A review of a paper by Ando & Zhang)Thu, 04 Jun 2009, 10:15 Speaker: Ido Ginodi , The Hebrew University   Information theoretic model validation in clusteringTue, 19 May 2009, 10:15 Speaker: Joachim Buhmann , ETH Zurich(Special Date)   Mean Field Variational Approximation for Continuous-Time Bayesian NetworksThu, 14 May 2009, 10:15 Speaker: Ido Cohn , The Hebrew University   Generalization in LearningThu, 07 May 2009, 10:15 Speaker: Yevgeny Seldin , The Hebrew University   Generalization in Reinforcement LearningThu, 23 Apr 2009, 10:15 Speaker: Peter Stone , The University of Texas at Austin   Learning similarities on a large scale through rankingThu, 02 Apr 2009, 10:15 Speaker: Gal Chechik , Google   Energy Minimization Methods and Graph CutsThu, 26 Mar 2009, 10:15 Speaker: Danny Rosenberg , The Hebrew University     PostponedThu, 12 Mar 2009, 10:15 Speaker: Yevgeny Seldin , The Hebrew University Fall 2008/2009 Semester   Revealing Modularity and Organization in Biological Networks Using Bi-clusteringThu, 05 Feb 2009, 10:15 Speaker: Asaf Pe'er , The Hebrew University   Stochastic Convex OptimizationThu, 29 Jan 2009, 10:15 Speaker: Nati Srebro , Toyota Technological Insitute - Chicago     All Learning Is RobustThu, 15 Jan 2009, 10:15 Speaker: Shie Mannor , The Technion   Support Vector Machines with a Reject OptionThu, 08 Jan 2009, 10:15 Speaker: Joseph Keshet , IDIAP Research Institute   Nonlinearity, memory, and phase transitions in learningThu, 01 Jan 2009, 10:15 Speaker: Ilya Nemenman , Los Alamos National Laboratory   Tightening LP Relaxations for MAP using Message PassingThu, 18 Dec 2008, 10:15 Speaker: Talya Meltzer , The Hebrew University   Machine Learning of Evaluation (with Applications to Computer Chess)Thu, 04 Dec 2008, 10:15 Speaker: Amir Ban , The Hebrew University   From Co-Occurrence to CorrespondenceThu, 27 Nov 2008, 10:15 Speaker: Ben Taskar , University of Pennsylvania   Learning Bounded Treewidth Bayesian NetworksThu, 13 Nov 2008, 10:15 Speaker: Gal Elidan , The Hebrew University Spring 2007/2008 Semester   * Pre-ICML-UAI-COLT talks - Part IIITue, 01 Jul 2008, 10:30 Speaker: Amit Gruber, Ohad Shamir, Tamir Hazan , The Hebrew UniversityNote special day and time!   * Non-Parametric Modeling and Visualization of Partially Ranked DataMon, 30 Jun 2008, 10:15 Speaker: Guy Lebanon , Purdue UniversityNote special day!   * Pre-ICML-UAI-COLT talks - Part ISun, 29 Jun 2008, 10:00 Speaker: Tal El-Hay , The Hebrew UniversityNote special day and time!   * Pre-ICML-UAI-COLT talks - Part IISun, 29 Jun 2008, 16:15 Speaker: Yevgeny Seldin, Ohad Shamir , The Hebrew UniversityNote special day and time!   Active Sampling for Multiple Output IdentificationThu, 26 Jun 2008, 10:15 Speaker: Oded Margalit , IBM   Latent Topic Models for HypertextThu, 19 Jun 2008, 10:15 Speaker: Amit Gruber , The Hebrew University     Multiple Instance Learning for CAD (Computer Aided Diagnosis)Thu, 15 May 2008, 10:15 Speaker: Inna Stainvas , Siemens Fall 2007/2008 Semester   On the Exchange Rate between Equivalence Constraints and LabelsThu, 01 May 2008, 10:15 Speaker: Liat Ein-Dor , Intel   Selective-PAC learningThu, 17 Apr 2008, 10:15 Speaker: Yair Wiener , Technion   Robust Real Time Pattern Matching using Bayesian Sequential Hypothesis TestingThu, 10 Apr 2008, 10:15 Speaker: Ofir Pele , The Hebrew University     Online Prediction: The Role of Convexity and RandomizationThu, 20 Mar 2008, 10:15 Speaker: Shai Shalev-Shwartz , TTI     * Does a large data-set mean more, or less, work?Sun, 24 Feb 2008, 10:15 Speaker: Nathan Srebro , Toyota Technological Institute at ChicagoJoint learning-vision seminar (in Ross 201).     Learning in High Dimensions, Sparsity and TreeletsThu, 24 Jan 2008, 10:15 Speaker: Boaz Nadler , Weizmann Institute of Science   * The Empirical Minimization Algorithm - Upper and Lower BoundsWed, 23 Jan 2008, 10:30 Speaker: Shahar Mendelson , ANU, TechnionJoint learning-theory seminar (in Ross 201).     On LASSO type estimatorsThu, 10 Jan 2008, 10:15 Speaker: Yaacov Ritov , The Hebrew University   Robust Inference in Bayesian Networks, with Application to Gene Expression Time-Series DataThu, 03 Jan 2008, 10:15 Speaker: Omer Berkman , Tel-Aviv University   A perceptron-like algorithm for regressionThu, 27 Dec 2007, 10:15 Speaker: Adam Kalai , Georgia Tech   Effective Optimization Procedures for Machine Learning Based on Data-dependent Error BoundsThu, 20 Dec 2007, 10:15 Speaker: Dori Peleg , Technion   Protein Structuring from Random Cryo-EM ImagesThu, 13 Dec 2007, 10:15 Speaker: Yoel Shkolnisky , Yale University   Data Clustering and Stability for Finite SamplesThu, 29 Nov 2007, 10:15 Speaker: Ohad Shamir , The Hebrew University   Continuous Time Markov NetworksThu, 22 Nov 2007, 10:15 Speaker: Tal El-Hay , The Hebrew University   The tempotron: applications to visual and auditory processingThu, 15 Nov 2007, 10:15 Speaker: Robert Guetig , The Hebrew University     Nonnegative Sparse PCAThu, 01 Nov 2007, 10:15 Speaker: Ron Zass , The Hebrew University   Some thoughts on the mathematical foundations of machine learningThu, 25 Oct 2007, 10:15 Speaker: Nati Linial , The Hebrew University Spring 2006/2007 Semester   Learnability beyond uniform convergenceSun, 12 Oct 0025, 10:00 Speaker: Shai Shalev-Shwartz , HUJI * indicates a special talk, not on the regular time-slot or place 1