Object Recognition

The ability to recognize 3D objects from 2D images is one that any young child possesses yet computers can not yet perform well. The main problem is that the appearance of a given object in images depends on many unknown factors due to the imaging process such as camera viewpoint and lighting. Given an image we would like to select the object model that is most similar to the observed image. Given an object we would like to know the optimal way of representing it for this recognition task. We provide rigorous mathematical and statistical definitions for image similarity and view likelihood. We also define geometric model based invariants and constraints for a given object under camera transformations. Also based on these constraints we have developed a more efficient way of indexing into the our object database.

Image Similarity

Physicophysical experiment show that humans have some clear criteria to decide when two images are similar or an image could originate from some object. The framework in which we work in is as follows. We have a model of our object and an image of some unknown object. This image is recognized as our object if there exists a viewpoint from which the model "reasonably" aligns with the image. When are two images similar? When is an image and an object similar? We replace the vague words "reasonably" and "similar" with rigorous mathematical measures that assess the similarity between objects and images under different assumptions. For example, between an object and an image we define an transformation metric that penalizes the non-rigidity of the optimal affine deformation of the object that best aligns model to the image.
When comparing against a database of objects many interpretations are plausible. We developed a general framework to deal with this ambiguity based on maximum likelihood. We define view likelihood, the probability that a certain view of a given object is observed and view stability, how little the image changes as the viewpoint is moved. We plug in our metric for image similarity into an algorithm that evaluates these new robust measures for recognition. Finally we can use use this likelihood framework to increase the robustness of our object recognition systems.
In order to accomplish the theory of view stability and likelihood, and detect the canonical views of an object (which are its most stable and likely views), a similarity measure between images is desired. We define few similarity measures for silhouettes of curved objects, where shape is the only available information in the image (color and texture clues are missing). Recently, a new similarity measure has been defined, based on partial curve matching. We show how this measure can be used in order to learn representative views from shape examples.

Tracing Birds

Selected Papers

Ronen Basri and Daphna Weinshall. Distance Metric between 3D Models and 2D Images for Recognition and Classification T-PAMI 18(4) 1996

Michael Werman and Daphna Weinshall. Similarity and Affine Invariant Distance Between Point Sets T-PAMI 17(8) August 1995.

Daphna Weinshall and Michael Werman. On View Likelihood and Stability. To appear in T-PAMI.

Daphna Weinshall, Michael Werman A computational theory of canonical views ARPA Image Understanding Workshop, February 1996.

Yoram Gdalyahu, Daphna Weinshall Measures for Silhouettes Resemblance and Representative Silhouettes of Curved Objects ECCV-96, Cambridge, April 1996

Object Invariants and Reconstruction

In order to study the problem of shape both for purposes of recognition and for purposes of reconstruction we study the relation between a 3D shape and its 2D projections. Our goal is to find constraints between model points and image measurements that are independent of the imaging parameters. These relations are termed model-based invariants. We search for these relationships and their optimal representations for purposes of efficient recognition/ reconstruction algorithms.
Given a sequence of images of a set of points in 3D using unknown cameras, there are two fundamental questions that need be solved:
What is the structure of the set of points in 3D?
What are the positions of the cameras relative to the points?
For a projective camera we show that these problems are dual. The imaging of a set of points in space by multiple cameras can be captured by constraint equations involving: space points, camera centers and image points were the space point and camera centers are symmetrical to one another. This formalism in which points and projections are interchangeable, allow both seemingly different problems to be solved with the same algorithms. The dual algebraic formalization for the case of camera centers are the fundamental matrix, trilinear tensor and quadlinear tensor. We can use this approach for algorithms that reconstruct shape and algorithms that learn invariant relations for indexing into an object database.

Selected Papers

D. Weinshall, M. Werman, and A. Shashua Shape Tensors for Efficient and Learnable Indexing IEEE Workshop on Visual Scene Representation, Boston, June 1995.

Daphna Weinshall Model-based invariants for 3-D vision IJCV 10(1),1993

Stefan Carlsson and Daphna Weinshall. Dual Computation of Projective Shape and Camera Positions from Multiple Images HU TR 96-6, 1996.


A real world application is more often not to decide if we have an image of a specific object, but give an image recognize the object relating to it from a large data base. Our approach is to use as an index some shape invariant that can be calculated from image measurements. Because of the ambiguity formed when projecting 3D onto 2D the is no unique shape invariant function so that each object corresponds to a single value. However we can find invariant functions so that the set of all points corresponding to feasible image of the object is a manifold. This manifold, given the invariant function used, is unique to each object and a image corresponding to this object must be on the manifold.
We study different possible invariant functions such as the shape tensor for this method. And study different efficient methods to find and represent these manifolds. So that given an image measurement we can know which are the possible object it represents by searching which manifolds does this point exist on.

Selected Papers

D. Weinshall, M. Werman, and A. Shashua Shape Tensors for Efficient and Learnable Indexing IEEE Workshop on Visual Scene Representation, Boston, June 1995.

Michael Werman, Daphna Weinshall Complexity of Indexing: Efficient and Learnable Large Database Indexing ECCV-96, Cambridge, April 1996

[Back] Back to Research Page