The UCCA Resource Webpage

Universal Conceptual Cognitive Annotation (UCCA) is a novel semantic approach to grammatical representation. It was developed in the Computational Linguistics Lab of the Computer Science Department of the Hebrew University by Omri Abend and Ari Rappoport.

The central idea of the project is to analyze and annotate natural languages using purely semantic categories and structure (a graph). Syntactic categories and structure are automatically deduced from the semantic ones using learning algorithms. The basic set of semantic categories (the foundational layer) is inspired by work in linguistic typology, cognitive grammar, and neuroscience.

We have annotated 160K tokens from English Wikipedia with the UCCA scheme, as well as a 30K English-French parallel corpus based on Jules Verne's 20K Leagues Under The Sea. Work on compiling a German corpus is underway, and pilot studies were conducted on several other languages as well.

The annotation so far focused on argument-structure and linkage phenomena. Due to the complexity of the linguistic system, there are often many applicable annotations for a given text (cf. A Dynamic Usage-based Model by R.W. Langacker, 2000). For practical reasons, we select a small set of highly useful distinctions, and apply them to provide one plausible annotation.

This page contains links to all of UCCA's downloadable resources. Note that some of the code is stored on github (in such cases, the repository is referenced). If you use these resources in your research, please cite this paper:

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013
[Paper: pdf]


Web Application

The graphic web-application used for compiling the UCCA corpora can be found here (avoid registration by logging in as: "guest" with password "tseug"). Tomer Eshet partnered in the development of the web application.

A newer version of the application, described in detail in this demo paper can be found in here.


Corpora and Annotation Guidelines

The following are the files constituting release 1.0 of the UCCA corpus. The distribution contains about 160K token annotated with UCCA's foundational layer. The corpus is taken from the English Wikipedia and is released under the Creative Commons Attribution-ShareAlike 3.0 Unported license.

This is an English-French parallel corpus of about 25K tokens based on the first five chapters of "Twenty Thousand Leagues Under the Sea" by Jules Verne. It is also released under the Creative Commons Attribution-ShareAlike 3.0 Unported license. The same guidelines were used here as in the above corpus.

The most up to date guideline are here.


Source Code

The following is python source code for reading and manipulating the UCCA structures. The code was written by Amit Beka and Daniel Hershcovich and is released under the GNU Public License version 3.0 or later (license included in the bundle).

Publications

Reference-less Measure of Faithfulness for Grammatical Error Correction
Leshem Choshen and Omri Abend. NAACL 2018 (short paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]

Semantic Structural Evaluation for Text Simplification
Elior Sulem, Omri Abend and Ari Rappoport. NAACL 2018 (long paper).
[Paper: pdf] [Data & Code: github]

A Transition-Based Directed Acyclic Graph Parser for UCCA.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2017 (long paper). Outstanding Paper Award.
[Paper: pdf] [Supp. Material: pdf] [Code & Data: github] [Demo]

UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation.
Omri Abend, Shai Yerushalmi and Ari Rappoport. ACL 2017 (demo paper).
[Paper: pdf] [Code: github] [Demo]

The State of the Art in Semantic Representation.
Omri Abend and Ari Rappoport. ACL 2017 (long paper).
[Paper: pdf]

HUME: Human UCCA-Based Evaluation of Machine Translation
Alexandra Birch, Omri Abend, Ondøej Bojar and Barry Haddow, EMNLP 2016 (long paper).
[Paper: pdf] [Data: github] [Demo]

Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study
Elior Sulem, Omri Abend and Ari Rappoport,
ACL 2015 Workshop on Semantics-Driven Statistical Machine Translation (S2MT).
[Paper: pdf]

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013 (long paper)
[Paper: pdf]

UCCA: A Semantics-based Grammatical Annotation Scheme
Omri Abend and Ari Rappoport, IWCS 2013 (long paper)
[Paper: pdf]


Theses

Measuring Semantic Preservation in Machine Translation with HCOMET: Human Cognitive Metric for Evaluating Translation
Pedro Marinotti, MSc Thesis,
The University of Edinburgh, 2014
[Paper: pdf]

Integration of a cognitive annotation into machine translation: Theoretical foundations and bilingual corpus analysis
Elior Sulem, MSc Thesis,
The Hebrew University of Jerusalem, 2014
[Paper: pdf]

Semi-supervised identification of scene-evoking nouns in UCCA
Amit Beka, MSc Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Grammatical Annotation Founded on Semantics: A Cognitive Linguistics Approach to Grammatical Corpus Annotation
Omri Abend, PhD Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]


Contact

For any questions or feedback, please email Omri Abend at oabend@cs.huji.ac.il.