The UCCA Resource Webpage

UCCA logo

Please consider submitting your parser to the SemEval 2019 shared task on UCCA parsing.

Universal Conceptual Cognitive Annotation (UCCA) is a novel semantic approach to grammatical representation. It was developed in the Computational Linguistics Lab of the Hebrew University by Omri Abend and Ari Rappoport.

The central idea of the project is to analyze and annotate natural languages using purely semantic categories and structure (a graph). Syntactic categories and structure are not part of the manual annotation, and are ideally learned implicitly by the parsers. The basic set of semantic categories (the foundational layer) is inspired by work in linguistic typology, cognitive grammar, and neuroscience. The development of additional layers, such as semantic roles and super-senses (adapted from the CARMLS project) is underway.

The annotation so far focused on argument-structure and linkage phenomena. We build primarily on Basic Linguistic Theory (R.M.W. Dixon, 2010a; 2010b; 2012), a widely used approach for language description. We acknowledge that there many applicable analyses for a given sentence, but select, for practical reasons, a small set of highly useful distinctions, and apply them to provide one plausible annotation.

We have annotated 160K tokens from English Wikipedia with the UCCA scheme, as well as a 30K English-French parallel corpus based on Jules Verne's "20K Leagues Under The Sea", and a 120K tokens corpus of the entire book in German. Pilot studies were conducted on several other languages as well.

This page contains links to all of UCCA's resources: corpora, annotation guidelines, parser and code. If you use these resources in your research, please cite the following or other relevant publications:

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013
[Paper: pdf]

Annotation Web-App

The graphic web-application used for compiling the UCCA corpora can be found here (avoid registration by logging in as: "guest" with password "tseug").

A newer version of the application: [Paper] [Demo] [Code]

Guidelines

UCCA-annotated corpora include the guidelines version they were compiled with in their repository. The most up to date guidelines are available here (the most recent one is generally in draft mode, but see releases): [pdf].

UCCA-Annotated Corpora

All publicly available with a Creative Commons Attribution-ShareAlike 3.0 Unported license. The guidelines with which each of them was annotated can be found in the repository.

  • English Wikipedia corpus: [github]
  • English 20K Leagues Under The Sea corpus: [github]
  • German 20K Leagues Under The Sea corpus: [github]
  • French 20K Leagues Under The Sea corpus: [github]
  • Excerpt of the PTB WSJ corpus: [github]

UCCA Parser

TUPA is a transition-based parser for Universal Conceptual Cognitive Annotation (UCCA), developed by Daniel Hershcovich, Omri Abend and Ari Rappoport. [Code] [Demo]

Source Code

Python source code for reading and manipulating the UCCA structures. The code was written by Amit Beka and Daniel Hershcovich. [Code]

Publications

Multitask Parsing Across Semantic Representations.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2018 (long paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]

Simple and Effective Text Simplification using Semantic and Neural Methods.
Elior Sulem, Omri Abend and Ari Rappoport. ACL 2018 (long paper).
[Paper: pdf] [Data: github]

Reference-less Measure of Faithfulness for Grammatical Error Correction
Leshem Choshen and Omri Abend. NAACL 2018 (short paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]

Semantic Structural Evaluation for Text Simplification
Elior Sulem, Omri Abend and Ari Rappoport. NAACL 2018 (long paper).
[Paper: pdf] [Data & Code: github]

A Transition-Based Directed Acyclic Graph Parser for UCCA.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2017 (long paper). Outstanding Paper Award.
[Paper: pdf] [Supp. Material: pdf] [Code & Data: github] [Demo]

UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation.
Omri Abend, Shai Yerushalmi and Ari Rappoport. ACL 2017 (demo paper).
[Paper: pdf] [Code: github] [Demo]

The State of the Art in Semantic Representation.
Omri Abend and Ari Rappoport. ACL 2017 (long paper).
[Paper: pdf]

HUME: Human UCCA-Based Evaluation of Machine Translation
Alexandra Birch, Omri Abend, Ond√łej Bojar and Barry Haddow, EMNLP 2016 (long paper).
[Paper: pdf] [Data: github] [Demo]

Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study
Elior Sulem, Omri Abend and Ari Rappoport,
ACL 2015 Workshop on Semantics-Driven Statistical Machine Translation (S2MT).
[Paper: pdf]

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013 (long paper)
[Paper: pdf]

UCCA: A Semantics-based Grammatical Annotation Scheme
Omri Abend and Ari Rappoport, IWCS 2013 (long paper)
[Paper: pdf]

Theses

Measuring Semantic Preservation in Machine Translation with HCOMET: Human Cognitive Metric for Evaluating Translation
Pedro Marinotti, MSc Thesis,
The University of Edinburgh, 2014
[Paper: pdf]

Integration of a cognitive annotation into machine translation: Theoretical foundations and bilingual corpus analysis
Elior Sulem, MSc Thesis,
The Hebrew University of Jerusalem, 2014
[Paper: pdf]

Semi-supervised identification of scene-evoking nouns in UCCA
Amit Beka, MSc Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Grammatical Annotation Founded on Semantics: A Cognitive Linguistics Approach to Grammatical Corpus Annotation
Omri Abend, PhD Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Contact

For any questions or feedback, please email Omri Abend at oabend@cs.huji.ac.il.