The old UCCA Resource Page (redirecting)

UCCA logo The UCCA Resource Webpage

Please consider participating in the
CoNLL 2019 Shared Task on Cross-Framework Meaning Representation Parsing
(includes SDP, EDS, AMR and UCCA parsing)

Universal Conceptual Cognitive Annotation (UCCA) is a novel semantic approach to grammatical representation. It was developed in the Computational Linguistics Lab of the Hebrew University by Omri Abend and Ari Rappoport.

The central idea of the project is to analyze and annotate natural languages using purely semantic categories and structure (a graph). Syntactic categories and structure are not part of the manual annotation, and are ideally learned implicitly by the parsers. The basic set of semantic categories (the foundational layer) is inspired by work in linguistic typology, cognitive grammar, and neuroscience. The development of additional layers, such as semantic roles and super-senses (adapted from the CARMLS project) is underway.

The annotation so far focused on argument-structure and linkage phenomena. We build primarily on Basic Linguistic Theory (R.M.W. Dixon, 2010a; 2010b; 2012), a widely used approach for language description. We acknowledge that there many applicable analyses for a given sentence, but select, for practical reasons, a small set of highly useful distinctions, and apply them to provide one plausible annotation.

We have annotated 160K tokens from English Wikipedia with the UCCA scheme, as well as a 30K English-French parallel corpus based on Jules Verne's "20K Leagues Under The Sea", and a 120K tokens corpus of the entire book in German. Pilot studies were conducted on several other languages as well.

This page contains links to all of UCCA's resources: corpora, annotation guidelines, parser and code. If you use these resources in your research, please cite the following or other relevant publications:

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013
[Paper: pdf]

Annotation Web-App

The UCCAApp web application for phrase-based annotation in general, and UCCA parsing in particular can be found here.

Formally, it supports DAG structures, discontiguous units and multiple categories.

The app supports configurable multi-layer annotation and task management, and is written in Django and AngularJS.

[Paper] [Demo] [Code]

Guidelines

UCCA-annotated corpora include the guidelines version they were compiled with in their repository. The most up to date guidelines are available here (the most recent one is generally in draft mode, but see releases): [pdf].

UCCA-Annotated Corpora

All publicly available with a Creative Commons Attribution-ShareAlike 3.0 Unported license. The guidelines with which each of them was annotated can be found in the repository.

  • English Wikipedia corpus: [github]
  • English Web Corpus (reviews section) corpus: [github]
  • English 20K Leagues Under The Sea corpus: [github]
  • German 20K Leagues Under The Sea corpus: [github]
  • French 20K Leagues Under The Sea corpus: [github]
  • UCCA German Little Prince [github]
  • UCCA Hebrew Little Prince [github]
  • UCCA Russian Little Prince [github]
  • UCCA English Little Prince [github]
  • Excerpt of the PTB WSJ corpus: [github]

UCCA Parser

TUPA is a transition-based parser for Universal Conceptual Cognitive Annotation (UCCA), developed by Daniel Hershcovich, Omri Abend and Ari Rappoport. [Code] [Demo]

Source Code

Python source code for reading and manipulating the UCCA structures. The code was written by Amit Beka and Daniel Hershcovich. [Code]

Publications

Semantics-aware Attention Improves Neural Machine Translation.
Aviv Slobodkin, Leshem Choshen and Omri Abend. *SEM 2022 (long paper).

Multitask Parsing Across Semantic Representations.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2018 (long paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]

Simple and Effective Text Simplification using Semantic and Neural Methods.
Elior Sulem, Omri Abend and Ari Rappoport. ACL 2018 (long paper).
[Paper: pdf] [Data: github]

Reference-less Measure of Faithfulness for Grammatical Error Correction
Leshem Choshen and Omri Abend. NAACL 2018 (short paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]

Semantic Structural Evaluation for Text Simplification
Elior Sulem, Omri Abend and Ari Rappoport. NAACL 2018 (long paper).
[Paper: pdf] [Data & Code: github]

A Transition-Based Directed Acyclic Graph Parser for UCCA.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2017 (long paper). Outstanding Paper Award.
[Paper: pdf] [Supp. Material: pdf] [Code & Data: github] [Demo]

UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation.
Omri Abend, Shai Yerushalmi and Ari Rappoport. ACL 2017 (demo paper).
[Paper: pdf] [Code: github] [Demo]

The State of the Art in Semantic Representation.
Omri Abend and Ari Rappoport. ACL 2017 (long paper).
[Paper: pdf]

HUME: Human UCCA-Based Evaluation of Machine Translation
Alexandra Birch, Omri Abend, Ondřej Bojar and Barry Haddow, EMNLP 2016 (long paper).
[Paper: pdf] [Data: github] [Demo]

Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study
Elior Sulem, Omri Abend and Ari Rappoport,
ACL 2015 Workshop on Semantics-Driven Statistical Machine Translation (S2MT).
[Paper: pdf]

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013 (long paper)
[Paper: pdf]

UCCA: A Semantics-based Grammatical Annotation Scheme
Omri Abend and Ari Rappoport, IWCS 2013 (long paper)
[Paper: pdf]

Theses

Measuring Semantic Preservation in Machine Translation with HCOMET: Human Cognitive Metric for Evaluating Translation
Pedro Marinotti, MSc Thesis,
The University of Edinburgh, 2014
[Paper: pdf]

Integration of a cognitive annotation into machine translation: Theoretical foundations and bilingual corpus analysis
Elior Sulem, MSc Thesis,
The Hebrew University of Jerusalem, 2014
[Paper: pdf]

Semi-supervised identification of scene-evoking nouns in UCCA
Amit Beka, MSc Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Grammatical Annotation Founded on Semantics: A Cognitive Linguistics Approach to Grammatical Corpus Annotation
Omri Abend, PhD Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Reports

Distinguishing Human Translations and Machine Outputs with UCCA
Michal Kessler, Lab Report,
The Hebrew University of Jerusalem, 2019
[Paper: pdf]

Contact

For any questions or feedback, please email Omri Abend at oabend@cs.huji.ac.il.