About

Data Example

The training data will consist of text snippets from sustainability reports, along with their corresponding content classification labels. Each snippet is 3–5 sentences long and represents different reporting criteria sections such as "Ressourcenmanagement" or "Wesentlichkeit". Furthermore the last sentence of the snippet is annotated regarding it's verifiability. You can find some trial data here . A sample snippet and its classification might look like this:
data-example

Content Class:
Resource Management

Verifiability Rating:
0.8

The training data and development data are constructed with permission from publicly available German-language company reports indexed in the German Sustainability Code (Deutscher Nachhaltigkeitskodex, DNK). Text snippets are sampled semi-automatically and then processed to ensure well-formedness and to exclude personally identifiable information.

Important Dates

Dataset Information

Data Source

The training data (ca. 1000 samples), development data (ca. 300 samples), and evaluation data (ca. 500 samples) are constructed from publicly available German-language company reports indexed in the German Sustainability Code (Deutscher Nachhaltigkeitskodex, DNK).

DNK reports always follow the same structure, consisting of 20 sections, each corresponding to a reporting criterion (e.g. `Incentive Systems' or `Usage of Natural Resources'). Each criterion section not only deals with a separate topic, but also fulfills a particular communicative purpose, which is reflected in the hierarchical structure of the report outline.

One goal of this shared task is to determine the extent to which the texts pertaining to the different sections diverge not only in content but also style and other linguistic properties.

Each input to be analyzed in Tasks A and B is a text snippet of 4 consecutive sentences. Text snippets are selected semi-automatically, based mostly on balanced random sampling, with some filtering steps to exclude structured data such as tables, and personally identifiable information.

Anonymization

The text snippets are preprocessed with a named entity recognition (NER) tool, and then checked manually for further personally identifiable information. Personally identifiable words or phrases are replaced by one of the tags below:

  • [ORG] - Names of companies and organizations
  • [PERSON] - Names of persons
  • [PRODUCT] - Products of a company or organization, e.g. name of a software
  • [KONTAKT] - Addresses, phone numbers, emails
  • [LINK] - Links
  • [NAME] - Everything else that might identify a company or person but does not fit one of the other categories

Location names (e.g. Berlin) and general terms for types of companies (e.g. Sparkassen) are not anonymized, except if they are part of the name of a specific organization (e.g. Stadtsparkasse Augsburg). Certain large government and non-government organizations referenced in their role of establishing sustainability reporting standards, such as laws and certificates, are not anonymized.

Data Annotation

Task A: The challenge is to assign a suitable content class to each text sample. The label for each instance is the name of the DNK reporting criterion section the text snippet was sampled from.

Task B: The challenge is to rate the verifiability of the statement (e.g. goal or state description) made in the last sentence of each text snippet, with the previous sentences given as context for better understanding. We use a numerical score between 0.0 (not verifiable) and 1.0 (clearly verifiable), and predictions will be evaluated by their Kendall Tau-b rank correlation with human ratings.

Task B ratings are derived from human annotation on a four-point scale:

  • Not verifiable (0.0)
  • Rather not verifiable (0.33)
  • Somewhat verifiable (0.67)
  • Clearly verifiable (1.0)

The annotation was executed via paid crowdsourcing. We collected ~5 crowd annotations per sample, took the majority vote, and in the case where the vote was tied, computed the arithmetic mean between the tied values. In these cases, we also report the standard deviation over the tied values (in cases where there was a unique majority vote, the standard deviation is 0.0). The information about the standard deviation is not strictly part of the shared task, but may be used by participants to gain insight into the uncertainty/difficulty of individual samples.

IMPORTANT: Participants should not rely on standard deviation (task_b_stdev) and publication year of the sample to make their predictions, as this information may not be given in the final evaluation data. The evaluation data will also contain reports from the years 2022 and 2023, whereas the training, development, and validation data only spans 2017-2021.

Questions & Answers

Contact

  • Shared Task Email (contact for all questions): sustaineval@gmail.com
  • Jakob Prange, Universität Augsburg (contact for task specific questions): jakob.prange@uni-a.de
  • Charlott Jakob, TU Berlin (contact for organisational questions): c.jakob@tu-berlin.de
  • Annemarie Friedrich, Universität Augsburg