About

Data Example

The training data will consist of text snippets from sustainability reports, along with their corresponding content classification labels. Each snippet is 3–5 sentences long and represents different reporting criteria sections such as "Ressourcenmanagement" or "Wesentlichkeit". Furthermore the last sentence of the snippet is annotated regarding it's verifiability. You can find some trial data here . A sample snippet and its classification might look like this:

Content Class:
Resource Management

Verifiability Rating:
0.8

The training data and development data are constructed with permission from publicly available German-language company reports indexed in the German Sustainability Code (Deutscher Nachhaltigkeitskodex, DNK). Text snippets are sampled semi-automatically and then processed to ensure well-formedness and to exclude personally identifiable information.

Important Dates

Subtasks

Task A: Content Classification

Participants are tasked with assigning a content class to text snippets from sustainability reports. The text snippets are sampled from different sections of German-language company reports indexed in the German Sustainability Code (DNK). Each snippet corresponds to one of the predefined reporting criteria in DNK, and the goal is to classify the snippet according to its corresponding criterion section.

Evaluation: Accuracy
Classes:

Strategy	Process Management	Environment	Society
1. Strategic Analysis and Action 2. Materiality 3. Objectives 4. Depth of the Value Chain	5. Responsibility 6. Rules and Processes 7. Control 8. Incentive Systems 9. Stakeholder Engagement 10. Innovation and Product Management	11. Usage of Natural Resources 12. Resource Management 13. Climate-Relevant Emissions	14. Employment Rights 15. Equal Opportunities 16. Qualifications 17. Human Rights 18. Corporate Citizenship 19. Political Influence 20. Conduct that Complies with the Law and Policy

Task B: Verifiability Rating

This task focuses on rating the verifiability of the last sentence in each text snippet, with the prior sentences serving as context. The verifiability score is assigned on a scale from 0.0 (not verifiable) to 1.0 (clearly verifiable). The task is evaluated by the Kendall τ rank correlation with human ratings.

Evaluation: Kendall τ rank correlation (version B)

Dataset Information

Data Source

The training data (ca. 1000 samples), development data (ca. 300 samples), and evaluation data (ca. 500 samples) are constructed from publicly available German-language company reports indexed in the German Sustainability Code (Deutscher Nachhaltigkeitskodex, DNK).

DNK reports always follow the same structure, consisting of 20 sections, each corresponding to a reporting criterion (e.g. `Incentive Systems' or `Usage of Natural Resources'). Each criterion section not only deals with a separate topic, but also fulfills a particular communicative purpose, which is reflected in the hierarchical structure of the report outline.

One goal of this shared task is to determine the extent to which the texts pertaining to the different sections diverge not only in content but also style and other linguistic properties.

Each input to be analyzed in Tasks A and B is a text snippet of 4 consecutive sentences. Text snippets are selected semi-automatically, based mostly on balanced random sampling, with some filtering steps to exclude structured data such as tables, and personally identifiable information.

Anonymization

The text snippets are preprocessed with a named entity recognition (NER) tool, and then checked manually for further personally identifiable information. Personally identifiable words or phrases are replaced by one of the tags below:

[ORG] - Names of companies and organizations
[PERSON] - Names of persons
[PRODUCT] - Products of a company or organization, e.g. name of a software
[KONTAKT] - Addresses, phone numbers, emails
[LINK] - Links
[NAME] - Everything else that might identify a company or person but does not fit one of the other categories

Location names (e.g. Berlin) and general terms for types of companies (e.g. Sparkassen) are not anonymized, except if they are part of the name of a specific organization (e.g. Stadtsparkasse Augsburg). Certain large government and non-government organizations referenced in their role of establishing sustainability reporting standards, such as laws and certificates, are not anonymized.

Data Annotation

Task A: The challenge is to assign a suitable content class to each text sample. The label for each instance is the name of the DNK reporting criterion section the text snippet was sampled from.

Task B: The challenge is to rate the verifiability of the statement (e.g. goal or state description) made in the last sentence of each text snippet, with the previous sentences given as context for better understanding. We use a numerical score between 0.0 (not verifiable) and 1.0 (clearly verifiable), and predictions will be evaluated by their Kendall Tau-b rank correlation with human ratings.

Task B ratings are derived from human annotation on a four-point scale:

Not verifiable (0.0)
Rather not verifiable (0.33)
Somewhat verifiable (0.67)
Clearly verifiable (1.0)

The annotation was executed via paid crowdsourcing. We collected ~5 crowd annotations per sample, took the majority vote, and in the case where the vote was tied, computed the arithmetic mean between the tied values. In these cases, we also report the standard deviation over the tied values (in cases where there was a unique majority vote, the standard deviation is 0.0). The information about the standard deviation is not strictly part of the shared task, but may be used by participants to gain insight into the uncertainty/difficulty of individual samples.

IMPORTANT: Participants should not rely on standard deviation (task_b_stdev) and publication year of the sample to make their predictions, as this information may not be given in the final evaluation data. The evaluation data will also contain reports from the years 2022 and 2023, whereas the training, development, and validation data only spans 2017-2021.

Questions & Answers

Can I submit my results to special tracks?

Everyone automatically competes in the Open Track. We also welcome reproducible and explainable models, which may be smaller and more efficient than the latest state-of-the-art models. Depending on the number of submissions, we may group them into separate tracks to increase fairness.
How can I participate?

You can register on CodaBench for the subtask you want to participate in: Task A / Task B
Will special prizes be awarded?

Prizes will be awarded for the best overall performance and for special achievements such as sustainability-focused models, insightful analysis, and interdisciplinary approaches.

Contact

Shared Task Email (contact for all questions): sustaineval@gmail.com

Jakob Prange, Universität Augsburg (contact for task specific questions): jakob.prange@uni-a.de

Charlott Jakob, TU Berlin (contact for organisational questions): c.jakob@tu-berlin.de

Annemarie Friedrich, Universität Augsburg