Description
Effective data stewardship in research hinges upon the consistent and FAIR (Findable, Accessible, Interoperable, Reusable) representation of scientific variables across diverse environmental disciplines. Within the Helmholtz Earth and Environment DataHub initiative, we are therefore developing an innovative approach utilizing Large Language Models (LLMs) to support data producers by automating the semantic annotation of research data. Our service employs the community-driven I-ADOPT framework, which decomposes variable definitions from natural language descriptions into essential atomic components, ensuring naming consistency and interoperability.
In this poster, we present our approach to developing an LLM-based annotation service, highlighting key challenges and solutions as well as integration into higher-level infrastructures of the Helmholtz DataHub and beyond. The proposed annotation framework significantly streamlines the integration and harmonization of environmental data descriptions across domains such as climate, biodiversity, and atmospheric sciences, aligning closely with the objectives of the NFDI and the European Open Science Cloud (EOSC).
This contribution demonstrates how advanced semantic annotation tools can effectively support data stewardship in practical research contexts, enhancing reproducibility, interoperability, and collaboration within the scientific community.
Abstract | Poster |
---|