To preserve biodiversity, research to understand its underlying mechanisms is needed which requires integrated data. An increasing amount of heterogeneous data is generated and publicly shared in biodiversity research.
There are also a lot of efforts to semantically describe biodiversity datasets and research outputs. Multiple ontologies, like ENVO and IOBC, model specific parts of the domain.
However, in order to support integrative biodiversity research, there is a growing need to bridge between the more refined biodiversity concepts and general concepts provided by the foundational ontologies.
We propose the design of a core ontology for the biodiversity domain using a semi-automatic approach.
We make use of the fusion/merge strategy, where the new ontology is developed by assembling and reusing one or more ontologies
Our design is guided by data from several databases in the biodiversity field.
We develop a four-stage pipeline involving biodiversity experts and computer scientists at different phases.
Using automated approaches of clustering and the help of biodiversity experts, we generate the list of core concepts.
How do we reach the final outcome, step by step ...
The aim of this step is to get sufficient data sources from which we can extract relevant terms.
Keywords, such as ‘abundance’, ‘benthic’, ‘biomass’, ‘carbon’, ‘climate change’, ‘decomposition’, ‘earthworms’, ‘ecosystem’.
The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual Werkstatt for Digitization in the Sciences (K3, P5)” within the scope of the program line “Breakthroughs: Exploring Intelligent Systems for Digitization” - explore the basics, use applications”. Alsayed Algergawy’ work has been funded by the Deutsche Forschungsgemeinschaft (DFG) as part of CRC 1076 AquaDiva. Our sincere thanks to Tina Heger (Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB)) as the domain expert.
Löffler, F., Abdelmageed, N., Babalou, S., Kaur, P., König-Ries, B.: Tag me if you can! semantic annotation of biodiversity metadata with the qemp corpus and the biodivtagger. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4557–4564 (2020)