BiodivOnto: Towards a Core Ontology for Biodiversity

We aim to develop a core ontology for the biodiversity domain establishing links between the foundational and domain-specific ontologies.
Keywords
Biodiversity. Knowledge Representation. Core Ontology.

Introduction

To preserve biodiversity, research to understand its underlying mechanisms is needed which requires integrated data. An increasing amount of heterogeneous data is generated and publicly shared in biodiversity research. There are also a lot of efforts to semantically describe biodiversity datasets and research outputs. Multiple ontologies, like ENVO and IOBC, model specific parts of the domain. However, in order to support integrative biodiversity research, there is a growing need to bridge between the more refined biodiversity concepts and general concepts provided by the foundational ontologies.
We propose the design of a core ontology for the biodiversity domain using a semi-automatic approach.
We make use of the fusion/merge strategy, where the new ontology is developed by assembling and reusing one or more ontologies
Our design is guided by data from several databases in the biodiversity field.
We develop a four-stage pipeline involving biodiversity experts and computer scientists at different phases.
Using automated approaches of clustering and the help of biodiversity experts, we generate the list of core concepts.

Methodology

How do we reach the final outcome, step by step ...

The aim of this step is to get sufficient data sources from which we can extract relevant terms.

Keywords, such as ‘abundance’, ‘benthic’, ‘biomass’, ‘carbon’, ‘climate change’, ‘decomposition’, ‘earthworms’, ‘ecosystem’.

Semedico, BEFChina, and data.world.

Output: 100 abstracts, more than 50 tables.

Step 1

Data Acquisition

Try Right Now

Manual annotation of the collected data following the annotation scheme in QEMP [1]

Ontologies: OBC, SWEET, ECOCORE, ECSO, CBO, BCO, and the Biodiversity A-Zdictionary

Compound keyword expanded: "photosynthetic O2 production" -> ["photosynthetic", "O2", "O2 production", "photosynthetic O2 production"]

Keywords from external sources are included: QEMP, AquaDiva [2] and Soil related [3]

Step 2

Keywords Extraction

Try Right Now

Keywords normalization.

Case insensitive.

Singular Form.

Exclude spelling mistakes.

1107 unique keyword = 1.8 QEMP corpus.

Step 3

Keywords Filtration

Try Right Now

Distance-based clustering using word2vec [4] representation

Manual revision of the clustered keywords

Word-Net [5] similarity measure among the final seeds. If Similarity 0.0 take the seed as a core concept, otherwise, check BioPortal for a common ancestor

Relations are determined by a Biodiversity expert

Step 4

Concepts & Relations Determination

Try Right Now

Core Concepts in Existing Ontologies

Category Ontology Modules Terms sample inside category
Environment ENVO, ECOCORE, ECSO, PATO groundwater, garden
Organism ENVO, ECOCORE, ECSO, BCO mammal, insect
Phenomena ENVO, PATO, BCO decomposition, colonization
Quality ENVO,PATO, CBO, ECSO volume, age
Landscape ENVO grassland, forest
Trait BCO texture, structure
Ecosystem ENVO, ECOCORE, ECSO, PATO biome, habitat
Matter ENVO, ECSO carbon, H2O

Team

FUSION Group
Ph.D. Student
FUSION Group
Post Doc.

FUSION Group
Post Doc.
FUSION Group
Professor

Acknowledgments

The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual Werkstatt for Digitization in the Sciences (K3, P5)” within the scope of the program line “Breakthroughs: Exploring Intelligent Systems for Digitization” - explore the basics, use applications”. Alsayed Algergawy’ work has been funded by the Deutsche Forschungsgemeinschaft (DFG) as part of CRC 1076 AquaDiva. Our sincere thanks to Tina Heger (Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB)) as the domain expert.
Explore

References

  1. Löffler, F., Abdelmageed, N., Babalou, S., Kaur, P., König-Ries, B.: Tag me if you can! semantic annotation of biodiversity metadata with the qemp corpus and the biodivtagger. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4557–4564 (2020)
  2. AquaDiva project http://www.aquadiva.uni-jena.de/
  3. Udovenko, V., Algergawy, A.: Entity extraction in the ecological domain–a practical guide. BTW 2019–Workshopband (2019)
  4. Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al.’s negative sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
  5. Pedersen, T., Patwardhan, S., Michelizzi, J., et al.: Wordnet: Similarity-measuring the relatedness of concepts. In: AAAI. vol. 4, pp. 25–29 (2004)
Explore