BiodivOnto

Introduction

To preserve biodiversity, research to understand its underlying mechanisms is needed which requires integrated data. An increasing amount of heterogeneous data is generated and publicly shared in biodiversity research. There are also a lot of efforts to semantically describe biodiversity datasets and research outputs. Multiple ontologies, like ENVO and IOBC, model specific parts of the domain. However, in order to support integrative biodiversity research, there is a growing need to bridge between the more refined biodiversity concepts and general concepts provided by the foundational ontologies.

We propose the design of a core ontology for the biodiversity domain using a semi-automatic approach.

We make use of the fusion/merge strategy, where the new ontology is developed by assembling and reusing one or more ontologies

Our design is guided by data from several databases in the biodiversity field.

We develop a four-stage pipeline involving biodiversity experts and computer scientists at different phases.

Using automated approaches of clustering and the help of biodiversity experts, we generate the list of core concepts.

The aim of this step is to get sufficient data sources from which we can extract relevant terms.

Keywords, such as ‘abundance’, ‘benthic’, ‘biomass’, ‘carbon’, ‘climate change’, ‘decomposition’, ‘earthworms’, ‘ecosystem’.

Semedico, BEFChina, and data.world.

Output: 100 abstracts, more than 50 tables.

Step 1

Data Acquisition

Try Right Now

Manual annotation of the collected data following the annotation scheme in QEMP [1]

Ontologies: OBC, SWEET, ECOCORE, ECSO, CBO, BCO, and the Biodiversity A-Zdictionary

Compound keyword expanded: "photosynthetic O2 production" -> ["photosynthetic", "O2", "O2 production", "photosynthetic O2 production"]

Keywords from external sources are included: QEMP, AquaDiva [2] and Soil related [3]

Step 2

Keywords Extraction

Try Right Now

Keywords normalization.

Case insensitive.

Singular Form.

Exclude spelling mistakes.

1107 unique keyword = 1.8 QEMP corpus.

Step 3

Keywords Filtration

Try Right Now

Distance-based clustering using word2vec [4] representation

Manual revision of the clustered keywords

Word-Net [5] similarity measure among the final seeds. If Similarity 0.0 take the seed as a core concept, otherwise, check BioPortal for a common ancestor

Relations are determined by a Biodiversity expert

Step 4

Concepts & Relations Determination

Try Right Now

Core Concepts in Existing Ontologies

Category	Ontology Modules	Terms sample inside category
Environment	ENVO, ECOCORE, ECSO, PATO	groundwater, garden
Organism	ENVO, ECOCORE, ECSO, BCO	mammal, insect
Phenomena	ENVO, PATO, BCO	decomposition, colonization
Quality	ENVO,PATO, CBO, ECSO	volume, age
Landscape	ENVO	grassland, forest
Trait	BCO	texture, structure
Ecosystem	ENVO, ECOCORE, ECSO, PATO	biome, habitat
Matter	ENVO, ECSO	carbon, H2O

Team

Nora Abdelmageed

FUSION Group

Ph.D. Student

nora-abdelmageed

NoYo25

@Nora.Youssef

Contact

Alsayed Algergawy

FUSION Group

Post Doc.

alsayed-algergawy

alsayedal

@AAlgergawy

Contact

Sheeba Samuel

FUSION Group

Post Doc.

sheeba-samuel

Sheeba-Samuel

@sheebasamuel

Contact

Birgitta König-Ries

FUSION Group

Professor

birgitta-konig-ries

fusion-jena

@birgittaries

Contact

Acknowledgments

The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual Werkstatt for Digitization in the Sciences (K3, P5)” within the scope of the program line “Breakthroughs: Exploring Intelligent Systems for Digitization” - explore the basics, use applications”. Alsayed Algergawy’ work has been funded by the Deutsche Forschungsgemeinschaft (DFG) as part of CRC 1076 AquaDiva. Our sincere thanks to Tina Heger (Berlin-Brandenburg Institute of Advanced Biodiversity Research (BBIB)) as the domain expert.

Explore

References

Löffler, F., Abdelmageed, N., Babalou, S., Kaur, P., König-Ries, B.: Tag me if you can! semantic annotation of biodiversity metadata with the qemp corpus and the biodivtagger. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4557–4564 (2020)
AquaDiva project http://www.aquadiva.uni-jena.de/
Udovenko, V., Algergawy, A.: Entity extraction in the ecological domain–a practical guide. BTW 2019–Workshopband (2019)
Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al.’s negative sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Pedersen, T., Patwardhan, S., Michelizzi, J., et al.: Wordnet: Similarity-measuring the relatedness of concepts. In: AAAI. vol. 4, pp. 25–29 (2004)

Explore

BiodivOnto: Towards a Core Ontology for Biodiversity

Introduction

Methodology

Data Acquisition

Keywords Extraction

Keywords Filtration

Concepts & Relations Determination

Core Concepts in Existing Ontologies

Team

Acknowledgments

References