The aim of this step is to get sufficient data sources from which we can extract relevant terms.
Keywords, such as ‘abundance’, ‘benthic’, ‘biomass’, ‘carbon’, ‘climate change’, ‘decomposition’, ‘earthworms’, ‘ecosystem’.
Semedico, BEFChina, and data.world.
Output: 100 abstracts, more than 50 tables.
Manual annotation of the collected data following the annotation scheme in QEMP [1]
Ontologies: OBC, SWEET, ECOCORE, ECSO, CBO, BCO, and the Biodiversity A-Zdictionary
Compound keyword expanded: "photosynthetic O2 production" -> ["photosynthetic", "O2", "O2 production", "photosynthetic O2 production"]
Keywords from external sources are included: QEMP, AquaDiva [2] and Soil related [3]
Keywords normalization.
Case insensitive.
Singular Form.
Exclude spelling mistakes.
1107 unique keyword = 1.8 QEMP corpus.
Distance-based clustering using word2vec [4] representation
Manual revision of the clustered keywords
Word-Net [5] similarity measure among the final seeds. If Similarity 0.0 take the seed as a core concept, otherwise, check BioPortal for a common ancestor
Relations are determined by a Biodiversity expert
Category | Ontology Modules | Terms sample inside category |
---|---|---|
Environment | ENVO, ECOCORE, ECSO, PATO | groundwater, garden |
Organism | ENVO, ECOCORE, ECSO, BCO | mammal, insect |
Phenomena | ENVO, PATO, BCO | decomposition, colonization |
Quality | ENVO,PATO, CBO, ECSO | volume, age |
Landscape | ENVO | grassland, forest |
Trait | BCO | texture, structure |
Ecosystem | ENVO, ECOCORE, ECSO, PATO | biome, habitat |
Matter | ENVO, ECSO | carbon, H2O |