The aim of this step is to get sufficient data sources from which we can extract relevant terms.
Keywords, such as ‘abundance’, ‘benthic’, ‘biomass’, ‘carbon’, ‘climate change’, ‘decomposition’, ‘earthworms’, ‘ecosystem’.
Semedico, BEFChina, and data.world.
Output: 100 abstracts, more than 50 tables.
Manual annotation of the collected data following the annotation scheme in QEMP 
Ontologies: OBC, SWEET, ECOCORE, ECSO, CBO, BCO, and the Biodiversity A-Zdictionary
Compound keyword expanded: "photosynthetic O2 production" -> ["photosynthetic", "O2", "O2 production", "photosynthetic O2 production"]
Keywords from external sources are included: QEMP, AquaDiva  and Soil related 
Exclude spelling mistakes.
1107 unique keyword = 1.8 QEMP corpus.
Distance-based clustering using word2vec  representation
Manual revision of the clustered keywords
Word-Net  similarity measure among the final seeds. If Similarity 0.0 take the seed as a core concept, otherwise, check BioPortal for a common ancestor
Relations are determined by a Biodiversity expert
|Category||Ontology Modules||Terms sample inside category|
|Environment||ENVO, ECOCORE, ECSO, PATO||groundwater, garden|
|Organism||ENVO, ECOCORE, ECSO, BCO||mammal, insect|
|Phenomena||ENVO, PATO, BCO||decomposition, colonization|
|Quality||ENVO,PATO, CBO, ECSO||volume, age|
|Ecosystem||ENVO, ECOCORE, ECSO, PATO||biome, habitat|
|Matter||ENVO, ECSO||carbon, H2O|