IMPACT NLP Bert-E: Training a Language Model Tailored for Earth Science Use-Cases

The Machine Learning Team at IMPACT trained an earth science focused language model called BERT-E. It will be used for NASA-related downstream tasks such as GCMD Keyword tagging, building Knowledge Discoverability Frameworks and Domain Entity Extraction. We collected about 270,000 full text papers, directly from American Geophysical Union and American Meteorology Society to train the model. Upon testing, we found that BERT-E beats SciBERT, A widely used general science language model by 2% for the GCMD keyword recommendation task. It is also made publicly available in HuggingFace’s ModelHub.

IMPACT logo
Scroll to Top