Release of Science Mission Directorate (SMD) Large Language Model (LLM) Built by NASA SMD and IBM Research

NASA’s SMD Artificial Intelligence (AI) and Machine Learning (ML) working group in collaboration with IBM Research, has developed a specialized language model. The model is trained on scientific corpus from relevant publications such as NASA Astrophysics Data System (ADS), the American Geophysical Union (AGU), the American Meteorological Society (AMS) and PubMed. The model shows improvements in performance on scientific benchmarks like BLURB (Biomedical Language Understanding and Reasoning Benchmark) and SQUAD2 (Stanford Question Answering Dataset). The initial version is available on Hugging Face: https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1. Additionally, a sentence-transformer model, built upon this domain-adapted encoder model, is also made available through Hugging Face https://huggingface.co/nasa-impact/nasa-smd-ibm-st.

Plans are underway to utilize these models for various data stewardship activities, such as assigning dataset keywords, improving documentation, and enhancing search and discovery processes. Presently, these models are being integrated into the workflows of NASA’s Science Discovery Engine for better search results.

SDE home page
Scroll to Top