Developing Best Practices for Large Language Models in Environmental Science
Short Title: EnviroLLM Guidelines
PI: Charlotte H. Chang (Pomona College)
Co-PIs: Brian Robinson (McGill University), J.T. Erbaugh (The Nature Conservancy & Dartmouth College)
Environmental challenges are increasingly complex and pressing, requiring rigorous and rapid synthesis of broad bodies of research for evidence-based actions. To address this need, scientists are increasingly relying on artificial intelligence to analyze vast amounts of research and policy documents. However, they lack clear guidelines for how to use these powerful tools effectively and ethically to address pressing environmental concerns. Our working group brings together experts from research institutions, policy think-tanks, conservation organizations, and a primarily undergraduate institution (PUI) to develop best practices for using AI-powered text analysis in environmental evidence synthesis and policy analysis. By combining high-performance computing resources with undergraduate research experiences, we aim to create a model for inclusive environmental data science that bridges the gap between large research universities and PUIs. Working with undergraduate students through course-based research experiences, we will develop and test user-friendly tools for analyzing conservation literature and environmental policies. This approach not only advances environmental science but also creates new pathways for undergraduates to participate in cutting-edge research using NSF’s advanced computing infrastructure and helps train the scientific workforce of the 21st century. The resulting guidelines and tools will help researchers worldwide more easily and thoughtfully use AI for environmental evidence and policy syntheses, while our educational model will show how to involve undergraduate researchers in advanced computational text analysis projects. This work represents a crucial step toward more inclusive, ethical, and effective use of AI in environmental science, while developing materials to train diverse undergraduate students in environmental data science research.