
Nima Shoghi
ML PhD Student
Atlanta, GA
I'm a second-year ML PhD student at Georgia Tech, advised by Pan Li and Victor Fung. My research focuses on AI for science, developing foundation models and generative models for atomic-scale simulation.
Research areas:
- Scientific foundation models: Pre-training for atomistic property prediction (ICLR 2024, Digital Discovery 2025, ACS Catalysis 2023, J. Chem. Phys. 2022, NeurIPS 2025) and protein conformational dynamics (ICLR 2026)
- Spatio-temporal generative models: SE(3)-equivariant diffusion for long-horizon scientific simulation (ICLR 2026)
My latest work, STAR-MD (ICLR 2026), is the first generative model to simulate stable protein dynamics at microsecond timescales.
Recent Updates
STAR-MD, a generative model for long-horizon protein dynamics, was accepted at ICLR 2026!
Our paper on robust fine-tuning for molecular graph foundation models was accepted at NeurIPS 2025!
Started a Research Scientist Internship at ByteDance Research (ByteDance Seed), working on generative models for protein dynamics.
MatterTune, a platform for fine-tuning atomistic foundation models, was accepted at Digital Discovery!
Gave invited talks on pre-training for chemistry at various venues, including an invited keynote at the 2024 Machine Learning for Materials and Molecular Discoveries Symposium in Gothenburg, the AI for Science Institute (AISI) Beijing, KAUST, and SES AI.
JMP, a framework for pre-training large generalizable models for molecules and materials, was accepted at ICLR 2024!
Featured Publications
Selected research publications
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
Nima Shoghi, Yuxuan Liu, Yuning Shen, Rob Brekelmans, Pan Li, and Quanquan Gu
Introduces STAR-MD, a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales using a causal diffusion transformer with joint spatiotemporal attention.
From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
Nima Shoghi, Adeesh Kolluru, John Kitchin, Zachary Ulissi, C. Lawrence Zitnick, and Brandon Wood
Introduces Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that leverages diverse data to advance atomic property prediction across chemical domains, achieving state-of-the-art performance on 34 out of 40 downstream tasks.
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts
Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Félix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, and C. Lawrence Zitnick
Introduces the Open Catalyst 2022 (OC22) dataset, consisting of 62,331 DFT relaxations, to accelerate machine learning for oxide electrocatalysts and establish benchmarks for the field.
Lingyu Kong, Nima Shoghi, Guoxiang Hu, Pan Li, and Victor Fung
Introduces MatterTune, a modular platform that enables fine-tuning of pre-trained atomistic foundation models for materials science applications.
Experience
Selected professional and research experience

Research Scientist Intern, AI for Science
- Developed STAR-MD, a generative molecular dynamics model and the first to simulate stable protein dynamics at microsecond timescales---extrapolating up to 100x beyond training data where all prior methods fail catastrophically. Achieves SOTA on ATLAS with ~33% higher coverage and ~60% higher structural quality, providing up to 1000x speedup over traditional MD. (ICLR 2026)
- Designed an SE(3)-equivariant diffusion architecture with novel joint spatiotemporal attention that avoids cubic memory bottlenecks, scaling to proteins with 1000+ residues and rollouts of thousands of frames (10+ s).
- Built end-to-end pipeline for generative atomic dynamics: distributed training, physics-based relaxation, quality/diversity evaluation.

AI Resident, FAIR Chemistry Team
- Led the development of Joint Multi-domain Pre-training (JMP), a foundation model for atomic property prediction pre-trained on 120M+ structures from diverse chemical domains (catalysis and small molecules). Achieved SOTA on 35 of 41 downstream tasks---including out-of-distribution domains (large molecules, materials, protein-ligand complexes). (ICLR 2024*)
- Co-authored a perspective on challenges for large-scale generalizable ML potentials in catalyst discovery (ACS Catalysis 2022), and co-developed attention-based transfer learning methods for GNNs across molecular and catalyst domains (J Chem Phys 2022).
- Contributed to the Open Catalyst 2022 paper by running baseline model benchmarks for oxide electrocatalysts. (ACS Catal. 2023)

- Researching robust fine-tuning strategies for large-scale pre-trained GNN models.
- Co-developed MatterTune, an open-source platform for fine-tuning atomistic foundation models (UMA, JMP, EquiformerV2, MACE, etc.) with parameter-efficient methods; achieved near-SOTA on the MatBench Discovery benchmark. (Digital Discovery 2025)
- Contributed to a benchmark study of 8 robust fine-tuning methods across 6 molecular graph foundation models on 12 downstream tasks, informing the design of improved fine-tuning strategies for molecular property prediction. (NeurIPS 2025)
- Prior work at GT HPArch Lab: Developed efficient inference strategies for diffusion models (latent-space sampling, quantization) and memory-efficient training (~7x memory reduction during training and inference). (IEEE CAL 2021*, MemSys 2020*)
Education

PhD in Machine Learning, 4.0 GPA

M.S. in Computer Science (ML Focus), 4.0 GPA
