Scalable Spatio-Temporal
SE(3) Diffusion for
Long-Horizon Protein Dynamics

We present STAR-MD, a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales.

1ByteDance Seed2Georgia Tech3UCLA
Overview

Abstract

Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics, but their computational cost limits access to biologically relevant timescales. Recent generative models have shown promise in accelerating simulations, yet they struggle with long-horizon generation due to architectural constraints, error accumulation, and inadequate modeling of spatiotemporal dynamics.

We present STAR-MD (Spatio-Temporal Autoregressive Rollout for Molecular Dynamics), a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales. Our key innovation is a causal diffusion transformer with joint spatiotemporal attention that efficiently captures complex space-time dependencies while avoiding the memory bottlenecks of existing methods.

On the standard ATLAS benchmark, STAR-MD achieves state-of-the-art performance across all metrics-substantially improving conformational coverage, structural validity, and dynamic fidelity compared to previous methods. STAR-MD successfully extrapolates to generate stable microsecond-scale trajectories where baseline methods fail catastrophically, maintaining high structural quality throughout the extended rollout.

Our comprehensive evaluation reveals severe limitations in current models for long-horizon generation, while demonstrating that STAR-MD's joint spatiotemporal modeling enables robust dynamics simulation at biologically relevant timescales, paving the way for accelerated exploration of protein function.

Architecture

Method Overview

STAR-MD uses a causal diffusion transformer with joint spatio-temporal attention for autoregressive protein trajectory generation.

Figure 1:STAR-MD generation pipeline overview
Inputs
Seq:MGLSD...PVEK
Frozen OpenFold
Structure
Autoregressive Loop
STAR-MD Model
...
History
Current
KV Cache
singles
pairs
Diffusion LoopN x
Invariant
Point
Attention
Joint S x T
Attention
Backbone
Update
New
Frame
Output: Trajectory
0ns
2ns
...
996ns
1us
Figure 1: Overview of STAR-MD generation. Input contains protein sequence and a starting conformation. In autoregressive diffusion generation, structural information of previously generated conformations and current noisy conformations are encoded into single and pair representations. A joint spatiotemporal attention block captures context information to update the single representation of the current frame.
Benchmarks

Results

We evaluate STAR-MD on the ATLAS benchmark across multiple timescales (100 ns, 240 ns, and 1 us). Results show that STAR-MD achieves state-of-the-art performance in conformational coverage, structural validity, and dynamic fidelity compared to existing methods. Best non-oracle values are highlighted with *.

Table 1: ATLAS 100 ns Evaluation

Coverage & Validity (JSD, Rec), Dynamic Fidelity (tICA, RMSD, AutoCor, VAMP-2), and Structural Validity (CA%, AA%, CA+AA%) metrics.

Scroll horizontally to see all metrics

ModelJSD (down)Rec (up)tICA (up)RMSD (down)AutoCor (down)VAMP-2 (down)CA% (up)AA% (up)CA+AA% (up)
MD (Oracle)
0.310.670.170.000.000.0298.3798.0796.43
0.56+/-0.010.28+/-0.010.12+/-0.000.38+/-0.010.05+/-0.000.38+/-0.0171.83+/-1.9095.03+/-0.5968.31+/-2.20
0.59+/-0.010.20+/-0.01N/A3.31+/-0.060.12+/-0.011.56+/-0.0110.58+/-0.090.82+/-0.100.47+/-0.04
0.52+/-0.010.36+/-0.010.15+/-0.010.20+/-0.000.08+/-0.000.47+/-0.0256.94+/-0.5292.47+/-0.2552.06+/-0.36
STAR-MDOurs
*0.43+/-0.01*0.54+/-0.01*0.17+/-0.00*0.07+/-0.02*0.02+/-0.00*0.10+/-0.02*86.81+/-0.64*98.18+/-0.05*85.29+/-0.62

Table 2a: ATLAS 240 ns Evaluation

Short-horizon evaluation comparing trajectory quality over 240 nanoseconds.

Scroll horizontally to see all metrics

ModelJSD (down)Rec (up)RMSD (down)AutoCor (down)CA% (up)AA% (up)CA+AA% (up)
MD (Oracle)
0.260.750.010.0099.5396.8396.36
0.52+/-0.010.38+/-0.010.48+/-0.010.25+/-0.0163.25+/-2.1087.83+/-1.1356.60+/-2.10
0.57+/-0.030.20+/-0.031.76+/-0.050.14+/-0.018.96+/-0.250.94+/-0.120.63+/-0.16
0.51+/-0.010.42+/-0.020.35+/-0.010.39+/-0.0144.71+/-1.5573.13+/-0.8436.51+/-1.22
STAR-MDOurs
*0.44+/-0.01*0.59+/-0.01*0.20+/-0.02*0.03+/-0.01*85.16+/-1.91*97.57+/-0.13*83.15+/-1.99

Table 2b: ATLAS 1 us Evaluation

Long-horizon evaluation demonstrating STAR-MD's stability over microsecond timescales.

Scroll horizontally to see all metrics

ModelJSD (down)Rec (up)RMSD (down)AutoCor (down)CA% (up)AA% (up)CA+AA% (up)
MD (Oracle)
0.230.910.000.0096.2586.5082.75
0.56+/-0.010.36+/-0.030.37+/-0.020.39+/-0.0136.11+/-7.3456.99+/-4.5224.81+/-4.30
0.650.200.78+/-0.03*0.04+/-0.019.64+/-0.020.19+/-0.000.06+/-0.00
0.55+/-0.020.45+/-0.020.33+/-0.020.38+/-0.0354.74+/-1.7962.32+/-3.4336.91+/-1.39
STAR-MDOurs
*0.46+/-0.01*0.61+/-0.02*0.13+/-0.020.10+/-0.02*88.47+/-1.09*89.81+/-0.65*79.93+/-1.04

Figure 3: Long-horizon Validity Analysis

Comparison of CA + AA Validity (%) across different timescales (100 ns, 240 ns, and 1 us). STAR-MD (red) maintains high structural validity even over microsecond-scale rollouts, significantly outperforming baseline methods which degrade rapidly. Shaded regions indicate standard error across multiple runs.

100 ns

Time (ns)

240 ns

Time (ns)

1000 ns

Time (ns)
MD (Oracle)
MDGen
AlphaFolding
ConfRover
STAR-MD
Visualizations

Generated Trajectories

Explore generated protein dynamics trajectories in 3D. STAR-MD produces stable, physically plausible conformational changes.

Loading trajectory data...
Trajectory 1 / 12

STAR-MD

Spatio-Temporal Autoregressive Rollout for Molecular Dynamics. A scalable SE(3)-equivariant diffusion model for microsecond-scale protein trajectory generation.