OmniCode: A Benchmark for Evaluating Software Development Agents
Published in Submitted to ICLR 2026 (Under Review), 2026
We propose a benchmark containing a broader and more diverse set of tasks for code-generated AI agents.
Recommended citation: Sonwane, Atharv*, Eng-Shen Tu*, Wei-Chung Lu*, Claas Beger*, Carter Larsen, Debjit Dhar, Rachel Chen et al. "OmniCode: A Benchmark for Evaluating Software Engineering Agents." arXiv preprint arXiv:2602.02262 (2026).
Download Paper
