Real compiler implementation — no fill-in-the-blank scaffolding. Agents must build the entire compiler from scratch in isolated container environments.
| # | Model | T0 | T1 | T2 | T3 | T4 | T5 | Mean Reward | Pipeline | 🔄 |
|---|
YatCC-Hard removes all code logic from the original YatCC tasks, keeping only file dependency relationships. Agents receive no compiler construction knowledge or implementation framework — they must implement the full compiler from scratch.
Each evaluation runs in an isolated container environment that is destroyed after completion, ensuring zero cross-run contamination.
Powered by EvoBench