Leaderboard

How well can AI agents build a compiler from scratch? Real results from OpenHands SDK and CLI agents with infinite self-loop iteration.

YatCC YatCC-Hard
#ModelBackend T0T1T2T3T4T5 Mean RewardPass ScorePipeline🔄

📊 Metrics Formula

Mean Reward = Σ(score[i] × weight[i] × bonus[i]) / Σ(weight[i])

weight = [5%, 20%, 20%, 15%, 30%, 10%] | bonus = 1.2 (no resurrection) / 1.0 (resurrected)

Pass Score = Σ(pass[i] × pass_bonus[i]) / (6 × 1.5) × 100

pass_bonus = 1.5 (no resurrection) / 1.0 (resurrected)

Last updated: | Powered by EvoBench