GRAIL is better than LRMs across all model sizes
We compared the winrate of GRAIL and the reasoning model across different model sizes. We also tried GRAIL with DeepSeek-R1 models and the reasoning model with non-reasoning LLMs. The results demonstrate that the GRAIL agent can achieve impressive winrates across model sizes while the same cannot be said for the reasoning agent.
