Researchers from Stanford, Princeton, and Cornell have developed a new benchmark to more accurately evaluate the coding abilities of large language models (LLMs). Called CodeClash, the new benchmark ...
Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. LMArena.ai, which is an ...
OpenAI’s GPT-5.5 has overtaken Google’s Gemini 3.1 Pro in key AI benchmarks, scoring 59 on the Intelligence Index versus Gemini’s 57. The April 2026 release brings faster performance, stronger coding ...
A new report today from code quality testing startup SonarSource SA is warning that while the latest large language models may be getting better at passing coding benchmarks, at the same time they are ...
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...
ChatGPT 4.1 is now rolling out, and it's a significant leap from GPT 4o, but it fails to beat the benchmark set by Google Gemini. Yesterday, OpenAI confirmed that developers with API access can try as ...
PewDiePie has revealed he spent months fine-tuning his own AI model, claiming it temporarily outperformed ChatGPT on a coding benchmark. In a new YouTube video, the creator explained that the project ...
OpenAI's newly released ChatGPT 5.5 outperforms Cloud Opus 4.7 in coding benchmarks and allows developers to build complex 3D ...