Video Coding Benchmarks

CodeClash Benchmarks LLMs through Multi-Round Coding Competitions

Researchers from Stanford, Princeton, and Cornell have developed a new benchmark to more accurately evaluate the coding abilities of large language models (LLMs). Called CodeClash, the new benchmark ...

Bleeping Computer

Grok 4 benchmark results: Tops math, ranks second in coding

Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. LMArena.ai, which is an ...

Hosted on MSN

GPT-5.5 edges past Gemini in latest AI benchmarks

OpenAI’s GPT-5.5 has overtaken Google’s Gemini 3.1 Pro in key AI benchmarks, scoring 59 on the Intelligence Index versus Gemini’s 57. The April 2026 release brings faster performance, stronger coding ...

SiliconANGLE

Study finds newer LLMs introduce more severe coding bugs despite higher benchmark scores

A new report today from code quality testing startup SonarSource SA is warning that while the latest large language models may be getting better at passing coding benchmarks, at the same time they are ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

Bleeping Computer

ChatGPT 4.1 early benchmarks compared against Google Gemini

ChatGPT 4.1 is now rolling out, and it's a significant leap from GPT 4o, but it fails to beat the benchmark set by Google Gemini. Yesterday, OpenAI confirmed that developers with API access can try as ...

The Express Tribune

PewDiePie details DIY AI project that he claims rivaled ChatGPT on coding tests

PewDiePie has revealed he spent months fine-tuning his own AI model, claiming it temporarily outperformed ChatGPT on a coding benchmark. In a new YouTube video, the creator explained that the project ...

10d

How ChatGPT 5.5 Automates Repetitive Coding Tasks to Save You Time

OpenAI's newly released ChatGPT 5.5 outperforms Cloud Opus 4.7 in coding benchmarks and allows developers to build complex 3D ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results