
It’s not simple being one in every of Silicon Valley’s favourite benchmarks.
SWE-Bench (pronounced “swee bench”) launched in November 2024 as a technique to consider an AI mannequin’s coding ability. It has since rapidly turn into some of the standard exams in AI. A SWE-Bench rating has turn into a mainstay of main mannequin releases from OpenAI, Anthropic, and Google—and outdoors of basis fashions, the fine-tuners at AI companies are in fixed competitors to see who can rise above the pack.
Regardless of all of the fervor, this isn’t precisely a truthful evaluation of which mannequin is “higher.” Entrants have begun to sport the system—which is pushing many others to wonder if there’s a greater technique to truly measure AI achievement. Learn the complete story.
—Russell Brandom
Did solar energy trigger Spain’s blackout?
At roughly noon on Monday, April 28, the lights went out in Spain. The grid blackout, which prolonged into elements of Portugal and France, affected tens of hundreds of thousands of individuals—flights have been grounded, cell networks went down, and companies closed for the day.
Over every week later, officers nonetheless aren’t solely certain what occurred, however some have advised that renewables could have performed a task, as a result of simply earlier than the outage occurred, wind and photo voltaic accounted for about 70% of electrical energy era. Others, together with Spanish authorities officers, insist that it’s too early to assign blame.
It’ll take weeks to get the complete report, however we do know just a few issues about what occurred. Listed below are just a few takeaways that would assist our future grid.