It’s not straightforward being certainly one of Silicon Valley’s favourite benchmarks.
SWE-Bench (pronounced “swee bench”) launched in November 2024 as a strategy to consider an AI mannequin’s coding talent. It has since shortly turn out to be one of the crucial well-liked checks in AI. A SWE-Bench rating has turn out to be a mainstay of main mannequin releases from OpenAI, Anthropic, and Google—and outdoors of basis fashions, the fine-tuners at AI corporations are in fixed competitors to see who can rise above the pack.
Regardless of all of the fervor, this isn’t precisely a truthful evaluation of which mannequin is “higher.” Entrants have begun to recreation the system—which is pushing many others to wonder if there’s a greater strategy to really measure AI achievement. Learn the total story.
—Russell Brandom
Did solar energy trigger Spain’s blackout?
At roughly noon on Monday, April 28, the lights went out in Spain. The grid blackout, which prolonged into components of Portugal and France, affected tens of tens of millions of individuals—flights had been grounded, cell networks went down, and companies closed for the day.
Over per week later, officers nonetheless aren’t totally positive what occurred, however some have instructed that renewables might have performed a task, as a result of simply earlier than the outage occurred, wind and photo voltaic accounted for about 70% of electrical energy technology. Others, together with Spanish authorities officers, insist that it’s too early to assign blame.
It’ll take weeks to get the total report, however we do know a couple of issues about what occurred. Listed here are a couple of takeaways that might assist our future grid.