There was no nerf - this meme needs to die.

smt88 · 2026-04-18T22:08:14 1776550094

What exactly happened then? How did we all have this collective hallucination?

solenoid0937 · 2026-04-18T22:12:44 1776550364

Collective hallucinations are common. Mandela effect, people thinking FB is listening to your microphone because they see relevant ads, etc

This is a common phenomenon that all humans pattern match to things we expect. When we learn a new vocabulary word you see it everywhere for the next two days. When we think Claude might be nerfed, we overindex on every instance of Claude underperforming.

The only way to account for this is credulous, hard data. Like benchmarks over time. To this day no one has provided evidence that Claude Code, when fixed to the same thinking level, has had degraded performance.

al_borland · 2026-04-19T00:00:36 1776556836

Are there any good ways to benchmark models over time that don't fall victim to Goodhart's law? It seems that once the benchmark is defined, the AI will train on it, and it will become effectively meaningless.

I read many articles about AIs doing extremely well on various tests in graduate or PhD level programs. But these tests are well defined. A professor put the same models though his freshman CS class and most of them failed.

solenoid0937 · 2026-04-19T13:17:09 1776604629

These models don't learn continuously, they are a static snapshot one training is finished. You only need a new benchmark once new models are published (or you need a private benchmark, in which case you don't need to update the benchmark at all)