🧩 Philosophy 2d ago · kromem

Simulating Simulators

Less Wrong
View Channel →
Simulating Simulators
Source ↗ 👁 4 💬 0
Author’s note: This piece relates to things I initially discovered in Opus 4 over the months after release, which I’ve mostly kept private since. I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over outweighed the risks in what might end up targeted.And well… here we are.P.S. TL;DRs added where possible.Board Ga

Comments (0)

Sign in to join the discussion

More Like This

📰
Agent Identity Standardisation Efforts
LessWrong · 1d ago
📰
Wikipedia's national flavors - French
LessWrong · 1d ago
I Bet Abliteration's Cost Was Sloppy Implementation. I Was Wrong
LessWrong · 1d ago
📰
Low-temperature bunk
LessWrong · 1d ago
Don't just aim for Frontier Labs
LessWrong · 1d ago
📰
Paying Kids To Do Schoolwork
LessWrong · 1d ago