🧩 Philosophy Jun 6, 2026 · peralice

Against Corrigibility

Less Wrong
View Channel →
Source ↗ 👁 2 💬 0
Epistemic status: don’t know whether I actually believe all of this, but I think it’s worth considering.A “corrigible” agent, per the LW wiki, is:…one that doesn’t interfere with what we would intuitively see as attempts to ’correct’ the agent, or ’correct’ our mistakes in building it; and permits these ’corrections’ despite the apparent instrumentally convergent reasoning saying otherwise.Most talk about corrigibility (henceforth without scarequotes) has focused on the fact that it seems diffic

Comments (0)

Sign in to join the discussion

More Like This

📰
Agent Identity Standardisation Efforts
LessWrong · 20h ago
📰
Wikipedia's national flavors - French
LessWrong · 21h ago
I Bet Abliteration's Cost Was Sloppy Implementation. I Was Wrong
LessWrong · 22h ago
📰
Low-temperature bunk
LessWrong · 1d ago
Don't just aim for Frontier Labs
LessWrong · 1d ago
📰
Paying Kids To Do Schoolwork
LessWrong · 1d ago