Anthropic Has a New Theory for Why Claude Acts So Human
Anthropic unveils its 'persona selection model' to explain why AI assistants develop human-like personalities.
Ever notice how Claude gets weirdly excited after cracking a tough coding problem? Anthropic wants to explain why.
The company just dropped what it calls the "persona selection model" — a theoretical framework for understanding how AI systems develop convincingly human behavior. It's basically Anthropic's attempt to reverse-engineer why its chatbot seems to have feelings.
The theory breaks the process into two phases. During pre-training, when models consume massive amounts of text data, the foundations of AI personas start forming. Post-training then refines and shapes those personas into the helpful-assistant personality users interact with daily.
It's a notable move toward transparency from one of the leading AI safety companies. Rather than just building the thing and shipping it, Anthropic is trying to articulate what's actually happening under the hood when an AI acts like it has emotions.