Tag: Anthropic
All the articles with the tag "Anthropic".
-
Humans as Weak Supervisors: What AAR Reveals About Alignment
Anthropic's AAR project is ostensibly about autonomous AI research. Viewed differently, it's a meta-validation of weak-to-strong alignment: humans as weak supervisors, wielding evaluation environment design as their last leverage point over models that exceed their own capabilities.
-
Brain ≠ Hands: Dissecting Anthropic's Managed Agents Architecture
Anthropic published their Managed Agents architecture. Under the hood, the real insight isn't the three-layer split — it's two counterintuitive decouplings: Session ≠ Context, and Tool execution doesn't live next to the Agent.