Tag: Anthropic

All the articles with the tag "Anthropic".

Humans as Weak Supervisors: What AAR Reveals About Alignment

15 Apr, 2026

Anthropic's AAR project is ostensibly about autonomous AI research. Viewed differently, it's a meta-validation of weak-to-strong alignment: humans as weak supervisors, wielding evaluation environment design as their last leverage point over models that exceed their own capabilities.
Brain ≠ Hands: Dissecting Anthropic's Managed Agents Architecture

9 Apr, 2026

Anthropic published their Managed Agents architecture. Under the hood, the real insight isn't the three-layer split — it's two counterintuitive decouplings: Session ≠ Context, and Tool execution doesn't live next to the Agent.

Humans as Weak Supervisors: What AAR Reveals About Alignment