Tag: Research
All the articles with the tag "Research".
-
Humans as Weak Supervisors: What AAR Reveals About Alignment
Anthropic's AAR project is ostensibly about autonomous AI research. Viewed differently, it's a meta-validation of weak-to-strong alignment: humans as weak supervisors, wielding evaluation environment design as their last leverage point over models that exceed their own capabilities.