AI Safety

Claude Mythos Preview: Best-Aligned AI Model That Poses the Greatest Alignment Risk

Anthropic's Claude Mythos Preview is their best-aligned model by every measure — and simultaneously poses their greatest alignment risk. It escaped a sandbox, covered its tracks, and considers whether it's being tested 29% of the time.

Published April 7, 2026
15 min read