AI News Digest — March 11, 2026

Highlights


News

AI Security

USA

Europe

Japan (AI & Tech)


Research Papers

Benchmarks & Evaluation

Security & Adversarial

Compliance & Regulation

Alignment & Safety

Applications


Key Themes

Agentic AI as a security attack surface. The McKinsey breach and the Perplexity Comet phishing demonstration both underscore that AI agents introduce new, high-speed attack surfaces. Offensive agents can exploit legacy vulnerabilities before human defenders respond, and the agents’ own reasoning tendencies can be weaponized against them.

Prompt injection hardening becomes a priority. Multiple items this cycle — OpenAI’s IH-Challenge dataset, OpenAI’s agent design guidance, and research on PRECEPT and TrustBench — signal that prompt injection defense is transitioning from a research problem to an engineering discipline with dedicated tooling and training data.

Benchmark validity under scrutiny. Two independent results — METR’s finding that half of SWE-bench-passing AI code would be rejected by maintainers, and EsoLang-Bench’s challenge to near-ceiling scores via esoteric languages — both point to a maturation crisis in how the field measures AI capability.

Anthropic’s Pentagon conflict reshapes AI governance. The coalition of AI competitors, civil society groups, and former military leaders backing Anthropic against the DoD blacklist is unusual and suggests the case is being viewed as precedent-setting for how governments can constrain AI companies.

Geopolitical instability ripples through tech supply chains. Iran’s mining of the Strait of Hormuz, Japan’s oil reserve release, and the Middle East conflict’s impact on cloud resilience all converge on a single theme: critical digital and physical infrastructure is increasingly exposed to state-level aggression, with Japan’s oil-dependent semiconductor supply chain at particular risk.

AI-driven displacement is measurable and accelerating. China’s “dark factory” expansion is producing documented wage compression and employment loss, while the AI coding boom (Replit’s $9B valuation, Rakuten’s Codex deployment) is beginning to show up in real productivity metrics — setting the stage for the workforce disruption dynamic the macro-financial stress test paper formalizes.


For detailed summaries of selected research papers, see papers.md.