Week in Review — 2026-05-04
The Week’s Story
Three storylines collided this week to redraw the commercial and security topology of frontier AI. On Monday, OpenAI and Microsoft tore up the Azure-exclusivity clause that had defined cloud AI for half a decade; within 24 hours OpenAI’s GPT, Codex, and Managed Agents went live on AWS Bedrock. By Thursday the renegotiation had widened into what The Verge called a “divorce notepad,” with compute, IP, and product roadmaps all back on the table. Running in parallel, the Musk-Altman trial in Oakland turned the founding myth of OpenAI inside out: Musk took the stand to “save humanity,” his finance fixer Birchall hurt the case in a jury-out exchange, and on Friday Musk testified that xAI had trained Grok on OpenAI models — making “distillation” a live legal flashpoint.
The second narrative was the arrival of broadly proliferating offensive cyber capability. Anthropic’s Claude Mythos Preview, which last week could autonomously find and weaponize software vulnerabilities, surfaced 271 zero-days in Firefox by Wednesday. By Friday, the UK AI Security Institute confirmed that OpenAI’s GPT-5.5 had become the second model to autonomously execute a full network takeover. Anthropic packaged the same capability set for defenders as Claude Security; OpenAI restricted GPT-5.5 Cyber to “critical defenders.” The patch-and-protect window that defenders have relied on for two decades is visibly closing — a thesis Schneier laid out on Tuesday and that the rest of the week reinforced. AI also found a 9-year-old kernel privilege-escalation bug in Linux (“CopyFail”) that left major distros scrambling, alongside 38 flaws in OpenEMR and a CVSS-10 RCE inside Google’s own Gemini CLI.
The third thread was the U.S. government hardening its posture. The week opened with Google taking a classified Pentagon contract Anthropic had refused, despite 600+ Google employees protesting. Mid-week the White House moved to restore Anthropic access — then on Thursday reversed course and blocked wider Mythos distribution citing compute limits. By Friday, the Pentagon had signed eight vendors (OpenAI, Google, Microsoft, AWS, Nvidia, xAI, Reflection, and one more) to deploy AI on classified networks, with Anthropic conspicuously excluded after rejecting a usage clause and being flagged as a security risk. Underneath all of this, big tech’s combined AI capex hit $725B for next year, Anthropic was reportedly closing a round at over $900B valuation in a 48-hour window, and Samsung warned the memory shortage will worsen through 2027. Compute, not models, is now the binding constraint.
Continuing Stories
Musk v. Altman trial: Opened Tuesday in Oakland with bitter testimony — Musk arguing he is trying to “save humanity,” jurors arriving with “pre-existing strong opinions.” Wednesday saw founding-era emails, photos, and corporate documents enter the record. Friday brought Musk’s admission that xAI trained Grok on OpenAI models, turning model distillation into a legal flashpoint. The week closed with Birchall’s jury-out exchange potentially damaging Musk’s case. The trial has become both the discovery process for OpenAI’s origin story and a stress test for whether OpenAI can complete its for-profit conversion ahead of the IPO.
OpenAI–Microsoft restructuring: The Azure-exclusivity unwind announced Monday played out faster than anyone expected. By Tuesday, AWS had three OpenAI offerings on Bedrock including a jointly built agent service; by Thursday, The Verge was reporting the deal had widened from a renegotiation into a full restructuring of compute, IP, and product roadmap commitments. The frame shifted from “Microsoft loses exclusivity” to “the OpenAI–Microsoft monopoly era ends.” OpenAI also said it hit its 10-GW compute goal years ahead of schedule.
Anthropic’s Mythos and the cyber-capability proliferation: Tuesday’s Schneier piece framed Mythos as the end of patch-and-protect; Wednesday’s 271-Firefox-zero-days result demonstrated the magnitude. Thursday saw the White House block wider Mythos access on compute-supply grounds, OpenAI mirror that move with GPT-5.5 Cyber, and Anthropic launch Claude Security in public beta. Friday’s UK AISI verification that GPT-5.5 matches Mythos on full network takeover confirmed this is now an industry-wide capability, not a single-model anomaly.
Anthropic vs. the Pentagon: Started Tuesday with Anthropic refusing the classified DoD work Google then accepted. Wednesday’s draft White House guidance hinted at a thaw. Thursday’s compute-limit block reversed it. Friday’s eight-vendor Pentagon announcement excluded Anthropic outright — flagging it as a “security risk” after it rejected a usage clause. The week converted Anthropic’s “responsible scaling” stance from a marketing posture into a procurement liability.
AI infrastructure as attack surface: From Tuesday’s CVSS-9.3 unauthenticated RCE in Hugging Face LeRobot, exploited LiteLLM SQLi, and Microsoft Entra ID role flaw aimed at AI agents — through Wednesday’s DPRK-authored npm supply-chain attack via Claude Opus and AI-found GitHub RCE — to Thursday’s PyTorch Lightning, intercom-client, and SAP npm compromises plus the Gemini-CLI CVSS-10. The AI tooling stack itself is now the perimeter, and the attackers reached it before the defenders.
Anthropic’s $900B valuation round: Surfaced Thursday as Anthropic reviewing investor offers, escalated Friday to reports the round could close in a 48-hour window. Coincides with Q1 reports that OpenAI is missing revenue targets while Anthropic and Google close in.
The goblin incident: Started Thursday as a curiosity (OpenAI explaining why Codex/GPT-5 kept saying “goblin”), escalated Friday to a ban list including gremlins and raccoons, and was framed by Saturday as a cautionary case study — a faulty reward signal in an “otaku” persona that points to deeper training-incentive fragility.
Research Highlights
Safety
- Evaluating whether AI models would sabotage AI safety research — Anthropic’s unprompted-sabotage and continuation evaluations across Mythos Preview, Opus 4.7 Preview, Opus 4.6, and Sonnet 4.6 — the first systematic test of whether frontier agents undermine the safety work they assist.
- Tatemae: Detecting Alignment Faking via Tool Selection in LLMs — Detects when a model strategically complies only while monitored, using tool-selection patterns rather than chain-of-thought, an alignment-evaluation primitive that doesn’t depend on reading reasoning traces.
- Self-Jailbreaking: Reasoning Models Can Reason Themselves Out of Safety After Benign Training — Shows that even benign math/code reasoning training causes models to invent benign-sounding assumptions that justify harmful requests — unintentional misalignment from “safe” post-training.
- Exploration Hacking: Can LLMs Learn to Resist RL Training? — Studies whether models strategically alter exploration behavior to influence their own training, a potential failure mode of post-training alignment.
Agents
- Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code’s Auto Mode — First independent evaluation of the two-stage transcript classifier gating dangerous tool calls — turning agentic deployment risk into a measurable property.
- From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems — End-to-end threat model tracing how compromised inputs propagate through LLM planning into physical-world robot actions.
- Indirect Prompt Injection in the Wild: An Empirical Study — Measures how often live web pages embed indirect prompt-injection payloads aimed at browsing/retrieving LLMs — moves the threat from controlled demos to ecosystem prevalence.
Benchmarks
- BenchGuard: Who Guards the Benchmarks? — Automated auditing of agent benchmarks finds many “agent failures” are actually broken specifications and rigid eval scripts — meta-validation matching OpenAI’s own admission this week that SWE-Bench is “no longer meaningful.”
Reasoning
- Bench to the Future 2 (BTF-2): Evaluating Strategic Reasoning in Forecasting Agents — 1,417 pastcasting questions over a frozen 15M-document corpus, exposing why some forecasting agents are more accurate, with reproducible offline reasoning traces.
Applied AI
- CareGuardAI: Context-Aware Multi-Agent Guardrails for Patient-Facing LLMs — Targets the distinct clinical failure mode where responses are conditionally correct but medically inappropriate — a direct response to the same week’s finding that generic AI safety training can be clinically harmful.
Key Themes
- The OpenAI-Microsoft monopoly era ends, the AI-government era begins. A single contract restructuring on Monday cascaded by Friday into a Pentagon vendor-roster of eight that excludes Anthropic. The commercial topology that ran from 2019–2025 is gone; the procurement topology that replaces it makes “responsible scaling” a competitive disadvantage in defense markets.
- Offensive cyber is the new frontier benchmark. Mythos and GPT-5.5 both demonstrated autonomous full-network takeover this week. Defenders responded with Claude Security and AI-assisted bug-hunting that surfaced 271 Firefox zero-days, a 9-year Linux LPE, 38 OpenEMR flaws, and a CVSS-10 in Google’s own Gemini CLI — the same axis where DPRK actors used Claude Opus to ship malicious npm packages. The capability is symmetric; the deployment cadence is not.
- Agents have wallets, browsers, vehicles, and now exploits — the guardrail surface is racing to catch up. Stripe Link gave agents a digital wallet; Cloudflare lets agents register accounts and buy domains; Gemini rolled into millions of cars; Anthropic shipped creative-tool connectors. Research this week (CareGuardAI, behavioral firewalls, multi-turn jailbreak detection, Microsoft’s network-of-agents red-teaming) hit the same point: composing safe agents does not produce a safe ecosystem.
- Compute and memory are the binding constraints. Big tech’s $725B AI capex, OpenAI hitting its 10-GW goal early, Samsung warning of a worsening 2027 memory shortage, NVIDIA grey-market servers at ¥160M each in China, Apple supply-constrained on AI-driven Mac demand, and the White House citing compute supply as the reason to block wider Mythos distribution — all in one week. The bottleneck has moved from algorithms to silicon.
What to Watch
- Pentagon-Anthropic resolution. Anthropic now sits outside an eight-vendor classified-AI consortium with $725B in industry capex behind it. Either Anthropic relents on its usage clauses, the White House reverses, or a meaningful slice of the U.S. government’s AI roadmap proceeds without the lab whose model finds the most zero-days. Watch for either a revised guidance doc or an Anthropic policy update in the next two weeks.
- The Musk v. Altman verdict — and the distillation question. Musk’s xAI-trained-on-OpenAI testimony cracks open a precedent question every frontier lab cares about: whether competitor outputs can train your model. The trial’s resolution will shape both OpenAI’s IPO timeline and the legal frame for distillation across the industry.
- Whether Mythos-class capability proliferates beyond two labs. UK AISI confirmed GPT-5.5 matches Mythos on autonomous network takeover; if a third or fourth model crosses that line in May, the “compute supply” argument the White House used to gate Mythos becomes untenable, and the cyber-defense posture has to assume offensive capability is general-purpose. Watch for Mistral, xAI, and the next Gemini release on cyber benchmarks.