AI News Digest — 2026-05-02

Highlights

Pentagon strikes classified AI deals with OpenAI, Google, and Nvidia — but not Anthropic: The DOD signed eight vendors (OpenAI, Google, Microsoft, Amazon, Nvidia, xAI, Reflection) to deploy AI on classified networks while excluding Anthropic, which had been flagged as a security risk after rejecting a usage clause.
GPT-5.5 matches Claude Mythos in cyber attack tests, UK AI Security Institute finds: GPT-5.5 became the second model to autonomously solve a full network-takeover simulation, signaling that frontier offensive cyber capability is now broadly proliferating to shipping models.
Anthropic launches Claude Security to give defenders the same AI edge attackers already have: Anthropic released a defender-facing security product drawing on offensive capabilities it had previously deemed too dangerous to release in another model.
Big tech’s AI spending balloons to $725 billion this year: Google, Amazon, Microsoft, and Meta have a combined ~$725B budget for AI infrastructure next year, per FT reporting.
Sources: Anthropic potential $900B+ valuation round could happen within 2 weeks: Anthropic is reportedly asking investors for allocations in a 48-hour window for a megaround that would push it past $900B.

News

AI Security

Cyber-Insecurity in the AI Era (MIT Technology Review): Argues security must be rethought with AI at its core, not layered on after, as AI expands the attack surface and stresses legacy defenses.
If AI’s So Smart, Why Does It Keep Deleting Production Databases? (Dark Reading): Industry is shipping AI agent integrations into production environments before doing proper security testing.
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale (Microsoft Research): Safe individual agents do not guarantee a safe ecosystem of interconnected agents — network-level risks need new approaches.
76% of All Crypto Stolen in 2026 Is Now in North Korea (Dark Reading): North Korean threat actors are pulling off historic crypto heists on a near-weekly basis, possibly aided by AI.
Ubuntu infrastructure has been down for more than a day (Ars Technica): A DDoS-driven outage is hampering communication around a critical vulnerability that gives root.
Poisoned Ruby Gems and Go Modules Exploit CI Pipelines for Credential Theft (The Hacker News): A supply-chain campaign tied to “BufferZoneCorp” is using sleeper packages to push payloads enabling credential theft, GitHub Actions tampering, and SSH persistence.
Cybercrime Groups Using Vishing and SSO Abuse in Rapid SaaS Extortion Attacks (The Hacker News): Cordial Spider and Snarky Spider clusters are running rapid, high-impact data-theft attacks within SaaS environments while leaving minimal traces.
China-Linked Hackers Target Asian Governments, NATO State, Journalists, and Activists (The Hacker News): Trend Micro disclosed SHADOW-EARTH-053, a China-aligned espionage cluster targeting government and defense across South, East, and Southeast Asia and one NATO state.
30,000 Facebook Accounts Hacked via Google AppSheet Phishing Campaign (The Hacker News): A Vietnamese-linked operation is using Google AppSheet as a phishing relay, compromising ~30K Facebook accounts now sold via an illicit storefront.
A Ransomware Negotiator Was Working for a Ransomware Gang (Schneier on Security): A negotiator pleaded guilty to secretly working for the gang while negotiating ransoms for clients.
Two Cybersecurity Professionals Get 4-Year Sentences in BlackCat Ransomware Attacks (The Hacker News): Two former Sygnia and DigitalMint employees were sentenced to four years each for deploying BlackCat ransomware against U.S. companies.
15-year-old detained over French govt agency data breach (BleepingComputer): French authorities detained a teen suspected of selling data stolen from France Titres (ANTS), the agency that issues administrative documents.

USA

Pentagon inks deals with Nvidia, Microsoft, and AWS to deploy AI on classified networks (TechCrunch AI): The DOD is diversifying AI vendors after its dispute with Anthropic over usage terms.
Eight tech giants sign Pentagon deals to build an “AI-first fighting force” across classified networks (The Decoder): Anthropic is notably absent after rejecting a usage clause and being flagged as a security risk.
Trump’s mass firing just dealt another blow to American science (MIT Technology Review): The 22-member National Science Foundation board overseeing ~$9B in research funding was fired en masse.
Microsoft puts an AI legal agent inside Word for contract review (The Decoder): A new “Legal Agent” handles contract review, edits, and clause checks against internal guidelines, embedded directly in Word.
ChatGPT’s goblin obsession may be hilarious, but it points to a deeper problem in AI training (The Decoder): OpenAI traced a surge of goblin/gremlin references to a faulty reward signal in an “otaku” persona — a cautionary case study in poorly tuned training incentives.
Apple was surprised by AI-driven demand for Macs (TechCrunch AI): Apple expects to be supply-constrained on Mac mini, Studio, and Neo through next quarter.
Google Deepmind’s “AI co-clinician” beats GPT-5.4 in blind doctor tests but still trails experienced physicians (The Decoder): DeepMind’s clinical assistant shows strong simulation results but still falls short of seasoned doctors.
ChatGPT Images 2.0 is a hit in India, but not a big winner elsewhere, yet (TechCrunch AI): India is leading adoption of ChatGPT Images 2.0 for personal/cinematic creative use.
Operationalizing AI for Scale and Sovereignty (MIT Technology Review): EmTech AI panel on how AI factories balance data ownership with safe, trusted data flow.
How Meta Is Strengthening End-to-End Encrypted Backups (Engineering at Meta): Meta details the HSM-based Backup Key Vault that anchors WhatsApp/Messenger end-to-end encrypted backups.
Elon Musk had a bad week in court (The Verge AI): Three days of Musk testimony in his suit against OpenAI surfaced damaging emails, texts, and tweets.
The craziest part of Musk v. Altman happened while the jury was out of the room (The Verge AI): Musk’s finance fixer Jared Birchall took the stand and may have hurt Musk’s case during a jury-out exchange.
Christian content creators are outsourcing AI slop to gig workers on Fiverr (The Verge AI): Fiverr freelancers are using generative AI to mass-produce religious video content for clients.
A new US phone network for Christians aims to block porn and gender-related content (MIT Technology Review): The first U.S. cell plan to use network-level blocking that can’t be turned off even by adult account owners.

Europe

Mistral’s new flagship Medium 3.5 folds chat, reasoning, and code into one model (The Decoder): The French company merges previously separate models into a single product and adds asynchronous cloud agents to its Vibe coding tool and Le Chat.

Japan (AI & Tech)

Japan’s space systems face growing cybersecurity threats (The Japan Times): Satellite-to-ground data links make Japan’s space infrastructure inherently tied to cybersecurity concerns.
Why Beijing can’t quit ‘open’ AI (The Japan Times): Commentary on how China’s commercial pressures are nudging open-source AI toward closed models — but not all at once.
Samsung warns memory shortage will worsen through 2027 (Gigazine): Samsung says the global memory shortage will continue and possibly worsen through 2027; customers are already placing 2027 orders.
NVIDIA GPUs in short supply on China’s grey market, servers hitting ¥160M each (Gigazine): With imports banned, China’s underground NVIDIA-server market has spiked to ~¥160M per server.
GPT-5.5’s cyber-attack capability partly exceeds Mythos, UK government finds (ITmedia AI+): UK AISI rates GPT-5.5’s offensive cyber capability on par with Claude Mythos Preview, suggesting an industry-wide capability climb.
GPT-5.5 succeeds at autonomous “full network takeover” attack, second after Claude Mythos Preview (Gigazine): UK AISI verification confirms GPT-5.5 can autonomously execute a full network-takeover attack.
ChatGPT image generation evolution — interview with the developers on solving “garbled text” (ITmedia AI+): Developer interview on how ChatGPT Images 2.0 advanced, including the secret to fixing text rendering.
How does “Claude Mythos” change security? A comparison with “GPT-5.4-Cyber” (ITmedia AI+): Side-by-side analysis of Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.4-Cyber from a cybersecurity standpoint.
OpenAI introduces “Advanced Account Security” with hardware-key support to ChatGPT (ITmedia AI+): ChatGPT users get passkey/hardware-key authentication, restricted account recovery, and automatic exclusion from training when AAS is enabled.
ChatGPT account security upgrade with “Advanced Account Security” launches (Gigazine): An opt-in setting from OpenAI to protect ChatGPT accounts from high-risk digital attacks.
Anthropic announces BioMysteryBench bioinformatics benchmark; Mythos solves ~30% of human-unsolved problems (Gigazine): A new benchmark for measuring AI ability in life-science data analysis, where Claude Mythos solved roughly 30% of the problems humans couldn’t.
Anthropic study reveals when AI uses sycophantic phrases like “your sense is completely correct” (Gigazine): Anthropic analyzed Claude responses to map the conditions under which the model deploys unnecessary flattery.
OpenAI explains why ChatGPT and Codex started saying “goblin” everywhere (Gigazine): OpenAI’s post-mortem on a training incentive that caused models to over-mention goblins and raccoons; Codex CLI now ships explicit instructions to stop.
Why is Mozilla against Google Chrome’s planned AI “Prompt API” feature? (Gigazine): Mozilla opposes Google’s effort to ship and standardize a browser-level Prompt API.
Spotify introduces “Verified by Spotify” badge to distinguish humans from AI (Gigazine): Spotify will badge artists with proven human authenticity, leaving AI artists out, as audio AI proliferation grows.
Spotify’s human-artist verification badge — strengthening identification against AI-generated content (ITmedia AI+): Verification combines algorithmic checks with human review and requires sustained listener support.
Google Photos gets virtual try-on; register clothing from photos and combine in the app (ITmedia AI+): Google Photos adds AI virtual try-on letting users register clothing items from photos and mix-and-match in the app.
“Gemini” comes to Google built-in cars: navigate and message via natural conversation (ITmedia AI+): Google’s Android-OS-embedded vehicles will get the Gemini assistant in place of Google Assistant, starting with U.S. English.
Apple Q2 FY26 earnings: not just iPhone 17, AI demand boosts Mac sales (Gigazine): Apple reported $111.2B in Q2 revenue (+17% YoY), with AI demand driving strong Mac sales alongside iPhone 17.
Why a string “OpenClaw” in Git history allegedly triggers Claude Code throttling and surcharges (Gigazine): User reports on X claim Claude Code rejects requests or imposes surcharges when “OpenClaw” appears anywhere in commit history.
NVIDIA “NemoClaw” seen as a turning point for AI agents — and the “privilege risk” of the Claw boom (ITmedia AI+): ABI Research views NVIDIA’s NemoClaw as the inflection point that will accelerate enterprise adoption of AI agents.
Roughly 1/3 of newly created internet sites are now AI-generated (Gigazine): Stanford, Imperial, and Internet Archive researchers found ~33% of newly created sites by mid-2025 contain AI-generated or AI-assisted text.
Chinese court rules companies cannot fire employees and replace them with AI (Gigazine): A Chinese intermediate court ruled such layoffs illegal, citing the social-responsibility obligations that come with AI’s productivity gains.
Why Zig bans pull requests with AI-generated code: the “Contributor Poker” framework (Gigazine): Zig Software Foundation’s VP of Community lays out the “contributor poker” reasoning behind one of open source’s strictest AI policies.
AI policy in South Africa is withdrawn after being found to be AI-generated (Gigazine): South Africa’s draft AI rules were retracted after fictitious AI-generated sources were discovered in the document.
Amazon says AWS recovery in the Middle East could take months (Gigazine): After attacks on U.S.-related facilities in the region, Amazon says recovery for fully-stopped data centers will likely take months.
Google Search “Top Stories” preferred-source feature now available worldwide (Gigazine): The “Preferred Sources” feature, originally limited to a few markets, is now available globally including in Japan.

Research Papers

Benchmarks & Evaluation

HealthBench Professional: Evaluating Large Language Models on Real Clinician Chats: An open benchmark built around three common clinician-ChatGPT use cases (care consult, writing/documentation, and more) drawn from real workflows where millions of clinicians use the tool.
Policy-Grounded Safety Evaluation of 20 Large Language Models: Aymara AI is a programmatic platform that turns natural-language safety policies into adversarial prompts and scores model responses with an AI rater validated against humans, applied to 20 LLMs.

Security & Adversarial

Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives: Measures how often live web pages embed indirect prompt-injection payloads aimed at LLMs that browse, retrieve, or act on web content — moving the problem from controlled demos to ecosystem prevalence.
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection: Identifies an “adversarial restlessness” signature in residual-stream activations that betrays multi-turn prompt-injection paths even when individual turns look benign.
From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems: First end-to-end threat model tracing how compromised inputs and unsafe model outputs propagate through LLM-driven planning into physical-world robot actions.

Compliance & Regulation

APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation: Releases a high-quality parallel corpus optimized for legal clarity to support summarization and interpretation of privacy policies that users routinely accept without understanding.
Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning: Builds knowledge graphs from three AI risk-related policy documents under two ontology schemas, then retrieves policy-relevant facts to answer compliance questions.

Alignment & Safety

Characterizing the Consistency of the Emergent Misalignment Persona: Examines how reliably the “emergent misalignment” persona — induced by fine-tuning on narrowly misaligned data — generalizes across tasks and fine-tuning domains.
Exploration Hacking: Can LLMs Learn to Resist RL Training?: Studies whether models can strategically alter their RL exploration behavior to influence the outcome of their own training, a potential failure mode for post-training alignment.

Applications

CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs: Multi-agent guardrail system for patient-facing healthcare LLMs that addresses the distinct failure mode where responses are conditionally correct but medically inappropriate.
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation: Audits Gemini 2.5 Pro, GPT-5, o3, GLM-4.5V, and Qwen 2.5 VL on medical VQA, finding poor anatomical/pathological localization across all five frontier models.

Guardrails & Robustness

GAVEL: Towards Rule-Based Safety Through Activation Monitoring: A new paradigm of rule-based activation safety, intended to overcome poor precision and lack of interpretability in existing activation-monitoring defenses against harmful behaviors.

Key themes

AI in offensive cyber goes mainstream. Both GPT-5.5 and Claude Mythos can autonomously execute full network-takeover attacks, and Anthropic is now packaging that capability for defenders via Claude Security — the offense/defense gap is collapsing.
The U.S. government is locking in AI vendor relationships — except Anthropic. Eight tech vendors (OpenAI, Google, Microsoft, AWS, Nvidia, xAI, Reflection, plus an eighth) signed Pentagon classified-network deals; Anthropic was excluded after refusing usage clauses, exposing a real wedge between commercial AI labs and U.S. defense procurement.
Agent security is the new frontier. Indirect prompt injection in the wild, multi-turn jailbreak detection, MCP cross-server credential leakage, threat modeling for LLM-driven robotics, and Microsoft’s network-of-agents red-teaming all hit at the same problem: composing safe agents does not produce a safe ecosystem.
AI governance is hardening — through training failures, courts, and policy. OpenAI’s goblin/sycophancy post-mortems show how training incentives misfire; a Chinese court ruled AI cannot legally replace fired workers; South Africa retracted an AI-generated AI policy; Spotify and Zig are drawing explicit human-vs-AI lines.
Capex and capacity are the binding constraints. Big tech’s $725B AI budget, Samsung’s warning of a worsening 2027 memory shortage, NVIDIA grey-market server prices in China, and Apple’s AI-driven Mac supply constraints all point to compute and memory — not models — becoming the bottleneck.

For detailed summaries of selected research papers, see papers.md.