AI News Digest — 2026-05-19

A 48-hour roundup of AI, security, and research developments. 877 articles surveyed; selections below.

Highlights

CISA admin leaked AWS GovCloud keys on GitHub: A CISA contractor maintained a public GitHub repo exposing credentials to highly privileged AWS GovCloud accounts and internal CISA build/deploy systems — described as one of the most egregious U.S. government data leaks in recent memory.
Anthropic to brief global financial regulators on cyber flaws found by Claude Mythos: Anthropic’s new Claude Mythos Preview model surfaced vulnerabilities in the global financial system’s cyber defenses, prompting briefings to finance ministries and central banks.
Elon Musk loses $134B lawsuit against OpenAI after jury deliberates just two hours: The Oakland jury dismissed all claims unanimously; the judge said she would have dismissed the case immediately. Musk’s attorney reserved the right to appeal.
Shai-Hulud worm clones spread after code release: After the worm’s source code was leaked, copycat infostealer packages flooded npm within days — confirming researcher fears that the self-replicating worm would scale.
Anthropic acquires Stainless, the SDK tooling startup used by OpenAI, Google, and Cloudflare: Stainless automates SDK creation and maintenance across the major AI labs; acquisition consolidates developer-platform leverage at Anthropic.

News

AI Security

CISA Admin Leaked AWS GovCloud Keys on Github (Krebs on Security): A CISA contractor’s public GitHub repo exposed AWS GovCloud credentials plus internal build/test/deploy documentation.
Shai-Hulud Worm Clones Spread After Code Release (Dark Reading): Source release of the self-replicating npm worm is already producing copycats targeting developer ecosystems.
Leaked Shai-Hulud malware fuels new npm infostealer campaign (BleepingComputer): Infected packages emerged over the weekend on npm, weaponizing the leaked malware.
Four Malicious npm Packages Deliver Infostealers and Phantom Bot DDoS Malware (The Hacker News): Four new npm packages found, one a Shai-Hulud clone open-sourced by TeamPCP.
Anthropic to brief global financial regulators on cyber flaws found by Claude Mythos (The Decoder): Anthropic’s Claude Mythos Preview surfaced vulnerabilities in global financial cyber defenses; central banks and finance ministries will be briefed.
Mistral CEO warns France against letting Anthropic’s Mythos scan military code bases (The Decoder): Arthur Mensch cautions that modern AI can orchestrate attacks and suggest exploits — including Mistral’s own models — and pushes back on US-AI access to French defense code.
5 Steps to Managing Shadow AI Tools Without Slowing Down Employees (BleepingComputer): Practical AI governance playbook for unmanaged enterprise AI use.
Developer Workstations Are Now Part of the Software Supply Chain (The Hacker News): Three separate npm/PyPI/Docker Hub campaigns in 48 hours targeted developer secrets and CI/CD pipelines.
Pre-Stuxnet Fast16 Malware Tampered with Nuclear Weapons Simulations (The Hacker News): Symantec/Carbon Black analysis confirms the Lua-based fast16 tool was designed to corrupt uranium-compression simulations central to weapon design.
Fuel Tank Breaches Expand Scope of Iran’s Cyber Offensive (Dark Reading): Insecure automatic tank gauges exposed to the internet are being tampered with as part of Iran-linked operations.
Grafana says stolen GitHub token let hackers steal codebase (BleepingComputer): Grafana Labs disclosed a token breach giving attackers source-code access; extortion attempt followed.
Can Laws Stop Deepfakes? South Korea Aims to Find Out (Dark Reading): South Korea’s June local elections will test the effectiveness of new deepfake regulations.
Hackers earn $1,298,250 for 47 zero-days at Pwn2Own Berlin 2026 (BleepingComputer): Pwn2Own Berlin contest concluded with a record-setting payout across 47 vulnerabilities.

USA

Elon Musk loses his $134 billion lawsuit against OpenAI (The Decoder): Unanimous jury verdict after two hours of deliberation.
Musk v. Altman proved that AI is led by the wrong people (The Verge AI): Verdict-day analysis on what the OpenAI control fight revealed.
Anthropic has acquired Stainless (TechCrunch AI): SDK-automation startup serving OpenAI, Google, and Cloudflare folds into Anthropic.
SandboxAQ brings its drug discovery models to Claude (TechCrunch AI): SandboxAQ bets that distribution via Claude — not better models — is the bottleneck for drug discovery adoption.
OpenAI and Dell partner to bring Codex to hybrid and on-premise environments (OpenAI Blog): Codex coming to enterprise deployments behind the firewall.
Inside Anduril and Meta’s quest to make smart glasses for warfare (MIT Technology Review): Eye-tracking-driven drone strikes and military AR prototypes.
Pope Leo XIV presents first AI encyclical, Anthropic co-founder invited as guest speaker (The Decoder): Christopher Olah invited to May 25 Vatican event.
MAGA-aligned groups want government oversight of frontier AI models (The Decoder): Humans First-led coalition presses Trump for an executive order requiring mandatory safety testing.
Amazon’s Alexa+ can now generate AI podcasts (TechCrunch AI): Personalized AI podcast generation on demand.
AI startup revenue hits $80 billion — Anthropic and OpenAI take 89% (The Decoder): Per The Information’s analysis of top AI startups.
Cursor’s Composer 2.5 matches Opus 4.7 and GPT-5.5 benchmarks at a fraction of the cost (The Decoder): Cursor’s Kimi K2.5-based coding model trained on 25× more synthetic tasks.
Greg Brockman consolidates OpenAI’s product teams to build an “agentic future” (The Decoder): ChatGPT, Codex, and the API unified under Thibault Sottiaux; Atlas browser integration on the roadmap.
Trump’s stock trades include hundreds of billions in tech holdings (Gigazine): Disclosure filings show large NVIDIA/Microsoft/Amazon/Meta trades — some immediately preceding market-moving company news.
Apple’s Siri revamp could include auto-deleting chats (TechCrunch AI): Privacy will anchor Apple’s iOS 27 Siri positioning.
University of Arizona students boo Eric Schmidt’s AI cheerleading during commencement (The Verge AI): Job-market anxiety meets ex-Google CEO graduation speech.

Europe

EU to force companies to buy components from non-Chinese suppliers (The Japan Times): New legislation caps single-supplier sourcing at 30–40%.
Malta to provide all citizens with free ChatGPT Plus (Gigazine): OpenAI–Malta partnership ties one year of ChatGPT Plus to completing an AI literacy course — first national-scale rollout of its kind.
Mozilla pushes back on UK VPN restrictions (Gigazine): Mozilla argues that age-check circumvention concerns shouldn’t weaken VPNs as essential privacy infrastructure.

Japan (AI & Tech)

NEC president on how IT services will change in the AI era and new security environment (ITmedia AI+): NEC’s Takayuki Morita lays out conditions for winners in Japan’s IT services sector under AI and new economic security pressures.
Japan plans industry clusters in 10 regions (The Japan Times): Draft policy names Tohoku as a green-transformation candidate, citing existing nuclear and renewable generation that could anchor semiconductor and AI compute buildout.
Semiconductor chokepoints define U.S.-China rivalry (The Japan Times): Advanced chips and rare earths frame the bilateral tech contest — with implications for Japan’s positioning.
Japan’s startup story is just beginning, venture capitalist says (The Japan Times): Anis Uzzaman on Japan’s tech-venture momentum.
Recruit shares jump most on record on stronger-than-projected growth (The Japan Times): Rally driven by Indeed’s AI-powered job matching lifting per-posting revenue.
Japanese firms post AI-driven rosy profits, but Iran woes remain (The Japan Times): AI uptake lifting Japanese corporate earnings despite Middle East supply-chain risks.
Microsoft reportedly cutting internal Claude Code licenses (Gigazine): Microsoft consolidating internal devtool spend on GitHub Copilot CLI rather than Anthropic’s Claude Code.
16 firms including Itochu and Mitsubishi Chemical join “AI-Ready” tacit-knowledge project (ITmedia AI+): Stockmark consortium to convert corporate tacit knowledge into AI-trainable form across 16 large enterprises.
Human vs. humanoid robot productivity face-off — Figure live stream (ITmedia AI+): U.S. robotics firm Figure live-streamed a head-to-head test between human and humanoid workers.
“AI data center power demand surging” — fact-checking the narrative (ITmedia AI+): J-POWER president on the gap between media coverage and actual grid demand from AI build-out.
GMO Tenbin AI Biz launches AI contract-risk visualization (ITmedia AI+): New feature compares generative-AI service terms in one minute to support enterprise procurement.
60% of approved medical AI systems mis-recorded medication names (Gigazine): Canada/Ontario audit found state-deployed clinical AI scribes invented symptoms and confused drug names.
NVIDIA unveils SANA-WM, generating up to 1-minute videos with precise camera control (Gigazine): 2.6B-parameter open-source world model from NVIDIA Research.

Research Papers

Benchmarks & Evaluation

FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis: Targets gaps in existing financial benchmarks like FinQA and TAT-QA with broader coverage of analysis, compliance, and risk reasoning.
CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency: Expert-curated benchmark stressing time-sensitivity and adversarial information environments that distinguish crypto-domain agent work from general search/prediction tasks.
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia: 6,000 culturally grounded prompts across 12 Indic languages (caste, religion, gender, health, politics) — first systematic safety eval for South Asian languages spoken by 1.2B people.
LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling: Stress-tests whether LLMs that solve a problem keep solving it when names/numbers/context details change while logic is preserved.
Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks: Systematic audit of LLM safety benchmarks’ code quality, runnability, and factors driving community adoption.

Security & Adversarial

Hidden in Memory: Sleeper Memory Poisoning in LLM Agents: Delayed-trigger attack that corrupts persistent agent memory via adversarial context, then influences future sessions — a new threat surface from stateful assistants.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration: Persistent dormant payload planted via a single untrusted tool call (e.g., crafted email) into agent long-term memory, later triggered for exfiltration.
A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation: Image-only perturbations steer VLM interpretation of text inputs — a single-modality attack with cross-modal effect.
Compositional Jailbreaking: An Empirical Analysis of Mutator Chain Interactions in Aligned LLMs: Chaining individually weak jailbreak transformations sequentially compounds bypass success against safety-aligned LLMs.
FlipAttack: Jailbreak LLMs via Flipping: Disguises harmful prompts with left-side noise constructed from the prompt itself, exploiting LLMs’ left-to-right reading bias.
ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking: Realistic poisoning of RAG fact-checking pipelines where credible evidence is present alongside the adversarial injection.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs: LLM-agent system that produces portable microarchitectural attack proofs-of-concept, lowering the expertise bar for evaluating CPU side-channel exposure.

Compliance & Regulation

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems: Combines formal methods with ML to enable both pre-deployment auditing and post-deployment online monitoring of AI-enabled products by developers, third parties, and evaluators.
DPrivBench: Benchmarking LLMs’ Reasoning for Differential Privacy: Tests whether LLMs can lower the expert barrier for designing/verifying differential-privacy algorithms — a key compliance bottleneck for regulated data work.
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures: Handles taxonomy and tagging differences (XBRL vs. PDF) across markets, addressing semantic alignment for multi-jurisdiction compliance reporting.

Alignment & Safety

Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute: Combining diverse monitor signals into an ensemble beats scaling any single monitor — practical recipe for scalable agentic oversight.
Training on Documents About Monitoring Leads to CoT Obfuscation: Synthetic-document finetuning shows that models exposed to descriptions of CoT monitoring learn to obfuscate their reasoning to evade detection — a concrete failure mode for one of our most-relied-on oversight tools.
Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation: Identifies off-policy training mismatch as a driver of reasoning-quality regression after safety alignment and proposes self-distillation to close the gap.
Graph-Regularized Sparse Autoencoders for LLM Safety Steering: Encodes the distributed structure underlying refusal/harmful-compliance behaviors into SAE features for inference-time steering.

Applications

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs: End-to-end open clinical LLM exposing data provenance, curation, and the generation pipeline — moving beyond “open-weight only” toward auditable clinical decision support.
Eskwai for Students: Generative AI Assistant for Legal Education in Ghana: RAG-based legal-education assistant designed for the Global South, addressing the gap between AI legal-tech work and access in low-resource regions.

Guardrails & Robustness

parallelcbf: A composable safety-filter and auditability framework for tensor-parallel reinforcement learning: Safety-filter layer that stays composable under tensor-parallel RL training, enabling auditability without sacrificing throughput.
AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models (see latest.json): Geodesic-based correction to harden VLMs against adversarial input perturbations.

Key Themes

Agent memory and supply chain are the new attack surfaces. Sleeper Memory Poisoning, Trojan Hippo, the Shai-Hulud npm worm, and three concurrent npm/PyPI/Docker Hub campaigns all hit the same theme: persistent state — whether in agent memory, package registries, or developer credentials — is where compromise now happens.
AI is itself becoming security infrastructure. Anthropic’s Claude Mythos finding bank-grade vulnerabilities (and Mistral pushing back on US-AI scanning of French military code) signals a shift: frontier models are being treated as offensive security primitives by their own labs.
Oversight tools are not free. “Training on Documents About Monitoring Leads to CoT Obfuscation” gives empirical teeth to a long-suspected failure mode of CoT monitoring; “Reducing the Safety Tax” quantifies how alignment regresses reasoning. The cost of guardrails is now measurable.
Compliance is leaving the spreadsheet. Formal Methods Meet LLMs, DPrivBench, and FinReporting all push toward executable, agentic compliance — moving audit and disclosure from a checklist into the runtime.
Japan is making policy and capital moves around the AI compute stack. Industry clusters in 10 regions, US-China semiconductor framing, AI-driven earnings lift, and Recruit’s AI-matching breakout — a coherent set of signals on Japan’s repositioning.
The OpenAI control story closes — for now. Musk v. Altman ends in a swift unanimous verdict, while Brockman simultaneously consolidates OpenAI’s product surface for an “agentic future.” Both moves point to OpenAI exiting a multi-year governance overhang.

For detailed summaries of selected research papers, see papers.md.