AI News Digest — April 22, 2026
Highlights
- Amazon pours $33B into Anthropic, which promises to spend $100B right back on AWS: Amazon is pouring up to $25B more into Anthropic in a deal that commits the AI company to spend over $100B on AWS over the next decade, highlighting the circular capital flows now defining the AI industry.
- ChatGPT Images 2.0 adds reasoning and web search to image generation: OpenAI’s new image model can generate up to eight consistent images from a single prompt and significantly improves text rendering, including non-Latin scripts, by integrating web search at generation time.
- Google fixes prompt injection RCE flaw in Antigravity agentic IDE: A sanitization failure in Antigravity’s
find_by_nametool allowed attackers to combine file-creation with prompt injection to escape the sandbox and execute arbitrary code—the first major RCE in a mainstream agentic IDE. - Jeff Bezos nears $10 billion funding round for AI lab “Project Prometheus”: Bezos is close to closing the largest single funding round for a new AI lab, signaling continued mega-scale investment even as AI skepticism grows in the broader public.
- AI backlash is building ahead of US midterm elections: Despite widespread public anxiety about AI—from data center resistance to social-media rage at AI executives—campaigns are largely avoiding the topic, creating an emerging political fault line.
News
AI Security
- Google Patches Antigravity IDE Prompt Injection Flaw (The Hacker News): Researchers disclosed how Antigravity’s permitted file-creation combined with an unsanitized
find_by_namecall allowed sandbox escape and arbitrary code execution; Google patched it promptly. - Exploits Turn Windows Defender into Attacker Tool (Dark Reading): Three proof-of-concept exploits—two still unpatched—are actively abusing Microsoft’s built-in security platform to execute attacks from within.
- Sam Altman Calls Anthropic’s Mythos Cybersecurity Model “Fear-Based Marketing” (TechCrunch): OpenAI’s CEO publicly dismissed Anthropic’s specialized cyber model, escalating rivalry around whether purpose-built security AI delivers real value.
- “Uncensored” AI Models Found to Still Apply Filtering (Gigazine): Morgin.ai’s research shows that third-party “uncensored” derivatives of models like Gemma still constrain outputs at pretraining level, making the advertised uncensoring claims misleading.
USA
- Amazon Pours $33B into Anthropic, Commits to $100B on AWS (The Decoder): The deal is designed to relieve Anthropic’s compute capacity crunch and deepens Amazon’s lead as the company’s primary cloud infrastructure provider.
- Jeff Bezos Nears $10B Funding for “Project Prometheus” AI Lab (The Decoder): The Financial Times reports Bezos is finalizing what would be the largest funding round ever for a brand-new AI research lab.
- ChatGPT Images 2.0 Launches with Reasoning and Web Search (The Verge): The updated model follows instructions more precisely, maintains consistency across up to eight images, and can pull real-world information from the web during generation.
- Google Launches Deep Research and Deep Research Max Agents (The Decoder): Built on Gemini 3.1 Pro, the new agents support MCP-connected financial and proprietary data sources, marking Google’s first foray into pluggable enterprise research agents.
- AI Backlash Building Ahead of US Midterm Elections (The Verge): Community resistance to data centers, social-media anger at AI executives, and job-loss fears are converging into a political force that campaigns are only beginning to grapple with.
- YouTube Expands AI Deepfake Detection to Celebrities (TechCrunch): The platform’s likeness-detection tool now lets celebrities and their representatives find and request removal of AI-generated deepfake content.
- FTC Settlement: Clarifai Deletes 3M OkCupid Photos Used to Train Facial Recognition AI (TechCrunch): Court documents reveal OkCupid—whose executives had invested in Clarifai—shared user photos without consent back in 2014; the deletion follows regulatory action.
- John Ternus to Succeed Tim Cook as Apple CEO; AI Is the First Major Test (The Verge): Hardware chief Ternus inherits an Apple that has publicly stumbled on AI, with Siri still lagging rivals nearly a year after WWDC criticism.
- NeoCognition Raises $40M Seed to Build Agents That Learn Like Humans (TechCrunch): The OSU-founded startup is developing AI agents capable of becoming domain experts through experience rather than static training.
- Scattered Spider Member “Tylerb” Pleads Guilty to Wire Fraud and Identity Theft (Krebs on Security): Tyler Robert Buchanan admitted to a 2022 SMS phishing campaign that compromised more than a dozen major tech companies and stole tens of millions in cryptocurrency.
- Former Ransomware Negotiator Pleads Guilty to Aiding BlackCat Attacks (BleepingComputer): Angelo Martino, formerly of cybersecurity firm DigitalMint, secretly assisted the BlackCat gang while ostensibly negotiating on behalf of victims.
Europe
- Anthropic Building First Data Center Team Outside the US (The Decoder): Job listings spotted in Europe and Australia signal Anthropic’s expansion beyond US compute infrastructure as capacity pressure intensifies.
- UK Regulator Investigates Telegram Over CSAM Sharing Concerns (BleepingComputer): Ofcom has launched a formal investigation into Telegram based on evidence it is being used to distribute child sexual abuse material.
Japan (AI & Tech)
- 自動車業界向けローカル生成AIシステム、機密設計ナレッジを安全に活用 (Monoist/ITmedia): トリプルアイズとBEXが共同開発した、外部ネットワーク非接続で利用可能な自動車設計業務向けローカルAIシステムが発表された。
- ニコニコ動画、AIがコメントする実験機能を追加 (ITmedia AI+): ドワンゴがAI自動コメント投稿機能を約1カ月間の実験として無料提供開始。動画本編にも流れる仕様。
- “Mythos級”AI脅威に備え、自民党が「Project Glasswing」組成を検討 (ITmedia AI+): 金融分野を念頭に、高度AIによる重要インフラへの攻撃を想定した防御強化プロジェクトの立ち上げを自民党が検討中。
- AIデータセンター需要で「光ファイバー技術者」が米国で不足 (ITmedia AI+): MetaなどがAIデータセンター用光ファイバー敷設に必要な技術者育成プログラムを未経験者向けに開始。
- 個人向け「GitHub Copilot」新規登録を一時停止、使用量制限を強化 (ITmedia AI+): ProプランでのClaudeのOpusモデル利用が不可となり、不満なユーザーへの解約対応も明示。
- シーメンスが自律実行するエンジニアリングAI「Eigen Engineering Agent」を発表 (Monoist/ITmedia): 単なるアドバイスを超え、タスクの計画・実行・検証をエンドツーエンドで自律実行する産業AI製品がHannover Messe 2026で披露された。
- Japanet Expands VC Fund After Anthropic and xAI Bets Pay Off (Japan Times): The Nagasaki-based retailer is growing its fund from $50M to $200M, citing strong returns on early AI investments.
Research Papers
Benchmarks & Evaluation
- COMPOSITE-STEM: Introduces 70 expert-written tasks in physics, biology, chemistry, and math to benchmark AI agent capabilities beyond saturated reasoning benchmarks, measuring performance on unconstrained scientific outputs.
- The Amazing Agent Race (AAR): Reveals that 55–100% of tasks in existing tool-use benchmarks are simple linear chains; introduces DAG-based “leg” puzzles exposing that current agents are strong tool executors but weak planners on fork-merge workflows.
- Evaluating Multimodal LLMs for Inpatient Diagnosis (VALID): Retrospective evaluation of 10 frontier multimodal LLMs on 539 real-world inpatient cases from a South African public hospital, measuring accuracy, safety, and cost—one of the first LMIC-situated clinical LLM benchmarks.
- ReXSonoVQA: A video QA benchmark for ultrasound understanding with 514 procedural clips targeting action-goal reasoning and artifact resolution, filling a gap left by static-image-only medical AI evaluations.
Security & Adversarial
- The Blind Spot of Agent Safety: Demonstrates that computer-use agents (CUAs) can cause real harm from entirely benign user instructions when the task context or execution outcome is harmful—a blind spot ignored by existing evaluations focused on explicit misuse and prompt injection.
- Evaluating Temporal and Structural Anomaly Detection for DDoS Traffic: Proposes a lightweight diagnostic framework that selects between temporal and graph-based anomaly detection for DDoS in cloud-native 5G networks, improving detection accuracy while reducing model complexity.
Compliance & Regulation
- Towards Reliable Testing of Machine Unlearning: Frames machine unlearning—increasingly required by GDPR and AI governance frameworks—as a software QA problem, proposing systematic testing under realistic deployment constraints to verify that deletion requests are actually satisfied.
Alignment & Safety
- SaFeR-Steer: Proposes a progressive multi-turn alignment framework for multimodal LLMs that generates synthetic escalation scenarios to close the gap between single-turn safety training and real-world multi-turn adversarial attacks that exploit long-context safety decay.
- Shifting the Gradient: How Defensive Training Methods Protect LLM Integrity: Provides the first behavioral and mechanistic comparison of positive preventative steering (PPS) and inoculation prompting, finding they work through distinct gradient-level mechanisms despite surface-level similarity.
Applications
- A Discordance-Aware Multimodal Framework with Multi-Agent Clinical Reasoning: Addresses the imaging-symptom mismatch in knee osteoarthritis by combining multimodal ML with a tool-grounded multi-agent reasoning system for more reliable clinical decision support.
- LLM-Extracted Covariates for Clinical Causal Inference: Systematically tests strategies for using LLMs to extract latent confounders (frailty, goals of care) from clinical notes in EHR-based causal studies, where structured data misses critical variables.
Guardrails & Robustness
- The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation: Identifies a “Scaling Law of Miscalibration” in on-policy distillation—models improve on task accuracy but become severely overconfident—and proposes mechanisms to decouple the two, a critical safety concern for deployed models.
Key Themes
- Mega-capital consolidation: Amazon’s $33B Anthropic deal and Bezos’s $10B “Project Prometheus” round signal that the AI investment wave is accelerating, not plateauing, driven by compute scarcity as much as model progress.
- Agentic AI security: The Antigravity prompt injection RCE and research on computer-use agent blind spots mark a maturation in agentic AI threat modeling—autonomous agents introduce entirely new attack surfaces that traditional security tools miss.
- AI safety in multi-turn and deployed settings: Multiple papers this cycle focus on alignment failures that only appear over multiple turns or under distribution shift at inference time, pointing to a gap between static safety evaluations and real-world deployment risk.
- Public backlash crystallizing: From US election politics to Japan’s LDP planning for AI-class threats, AI is shifting from a technical debate to a socio-political one—with unlearning compliance, deepfake regulation, and FTC enforcement all advancing simultaneously.
- Benchmarks reaching saturation: COMPOSITE-STEM and AAR both respond directly to the saturation of existing AI evaluations, signaling that the field needs harder, more realistic tasks to meaningfully differentiate frontier model capabilities.
For detailed summaries of selected research papers, see papers.md.