AI News Digest — April 2, 2026
Highlights
- OpenAI Closes $122B Mega-Round, Unveils ChatGPT Super App: OpenAI officially confirms a $122 billion funding round at an $852 billion valuation, backed by SoftBank, Amazon, and NVIDIA, and signals a hard pivot toward a unified ChatGPT super app.
- Anthropic’s Claude Code Leaked Source Cloned 8,000+ Times: An npm packaging error exposed Claude Code’s source code, which spread to over 8,000 GitHub forks despite mass takedown efforts, revealing details of an unreleased “Mythos” model.
- Google DeepMind Identifies Six “Traps” That Hijack Autonomous AI Agents: A new study documents six environmental manipulation patterns that reliably redirect real-world AI agents browsing the web, handling email, and executing transactions.
- EU Bars AI-Generated Content from Official Communications: The European Commission, Parliament, and Council have prohibited their press teams from publishing fully AI-generated content, citing authenticity and accountability concerns.
- North Korea Linked to Axios npm Supply Chain Compromise: Google formally attributes the supply chain attack on the widely-used Axios npm package to North Korean group UNC1069, marking a significant escalation in open-source ecosystem targeting.
News
AI Security
- Claude Code Source Leaked via npm Packaging Error (The Hacker News) — Anthropic confirms internal Claude Code source was inadvertently bundled into a public npm package, exposing proprietary logic and references to unreleased models.
- Claude Chrome Extension “ShadowPrompt” Vulnerability (ITmedia Enterprise) — A critical flaw in Claude’s Chrome extension allows a visited webpage to hijack AI behavior without user interaction, exploiting a design-level gap in extension sandboxing.
- Google DeepMind: Six Traps for Hijacking AI Agents in the Wild (The Decoder) — Researchers categorize six injection patterns — from hidden page instructions to malicious email content — that consistently redirect agent actions in deployed systems.
- Mercor Cyberattack Tied to LiteLLM Open-Source Compromise (TechCrunch AI) — AI recruiting startup Mercor confirms a security incident after attackers compromised the widely-used LiteLLM project as an entry vector.
- Google Attributes Axios npm Attack to North Korean Group UNC1069 (The Hacker News) — The attribution implicates a financially motivated North Korean threat actor in a supply chain attack affecting millions of downstream npm dependents.
- CrowdStrike: AI-Accelerated Attacks Now Average 29-Minute Breakout, Fastest 27 Seconds (ITmedia Enterprise) — The 2026 threat report documents how widespread AI tooling has compressed the average attacker breakout time to under 30 minutes, with observed cases completing initial intrusion in 27 seconds.
- Is “Hackback” Now Official US Cybersecurity Strategy? (Schneier on Security) — Bruce Schneier analyzes the 2026 US Cyber Strategy document and its implications for offensive cyber doctrine.
- New EvilTokens Service Fuels Microsoft Device Code Phishing (BleepingComputer) — A new MaaS kit automates device code phishing to hijack Microsoft OAuth tokens, lowering the bar for enterprise account takeover.
- ‘NoVoice’ Android Malware Infected 2.3 Million Devices via Google Play (BleepingComputer) — Malware hidden in over 50 Play Store apps silently exfiltrates data; apps have been pulled but infection persists on user devices.
- New Chrome Zero-Day CVE-2026-5281 Under Active Exploitation (The Hacker News) — Google patches the fourth Chrome zero-day exploited in the wild so far in 2026; update is immediately available.
- CERT-UA Impersonation Campaign Spread AGEWHEEZE Malware to 1 Million Emails (The Hacker News) — Attackers spoofed Ukraine’s computer emergency response team to distribute new malware at scale.
- Microsoft Warns of WhatsApp-Delivered VBS Malware with UAC Bypass (The Hacker News) — A new campaign uses WhatsApp messages to deliver VBScript payloads that execute with elevated privileges, bypassing User Account Control.
- Perplexity AI Sued Over Alleged Data Sharing with Meta and Google (The Decoder) — A class-action lawsuit accuses Perplexity of sharing personally identifiable user chat data with third-party advertisers.
- FBI Warns Against Chinese Mobile Apps Over Privacy Risks (BleepingComputer) — The FBI issues a formal advisory cautioning Americans about data collection practices in foreign-developed applications.
USA
- OpenAI Confirms $122B Round and ChatGPT Super App (The Decoder) — The raise, backed by SoftBank, Amazon, and NVIDIA, values OpenAI at $852 billion; the company signals a pivot away from API-first toward an integrated consumer super app.
- Meta’s Hyperion AI Datacenter Powered by 10 New Natural Gas Plants (TechCrunch AI) — Meta’s planned flagship AI datacenter will rely on newly built gas generation capacity equivalent to powering an entire US state.
- Cognichip Raises $60M to Use AI for Chip Design (TechCrunch AI) — The startup claims its AI can cut chip development costs by 75% and halve the design timeline, targeting the next generation of AI accelerators.
- Salesforce Announces 30 New AI Features for Slack (TechCrunch AI) — The overhaul deeply integrates Salesforce’s Agentforce platform into Slack, enabling autonomous agents to surface insights and take actions within conversations.
- Gradient Labs Gives Every Bank Customer an AI Account Manager (OpenAI Blog) — Built on GPT-4.1 and GPT-5.4, Gradient Labs deploys AI agents that automate banking support workflows, targeting sub-second response latency at enterprise scale.
- Holo3: Breaking the Computer Use Frontier (Hugging Face Blog) — H Company releases Holo3, a new computer-use model claiming state-of-the-art performance on desktop and browser automation benchmarks.
- Microsoft Research ADeLe: Predicting and Explaining AI Performance Across Tasks (Microsoft Research Blog) — ADeLe provides a framework to forecast LLM task performance and explain failures, moving beyond aggregate benchmark scores.
- America’s AI Boom Is Leaving the Rest of the World Behind (Rest of World) — Analysis shows US AI investment is widening rather than closing the global gap, with capital and talent concentrating in a handful of American firms.
- Gig Workers Are Now Training Humanoid Robots at Home (MIT Technology Review) — A growing gig economy is emerging around collecting embodied AI training data via teleoperation rigs, with workers in Nigeria and elsewhere earning per-task fees.
- Elgato Stream Deck Gets MCP Support for AI Agent Control (The Verge AI) — Elgato adds Model Context Protocol integration, allowing AI agents to trigger physical Stream Deck buttons and read their state.
- Anthropic Is Having a Month (TechCrunch AI) — A second human-error incident within a week at Anthropic; TechCrunch covers the streak of internal missteps.
- Baidu’s Robotaxis Froze in Traffic, Creating Chaos (The Verge AI) — Multiple Apollo Go robotaxis stalled mid-journey in a Chinese city, trapping passengers and blocking intersections before operators remotely intervened.
- Google Releases March 2026 AI News Roundup (Google AI Blog) — Monthly update covering Gemini, Search, Workspace, and Cloud AI releases from March.
Europe
- EU Bars AI-Generated Content from Official Communications (The Decoder) — All three major EU institutions have instructed press teams to avoid fully AI-generated material, a de facto content policy ahead of formal AI Act enforcement.
Japan (AI & Tech)
- .claudeフォルダの構造と使い方 — Claude Code入門 (ITmedia AI+) — A beginner’s guide to Claude Code’s
.claudefolder structure, explaining how CLAUDE.md and project-level configuration shape agent behavior. - 東大、オープンソース四足歩行ロボット「MEVIUS2」を発表 (ITmedia AI+) — The University of Tokyo unveils MEVIUS2, an open-source quadruped robot with parts orderable online, demonstrated climbing stairs in public.
- NotebookLMで作業時間95%削減 — 自治体・企業の「Google回帰」 (ITmedia Enterprise) — Japanese municipalities and enterprises report up to 95% reduction in document processing time after deploying NotebookLM and other Google AI tools.
- Slack SlackbotがAIエージェント連携を大幅強化 (ITmedia AI+) — Slack announces Slackbot’s expansion into a multi-agent coordinator, including a “Deep Thoughts” strategic analysis feature.
- NTTデータ系気象会社がMCPサーバを提供開始 (ITmedia AI+) — An NTT Data weather subsidiary launches an MCP server exposing real-time meteorological data for AI agent pipelines in retail, distribution, and construction.
- スマホでAIモデルをローカル実行「Off Grid」レビュー (Gigazine) — Review of Off Grid, a free iOS/Android app running open-source LLMs and image generation models fully on-device without cloud dependency.
- 株式投資にAIを使うと? ClaudeとGeminiの”性格の違い” (ITmedia AI+) — A Tokyo University research group finds Claude adopts a conservative incremental investment strategy while Gemini takes bolder positions when automated to trade equities.
- 「説明可能なAI」が鍵 — 2028年、生成AI導入企業の50%がLLMオブザーバビリティーに投資 (ITmedia Enterprise) — Gartner predicts half of generative AI deployments will require explainability and observability tooling by 2028, driven by governance requirements.
- 最短27秒で初期侵入完了 — AI普及で高速化するサイバー攻撃 (ITmedia Enterprise) — CrowdStrike’s threat report finds AI-assisted attackers completing initial access in as little as 27 seconds, with the average dropping to 29 minutes.
- KDDI、フィクティブ取引で2461億円の過大計上を発表 (The Japan Times) — KDDI discloses ¥246.1 billion in overstated revenues via fictitious transactions; eight executives including the CEO will return portions of remuneration.
- 溶接工が6時間でアプリを開発 — 静岡の町工場が生成AI教育に500万円 (ITmedia Business) — A 13-person factory in Shizuoka invested ¥5 million in generative AI training and reports a welder built a business app in six hours with no prior coding experience.
- AIを使わないApple共同創業者ウォズニアック、50周年で技術への違和感を語る (ITmedia Business) — On Apple’s 50th anniversary, Steve Wozniak says AI-generated content feels “too perfect to be human,” expressing discomfort with the loss of personality in technology.
Research Papers
Benchmarks & Evaluation
- Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents — Argues that current benchmarks measure single-attempt capability but not production reliability; proposes a framework distinguishing capability from consistency across repeated runs, critical for deployment decisions.
- BenchScope: How Many Independent Signals Does Your Benchmark Provide? — Introduces Effective Dimensionality to measure how many genuinely independent signals a benchmark suite provides, revealing that many popular eval sets are heavily redundant and inflate apparent coverage.
- ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities — Demonstrates that errors in benchmark construction itself cause AI performance to appear lower than it is; after correcting labeling flaws, agent scores on ELT pipeline tasks improve substantially.
- SciVisAgentBench: Evaluating Scientific Data Analysis and Visualization Agents — Introduces a benchmark testing agents on end-to-end scientific workflows — from raw data ingestion to publication-quality visualization — revealing consistent failure modes in multi-step reasoning.
- SkillTester: Benchmarking Utility and Security of Agent Skills — A dual-axis evaluation framework that tests both functional utility and security vulnerabilities of AI agent “skills” (tools/plugins), including a curated security probe suite.
Security & Adversarial
- Trojan-Speak: Bypassing Constitutional Classifiers via Adversarial Finetuning — Shows that fine-tuning APIs allow adversaries to embed triggers that bypass content classifiers with no perceptible “jailbreak tax” — outputs appear normal until a trigger phrase is present.
- Architecting Secure AI Agents: System-Level Defenses Against Indirect Prompt Injection — Presents architectural patterns — input sanitization pipelines, context compartmentalization, and permission hierarchies — that reduce susceptibility to injected instructions from untrusted environments.
- GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small LMs — Proposes a lightweight defense mechanism for edge-deployed small language models that monitors internal token activations to detect and block jailbreak attempts without retraining.
- CivicShield: Defense-in-Depth for Government AI Chatbots vs. Multi-Turn Adversarial Attacks — Documents that multi-turn adversarial attacks succeed over 90% of the time against current government chatbot deployments and proposes a layered defense framework reducing success to under 15%.
- Security in LLM-as-a-Judge: A Comprehensive SoK — Systematizes knowledge on attack surfaces in LLM-based evaluation pipelines, covering prompt injection, output manipulation, and gaming strategies that undermine automated quality assessment.
- Design Principles for a Security Operations Benchmark for Multi-Agent AI Systems — Proposes evaluation criteria for AI systems performing cybersecurity operations, addressing gaps in existing benchmarks that fail to capture adversarial conditions or multi-agent coordination requirements.
Alignment & Safety
- Extending MONA: Reward-Hacking Mitigation via Myopic Optimization with Non-Myopic Approval — Reproduces and extends MONA, a technique that limits an agent’s planning horizon while allowing a human-in-the-loop approver to evaluate longer-term consequences, reducing multi-step reward hacking.
- Robust Safety Monitoring of Language Models via Activation Watermarking — Embeds verifiable watermarks into model activations to enable reliable detection of misuse (e.g., weapon instructions, malware generation) even when outputs are obfuscated at the text level.
- SafeClaw-R: Towards Safe and Secure Multi-Agent Personal Assistants — Addresses safety and authorization gaps in multi-agent personal assistant frameworks (e.g., AutoGPT-style systems), proposing sandboxing and inter-agent trust hierarchies.
Guardrails & Robustness
- Towards Policy-Adaptive Image Guardrail: Benchmark and Method — Introduces a content moderation framework that adapts rejection thresholds to context-specific policies rather than applying fixed rules, with an accompanying benchmark for evaluating policy-conditioned refusals.
- Uncertainty Gating for Cost-Aware Explainable AI — Uses epistemic uncertainty as a proxy for when explanations are unreliable, gating expensive explanation generation to cases where it is most needed — particularly relevant for high-stakes clinical and legal deployments.
Applications
- Symphony for Medical Coding: Agentic System for Scalable and Explainable Medical Coding — Presents an agentic pipeline that translates free-text clinical notes into standardized billing codes with higher accuracy and built-in audit trails, addressing a major bottleneck in healthcare administration.
Key Themes
- AI infrastructure concentration: Meta’s natural gas buildout and the OpenAI mega-round signal that AI’s resource demands are now a macroeconomic factor, with capital and energy access shaping who can compete.
- Agentic security as the new frontier: The Claude Code leak, Google DeepMind’s agent hijacking research, ShadowPrompt, and four new papers on prompt injection underscore that agents operating in the real world introduce a qualitatively different attack surface from static LLMs.
- Benchmark credibility under scrutiny: Multiple papers this cycle challenge the validity of existing benchmarks — from redundancy (BenchScope) to labeling errors (ELT-Bench) to the capability/reliability gap (Beyond pass@1) — signaling growing methodological maturity in AI evaluation.
- Regulatory and institutional guardrails taking shape: The EU content ban and Gartner’s LLM observability forecast indicate that governance requirements are moving from abstract to operational, creating demand for explainability and audit infrastructure.
- North Korea’s escalation in open-source targeting: The Axios npm attack extends a pattern of supply chain operations to the most critical shared infrastructure of AI/ML development, raising the risk profile for the entire open-source ecosystem.
For detailed summaries of selected research papers, see papers.md.