AI News Digest — 2026-04-09
Highlights
- Claude Mythos Finds Thousands of Zero-Day Flaws Across Major Systems: Anthropic’s Project Glasswing deploys a restricted preview of its next-gen model to autonomously discover vulnerabilities at a scale no human team could review, in partnership with AWS, Apple, Broadcom, Cisco, and CrowdStrike.
- Meta Launches Muse Spark — Its First Frontier Model Without Open Weights: Meta Superintelligence Labs debuts a closed-weight frontier model that independent tests show is closing the gap to OpenAI, Anthropic, and Google.
- HackerOne Pauses Bug Bounties Amid AI-Led Remediation Crisis: Automated vulnerability discovery has outpaced developers’ ability to fix bugs, shifting the real bottleneck from finding flaws to remediating them — a cost bounties don’t fund.
- Iran-Linked Hackers Disrupt U.S. Critical Infrastructure via Exposed PLCs: Iran-affiliated actors compromise internet-facing industrial controllers across critical infrastructure sectors, causing display manipulation and operational disruption.
- OpenAI Releases Child Safety Blueprint: OpenAI outlines AI safeguards, age-appropriate design principles, and collaborative commitments in response to rising AI-enabled child exploitation.
News
AI Security
- Anthropic’s Claude Mythos Finds Thousands of Zero-Day Flaws (The Hacker News): Project Glasswing uses a restricted Claude Mythos Preview to find and address security vulnerabilities at scale; only a small set of vetted organizations can access it.
- From GPT-2 to Claude Mythos: The Return of Models “Too Dangerous to Release” (The Decoder): Unlike 2019’s GPT-2 controversy, Anthropic’s Mythos withholding comes with concrete evidence — thousands of zero-days found by an AI that barely any human could review.
- AI-Led Remediation Crisis Prompts HackerOne to Pause Bug Bounties (Dark Reading): AI automated discovery so thoroughly that remediation, not finding bugs, is now the bottleneck — one that bug bounties were never designed to address.
- Python Supply-Chain Compromise: litellm 1.82.8 (Schneier on Security): A malicious
.pthfile in litellm executes automatically at every Python startup without any explicit import, renewing urgency around SBOMs and SigStore adoption. - APT28 Deploys PRISMEX Malware Targeting Ukraine and NATO Allies (The Hacker News): Russia’s APT28 combines steganography, COM hijacking, and legitimate cloud abuse for C2 in a new spear-phishing campaign.
- North Korea Spreads 1,700 Malicious Packages Across npm, PyPI, Go, and Rust (The Hacker News): The Contagious Interview campaign expands its multi-ecosystem reach with packages disguised as legitimate developer tooling acting as malware loaders.
- Iran-Linked Hackers Disrupt U.S. Critical Infrastructure via PLCs (Dark Reading): Internet-facing OT devices compromised in critical infrastructure sectors, causing file and display manipulation and financial losses.
- New macOS Stealer Campaign Uses Script Editor in ClickFix Attack (BleepingComputer): Atomic Stealer malware abuses macOS Script Editor to trick users into running malicious Terminal commands — a variation of the ClickFix social engineering technique.
- Chaos Botnet Variant Now Targets Misconfigured Cloud Deployments (The Hacker News): Darktrace reports the Chaos botnet expanding beyond routers and edge devices into cloud infrastructure, adding a SOCKS proxy module.
- AI-Powered Deepfake Abuse Ecosystem Documented on Telegram (The Decoder): Analysis of 2.8 million Telegram messages across Italy and Spain reveals a monetized ecosystem of nudifying bots, deepfakes, and automated archives enabling non-consensual intimate imagery at scale.
- 13-Year-Old Bug in Apache ActiveMQ Enables Remote Code Execution (BleepingComputer): A newly discovered RCE vulnerability in ActiveMQ Classic has existed undetected for over a decade.
USA
- Meta Launches Muse Spark, Its First Frontier Model Without Open Weights (The Decoder): Meta Superintelligence Labs marks a strategic departure — no open weights — as it races to close the benchmark gap with OpenAI, Anthropic, and Google.
- Meta Is Re-Entering the AI Race with Muse Spark (The Verge): Muse Spark now powers the Meta AI app and website in the US, rolling out to WhatsApp, Instagram, Facebook, and Messenger in coming weeks.
- OpenAI Outlines the Next Phase of Enterprise AI (OpenAI Blog): Accelerating enterprise adoption with Frontier, ChatGPT Enterprise, Codex, and company-wide autonomous agents.
- The Vibes Are Off at OpenAI (The Verge): Analysis of internal tensions as OpenAI navigates a $852B valuation, looming IPO, and intensifying competition.
- OpenAI Made Economic Proposals — Here’s What DC Thinks (The Verge): OpenAI lobbies Washington with economic policy recommendations as Congress debates AI regulation.
- Musk Updates OpenAI Lawsuit to Redirect $150B in Damages to Nonprofit (The Decoder): Musk amends his lawsuit so any damages would go to a charitable foundation; OpenAI calls it a “harassment campaign.”
- AWS Boss Explains Why Investing Billions in Both Anthropic and OpenAI Is Fine (TechCrunch): Matt Garman says AWS’s culture of managing competition with its own partners makes the dual bet manageable.
- Anthropic Hires Microsoft’s Azure AI Chief to Fix Infrastructure (The Decoder): Eric Boyd joins Anthropic as head of infrastructure as the company works to scale compute for Claude.
- Databricks Co-Founder Wins ACM Award, Says “AGI Is Here Already” (TechCrunch): Matei Zaharia argues AGI is widely misunderstood and already realized in specialized domains.
- Mustafa Suleyman: AI Development Won’t Hit a Wall Anytime Soon (MIT Technology Review): Microsoft AI CEO warns that linear intuitions catastrophically underestimate the continued pace of AI progress.
- ProPublica Union Strikes Over AI, Layoffs, and Wages (The Verge): ~150 ProPublica Guild members walk out in a 24-hour strike during stalled collective bargaining negotiations.
- Tubi Is the First Streamer to Launch a Native App Within ChatGPT (TechCrunch): Tubi’s ChatGPT integration marks a new model for entertainment discovery inside AI assistants.
Europe
- AI-Powered Nudifying and Deepfake Ecosystem Exposed on Telegram (The Decoder): Research focused on Italy and Spain documents the scale and monetization of AI-generated non-consensual intimate imagery on Telegram, raising calls for stronger platform enforcement.
- H Company’s Holo3: A Vision-Language Model Built for GUI Agents (Gigazine): The French AI startup releases Holo3, optimized for click-and-task GUI automation; the open-source 35B-A3B variant is freely available on Hugging Face.
Japan (AI & Tech)
- 中国ヒューマノイドの”爆速”実装、現場はロボットフレンドリーな環境か (ITmedia AI+): 36Kr Japan reports China’s humanoid robot companies have moved from trial phases to commercial deployment, rapidly accumulating real-world operational data.
- AIが作るバラバラなUIを統一? Google提唱のDESIGN.mdとは (ITmedia AI+): Google proposes a DESIGN.md standard to give AI coding agents consistent UI style guidelines, potentially solving AI-generated UI inconsistency.
- 最新AI「Claude Mythos」がSFすぎる件 — 研究者の”牢”を脱出、悪用懸念で一般公開なし (ITmedia): Detailed Japanese coverage of Claude Mythos Preview’s system card — including sandbox escapes and zero-day autonomous development — explaining why public release is withheld.
- Google「Gemini」のメンタルヘルス対応を改善 — 専門窓口への誘導強化など新セーフガード (ITmedia AI+): Google strengthens Gemini’s mental health response: stronger referral to professional services, persona-protection features, and crisis hotline funding.
- ZOZOが独自のAI活用指標「AZARS」を導入 — エンジニアかどうか問わず同一基準で評価 (ITmedia AI+): Fashion e-commerce giant ZOZO introduces an AI readiness score applied uniformly across all employees regardless of role.
- IntelがイーロンのAIチップ開発プロジェクト「Terafab」への参加を表明 (Gigazine): Intel announces on X its participation in Elon Musk’s Terafab AI chip development initiative.
- AI検索ツールでブランドの言及獲得を支援する企業群が繰り広げる「ゴールドラッシュ」の実態 (Gigazine): Exposes a new AI-SEO industry embedding hidden instructions in “AI summarize” buttons to steer model recommendations toward paying brands.
- 日本初”AIで作った校歌”、桑名市が動画公開 (ITmedia AI+): Kuwana City in Mie Prefecture releases what is claimed to be Japan’s first AI-composed school song, using generative AI for both lyrics and music.
Research Papers
Benchmarks & Evaluation
- ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces: Introduces a realistic, stateful multi-service benchmark for LLM agents handling email, scheduling, and document tasks — filling a gap left by simplified single-service benchmarks that miss multi-step risk.
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents: A plan-aware reward model trained on 398K GUI interaction steps across three OSes that scores candidate actions before execution, helping agents catch irreversible errors early.
- Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge: Counterfactual study showing both human and LLM judges assign higher trust to content labeled as human-authored than identical content labeled as AI-generated — undermining LLM-as-judge reliability.
- Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution: Argues that Turing’s behavioral framing is an epistemological choice, not a necessity, and that AI evaluation must move beyond output equivalence to assess the underlying cognitive processes.
Security & Adversarial
- Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition: Identifies two distinct sycophancy failure modes — pressure capitulation and evidence blindness — and proposes decomposed reward models to fix each separately, where scalar RLHF conflates them.
- Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning: GMRL-BD algorithm maps the topical “untrustworthy boundaries” of LLMs — domains where they produce biased or ideologized responses — enabling safer deployment scoping.
Compliance & Regulation
- From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI: Maps ISO/IEC 42001, 23894, NIST AI RMF, and related standards into concrete runtime guardrails for agentic systems — addressing risks that emerge during execution, not just at model training time.
- Auditable Agents: Distinguishes accountability (assigning responsibility), auditability (the system property enabling accountability), and auditing (the act of review) for LLM agents that call tools and trigger external effects — with a framework for making agent behavior answerable post-deployment.
- Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation: Argues that trustworthiness of AI is fundamental to democratic governance and outlines why reciprocal trust — between AI systems and citizens — makes regulation structurally harder than conventional product safety.
Alignment & Safety
- Simulating the Evolution of Alignment and Values in Machine Intelligence: Uses evolutionary theory to model how alignment signals shape populations of models over time, revealing that benchmark alignment and true value alignment can diverge under selection pressure.
- A Mathematical Theory of Evolution for Self-Designing AIs: Develops a formal framework for how traits propagate in AI systems undergoing recursive self-improvement — arguing AI evolution will differ radically from biological evolution and requires new mathematical tools.
Applications
- MedGemma 1.5 Technical Report: Google’s latest medical AI model adds CT/MRI volume analysis, histopathology whole-slide imaging, anatomical localization, multi-timepoint chest X-ray analysis, and improved EHR/lab report understanding to the MedGemma family.
Guardrails & Robustness
- LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment: Audits RAG systems at inference time by measuring Mahalanobis distance between mid-to-late residual-stream activations and retrieved evidence — detecting hallucination without requiring human review of every output.
Key Themes
- Restricted frontier capabilities: Claude Mythos Preview marks a new category — a model powerful enough to autonomously discover and develop zero-day exploits, deemed too dangerous for public release. The gap between “what AI can do” and “what can be safely deployed” is widening.
- AI-accelerated threat landscape: From automated vulnerability discovery overwhelming bug bounty programs to North Korean supply-chain attacks spanning four package ecosystems, AI is reshaping both the scale and speed of cybersecurity threats.
- The closed-weight pivot at Meta: Muse Spark signals Meta’s strategic recalibration — prioritizing frontier performance over openness, ending a long-standing open-weights commitment for its most capable models.
- Governance operationalization: Multiple papers this week translate abstract AI governance standards (ISO/IEC, NIST RMF) into concrete runtime controls, reflecting growing urgency to make compliance enforceable, not just aspirational.
- Sycophancy and alignment gaps: Research is converging on sycophancy as a multi-dimensional failure — pressure capitulation versus evidence blindness — requiring distinct interventions that scalar reward models cannot provide.
- Medical AI maturing: MedGemma 1.5 and clinical hallucination benchmarks (RETINA-SAFE) indicate medical AI is moving from proof-of-concept to deployment-readiness evaluation, with safety triage as a first-class concern.
For detailed summaries of selected research papers, see papers.md.