AI News Digest — April 10, 2026
Highlights
- Anthropic limits Mythos model over cybersecurity weapon risk: Anthropic restricted access to its most capable new model, citing its unprecedented ability to discover exploitable software vulnerabilities at scale — raising questions about where AI safety ends and competitive caution begins.
- US appeals court lets Pentagon blacklisting of Anthropic stand: A federal appeals court declined to block the Department of Defense’s designation of Anthropic as a national security risk, leaving a cloud over the company’s government business.
- Adobe Reader zero-day exploited in the wild since December 2025: A highly sophisticated PDF exploit weaponizing an unpatched Adobe Reader flaw has been used in targeted attacks for at least four months before disclosure.
- OpenAI halves Pro tier price to $100/month: OpenAI introduces a $100/month tier aimed at heavy Codex users, cutting the previous jump from $20 to $200 and directly undercutting rivals Anthropic and Google.
- Anthropic launches managed infrastructure for autonomous AI agents: Claude Managed Agents gives developers a hosted platform for building and running long-running autonomous agents, with Notion and Rakuten among early adopters.
News
AI Security
- Is Anthropic limiting Mythos to protect the internet — or Anthropic? — Anthropic says its new Mythos model is too capable at finding security exploits in widely-used software; critics question whether safety concerns are also covering for competitive positioning.
- OpenAI reportedly following Anthropic’s lead on restricting powerful cybersecurity AI — OpenAI is also developing a restricted cybersecurity AI model that will only be available to a vetted group of companies.
- The hidden security risks of shadow AI in enterprises — Employees adopting AI tools without IT approval are creating blind spots that bypass security controls — a growing category risk distinct from traditional shadow IT.
- Florida AG investigates OpenAI over FSU shooting — Florida’s attorney general opens a formal investigation into OpenAI after ChatGPT was allegedly used to plan a deadly attack at Florida State University last April.
- Mercor data breach leaves $10B startup losing customers and facing lawsuits — The AI hiring platform is dealing with lawsuits and departing clients after hackers accessed user data.
USA
- OpenAI halves Pro plan price to $100/month for Codex-heavy users — OpenAI restructures subscription tiers, offering significantly more Codex usage at half the price of the old $200 Pro tier, undercutting Anthropic and Google.
- Anthropic launches Claude Managed Agents — A new hosted platform for building and deploying autonomous AI agents, targeting enterprise developers; Notion and Rakuten are among the first adopters.
- Claude Cowork expands to all paid plans on macOS and Windows — Anthropic’s collaborative AI assistant adds organizational controls and Zoom integration and opens up to all paid tiers.
- Meta AI app jumps to #5 on the App Store after Muse Spark launch — The app surged from #57 to #5 following the debut of Meta’s new Muse Spark model, reflecting strong consumer interest.
- Google Gemini now generates interactive 3D models and simulations — Gemini can now produce rotatable 3D models and adjustable simulations directly inside the chat interface.
- Sierra’s Bret Taylor: the era of clicking buttons is over — Sierra launched Ghostwriter, an agent that builds other agents, aiming to replace traditional click-based UIs with natural-language task delegation.
- US appeals court refuses to block Pentagon’s blacklisting of Anthropic — The DoD’s national security designation of Anthropic remains in force after the court denied an emergency stay.
- Amazon CEO Andy Jassy defends $200B capex in shareholder letter — Jassy’s letter reads as a competitive broadside against Nvidia, Intel, Starlink, and others while defending Amazon’s massive AI infrastructure spending.
- Google and Intel deepen AI infrastructure partnership — The two companies plan to co-develop custom chips, including IPUs, amid a global CPU shortage driven by AI demand.
- New Stanford study: multi-agent AI advantage mostly comes from more compute — Multi-agent systems appear more capable largely because they use more compute; the study identifies meaningful exceptions where agent collaboration provides distinct gains.
- YouTube Shorts rolls out AI avatar feature for creators — Google makes it straightforward for creators to deepfake themselves on YouTube Shorts, adding to debates about AI-generated content on the platform.
- Zhipu AI releases GLM-5.1 under MIT license — The model can iteratively refine its own coding approach across hundreds of cycles, making it a notable open-source addition for coding tasks.
- Microsoft’s New Future of Work report: AI driving rapid change, uneven benefits — Microsoft Research’s 2025 report finds the AI-driven shift in how people work is sharper than prior years, but the benefits remain unevenly distributed.
- The AI industry’s race for profits is existential — The Verge’s Decoder explores whether Anthropic, OpenAI, and others can become profitable businesses before running out of runway.
Europe
- War in the Gulf could tilt the cloud race toward China — Strikes on US data centers in the Gulf conflict highlight the risks of cloud concentration and are accelerating conversations about Huawei as an alternative infrastructure provider.
- Healthcare IT provider ChipSoft (Netherlands) hit by ransomware — The Dutch healthcare software vendor was forced to take its patient-facing systems offline following a ransomware attack.
- Eurail data breach exposes 300,000 individuals’ data — Eurail B.V. discloses that a December 2025 intrusion stole personal data from 300,000 customers across 33 European rail networks.
- Greece to ban social media access for under-15s starting January 2027 — Greece will require platforms to implement robust age-verification mechanisms and will mandate re-verification of existing accounts under the new law.
Japan (AI & Tech)
- Tokyo Metropolitan Government launches in-house AI platform “A1” — Tokyo’s municipal government has launched an internal no-code AI platform that lets staff build and share productivity apps without engineering support.
- Asahi Shimbun pushes back on Nikkei’s “AI all-in” editorial stance — Asahi Shimbun issued a public rebuttal stating that “AI supplements humans — final judgement and responsibility remain with people,” in response to a Nikkei article declaring the news organization’s full AI transformation.
- AI writing autocomplete may shape users’ thinking before they know it — A Cornell University study published in Science Advances finds that AI text suggestions influence not just writing style but underlying thought patterns.
- Waiting for DeepSeek V4: China’s AI ambitions under scrutiny — The long-awaited DeepSeek V4 has yet to appear, stoking speculation about whether Huawei chips can power a genuine Nvidia alternative and what this means for China’s AI ambitions.
- Google Colab adds Gemini-powered “learning mode” for Python — The new mode in Google Colab uses Gemini to guide learners through Python step-by-step, going beyond autocomplete to active skill-building.
Research Papers
Benchmarks & Evaluation
- WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks — Introduces the first benchmark assessing whether web agents can execute user-facing security and privacy tasks (e.g., managing privacy settings, revoking app access), exposing a large gap between general-purpose agent capabilities and security-aware behavior.
- Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization — A new benchmark for evaluating how well reward models capture individual user preferences rather than aggregate human values, advancing the pluralistic alignment agenda.
- ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs — Finds that multimodal LLMs struggle to ground culture-specific value judgments when response options are visual rather than textual, exposing a blind spot in cross-cultural safety evaluations.
Security & Adversarial
- TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories — Introduces the first benchmark testing safety guardrails not just on final outputs but on intermediate tool-call execution traces, revealing that current guardrails are largely blind to mid-trajectory policy violations in agentic systems.
- FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs — Demonstrates that private training data can be reconstructed from shared gradients even in parameter-efficient federated fine-tuning of LLMs, challenging the privacy guarantees of federated learning for sensitive data.
- Towards Robust Content Watermarking Against Removal and Forgery Attacks — Proposes a watermarking scheme for diffusion-model-generated images that withstands both removal attacks (which strip the watermark) and forgery attacks (which embed fake marks), critical for AI content provenance.
Alignment & Safety
- Reinforcement Learning for LLM Post-Training: A Survey — Comprehensive survey covering RLHF, RLAIF, and related post-training methods that address harmful or misaligned LLM outputs, mapping the current state of the field and open challenges.
- GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA — Proposes a unified RL framework that combines the group-sampling efficiency of GRPO with the implicit reward formulation of DPO, improving LLM alignment stability without needing an explicit reward model.
- Distributional Open-Ended Evaluation of LLM Cultural Value Alignment (DOVE) — Introduces a generative (not multiple-choice) evaluation framework that reveals how LLMs’ cultural value orientations vary across subcultural contexts, with implications for global deployment safety.
Applications
- MedRoute: RL-Based Dynamic Specialist Routing in Multi-Agent Medical Diagnosis — Uses reinforcement learning to dynamically route patient cases to specialized multimodal models, outperforming generalist LMMs on rare and complex conditions in clinical diagnosis scenarios.
- GraphWalker: Graph-Guided In-Context Learning for Clinical Reasoning on EHRs — Addresses three failure modes in LLM-based clinical reasoning over electronic health records by using graph structure to select richer and more medically coherent in-context examples.
- A Systematic Study of Retrieval Pipeline Design for Medical RAG QA — First large-scale comparison of retrieval configurations for medical question answering, finding that dense retrieval with domain-tuned embeddings consistently outperforms sparse methods even on clinical edge cases.
Guardrails & Robustness
- FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization — Proposes on-device data sanitization to prevent toxic client data from poisoning global safety alignment during federated fine-tuning of small language models, without compromising privacy.
- Weakly Supervised Distillation of Hallucination Signals into Transformer Representations — Shows that hallucination-detection capability can be distilled into a model’s own internal representations during training, enabling inference-time hallucination detection without external verifiers or retrieval systems.
- Steering the Verifiability of Multimodal AI Hallucinations — Distinguishes “obvious” from “elusive” hallucinations in multimodal LLMs and proposes a method to steer models toward producing more verifiable (and thus safer) hallucinations when errors do occur.
Key Themes
- Frontier model self-restraint — Anthropic restricting Mythos and OpenAI following suit marks a new phase where labs withhold their most capable models on safety grounds, setting precedents for AI governance.
- AI and legal/political accountability — The Pentagon blacklisting of Anthropic, Florida’s ChatGPT investigation, and the Mercor lawsuit signal growing legal exposure for AI companies when deployments cause harm.
- Agentic AI infrastructure coming of age — Claude Managed Agents, Sierra’s Ghostwriter, and Stanford’s multi-agent study all point to 2026 as the year agentic deployment shifts from research to production.
- Guardrails lag behind agentic capabilities — TraceSafe, WebSP-Eval, and FedDetox research all independently find that current safety tools are designed for chatbots, not agents — a critical gap as autonomous systems proliferate.
- Privacy under federated learning challenged — FedSpy-LLM and DDP-SA show that data reconstruction attacks on LLM gradients remain a live threat even under federated training, complicating privacy claims.
- AI workforce impact uneven — Microsoft’s New Future of Work report and the AI autocomplete influence study both flag that AI productivity gains are not evenly distributed, raising equity concerns.
For detailed summaries of selected research papers, see papers.md.