AI News Digest — 2026-03-24

Highlights

Jensen Huang Claims AGI Has Been Achieved: Nvidia CEO Jensen Huang told the Lex Fridman podcast “I think we’ve achieved AGI,” reigniting debate over the vaguely defined term and what it means for the industry.
Senator Warren Calls Pentagon’s Anthropic Ban ‘Retaliation’: Sen. Elizabeth Warren wrote to Defense Secretary Hegseth calling the DOD’s “supply-chain risk” label on Anthropic politically motivated retaliation rather than a legitimate security concern.
CanisterWorm Wiper Targets Iran via Cloud Services: A financially motivated threat actor released a worm that spreads through misconfigured cloud services and wipes data on systems using Iran’s timezone or Farsi — part of a broader TeamPCP campaign that also poisoned the Trivy security scanner.
OpenAI Guarantees 17.5% Returns to Court Private Equity: OpenAI is offering private equity firms a guaranteed minimum return on enterprise joint ventures as it races to secure infrastructure partnerships ahead of Anthropic.
White House Unveils AI Policy: The White House released its formal AI policy framework, coinciding with broader debates about how AI is shaping geopolitics, energy, and cognition.

News

AI Security

We Found Eight Attack Vectors Inside AWS Bedrock — Researchers identified eight exploitation paths in Amazon’s Bedrock AI platform, where agents’ direct access to enterprise data sources (Salesforce, SharePoint, Lambda) creates novel attack surfaces unique to AI-connected systems.
Varonis Atlas: Securing AI and the Data That Powers It — Varonis Atlas addresses the challenge of AI agents directly accessing sensitive organizational data, arguing that data security is now inseparable from AI security.

USA

Jensen Huang Says ‘I Think We’ve Achieved AGI’ — On the Lex Fridman podcast, Nvidia’s CEO made a sweeping claim about AGI, fueling debate about what the term actually means and whether it’s being redefined to suit industry narratives.
Senator Warren Calls Pentagon’s Anthropic Decision ‘Retaliation’ — Warren argued in a letter to Defense Secretary Hegseth that labeling Anthropic a supply-chain risk goes beyond contract termination into politically motivated punishment.
OpenAI Lures Private Equity with Guaranteed Returns — To win enterprise joint venture partners, OpenAI is sweetening deals with a 17.5% minimum return guarantee as it competes with Anthropic for infrastructure and distribution.
Meta Acqui-Hires Dreamer Team to Boost AI Agent Ambitions — Meta Superintelligence Labs absorbs the entire Dreamer startup team, including former Meta VP Hugo Barra, in its second agent-focused acquisition this year.
Zuckerberg Builds Personal AI Agent, Plans Flatter Org Structure — Mark Zuckerberg is reportedly building an AI agent to help run Meta while the company explores deep cuts to management layers.
OpenAI’s Sam Altman Steps Down as Helion Board Chair Amid Power Deal Talks — Altman is exiting the Helion board as reports emerge that OpenAI is negotiating to purchase 12.5% of the fusion startup’s power output.
Apple Sets WWDC 2026 for June 8, Promises ‘AI Advancements’ — Apple confirmed its developer conference week and is expected to unveil significant Siri upgrades with advanced AI capabilities.
Luma AI’s Uni-1 Challenges Google’s Image Generation Dominance — Luma AI’s Uni-1 model combines image understanding and generation in a single architecture with built-in reasoning, positioning itself as a serious challenger to OpenAI and Google.
Gimlet Labs Raises $80M to Solve AI Inference Across Chip Vendors — The startup’s technology lets AI models run simultaneously across NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix chips, addressing a key bottleneck in multi-vendor inference.
OpenSeeker Open-Sources Competitive AI Search Agent — With just 11,700 training samples and one training run, OpenSeeker matches proprietary solutions from Alibaba, with all data, code, and model weights released publicly.
Littlebird Raises $11M for Screen-Reading AI Context Tool — Littlebird reads your screen in real time to capture work context and answer queries without screenshots, raising $11M to expand the privacy-preserving approach.
Lovable Vibe-Coding Startup Hunts Acquisitions — The fast-growing AI coding startup is actively seeking startups and teams to bring in-house as it scales its position in the AI-assisted development space.
The Hardest Question About AI-Fueled Delusions — MIT Technology Review explores the concerning psychological effects of AI-enabled delusional thinking and why it’s proving difficult to draw clear causal lines.
Bernie Sanders AI ‘Gotcha’ Video Flops — Sanders attempted to expose industry secrets by prompting Claude, but the episode mostly revealed how agreeable chatbots can appear without actually confirming anything.
The Gulf Was Silicon Valley’s AI Bet — Trump Put It in the Crosshairs — The same geographic choke points that made the Persian Gulf the world’s energy hub now threaten its role as a hub of AI infrastructure investment.
Microsoft Researchers Debate Whether Machines Can Ever Be Intelligent — AI researchers Subutai Ahmad and Nicolò Fusi compare transformer architectures with the human brain, exploring continual learning, efficiency, and the limits of current AI paradigms.

Japan (AI & Tech)

Preferred Networks Releases PLaMo 3.0 Prime — Japan’s First Reasoning LLM Built from Scratch — PFN’s PLaMo 3.0 Prime is Japan’s first domestically built large language model with extended reasoning (long-thought) capability, developed without fine-tuning from existing models and competitive with Qwen3-235B and GPT-oss-120b.
Tokyo University and NEC Sign AI Industry-Academia Partnership — The University of Tokyo and NEC formed a joint research agreement focused on “trustworthy AI,” aiming to produce research with global impact on some of AI’s hardest open problems.
AI Analyzes 341 Job Types: Which Will Grow, Which Face Crisis? — An AI-driven analysis of 341 occupations classifies them as “growing,” “at risk,” or “in between,” providing a methodology for workers to understand their exposure to automation.
WordPress.com Formally Supports AI Agents for Content Creation and SEO — Automattic’s WordPress.com announced official support for AI agent-driven post creation, SEO improvement, comment management, and metadata updates.
OpenCode: Free Open-Source AI Coding Agent for Terminal and IDE — OpenCode supports Claude, GPT, Gemini and local models, offering multi-agent parallel execution, LSP support, and GitHub Copilot integration for cross-platform AI-assisted development.

Research Papers

Benchmarks & Evaluation

ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with LLMs — A benchmark integrating multiple verbal and non-verbal reasoning and planning tasks (framed as travel itinerary planning) to evaluate LLM cognitive capabilities in complex real-world contexts.
GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning — A large-scale benchmark of 90K geometry problems testing symbolic reasoning through multi-step proofs grounded in both text and diagrams, exposing gaps in current LLM geometric understanding.
URAG: A Benchmark for Uncertainty Quantification in RAG Systems — Comprehensive benchmark for assessing the reliability and confidence calibration of retrieval-augmented generation systems across multiple domains, addressing a key gap in RAG evaluation.
FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Drug Assessment — A real-world benchmark using FDA generic drug label documents, developed with regulatory assessors, for evaluating document-grounded QA in clinical and compliance contexts.

Security & Adversarial

When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of LLMs — Studies adaptive adversaries that iteratively refine prompts to evade LLM safeguards, revealing that realistic jailbreaking scenarios are far more dangerous than static harmful prompt collections suggest.
LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages — Measures cross-lingual safety degradation in LLMs, showing that refusal mechanisms trained on high-resource languages fail systematically for Yoruba, Hausa, Igbo, and Igala.
The Autonomy Tax: Defense Training Breaks LLM Agents — Reveals a capability-alignment paradox: training agents to resist prompt injection attacks degrades their autonomy and tool-use effectiveness, creating a measurable “autonomy tax.”
Zero-Day Attack Detection in IDS Using Self-Attention and Jensen-Shannon Divergence in WGAN-GP — Applies Wasserstein GANs with gradient penalty to generate synthetic network traffic for training intrusion detection systems against previously unseen zero-day attacks.

Compliance & Regulation

A Framework for Formalizing LLM Agent Security — Proposes formal contextual security definitions for LLM agents, addressing the lack of rigorous attack definitions needed for compliance frameworks and security assurance in agentic deployments.
MAPLE: Metadata Augmented Private Language Evolution — A differentially private LLM fine-tuning framework using synthetic data generation, enabling privacy-preserving model adaptation suitable for regulated industries handling sensitive data.

Alignment & Safety

Do Post-Training Algorithms Actually Differ? Scale-Dependent Ranking Inversions — Controlled evaluation of 51 post-training alignment algorithms (DPO, SimPO, KTO, GRPO) across model scales reveals that effectiveness rankings reverse depending on model size — a critical finding for practitioners choosing alignment methods.
Generative Active Testing: Efficient LLM Evaluation via Proxy Task Adaptation — An efficient framework for building task-specific benchmarks through active sample selection and proxy task adaptation, dramatically reducing the annotation cost of robust LLM evaluation.

Key Themes

AGI discourse is intensifying — Jensen Huang’s AGI claim and ongoing debates about AI consciousness and intelligence reflect a field grappling with how to define its own milestones.
AI and geopolitics are inseparable — The Pentagon/Anthropic dispute, Gulf infrastructure risks, and Iran-targeted cyberattacks all illustrate how AI infrastructure has become a geopolitical flashpoint.
Supply chain attacks are escalating — The TeamPCP/CanisterWorm/Trivy campaign demonstrates how a single supply chain compromise can cascade across Docker, GitHub, Kubernetes, and cloud services.
AI security has unique attack surfaces — AWS Bedrock attack vectors and the “Autonomy Tax” paper highlight that AI agents introduce security challenges qualitatively different from traditional software.
Japan is building AI independence — PLaMo 3.0 Prime and the Tokyo University–NEC partnership signal Japan’s intent to develop sovereign AI capabilities rather than depend entirely on US or Chinese models.
Alignment methods are scale-dependent — Research showing that post-training algorithm rankings invert across model sizes has direct implications for how labs choose and apply safety training techniques.

For detailed summaries of selected research papers, see papers.md.