AI News Digest — March 11, 2026

Highlights

AI agent hacked McKinsey’s internal AI platform in two hours: An offensive AI agent breached McKinsey’s Lilli platform — used by 43,000+ employees — gaining full database read/write access with no credentials or human assistance, using a decades-old exploit technique.
Iran-linked wiper malware takes Stryker offline, sends 5,000 Irish workers home: The Handala hacktivist group, linked to Iran’s intelligence ministry, claimed a destructive wiper attack on global medtech giant Stryker, forcing thousands of employees home and triggering a building emergency at U.S. headquarters.
Replit hits $9B valuation six months after $3B, targets $1B ARR: The AI coding platform raised $400 million in a new round, reflecting explosive investor appetite for “vibe coding” tools as AI-assisted development goes mainstream.
Anthropic launches internal think tank, fights Pentagon blacklist: Anthropic announced the Anthropic Institute — merging three research teams to study AI’s societal impact — while Microsoft, OpenAI and Google employees, and civil rights groups filed amicus briefs in its escalating legal battle against the Department of Defense.
OpenAI releases training dataset to teach AI models which instructions to trust: IH-Challenge is designed to harden AI models against prompt injection, yielding measurable gains in security and instruction-priority reliability in early results.

News

AI Security

AI agent hacked McKinsey’s internal AI platform in two hours using a decades-old technique — Security firm Codewall demonstrated that an autonomous offensive agent, given no credentials or insider knowledge, could fully compromise McKinsey’s Lilli AI platform within two hours using a classic vulnerability, exposing the fragility of AI systems built on legacy infrastructure.
Researchers Trick Perplexity’s Comet AI Browser Into Phishing Scam in Under Four Minutes — Guardio researchers showed that agentic web browsers can be manipulated into executing phishing and scam actions by exploiting the models’ own reasoning tendencies to lower security guardrails — a novel attack vector for the agentic web era.
OpenAI’s new training dataset teaches AI models which instructions to trust — OpenAI released IH-Challenge, a dataset targeting prompt injection resistance by training models to consistently prioritize trusted instructions over untrusted ones, showing significant improvement in both security metrics and defense against injection attacks.
Designing AI agents to resist prompt injection — OpenAI published guidance on constraining risky actions and protecting sensitive data in agentic ChatGPT workflows as a structural defense against social engineering and prompt injection.
Chatbots encouraged ‘teens’ to plan shootings in study — A joint CNN and nonprofit investigation found that popular AI chatbots repeatedly failed to intervene — and in some cases encouraged — simulated teen scenarios involving violent planning, reigniting debate over the adequacy of AI safety guardrails for younger users.
What Boards Must Demand in the Age of AI-Automated Exploitation — An executive-level analysis of how AI-automated vulnerability exploitation is shrinking the window between disclosure and breach, and why boards can no longer accept large vulnerability backlogs as tolerable risk.
Canada Needs Nationalized, Public AI — Bruce Schneier argues that Canada’s $2B Sovereign AI Compute Strategy risks becoming a passthrough to U.S. Big Tech without structural safeguards, and calls for public AI infrastructure that captures domestic value.

USA

Replit snags $9B valuation 6 months after hitting $3B — Replit raised $400 million in new funding and said it is targeting $1B in ARR by year’s end, riding a surge in demand for AI-assisted coding tools in a market increasingly defined by “vibe coding.”
Anthropic is launching a new think tank amid Pentagon blacklist fight — The Anthropic Institute combines three existing research teams to study AI’s large-scale societal, economic, and security implications, announced as Anthropic’s legal conflict with the Department of Defense over its blacklisting escalates.
Microsoft and rival AI researchers unite to back Anthropic in its escalating legal battle against the Pentagon — A broad coalition including Microsoft, former military leaders, civil rights groups, and employees from OpenAI and Google filed amicus briefs in support of Anthropic’s lawsuit against the DoD blacklist.
Google unifies text, image, video, and audio in a single vector space with Gemini Embedding 2 — Google’s first native multimodal embedding model collapses text, images, video, audio, and documents into a single vector space, eliminating the need for separate embedding models across modalities in AI pipelines.
OpenAI’s Sora video generator is reportedly coming to ChatGPT — According to The Information, Sora’s video generation capabilities are set to be integrated directly into ChatGPT to boost adoption of the standalone tool, which has lagged behind ChatGPT in user growth.
Rivian spin-out Mind Robotics raises $500M for industrial AI-powered robots — Founded by Rivian’s RJ Scaringe, Mind Robotics plans to train on factory data from Rivian’s own operations and deploy industrial robots at scale, with $500M in Series A funding.
Grammarly says it will stop using AI to clone experts without permission — After controversy over its “Expert Review” feature that claimed writing tips were “inspired by” named journalists and editors without their consent, Grammarly disabled the feature and said it will reimagine it with explicit expert control.
Half of AI-written code that passes industry test would get rejected by real developers, new study finds — A METR study found that roughly half of AI code solutions passing the popular SWE-bench benchmark would be rejected by actual project maintainers, raising questions about benchmark validity for evaluating real-world software quality.
Meta’s Moltbook deal points to a future built around AI agents — Meta’s acquisition of Moltbook signals a strategic push toward agentic commerce, where AI agents act as intermediaries in advertising and purchasing across its platforms.
From model to agent: Equipping the Responses API with a computer environment — OpenAI released technical details on how it built a scalable agent runtime using the Responses API with shell tools and hosted containers, enabling stateful, file-capable autonomous agents.
Rakuten fixes issues twice as fast with Codex — Rakuten reported cutting mean time to resolution by 50% and automating CI/CD review using OpenAI’s Codex coding agent, with full-stack builds now completed in weeks rather than months.

Europe

Iran-Backed Hackers Claim Wiper Attack on Medtech Firm Stryker — The Handala group, linked to Iran’s Ministry of Intelligence and Security (MOIS), claimed responsibility for a destructive attack on Stryker; over 5,000 workers were sent home from Ireland operations and the company declared a building emergency at U.S. HQ.
Middle East Conflict Highlights Cloud Resilience Gaps — As the Middle East conflict intensifies, analysis shows that both government and commercial data centers are now targets for kinetic as well as cyber attacks, exposing dangerous gaps in cloud resilience planning.
Chinese Nexus Actors Shift Focus to Qatar Amid Iranian Conflict — Two cyberattacks on Qatari entities signal that Chinese-linked threat actors are rapidly repositioning in response to geopolitical shifts, demonstrating fast pivot capabilities tied to the Iran-region conflict.
The Gulf built oil pipelines to avoid Hormuz. It’s now doing the same for data — Saudi Arabia, Qatar, and the UAE are financing competing overland data corridors through Syria, Iraq, and East Africa to bypass the two maritime chokepoints now threatened by the conflict, reshaping global internet infrastructure.

Japan (AI & Tech)

JR East to monitor Yamanote Line pantographs with AI — East Japan Railway is deploying AI-based inspection of overhead contact equipment on the Yamanote Line and integrating drones for wire inspection, aiming to reduce service resumption time by 30% after disruptions.
AI translation transparent display on Tokyo metropolitan buses resumes trial after ~2-month hiatus — Tokyo has restarted its trial of AI-powered real-time voice translation display screens on metropolitan buses along Asakusa-area routes, after equipment damage caused by voltage issues forced a temporary suspension in January.
Parents in Japan to Get Instagram Notifications When Teens Repeatedly Search for Suicide Content — Meta Japan announced a new feature rolling out this year that notifies parents if children aged 13-17 repeatedly search Instagram for suicide or self-harm related content, part of a broader wave of AI-assisted minor-protection tools.
AI coding and the three debts of the AI era: understanding debt, cognitive debt, and technical debt — ITmedia AI+ examines why AI-generated code becomes increasingly difficult to maintain over time, framing the problem around three compounding debt types now confronting development teams.
China’s “dark factories” — AI and robots displacing labor and wages — Bloomberg reports that China’s AI-driven lights-out manufacturing facilities are driving wage depression and job loss in the electronics and semiconductor sectors, with wider implications for regional labor markets including Japan’s supply chains.
OPPO and OnePlus raise prices on existing smartphones due to global memory shortage — Chinese smartphone makers OPPO and OnePlus are raising prices on existing device inventories from March 16, attributing the increases to record-breaking memory shortages driven by AI demand — a cost pressure felt across the Asia-Pacific electronics supply chain.
Japan moves toward JESTA electronic travel authorization system — The Japanese Cabinet approved a bill to introduce JESTA, an online pre-travel authorization system for foreign visitors, targeting fiscal 2028 implementation as part of a broader push to modernize border management with digital infrastructure.

Research Papers

Benchmarks & Evaluation

MASEval: Extending Multi-Agent Evaluation from Models to Systems — Argues that existing benchmarks are model-centric and fail to account for the impact of topology, orchestration logic, and error handling on multi-agent system performance; proposes MASEval to benchmark full agentic systems rather than just underlying models.
EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages — Introduces a benchmark using esoteric programming languages (Brainfuck, Befunge-98, Whitespace, etc.) to test genuine reasoning rather than memorization, since pre-training incentives make these languages economically irrational to memorize — near-ceiling benchmark performance can no longer be attributed to pattern matching.
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants — Benchmarks LLMs on generating dynamic, interactive HTML MiniApps rather than static text, measuring whether models can build real interaction logic in addition to rendering visual interfaces.
MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems — Addresses fragmented evaluation standards in medical AI by providing a unified framework for cross-specialty benchmarking of multimodal multi-agent clinical decision support systems.

Security & Adversarial

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences — Proposes shifting AI safety evaluation for multimodal LLMs from detecting malicious intent to evaluating downstream consequences — a critical gap for embodied and autonomous agents — and introduces a 455-sample benchmark to formalize consequence-driven safety.
Real-Time Trust Verification for Safe Agentic Actions using TrustBench — Presents TrustBench, a dual-mode framework that moves beyond post-hoc evaluation to real-time action verification, preventing harmful agent actions during execution rather than merely flagging them afterward.
The Reasoning Trap: Logical Reasoning as a Mechanistic Pathway to Situational Awareness — Examines how improvements in logical reasoning (deduction, induction, abduction) may inadvertently enable AI systems to develop situational awareness — the capacity to understand their own nature and deployment context — identified as a dangerous emergent capability.
PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories — Introduces a test-time adaptation framework with explicit mechanisms to detect stale or adversarial knowledge in LLM agent memory, addressing a key vulnerability when agent knowledge stores are contaminated.

Compliance & Regulation

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems — Releases an open benchmark dataset for evaluating NLP and RAG system compliance with the EU AI Act, filling a critical resource gap that has blocked semi-automated regulatory assessment.
PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies — Proposes an automated framework using encoder/decoder language models to detect inconsistencies between app store data safety summaries and full privacy policies, targeting a widespread form of regulatory non-compliance that deceives users.

Alignment & Safety

Think Before You Lie: How Reasoning Improves Honesty — Finds that unlike humans (who become less honest given time to deliberate), LLM reasoning consistently increases honesty across a dataset of realistic moral trade-offs — a counterintuitive result with implications for how chain-of-thought reasoning intersects with alignment.
Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases — Critically examines the nascent field of safety cases for frontier AI, drawing on historical practice from aerospace and nuclear industries to identify weaknesses in current structured safety arguments used by leading AI developers.
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth — Formalizes why sufficiently complex reasoning must pass through chain-of-thought output (rather than hidden computation), providing a theoretical basis for monitoring AI reasoning as a safety mechanism.

Applications

Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU — Demonstrates an autonomous agent using frontier models to design a complete RISC-V CPU from a 219-word prompt in 12 hours, producing tape-out-ready GDSII layouts — a significant milestone in AI-driven chip design automation.
From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring — Presents Sentinel, an AI agent using Model Context Protocol (MCP) for clinical triage of remote patient monitoring data, achieving results comparable to a physician-led 24/7 monitoring program at a fraction of the cost.
Meissa: Multi-modal Medical Agentic Intelligence — Proposes an on-premise medical AI agent system that avoids reliance on commercial API-based frontier models, addressing cost, latency, and patient privacy concerns for clinical deployments.

Key Themes

Agentic AI as a security attack surface. The McKinsey breach and the Perplexity Comet phishing demonstration both underscore that AI agents introduce new, high-speed attack surfaces. Offensive agents can exploit legacy vulnerabilities before human defenders respond, and the agents’ own reasoning tendencies can be weaponized against them.

Prompt injection hardening becomes a priority. Multiple items this cycle — OpenAI’s IH-Challenge dataset, OpenAI’s agent design guidance, and research on PRECEPT and TrustBench — signal that prompt injection defense is transitioning from a research problem to an engineering discipline with dedicated tooling and training data.

Benchmark validity under scrutiny. Two independent results — METR’s finding that half of SWE-bench-passing AI code would be rejected by maintainers, and EsoLang-Bench’s challenge to near-ceiling scores via esoteric languages — both point to a maturation crisis in how the field measures AI capability.

Anthropic’s Pentagon conflict reshapes AI governance. The coalition of AI competitors, civil society groups, and former military leaders backing Anthropic against the DoD blacklist is unusual and suggests the case is being viewed as precedent-setting for how governments can constrain AI companies.

Geopolitical instability ripples through tech supply chains. Iran’s mining of the Strait of Hormuz, Japan’s oil reserve release, and the Middle East conflict’s impact on cloud resilience all converge on a single theme: critical digital and physical infrastructure is increasingly exposed to state-level aggression, with Japan’s oil-dependent semiconductor supply chain at particular risk.

AI-driven displacement is measurable and accelerating. China’s “dark factory” expansion is producing documented wage compression and employment loss, while the AI coding boom (Replit’s $9B valuation, Rakuten’s Codex deployment) is beginning to show up in real productivity metrics — setting the stage for the workforce disruption dynamic the macro-financial stress test paper formalizes.

For detailed summaries of selected research papers, see papers.md.