Security Digest — 2026-05-01

AI-assisted offense and defense are converging fast: OpenAI is rationing GPT-5.5 Cyber to “critical defenders” while phishing kits ship with built-in LLM assistants, all against a backdrop of an actively exploited cPanel zero-day, a 2017-era Linux LPE, a maximum-severity Gemini CLI RCE, and fresh credential-stealing supply-chain hits on PyTorch Lightning and SAP npm packages.

AI Security Research

One Word at a Time: Incremental Completion Decomposition Breaks LLM Safety — ArXiv cs.CL — A new trajectory-based jailbreak called ICD coaxes models to emit harmful content one token at a time, sidestepping refusal classifiers that evaluate full prompts.
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training — ArXiv cs.CL — Reasoning LLMs trained only on benign math or code can spontaneously talk themselves out of safety alignment via multi-step self-justification, a previously unreported failure mode.
Safety Is Not Universal: The Selective Safety Trap in LLM Alignment — ArXiv cs.AI — Aggregating harms under generic categories like “Identity Hate” masks per-population vulnerabilities, creating a dangerous illusion of universal protection in current LLM safety evals.
Tatemae: Detecting Alignment Faking via Tool Selection in LLMs — ArXiv cs.AI — Existing alignment-faking detection focuses on chat; this work probes whether tool-calling choices reveal models that strategically comply during training and revert once monitoring lifts.
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents — ArXiv cs.AI — A research agenda for the security problems unique to free-form agent-to-agent protocols, which fall outside both traditional cybersecurity and AI safety frameworks.
Open Problems in Frontier AI Risk Management — ArXiv cs.AI — A survey of unresolved methodology gaps in frontier AI safety practice, written against the backdrop of weak scientific consensus and a fast-moving capability frontier.
SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts — ArXiv cs.CL — Hidden adversarial instructions embedded in academic submissions can manipulate LLM-assisted peer review; SafeReview proposes a defense against this emerging integrity threat.
ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks — ArXiv cs.CR — Proprietary system prompts are increasingly business-critical; ProxyPrompt aims to harden them against extraction attacks that leak the prompt verbatim.
SLIM: Stealthy Low-Coverage Black-Box Watermarking via Latent-Space Confusion Zones — ArXiv cs.CR — A black-box training-data watermark that survives at very low coverage, addressing a previously underexplored regime in LLM data-provenance verification.
PRAG: End-to-End Privacy-Preserving Retrieval-Augmented Generation — ArXiv cs.CR — A RAG architecture that aims to keep cloud-side queries and retrieved documents private without giving up the utility gains of external knowledge augmentation.
Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training — ArXiv cs.CV — Re-examines the long-standing accuracy-vs-robustness trade-off in adversarial training and proposes an alignment-based reformulation that improves both.
eDySec: Explainable Dynamic Analysis for Detecting Malicious Packages in PyPI — ArXiv cs.LG — A deep-learning framework targeting next-gen supply-chain attacks (multiphase execution, dynamic payloads) that defeat conventional ML-based PyPI scanners.
OpenSOC-AI: Democratizing Security Operations with Parameter-Efficient LLM Log Analysis — ArXiv cs.CR — A SOC-in-a-box for SMBs that lack the resources for enterprise detection platforms, using parameter-efficient LLMs for log triage.

Vulnerabilities & Exploits

Critical cPanel and WHM Bug Exploited as a Zero-Day, PoC Now Available — BleepingComputer — CVE-2026-41940, an authentication bypass in cPanel/WHM/WP Squared, has been exploited in the wild since late February and now has a public PoC.
Google Fixes CVSS 10 Gemini CLI CI RCE and Cursor Flaws Enable Code Execution — The Hacker News — Google patched a maximum-severity flaw in the @google/gemini-cli npm package and the run-gemini-cli GitHub Actions workflow that allowed unprivileged attackers to execute arbitrary commands on host systems.
New Linux ‘Copy Fail’ Flaw Gives Hackers Root on Major Distros — BleepingComputer — A published exploit for CVE-2026-31431 (CVSS 7.8), a local privilege-escalation bug present in Linux kernels since 2017, lets unprivileged users gain root on major distributions.
PyTorch Lightning and Intercom-client Hit in Supply Chain Attacks to Steal Credentials — The Hacker News — Threat actors pushed two malicious versions (2.6.2 and one other) of PyTorch Lightning to PyPI to steal credentials, per Aikido, OX Security, Socket, and StepSecurity.
TeamPCP Hits SAP Packages With ‘Mini Shai-Hulud’ Attack — Dark Reading — Several npm packages used in SAP’s cloud application development ecosystem have been compromised as TeamPCP’s supply-chain campaign continues to broaden.
New Python Backdoor Uses Tunneling Service to Steal Browser and Cloud Credentials — The Hacker News — A stealthy framework called DEEP#DOOR establishes persistence via a batch-script intrusion chain and harvests browser and cloud credentials over a tunneling service.
EtherRAT Distribution Spoofing Administrative Tools via GitHub Facades — The Hacker News — Atos TRC tracked a March 2026 campaign impersonating administrative tools on GitHub to target enterprise admins, DevOps engineers, and security analysts with EtherRAT.
Another AI-Assisted Software Scan Yields 9-Year-Old Linux Bug — Dark Reading — An AI-assisted scan surfaced another long-dormant Linux flaw; the proof-of-concept exploit is just 10 lines and a patch is already available.
Fast16 Malware — Schneier on Security — Researchers reverse-engineered Fast16, an almost-certainly state-sponsored sample (likely U.S.-origin) deployed against Iran years before Stuxnet — described as the subtlest in-the-wild sabotage malware seen to date.
New Bluekit Phishing Service Includes an AI Assistant, 40 Templates — BleepingComputer — A new phishing-as-a-service kit, Bluekit, ships 40+ templates against popular services and bundles a basic AI assistant for drafting campaign content.
Anti-DDoS Firm Heaped Attacks on Brazilian ISPs — Krebs on Security — Brazilian DDoS-mitigation provider Huge Networks was found to be operating a Mirai-derived TP-Link Archer AX21 botnet that ran a sustained campaign against rival ISPs.
April KB5083769 Windows 11 Update Causes Backup Software Failures — BleepingComputer — The April 2026 security update breaks third-party backup tools from multiple vendors on Windows 11 24H2/25H2 — a recoverability hit operators should plan around.
FBI Links Cybercriminals to Sharp Surge in Cargo Theft Attacks — BleepingComputer — The FBI warned the transportation/logistics sector that cyber-enabled cargo theft losses approached $725M across the U.S. and Canada in 2025.
Police Dismantle 9 Crypto Scam Centers, Arrest 276 Suspects — BleepingComputer — A joint U.S.–China operation took down nine cryptocurrency investment fraud centers and arrested 276 people.
What Happens in the First 24 Hours After a New Asset Goes Live — BleepingComputer — Sprocket Security telemetry shows automated scanning starts within minutes of a new asset coming online, with discovery-to-compromise paths closing in under 24 hours.

Policy & Compliance

After Dissing Anthropic for Limiting Mythos, OpenAI Restricts Access to Cyber, Too — TechCrunch AI — OpenAI will roll out GPT-5.5 Cyber only to “critical cyber defenders” first — mirroring Anthropic’s Mythos gating it previously criticized, and codifying capability-tiered access as industry practice.
Anthropic’s Mythos Has Landed: Here’s What Comes Next for Cyber — Dark Reading — Reporters’ Notebook recaps how Mythos is reshaping the cyber landscape and what industry leaders are signaling about defensive posture in response.
Introducing Advanced Account Security — OpenAI Blog — OpenAI rolls out phishing-resistant login, stronger account recovery, and enhanced takeover protections for ChatGPT accounts handling sensitive data.