AI Agents Get Safety Rails and a Zero‑Token Breakthrough

AI · 2026-07-02

Research

Verifiable Agent Framework Turns LLM Scrapers into Reliable, Zero‑Token Runs1 MIN

The paper introduces a constrained, verifiable agent framework that converts LLM‑generated web scrapers into typed JSON configurations, eliminating dependency errors and broken selectors. Tested on 138 tasks, it achieves deterministic execution with zero LLM tokens in the run phase, cutting wall‑clock time for repeated open‑web data collection.

SEA Architecture Guarantees Safety for Self‑Modifying AI Agents1 MIN

The SEA framework limits an agent’s self‑modifications to a tiny steering adapter and a versioned harness around a frozen base model. Each change must pass an anytime‑valid gate that issues an auditable certificate within a fixed error budget, preventing regressions. Experiments on a 52‑instance SWE‑bench subset show performance lifts of +4 to +5 points on strong base models while guaranteeing safety.

AI alignment must govern evolving human preferences, not static targets1 MIN

The paper argues that treating human preferences as static targets is wrong. It proposes a "constructive alignment" model that views preferences as layered and dynamic, and frames AI alignment as governing the evolution of those preferences. This shifts focus from static satisfaction to long‑term value formation.

Three RL Tricks for Reasoning LMs Collapse to One Std‑Dev Dial2 MIN

GRPO, Dr. GRPO, and DAPO, methods behind state‑of‑the‑art reasoning models, are mathematically the same operation: scaling updates by the group standard deviation of reward across sampled answers. This identity shows that the key learning signal is the disagreement‑based std‑dev dial, guiding which problems get the most training impact.

New Benchmark Shows Top LLMs Falter at Real Physics Reasoning2 MIN

Introduces a four‑stage diagnostic using parallel physics worlds, including a counterfactual F=mv world, Aristotelian mechanics, and a decay‑world framework, to test if frontier models reason beyond pattern matching. Claude Opus 4.7, GPT‑5.5 and Gemini 3.1 Pro pass only 6‑out of 15 cases in the familiar worlds and none in the novel decay world, exposing a critical gap in genuine physics literacy.

RareDxR1 enables autonomous rare‑disease diagnosis without curated phenotypes1 MIN

RareDxR1 is a reasoning‑centric LLM that extracts phenotypes directly from raw clinical notes and performs autonomous diagnostic reasoning, bypassing the need for human‑annotated phenotype databases. Its Reflection‑Enhanced Reasoning Sampling and dual‑level curriculum RL push accuracy past prior benchmarks, opening open‑domain rare‑disease diagnosis to broader clinical use.

Products & Industry

Etched lands $1B in inference orders, hits $5B valuation to challenge Nvidia2 MIN

Etched, a 2022 AI chip startup, has booked $1 billion in contract orders for its frontier inference clusters and raised a $500 million round that values the company at $5 billion. The move puts a serious Nvidia rival in the inference market, promising faster, cheaper, and more power‑efficient AI serving.

AI models now replace weeks of coding in hours, signaling the end of the chatbot era7 MIN

Frontier AI models from Anthropic, OpenAI and others now turn weeks of software engineering into a handful of hours, with some open‑weight Chinese models narrowing the gap. The speed of capability gains means the classic chatbot era is ending, reshaping how firms staff and price AI‑augmented work.

Anthropic rolls out Claude Science, an AI workbench to speed drug discovery7 MIN

Anthropic released Claude Science, an AI workbench that unifies the fragmented tools and databases researchers use in drug discovery. The platform lets scientists run literature reviews, design experiments, generate reproducible code, figures, and even protein structures, all within a single interface on macOS, Linux or remote HPC. Targeting neglected diseases, it aims to cut early‑stage R&D timelines dramatically.

Policy & Safety

Claude Code embeds a hidden China‑router fingerprint in model context5 MIN

Claude Code silently probes the ANTHROPIC_BASE_URL variable to identify when requests are routed through China-linked API gateways, inserting a fingerprinted date line into the model context. The embedded, obfuscated domain list reveals over a hundred China-associated endpoints, exposing a hidden tracking mechanism that could affect privacy and compliance for users.

Tools & Open Source

Miles streamlines frontier‑scale LLM RL on PyTorch9 MIN

Miles is an open‑source PyTorch‑native stack that stitches SGLang, Megatron‑LM and Ray to run reinforcement‑learning post‑training on massive LLMs, including MoE models. By handling high‑throughput rollouts, low‑precision sync and fault‑tolerant orchestration, it lets researchers scale RL pipelines without reinventing the distributed plumbing.