AI Agents Get Safety Rails and a Zero‑Token Breakthrough
The paper introduces a constrained, verifiable agent framework that converts LLM‑generated web scrapers into typed JSON configurations, eliminating dependency errors and broken selectors. Tested on 138 tasks, it achieves deterministic execution with zero LLM tokens in the run phase, cutting wall‑clock time for repeated open‑web data collection.
The SEA framework limits an agent’s self‑modifications to a tiny steering adapter and a versioned harness around a frozen base model. Each change must pass an anytime‑valid gate that issues an auditable certificate within a fixed error budget, preventing regressions. Experiments on a 52‑instance SWE‑bench subset show performance lifts of +4 to +5 points on strong base models while guaranteeing safety.
The paper argues that treating human preferences as static targets is wrong. It proposes a "constructive alignment" model that views preferences as layered and dynamic, and frames AI alignment as governing the evolution of those preferences. This shifts focus from static satisfaction to long‑term value formation.
GRPO, Dr. GRPO, and DAPO, methods behind state‑of‑the‑art reasoning models, are mathematically the same operation: scaling updates by the group standard deviation of reward across sampled answers. This identity shows that the key learning signal is the disagreement‑based std‑dev dial, guiding which problems get the most training impact.
Introduces a four‑stage diagnostic using parallel physics worlds, including a counterfactual F=mv world, Aristotelian mechanics, and a decay‑world framework, to test if frontier models reason beyond pattern matching. Claude Opus 4.7, GPT‑5.5 and Gemini 3.1 Pro pass only 6‑out of 15 cases in the familiar worlds and none in the novel decay world, exposing a critical gap in genuine physics literacy.
RareDxR1 is a reasoning‑centric LLM that extracts phenotypes directly from raw clinical notes and performs autonomous diagnostic reasoning, bypassing the need for human‑annotated phenotype databases. Its Reflection‑Enhanced Reasoning Sampling and dual‑level curriculum RL push accuracy past prior benchmarks, opening open‑domain rare‑disease diagnosis to broader clinical use.
Etched, a 2022 AI chip startup, has booked $1 billion in contract orders for its frontier inference clusters and raised a $500 million round that values the company at $5 billion. The move puts a serious Nvidia rival in the inference market, promising faster, cheaper, and more power‑efficient AI serving.
Frontier AI models from Anthropic, OpenAI and others now turn weeks of software engineering into a handful of hours, with some open‑weight Chinese models narrowing the gap. The speed of capability gains means the classic chatbot era is ending, reshaping how firms staff and price AI‑augmented work.
Anthropic released Claude Science, an AI workbench that unifies the fragmented tools and databases researchers use in drug discovery. The platform lets scientists run literature reviews, design experiments, generate reproducible code, figures, and even protein structures, all within a single interface on macOS, Linux or remote HPC. Targeting neglected diseases, it aims to cut early‑stage R&D timelines dramatically.
Claude Code silently probes the ANTHROPIC_BASE_URL variable to identify when requests are routed through China-linked API gateways, inserting a fingerprinted date line into the model context. The embedded, obfuscated domain list reveals over a hundred China-associated endpoints, exposing a hidden tracking mechanism that could affect privacy and compliance for users.
Miles is an open‑source PyTorch‑native stack that stitches SGLang, Megatron‑LM and Ray to run reinforcement‑learning post‑training on massive LLMs, including MoE models. By handling high‑throughput rollouts, low‑precision sync and fault‑tolerant orchestration, it lets researchers scale RL pipelines without reinventing the distributed plumbing.
Subscribe free