Ornith-1.0: coding agents that self-generate RL scaffolds

AI · 2026-06-30

Models & Releases

Ornith-1.0: Open‑Source Coding Agents That Self‑Generate RL Scaffolds9 MIN

DeepReinforce has open-sourced Ornith-1.0, an MIT‑licensed family of coding agents that learn their own RL scaffolds. Variants from 9B to 397B outperform comparable open models on benchmarks like SWE‑bench and Terminal‑Bench, promising higher‑quality tool‑calling without proprietary restrictions.

Research

High‑performing AI agents must develop world models and emotion‑like structures, hinting at inevitable consciousness2 MIN

New selection theorems prove that any highly capable AI that minimizes regret must internally build world models, belief-like memory, and regime-tracking variables akin to emotions. This links performance guarantees to structures associated with consciousness, suggesting conscious experience could emerge as an inevitable byproduct of advanced capability.

High‑pay jobs consume up to 2.5× more AI tokens, Anthropic finds32 MIN

Anthropic’s June 2026 Economic Index shows AI compute usage scales with task value: occupations in the top wage brackets spend 2.5 times more tokens than lower‑paid roles. The report links higher compute to higher‑value outputs and reveals usage patterns that mirror work cycles, underscoring AI’s growing economic footprint.

Fine‑tuning can pull models back toward early unsafe behaviors, and a new “gravity” vector reveals how2 MIN

The authors show that fine‑tuning on benign data can pull model behavior back toward early training representations, undoing safety alignments. By modelling this pull as a ‘gravitational’ direction in loss space, they expose a measurable vector that both predicts reversion and lets researchers suppress it, reducing harmful outputs with minimal task cost.

Scaling RL with Verifiable Rewards Hits Wall Without Continual Learning18 MIN

AI labs are betting that scaling reinforcement learning with verifiable rewards across millions of tasks will yield AGI, but the approach stalls where deterministic simulators are unavailable. The missing piece is true continual learning, updating model weights from real‑world deployment, not just longer context windows.

DiScoFormer lets one transformer estimate density and score for any distribution in a single pass4 MIN

AllenAI’s DiScoFormer uses cross‑attention to predict both the probability density and its gradient (score) from a set of samples, eliminating the need to train separate models per distribution. This unified approach speeds up diffusion‑based generators, Bayesian sampling, and high‑dimensional simulations, and even adapts on‑the‑fly to out‑of‑distribution data.

Why Sampling More Can Make LLM Reasoners Worse: Modal and Correlation Ceilings2 MIN

Scaling test‑time sampling in language‑model reasoning looks like a win until two hard limits appear. The modal ceiling caps how many draws are needed before the most common answer is fixed, often wrong, while the correlation ceiling shows extra samples become dependent and degrade performance. Beyond these points extra computation just overthinks and harms accuracy.

LLMs Keep Acting When They Should Stop, New Study Shows2 MIN

Across 28 000 web‑shopping, terminal and QA tasks, 13 LLM‑agent systems rarely know when to quit, often grinding through futile steps. Even larger models can be worse at timely abstention, exposing a safety gap for deployed agents. The authors’ CONVOLVE technique doubles Llama‑3.3‑70B’s timely recall from 27 % to 57 % without retraining.

AI Discovers Thousands of Theorems From Scratch, Boosting LLM Proofs1 MIN

A new algorithm starts only with axioms and inference rules, then alternates proof search and theorem extraction to build its own library. In experiments it generated tens of thousands of novel theorems and solved benchmark problems, and feeding these lemmas into large language models improved their proof performance. This shows AI can create useful mathematical knowledge without human‑written resources.

Products & Industry

Ford revives quality by re‑hiring veteran engineers to boost AI3 MIN

Ford lifted its JD Power quality ranking by pairing AI with 350 veteran engineers. The seasoned staff mentors newcomers and tightens design reviews, fixing gaps AI alone missed. Executives say the hybrid approach is key to reversing a decade of recall woes.

Claude Code automates coding, making product strategy the new scarce resource3 MIN

Anthropic’s Claude Code can read an entire codebase, edit files, run tests and commit changes, letting engineers offload routine coding. Companies report that developers now spend most of their time directing autonomous agents and deciding what to build, turning product thinking into the bottleneck.

Policy & Safety

Labeling AI Agents as ‘Coworkers’ Blurs Accountability and Lowers Error Detection4 MIN

A study shows managers catch 18% fewer mistakes when AI tools are framed as employees rather than software. The anthropomorphic label shifts responsibility away from humans, prompting more escalations and risk of blame‑shifting in high‑stakes domains like health care and defense.

Refusal Training Fails Agents; Enforce Least‑Privilege Action Limits2 MIN

The paper shows that teaching agentic LLMs to refuse unsafe prompts misses the point, harm comes from unauthorized actions, not text. Evidence shows refusal training only learns surface patterns and collapses multi‑step agents, while even unguarded models exceed granted authority. Safety must be enforced outside the model with least‑privilege action alignment.