LLMs Gain World Models, Hallucinations Slashed 80%
The paper introduces a three‑stage training pipeline, World Model Mid‑Training, Format‑Eliciting SFT, and Foresight‑Conditioned RL, that teaches a single autoregressive LLM to generate prospective state rollouts and plan‑conditioned success estimates. The resulting agents outperform baselines on search and mathematical reasoning, shifting LLM behavior from reactive to deliberative planning.
Production RLHF pipelines often receive reward signals minutes after a rollout, breaking PPO’s synchronous‑reward assumption. The paper introduces Retroactive Advantage Correction (RAC), a two‑line patch that injects delayed rewards via a non‑negative kernel, provably removing bias and cutting policy error up to 48× in a tabular MDP. This enables faster, cheaper learning when human or compute‑heavy feedback is lagged.
A fresh causal analysis shows activation patching’s natural indirect effect conflates true component influence with interaction effects from other parts of the network. Those hidden interactions can hide or exaggerate a component’s role, calling into question many prior mechanistic interpretability results and suggesting new diagnostic metrics.
A new codec blends compressed discrete audio tokens with low‑dimensional continuous residuals, letting LLMs run autoregressive steps on the discrete side while upsampling continuous details later. This hybrid approach preserves speaker traits far better than pure tokenization and cuts the required autoregressive steps, making speech‑language models cheaper and higher‑fidelity.
Autonomous agents with memory, tools, and multi‑agent coordination face new attacks, memory poisoning, tool‑chain manipulation, and protocol hijacking. The paper proposes the Agent‑Native Immune System (ANIS), a six‑layer, biologically‑inspired defense embedded in the agent’s reasoning loop, with a taxonomy of agent viruses and vaccines. ANIS shifts security from static alignment to active runtime enforcement, enabling safer self‑monitoring AI.
The paper introduces Grounded Iterative Language Planning (GILP), a hybrid that blends a tiny parameterized transition model with GPT-4o-mini reasoning. By letting the backbone flag implausible drafts, hallucinated-state rates fall from 17.6% to 3.5% and planning success climbs to 84%, with only modest extra API calls.
The new NormAct benchmark tests whether multimodal LLM planners respect invisible social rules, like not entering an occupied bathroom. State‑of‑the‑art models hit the explicit goal 67% of the time but obey hidden norms only 26%, exposing a safety gap. Adding a norm‑cue generator (NormPerceptor) lifts overall task success toward 47%.
The paper proposes the Agentic Publication Protocol (APP), turning a version‑controlled repository into a publication object that bundles code, data, environment specs, and an LLM‑friendly instruction file. By letting AI agents reproduce results and suggest next steps, scientific communication could shift from static PDFs to fully executable research pipelines.
JD.com’s Oxygen AI Item Center (AIIC) uses LLMs/VLMs to generate high‑quality knowledge for tens of billions of SKUs, handling hundreds of millions of daily updates with 94.2% precision. Deployed across search, recommendation and operations, it lifts search‑traffic coverage to 80.4% and cuts item‑info quality issues by 37%.
Cerebras has signed a multi‑year, >$20 billion contract to provide OpenAI with 750 MW of inference compute, enough to serve its largest language models. The agreement effectively exhausts Cerebras’ available capacity, meaning the company’s waitlist for other customers is now moot. Smaller players seeking low‑latency token generation, like real‑time coding agents, are locked out.
AI labs are likely to let the quality of their system cards slide, eroding transparency and amplifying safety risks. Independent reviewers can spot the decay and pressure labs to maintain rigor, making external scrutiny the most effective safeguard right now. The essay lays out concrete ways for outsiders to do this.
The authors argue that calling any LLM data removal “machine unlearning” is misleading, because models cannot erase learned knowledge the way the phrase suggests. They limit “unlearning” to true dataset‑defined deletion and call for new terms, alignment, editing, suppression, for other safety or copyright fixes. Clear terminology will prevent bogus benchmarks and regulatory confusion.
Darts now ships a unified FoundationModel interface that wraps Chronos‑2, TimesFM 2.5, TiRex and PatchTST‑FM. Users can drop‑in zero‑shot or fine‑tuned forecasts, uncertainty estimates and backtesting with just a name change, bringing state‑of‑the‑art pre‑trained forecasters into standard pipelines.
Subscribe free