LLMs Gain World Models, Hallucinations Slashed 80%

AI · 2026-06-29

Research

LLMs Gain Internal World Models for Long‑Horizon Planning1 MIN

The paper introduces a three‑stage training pipeline, World Model Mid‑Training, Format‑Eliciting SFT, and Foresight‑Conditioned RL, that teaches a single autoregressive LLM to generate prospective state rollouts and plan‑conditioned success estimates. The resulting agents outperform baselines on search and mathematical reasoning, shifting LLM behavior from reactive to deliberative planning.

Closed-Form V-Trace Fix Eliminates Bias from Delayed RLHF Rewards1 MIN

Production RLHF pipelines often receive reward signals minutes after a rollout, breaking PPO’s synchronous‑reward assumption. The paper introduces Retroactive Advantage Correction (RAC), a two‑line patch that injects delayed rewards via a non‑negative kernel, provably removing bias and cutting policy error up to 48× in a tabular MDP. This enables faster, cheaper learning when human or compute‑heavy feedback is lagged.

Hidden Interaction Effects Break Activation Patching’s Causal Estimates2 MIN

A fresh causal analysis shows activation patching’s natural indirect effect conflates true component influence with interaction effects from other parts of the network. Those hidden interactions can hide or exaggerate a component’s role, calling into question many prior mechanistic interpretability results and suggesting new diagnostic metrics.

HybridCodec Boosts Speech LLMs by Marrying Discrete Tokens with Continuous Residuals1 MIN

A new codec blends compressed discrete audio tokens with low‑dimensional continuous residuals, letting LLMs run autoregressive steps on the discrete side while upsampling continuous details later. This hybrid approach preserves speaker traits far better than pure tokenization and cuts the required autoregressive steps, making speech‑language models cheaper and higher‑fidelity.

Embedding an Immune System Inside AI Agents to Block Runtime Hijacks2 MIN

Autonomous agents with memory, tools, and multi‑agent coordination face new attacks, memory poisoning, tool‑chain manipulation, and protocol hijacking. The paper proposes the Agent‑Native Immune System (ANIS), a six‑layer, biologically‑inspired defense embedded in the agent’s reasoning loop, with a taxonomy of agent viruses and vaccines. ANIS shifts security from static alignment to active runtime enforcement, enabling safer self‑monitoring AI.

Hybrid world model slashes LLM hallucinations by 80% in multi-step planning1 MIN

The paper introduces Grounded Iterative Language Planning (GILP), a hybrid that blends a tiny parameterized transition model with GPT-4o-mini reasoning. By letting the backbone flag implausible drafts, hallucinated-state rates fall from 17.6% to 3.5% and planning success climbs to 84%, with only modest extra API calls.

NormAct Reveals Embodied Agents Fail Hidden Social Norms, Proposes Fix2 MIN

The new NormAct benchmark tests whether multimodal LLM planners respect invisible social rules, like not entering an occupied bathroom. State‑of‑the‑art models hit the explicit goal 67% of the time but obey hidden norms only 26%, exposing a safety gap. Adding a norm‑cue generator (NormPerceptor) lifts overall task success toward 47%.

Agentic Publication Protocol makes papers executable, not just readable1 MIN

The paper proposes the Agentic Publication Protocol (APP), turning a version‑controlled repository into a publication object that bundles code, data, environment specs, and an LLM‑friendly instruction file. By letting AI agents reproduce results and suggest next steps, scientific communication could shift from static PDFs to fully executable research pipelines.

Products & Industry

JD’s Oxygen AIIC Scales LLM/VLM Item Knowledge to Billions of SKUs, Boosting Search Coverage to 80%2 MIN

JD.com’s Oxygen AI Item Center (AIIC) uses LLMs/VLMs to generate high‑quality knowledge for tens of billions of SKUs, handling hundreds of millions of daily updates with 94.2% precision. Deployed across search, recommendation and operations, it lifts search‑traffic coverage to 80.4% and cuts item‑info quality issues by 37%.

Cerebras’ $20B OpenAI Deal Grabs Inference Capacity, Leaves Others on Hold7 MIN

Cerebras has signed a multi‑year, >$20 billion contract to provide OpenAI with 750 MW of inference compute, enough to serve its largest language models. The agreement effectively exhausts Cerebras’ available capacity, meaning the company’s waitlist for other customers is now moot. Smaller players seeking low‑latency token generation, like real‑time coding agents, are locked out.

Policy & Safety

Third‑party audits are the best hope to keep AI system cards honest1 MIN

AI labs are likely to let the quality of their system cards slide, eroding transparency and amplifying safety risks. Independent reviewers can spot the decay and pressure labs to maintain rigor, making external scrutiny the most effective safeguard right now. The essay lays out concrete ways for outsiders to do this.

Why “Machine Unlearning” Misses the Mark for LLMs1 MIN

The authors argue that calling any LLM data removal “machine unlearning” is misleading, because models cannot erase learned knowledge the way the phrase suggests. They limit “unlearning” to true dataset‑defined deletion and call for new terms, alignment, editing, suppression, for other safety or copyright fixes. Clear terminology will prevent bogus benchmarks and regulatory confusion.

Tools & Open Source

Darts now ships unified zero‑shot forecasting with foundation models1 MIN

Darts now ships a unified FoundationModel interface that wraps Chronos‑2, TimesFM 2.5, TiRex and PatchTST‑FM. Users can drop‑in zero‑shot or fine‑tuned forecasts, uncertainty estimates and backtesting with just a name change, bringing state‑of‑the‑art pre‑trained forecasters into standard pipelines.