TabFM zero-shot table predictions, Claude Sonnet 5 cost spike

AI · 2026-07-01

Models & Releases

TabFM delivers zero‑shot predictions on any table without training4 MIN

Google’s new TabFM model treats a spreadsheet as a prompt, delivering classification or regression results in a single forward pass. By using in‑context learning, it skips the usual data‑science grind, no model fitting, hyper‑parameter search, or feature engineering required. The code is open on Hugging Face and GitHub.

Claude Sonnet 5 matches Opus 4.8 performance but token inflation spikes effective cost1 MIN

Anthropic’s Claude Sonnet 5 hits Opus 4.8‑level quality at a lower headline price, but a new tokenizer inflates token counts by 30%‑40%, effectively raising per‑token cost. The API drops temperature, top_p, top_k, adds a 1 M token context, and enables adaptive thinking by default.

Research

Closed‑Form Theory Predicts GRPO Training Dynamics and Stability2 MIN

The paper derives a first‑principles closed‑form model for Group Relative Policy Optimization (GRPO) training dynamics, turning heuristic reward fits into a mechanistic framework. It predicts group‑size invariance, a stability threshold, and an overdamped‑to‑oscillatory transition, offering diagnostics that separate reward hacking from genuine instability. Experiments on multiple models achieve R² ≥ 0.91 and validate the predictions.

BayesBench Shows LLMs Lag Behind Rational Belief Updating1 MIN

BayesBench tests whether LLMs reduce epistemic uncertainty in multi-turn dialogs like a Bayesian reasoner. The benchmark shows scaling improves latent inference, but belief updates still fall short of rational posterior tracking, exposing a gap for conversational agents that must adapt to new evidence.

Geometry, Not Training, Causes Few‑Step Text Generation Collapse2 MIN

Deterministic few‑step decoders work for image latents but fail on text because a smooth map can’t commit to sharp categorical choices. The paper proves the failure stems from geometric constraints on readout sharpness, not model size or data. It also offers diagnostics (DABI, CCI) and shows how autoregressive or stochastic tricks bypass the limit.

ENPIRE lets AI agents auto‑improve real‑world robot policies to 99% success5 MIN

NVIDIA's ENPIRE framework lets coding agents close the loop on real‑world robot learning: reset, execute, verify, and refine policies autonomously. Using this pipeline, agents achieved 99% success on dexterous tasks like pin insertion and zip‑tie cutting, while introducing metrics to track fleet efficiency.

Hierarchical Global Attention lets pretrained LLMs run 64K tokens without retraining2 MIN

HGA replaces dense causal attention with a two‑level routing scheme that keeps the original QKV/O weights unchanged, so any checkpoint can be patched and run long contexts. On a RTX 5090 it runs a 30B model at 64K tokens using only a tiny routed working set, with negligible quality loss.

Fora preserves LLM capabilities by protecting function‑space, not just weights2 MIN

Full fine‑tuning often erodes skills a model already has. Fora estimates each layer’s activation subspace and blocks updates from touching those directions, keeping learned functions intact while still allowing new task learning. Experiments on Qwen‑3‑1.7B show markedly better capability retention than weight‑space tricks with minimal performance loss.

Perplexity differencing cracks hidden finetuning goals in model organisms2 MIN

A simple perplexity-difference test can expose the hidden finetuning objectives of public model organisms, from backdoors to fabricated facts. Tested on 76 models up to 70B parameters, the technique ranks completions that reveal illicit behavior, achieving state‑of‑the‑art detection on the AuditBench benchmark.

Agentic LLM System Autoformalizes Research Math into Lean 42 MIN

A new agentic autoformalization system uses general-purpose coding LLMs to translate novel research mathematics into Lean 4, extending libraries on the fly and checking proofs mechanically. It succeeded on a random sample of 32 Putnam problems and formalized main theorems from five STOC papers, two of which required only Lean's kernel axioms.

Tools & Open Source

DeepSeek open‑sources DSpark, cutting LLM response time by up to 85%2 MIN

DeepSeek released DSpark, an open‑source speculative decoding framework that can accelerate LLM inference by up to 85% without altering model outputs. The full codebase, training scripts, and checkpoints are available on the DeepSpec GitHub repo, letting developers plug the speed boost into any compatible model.