Grok 4's smartest claim unravels, cost spiral threatens AI

AI · 2026-06-28

Models & Releases

Grok 4’s “world’s smartest AI” claim unravels: benchmarks vs reality11 MIN

xAI touts Grok 4 as the world’s smartest AI, citing record ARC‑AGI scores. A deep‑dive finds the model uneven, strong on benchmark‑like tasks but undercooked elsewhere, and the headline‑grabbing numbers are inflated. Investors may get a PR win, but developers should temper expectations.

Research

Google’s frozen Multi-Token Prediction slashes Gemini Nano latency on Pixel phones5 MIN

Google retrofitted its frozen Gemini Nano v3 models with Multi-Token Prediction, letting the phone generate several tokens per forward pass instead of one. The change cuts latency and power use on Pixel 9/10, making on‑device features like notification summaries and proofread feel instant without extra memory‑hungry draft models.

Heavy‑Tailed Weight Spectra Reveal a Deep‑Learning Bias Toward Sparse Representations27 MIN

Power-law distributions seen in neural network weights, activations, and attention can be traced to generalized central limit theorem universality, positioning heavy‑tailed spectra as a natural bridge between sparsity and Gaussianity. The paper argues this mechanism drives networks toward sparse, factored representations, offering a statistical‑physics lens on modern deep‑learning empirics.

Products & Industry

NVIDIA restarts H20 GPU sales in China, unveils RTX PRO for AI1 MIN

CEO Jensen Huang announced that the U.S. government will grant the licenses needed for NVIDIA to ship its H20 data‑center GPUs to China again. At the same briefing he introduced a new RTX PRO GPU built to meet Chinese compliance for industrial AI workloads such as digital twins and logistics.

Policy & Safety

Frontier AI’s Cost Spiral Threatens Viability, Says Dean Ball18 MIN

Dean Ball argues that massive training expenses for frontier models are recouped in just months, after which newer models undercut prices, creating an unsustainable race. He warns the current U.S. licensing regime lacks clear safety standards, leaving future releases in limbo and amplifying economic and security risks.

Tools & Open Source

Pure‑C CPU Inference Engine for Qwen 3 4B Runs Without Dependencies2 MIN

A tiny C program lets you run Qwen 3 4B (and smaller) on a plain CPU, no GPU, no Python, no libraries. It loads FP16 weights, does FP32 accumulation, includes its own tokenizer and generation loop, and compiles in a single command. With ~6 GB RAM it’s usable for short chats on Linux or macOS.