GLM-5.2 1M context, Claude hides reasoning, China H100 chips
Z.ai released GLM-5.2, a 753B open-weight model with a 1M-token context and new IndexShare architecture that cuts FLOPs by 2.9×. On coding benchmarks it outperforms the previous GLM-5.1 and closes the gap to closed‑source leaders, hitting 81.0 on Terminal‑Bench 2.1, just four points shy of Claude Opus.
Vanilla conditional diffusion models collapse when asked to generate samples from compositional distributions that lie outside their training support. The authors prove this failure theoretically and confirm it with synthetic and realistic experiments, showing that inference‑time corrections cannot overcome the fundamental score‑estimation error. This reveals a core obstacle to compositional generalization in generative AI.
The authors prompt a black‑box LLM to label notable snippets of chat transcripts, embed the resulting descriptors, and cluster them into semantic groups. This unsupervised pipeline surfaces unexpected Gemini behaviors without inspecting model internals, offering a cheap qualitative tool for deployment‑ or RL‑training‑distribution analysis.
Anthropic encrypts the full chain‑of‑thought behind Claude Code and only returns a 600‑character signature plus a brief summary via the API. Users can’t retrieve the actual reasoning unless they have an enterprise agreement, meaning audit trails or debugging must rely on incomplete data.
ASML’s new high‑NA EUV lithography tool weighs 150 tons, costs $400 million, and can print features just eight nanometers wide, about 40 silicon atoms. By delivering that resolution, the machine keeps Moore’s Law alive and fuels the AI‑driven demand for ever denser chips, but only the richest fabs can afford it.
Seven Chinese firms are already shipping AI chips that match Nvidia's H100/H200 class, with most having IPO'd in the past six months. The new map groups them into three leading "dragons" and four "snakes," showing a home‑grown ecosystem that could shrink Nvidia's share in China to under 10 %.
Anthropic halted its new Mythos and Fable models after a U.S. export‑control order flagged them as national‑security risks. The move sparked a broader backlash, with European leaders citing it as a wake‑up call and firms eyeing cheap Chinese open‑source alternatives. The dispute foreshadows tougher AI regulation and a shift in global AI sourcing.
LemonHarness is an execution framework that forces LLM agents to operate inside a defined workspace, converting file writes, installs, and temporary artifacts into structured tool calls. By exposing state changes and a time‑aware budget, it lifts long‑horizon task accuracy from 84.5% to 86.5% on Terminal‑Bench 2.0, proving tighter runtime control improves stability.
Subscribe free