LODESubscribe →

AI — 2026-06-09

AI · 2026-06-09

Models & Releases
Gemma 4 adds dedicated “thinking” token to its chat template6 MIN

Google’s Gemma 4 model now includes special control tokens that separate internal reasoning (“thinking”) from final answers, letting developers preserve the model’s thought process in prompts. The new chat template defines <|think|>‑style delimiters for agentic workflows and tool use.

Research
Anthropic's Claude Matches Conventional Tools in Predicting NMR Spectra8 MIN

Anthropic’s research shows its Claude model can accurately interpret and predict NMR spectra, performing on par with—or better than—established chemistry software. The study highlights Claude’s potential to streamline analytical workflows for chemists, bridging AI reasoning with complex molecular data.

Tools & Open Source
Packed Twin Inference doubles LLM throughput on MI50 without extra model11 MIN

PTI runs multiple token streams in parallel via llama.cpp's batch decoding, sharing weight loads to avoid extra model copies. On an NVIDIA MI50, it achieves a 1.96× speedup for Qwen3.6-27B (38.1 vs 19.4 tok/s) with only ~0.2 GiB extra VRAM.

Llama.cpp patch cuts KV cache copies, boosting Gemma-4 MTP speed up to 43%1 MIN

A new kv‑cache patch for llama.cpp eliminates costly KV cell copies, restoring performance for Gemma‑4 models. Benchmarks on an RTX 5090 show structured decode rising from 104 tok/s to 149 tok/s (+43%) and free‑text speed up ~20% at 64 k context.

Get AI in your inbox, every issue.
Subscribe free
Privacy · Terms · About · Contact
© 2026 LODE