Gemma 4 31B Matches Claude Sonnet 4.6
BenchLM’s head‑to‑head benchmark pits Google’s Gemma 4 31B (FP8) against Anthropic’s Claude Sonnet 4.6. While Sonnet scores higher overall (83 vs 64) and leads on coding, Gemma narrows the gap on multimodal and knowledge tasks and offers free token pricing plus a 256K context window.
The paper shows that attackers who selectively decide when to launch or abort attacks can dramatically reduce the measured safety of AI control frameworks, lowering safety by up to 28 pp even with unchanged attack capabilities. This reveals a blind spot in current evaluations and urges inclusion of attack selection for realistic safety estimates.
The paper presents SafeGene, a reusable safety‑adapter module that can be applied across tasks and model families to counteract safety degradation when open‑weight LLMs are fine‑tuned. By decoupling safety alignment from task updates via layer‑wise coefficient recalibration, SafeGene reduces harmful responses while maintaining downstream performance.
Lean4Agent is the first framework that uses the dependent‑type language Lean4 to formally model and verify multi‑step LLM agent workflows and execution trajectories. Experiments on SWE‑Bench‑Verified and ELAIP‑Bench show verification‑passing workflows improve performance by ~12%, and the LeanEvolve component adds another ~7% gain.
AEGIS uses a lightweight probe on a weak robot policy’s activations to detect high‑risk steps and switches to a stronger policy only when needed. This selective escalation recovers about 10% of trajectories the weak policy loses, while the strong policy runs on just 38% of steps.
Fans of K‑pop are condemning the rise of AI‑generated deepfake videos that depict idols in non‑consensual, sexualized scenarios. The backlash on forums and Reddit highlights concerns about consent, privacy, and the potential for harmful exploitation of minors, prompting calls for tighter safeguards and platform enforcement.
Subscribe free