Sequent launches alignment nonprofit; head fix for overconfidence
Sequent, a new nonprofit backed by UK AISI’s Alignment Team and Timaeus, aims to scale theory‑driven, automated alignment work with 40‑80 staff in two years. By betting on a portfolio of provable safety approaches, it seeks a higher‑confidence bridge before artificial superintelligence likely arrives within years.
Researchers introduce Probe‑Conditioned Head Intervention (PCHI), an inference‑time tweak that detects overconfident wrong answers and rescales specific attention‑head outputs. On Qwen‑3‑4B‑Instruct, PCHI flips 82% of erroneous high‑confidence predictions and halves calibration error while hurting only 5% of correct confidences.
TD‑Grokking introduces training‑time decomposition, breaking impossible zero‑reward problems into verifiable subproblems that yield non‑zero rewards. This transforms sparse‑signal RLVR into a usable training signal, boosting LLM performance on challenging math and medical benchmarks beyond vanilla GRPO and other baselines.
Rotate2Think shows that hidden states during reasoning form a distinct, tightly clustered direction separate from input embeddings. By estimating a rotation via orthogonal Procrustes from a few solved examples and injecting the rotated vector at inference, it lifts accuracy in 30 of 32 model‑benchmark combos across math, science, code, and even multimodal tasks.
ComBench introduces 100 human‑annotated Olympiad‑style combinatorics problems, split into analysis‑centric proof grading and construction‑centric verification. Top frontier models hit only 65.4% overall accuracy, highlighting a persistent gap in rigorous reasoning and creative construction. The benchmark gives a clear diagnostic for future model improvements.
Fine‑tuning aligned LLMs often sacrifices safety. DualSelect jointly picks task data and safety references, refreshing references to steer updates and filter compatible examples. Experiments on 1‑8B models show a 5‑point safety gain over the best baselines with minimal performance loss.
In its “GitLab Act 2” announcement, the company says the core Git protocol is being rebuilt to handle machine‑scale workloads, targeting AI agents that will generate and merge code at rates far beyond human teams. This architectural shift aims to keep GitLab usable as AI‑driven development becomes the dominant workflow.
NVIDIA’s Confidential Computing‑enabled GPUs now power Apple’s Private Cloud Compute inference, extending from Apple data centers to Google Cloud. The partnership lets Apple and Google run foundation‑model workloads in trusted execution environments, preserving user data privacy while delivering high‑performance AI services.
The post formalizes a safety‑usefulness tradeoff model, showing how developers with constrained willingness to cut usefulness decide on safety measures. It distinguishes rushed‑reasonable versus low‑political‑will contexts and argues that safety tech improvements or bigger safety budgets shift the Pareto frontier, guiding risk‑reduction strategies.
The 0.32a3 update of the open‑source LLM CLI tool was written almost entirely by Claude Fable 5 through Claude Code, showing how AI can author production‑grade code. The blog post details the release and its significance for AI‑assisted development in widely‑used tooling.
Subscribe free