Sequent launches alignment nonprofit; head fix for overconfidence

AI · 2026-06-10

Research

Sequent launches nonprofit to automate alignment research for higher confidence27 MIN

Sequent, a new nonprofit backed by UK AISI’s Alignment Team and Timaeus, aims to scale theory‑driven, automated alignment work with 40‑80 staff in two years. By betting on a portfolio of provable safety approaches, it seeks a higher‑confidence bridge before artificial superintelligence likely arrives within years.

Targeted Head Intervention Cuts LLM Overconfidence Without Bleeding Accuracy1 MIN

Researchers introduce Probe‑Conditioned Head Intervention (PCHI), an inference‑time tweak that detects overconfident wrong answers and rescales specific attention‑head outputs. On Qwen‑3‑4B‑Instruct, PCHI flips 82% of erroneous high‑confidence predictions and halves calibration error while hurting only 5% of correct confidences.

TD‑Grokking lets LLMs crack zero‑reward RL tasks1 MIN

TD‑Grokking introduces training‑time decomposition, breaking impossible zero‑reward problems into verifiable subproblems that yield non‑zero rewards. This transforms sparse‑signal RLVR into a usable training signal, boosting LLM performance on challenging math and medical benchmarks beyond vanilla GRPO and other baselines.

Orthogonal rotation primes LLM reasoning, boosting accuracy on 30 of 32 benchmarks2 MIN

Rotate2Think shows that hidden states during reasoning form a distinct, tightly clustered direction separate from input embeddings. By estimating a rotation via orthogonal Procrustes from a few solved examples and injecting the rotated vector at inference, it lifts accuracy in 30 of 32 model‑benchmark combos across math, science, code, and even multimodal tasks.

ComBench Shows LLMs Still Lag on Olympiad‑Level Combinatorics Proofs1 MIN

ComBench introduces 100 human‑annotated Olympiad‑style combinatorics problems, split into analysis‑centric proof grading and construction‑centric verification. Top frontier models hit only 65.4% overall accuracy, highlighting a persistent gap in rigorous reasoning and creative construction. The benchmark gives a clear diagnostic for future model improvements.

DualSelect Keeps LLM Safety Intact While Fine‑Tuning on New Tasks1 MIN

Fine‑tuning aligned LLMs often sacrifices safety. DualSelect jointly picks task data and safety references, refreshing references to steer updates and filter compatible examples. Experiments on 1‑8B models show a 5‑point safety gain over the best baselines with minimal performance loss.

Products & Industry

GitLab rewrites Git for AI‑agent scale, betting on an ‘agentic’ future12 MIN

In its “GitLab Act 2” announcement, the company says the core Git protocol is being rebuilt to handle machine‑scale workloads, targeting AI agents that will generate and merge code at rates far beyond human teams. This architectural shift aims to keep GitLab usable as AI‑driven development becomes the dominant workflow.

Apple Leverages NVIDIA Confidential Computing for Private AI Inference on Google Cloud1 MIN

NVIDIA’s Confidential Computing‑enabled GPUs now power Apple’s Private Cloud Compute inference, extending from Apple data centers to Google Cloud. The partnership lets Apple and Google run foundation‑model workloads in trusted execution environments, preserving user data privacy while delivering high‑performance AI services.

Policy & Safety

How Limited Willingness to Sacrifice Usefulness Shapes AI Safety Decisions20 MIN

The post formalizes a safety‑usefulness tradeoff model, showing how developers with constrained willingness to cut usefulness decide on safety measures. It distinguishes rushed‑reasonable versus low‑political‑will contexts and argues that safety tech improvements or bigger safety budgets shift the Pareto frontier, guiding risk‑reduction strategies.

Tools & Open Source

Simon Willison’s LLM CLI 0.32a3, AI‑generated release proves AI‑assisted dev works1 MIN

The 0.32a3 update of the open‑source LLM CLI tool was written almost entirely by Claude Fable 5 through Claude Code, showing how AI can author production‑grade code. The blog post details the release and its significance for AI‑assisted development in widely‑used tooling.