Role-Playing Rewrites Truths, Scheming Detectors Fail
The paper probes large language models when they role‑play historical characters and discovers that simple prompting or fine‑tuning only swaps surface responses, while methods like Emergent Misalignment cause a broad shift in the model’s internal truth representations. This reveals that some training regimes can truly rewrite a model’s “worldview,” a crucial safety signal as AI gains autonomy.
Researchers measured in‑context scheming and found two common detectors give opposite errors: a covert‑action detector fails to notice open, safe responses, missing real schemers; a false‑positive detector flags benign behavior. This shows current evals are unreliable, risking both over‑ and under‑estimation of model risk.
A custom LLM fine‑tuned on expert‑annotated financial documents outperforms leading models on document relevance filtering, achieving higher accuracy and recall at a fraction of the cost. This shows that high‑quality, domain‑specific labeling can unlock expert judgment in AI without massive model size. Investors could soon automate tedious triage work, freeing time for deeper analysis.
Wiola introduces a completely original SLM design that bypasses the GPT, LLaMA, Mistral and Falcon lineages. It adds five new components, Spiral Rotary Positional Encoding, Gated Cross‑Layer Attention, Adaptive Token Merging, Dual‑Stream Feed‑Forward, and WiolaRMSNorm, to slash compute while keeping quality, and ships four model sizes up to 1.5 B parameters for HuggingFace.
The paper introduces C3RL, an RL algorithm that jointly optimizes correctness and confidence calibration for LLMs. With better‑calibrated confidence, the CAS inference strategy reallocates compute on the fly, slashing test‑time costs up to 12× while preserving or boosting QA accuracy.
CreativityNeuro adjusts model weights without data, raising Divergent Association Task scores by up to 14 percentile points and improving originality in human‑rated Alternative Uses tests. The approach also lowers mode‑collapse metrics, showing a simple path to more creative, less homogenized LLM output.
A joint Ramp‑Revelio Labs analysis of 21,000 U.S. firms shows heavy AI spenders increase headcount by about 10% within two years, with entry‑level hires rising 1.15 percentage points. Low‑intensity adopters see no significant change, suggesting AI can boost hiring rather than trigger layoffs.
A paper shows Reinforcement Learning with Verifiable Rewards lets small LLMs execute Atlassian SaaS workflows correctly, raising endpoint success rates from under 1 % to near‑perfect. The proof uses synthetic Jira/Confluence environments and demonstrates the approach works for Qwen‑3 models, though hand‑crafted rewards limit scalability.
Janus is an open-source playground that lets researchers test how users can steer permission decisions for autonomous AI agents. The system shows that user input can dramatically improve privacy and security, while AI‑augmented assistants reduce cognitive load, but real‑world user fatigue means no single design fits all contexts.
Researchers demonstrate that even with today’s restrictive LLM APIs, exposing only single-token logits, one can still infer key architectural traits like hidden dimension, depth, and parameter count. Their NightVision attack recovers these specs within 23% error on 32 open-source models, exposing a privacy gap for commercial providers.
The paper derives minimax risk bounds for KV cache compression, shows when aggressive compression silently degrades output, and introduces risk metrics plus a practical algorithm that meets these guarantees and improves LongBench results. This matters because KV cache compression is widely used to speed long‑sequence inference, and without proper risk assessment models can silently fail.
A study of Gemma 4‑12b‑it’s chain‑of‑thought outputs shows models often reach a correct answer before a later reasoning step flips them wrong, a phenomenon the authors call “fragile correctness.” Roughly 15% of MMLU‑pro and GPQA‑diamond questions exhibit this, and simple linear probes can recover about 1% overall accuracy, with larger gains on the affected cases.
Meta is assembling a cloud service to rent out its surplus AI compute and hosted models, turning idle data‑center capacity into a new revenue stream. The move pits the social‑media giant against AWS, Azure and Google Cloud while giving developers a fresh source of cheap AI power. Shares jumped more than 10% on the news.
Most chatbots converge on the same answers, even the “random” number 7 shows up repeatedly. Springboards’ new Flint model injects variety, returning different numbers and novel suggestions where ChatGPT and Claude repeat. By diversifying outputs, Flint aims to make brainstorming and creative tasks less homogenous.
Simon Willison used Stanford's DSPy to audit the SQL-generating prompts in Datasette Agent. By adjusting the schema listing to include column names, he cut error‑retry loops and improved query success rates, showing practical prompt‑engineering gains.
Subscribe free