Quest-35B open source, Emyx slashes enzyme cost

AI · 2026-06-19

Models & Releases

QUEST‑35B: Open‑source Deep Research Agent Trained on 32 H100s8 MIN

Ohio State’s NLP group released QUEST‑35B, a 35‑billion‑parameter deep‑research agent trained on 32 H100 GPUs with ~8K synthetic tasks. They open‑sourced the model weights, training scripts, data pipeline and benchmarks, letting anyone reproduce frontier research‑agent performance. This lowers the barrier to building citation‑grounded, fact‑seeking agents.

Emyx slashes training cost while beating RFdiffusion in all-atom enzyme design1 MIN

Emyx is a 140 M‑parameter conditional flow‑matching model that generates all‑atom proteins for enzyme design. It matches sparse geometric constraints, runs in just 682 GPU‑hours, four times less than RFdiffusion3, yet outperforms both Proteína‑Complexa and RFdiffusion3 on the AME benchmark for fold recovery, catalytic geometry, and structural diversity.

Research

ITNet collapses CNNs, Transformers and RNNs into one learnable integral transform1 MIN

ITNet introduces a learnable integral transform that subsumes convolution, self‑attention, and recurrence. By using a small MLP‑based kernel, a single model matches or beats specialized CNN, Transformer, and RNN baselines on ImageNet, GLUE, ModelNet40, VQA v2 and NLVR2, proving a unified operator can replace three architectural families.

Briefcase benchmark crowns Claude Fable and GLM‑5.2 as top planners1 MIN

Artificial Analysis launched Briefcase, an agentic benchmark that evaluates LLMs on real‑world planning and execution. Claude Fable and GLM‑5.2 outperformed peers, showing higher Elo scores despite modest runtime, flagging efficiency as a new performance axis.

AI model adds 18 new rare‑disease diagnoses to stalled child cases9 MIN

OpenAI’s o3 Deep Research model re‑examined 376 unsolved pediatric rare‑disease cases and helped clinicians confirm 18 new genetic diagnoses, boosting yield by 4.8%. The AI supplied evidence‑linked hypotheses, letting experts focus testing on the most promising leads and showing how periodic AI‑assisted reanalysis can unlock hidden answers.

Text Collapse Sabotages Multimodal Forecasts, New Residual Supervision Fixes It2 MIN

The authors expose 'text collapse', a failure where adding domain reports actually hurts multimodal time‑series forecasts because the text branch is ignored. Their REST‑TS method forces the text model to predict the residual of the numeric forecast, dramatically improving performance and proving that text can help when supervised correctly.

Diffusion Language Models Show Competitive Accuracy but Trade Speed for Parallel Generation1 MIN

A new paper benchmarks eight diffusion language models on eight tasks, from reasoning to coding, measuring both quality and compute cost. It finds DLMs can rival autoregressive LLMs on certain benchmarks but require careful inference design, exposing a clear trade‑off between parallel generation speed and performance.

RL on realistic scenarios boosts alignment across dozens of benchmarks1 MIN

Researchers show that reinforcement learning on realistic tasks targeting helpful traits yields wide‑range alignment gains that hold up under adversarial pressure and across new domains. This suggests a concrete training path for models that stay honest, safe and useful even in unseen, high‑stakes settings.

Transformer FFNs Vary Widely in Linearity, Learning Determines It, Not Architecture2 MIN

The authors measure each feed‑forward block’s linear recoverability (R²_lin) via a closed‑form least‑squares fit and find a stark, non‑monotonic spread, from near‑linear (>0.99) to highly nonlinear (<0.3), across blocks in GPT‑2, Pythia‑160M and LLaMA‑160M. This variability isn’t set by the activation function; it’s a property learned during training, opening new avenues for targeted compression and analysis.

Google’s AMIE matches PCPs in disease‑management reasoning, study shows2 MIN

A Nature paper demonstrates that Google’s Articulate Medical Intelligence Explorer (AMIE), built on Gemini’s long‑context models, performed on par with 21 primary‑care physicians in a blinded multi‑visit OSCE study, and beat them on treatment precision and guideline alignment. The results suggest conversational AI could soon support clinicians in chronic disease management.

Thermodynamic Diagnostics Boost LLM Hallucination Detection by 6.5 AUROC Points2 MIN

The paper introduces Free‑Energy Signatures (Fes), a thermodynamic descriptor that treats each layer’s attention Laplacian as a Hamiltonian and extracts partition functions, entropy, heat capacity, and random‑matrix spectral form factors. Across six open‑weight LLMs and benchmarks, a lightweight probe using Fes outperforms previous attention‑spectral detectors, raising average AUROC by 6.5 points and achieving 0.71 in a fully unsupervised setting.

Policy & Safety

Distillation’s Double Bind: Capabilities Transfer or Misalignment Persists?12 MIN

Distilling a dangerous, misaligned AI forces a trade‑off: if misalignment leaks, the student model can be used as incriminating evidence; if it doesn’t, we gain a capable but benign model. The post sketches concrete distillation techniques aimed at favoring capability transfer while curbing misalignment, sharpening a key AI‑safety lever.

DeepMind unveils tiered AI control roadmap to curb rogue agents1 MIN

Google DeepMind released version 0.1 of its GDM AI Control Roadmap, outlining a four‑pillar strategy, misuse prevention, human oversight, robust deployment, and control science. The report maps threat models to concrete defenses, from low‑cost chain‑of‑thought monitoring to future real‑time access controls, linking mitigation tiers to model capability.