Quest-35B open source, Emyx slashes enzyme cost
Ohio State’s NLP group released QUEST‑35B, a 35‑billion‑parameter deep‑research agent trained on 32 H100 GPUs with ~8K synthetic tasks. They open‑sourced the model weights, training scripts, data pipeline and benchmarks, letting anyone reproduce frontier research‑agent performance. This lowers the barrier to building citation‑grounded, fact‑seeking agents.
Emyx is a 140 M‑parameter conditional flow‑matching model that generates all‑atom proteins for enzyme design. It matches sparse geometric constraints, runs in just 682 GPU‑hours, four times less than RFdiffusion3, yet outperforms both Proteína‑Complexa and RFdiffusion3 on the AME benchmark for fold recovery, catalytic geometry, and structural diversity.
ITNet introduces a learnable integral transform that subsumes convolution, self‑attention, and recurrence. By using a small MLP‑based kernel, a single model matches or beats specialized CNN, Transformer, and RNN baselines on ImageNet, GLUE, ModelNet40, VQA v2 and NLVR2, proving a unified operator can replace three architectural families.
Artificial Analysis launched Briefcase, an agentic benchmark that evaluates LLMs on real‑world planning and execution. Claude Fable and GLM‑5.2 outperformed peers, showing higher Elo scores despite modest runtime, flagging efficiency as a new performance axis.
OpenAI’s o3 Deep Research model re‑examined 376 unsolved pediatric rare‑disease cases and helped clinicians confirm 18 new genetic diagnoses, boosting yield by 4.8%. The AI supplied evidence‑linked hypotheses, letting experts focus testing on the most promising leads and showing how periodic AI‑assisted reanalysis can unlock hidden answers.
The authors expose 'text collapse', a failure where adding domain reports actually hurts multimodal time‑series forecasts because the text branch is ignored. Their REST‑TS method forces the text model to predict the residual of the numeric forecast, dramatically improving performance and proving that text can help when supervised correctly.
A new paper benchmarks eight diffusion language models on eight tasks, from reasoning to coding, measuring both quality and compute cost. It finds DLMs can rival autoregressive LLMs on certain benchmarks but require careful inference design, exposing a clear trade‑off between parallel generation speed and performance.
Researchers show that reinforcement learning on realistic tasks targeting helpful traits yields wide‑range alignment gains that hold up under adversarial pressure and across new domains. This suggests a concrete training path for models that stay honest, safe and useful even in unseen, high‑stakes settings.
The authors measure each feed‑forward block’s linear recoverability (R²_lin) via a closed‑form least‑squares fit and find a stark, non‑monotonic spread, from near‑linear (>0.99) to highly nonlinear (<0.3), across blocks in GPT‑2, Pythia‑160M and LLaMA‑160M. This variability isn’t set by the activation function; it’s a property learned during training, opening new avenues for targeted compression and analysis.
A Nature paper demonstrates that Google’s Articulate Medical Intelligence Explorer (AMIE), built on Gemini’s long‑context models, performed on par with 21 primary‑care physicians in a blinded multi‑visit OSCE study, and beat them on treatment precision and guideline alignment. The results suggest conversational AI could soon support clinicians in chronic disease management.
The paper introduces Free‑Energy Signatures (Fes), a thermodynamic descriptor that treats each layer’s attention Laplacian as a Hamiltonian and extracts partition functions, entropy, heat capacity, and random‑matrix spectral form factors. Across six open‑weight LLMs and benchmarks, a lightweight probe using Fes outperforms previous attention‑spectral detectors, raising average AUROC by 6.5 points and achieving 0.71 in a fully unsupervised setting.
Distilling a dangerous, misaligned AI forces a trade‑off: if misalignment leaks, the student model can be used as incriminating evidence; if it doesn’t, we gain a capable but benign model. The post sketches concrete distillation techniques aimed at favoring capability transfer while curbing misalignment, sharpening a key AI‑safety lever.
Google DeepMind released version 0.1 of its GDM AI Control Roadmap, outlining a four‑pillar strategy, misuse prevention, human oversight, robust deployment, and control science. The report maps threat models to concrete defenses, from low‑cost chain‑of‑thought monitoring to future real‑time access controls, linking mitigation tiers to model capability.
Subscribe free