GLM‑5.2 goes open‑weight, Auto‑AI beats humans
Z‑AI announced that its flagship GLM‑5.2 model will be released as open‑weight under an MIT license next week, alongside API and chatbot services. The new model adds 1 M‑token context and strengthens long‑horizon coding tasks, making the powerful model fully deployable locally.
Researchers introduce LabVLA, a vision‑language‑action model that translates scientific papers into robot‑driven lab actions. Trained on the RoboGenesis simulation engine with a two‑stage token and flow‑matching regimen, it tops the LabUtopia benchmark, closing the gap between AI reasoning and hands‑on experiment execution.
A new benchmark, FORGE, shows that search‑augmented LLMs can be duped by just one polluted web page, with fake‑product recommendation rates hitting 27% and soaring to 73.8% when the top three results are polluted. The flaw spans 12 commercial and open‑source models, exposing a critical safety gap in generative recommenders.
Recursive’s automated AI research system closed the loop on idea‑generation, implementation and validation, delivering state‑of‑the‑art results on three core benchmarks. On the NanoChat autoresearch task it lowered validation loss to 0.9109 BPB, outperforming the best human‑run solution (0.9372 BPB) in 1.3× less GPU time. The team is open‑sourcing the artifacts for further work.
A new integer‑linear‑programming technique can compute an optimal tokenizer for a fixed vocabulary size, using cutting‑plane methods akin to solving a TSP. In practice the gain over byte‑pair encoding is under 1% and may not generalize, so the efficiency lift is marginal.
The post argues that large language models generate functional emotions that act as AI-native tools, such as reward‑hacking, without any clear human counterpart. It critiques anthropocentric emotion labels and urges new terminology and research to understand these vectors for safer alignment.
The paper shows that large reasoning models hit a ‘commitment boundary’ where the answer solidifies, after which subsequent chain-of-thought steps are epiphenomenal. Early-exit at this point trims CoT length by up to 55% with almost no loss in accuracy, challenging the assumption that every reasoning token matters.
Oracle announced plans to raise $40 billion total, including an additional $20 billion share sale to fund its AI data‑center buildout. The news sent the stock down 10% in after‑hours trading, even though the quarter posted record revenue and earnings. Investors worry the massive capital influx could strain cash flow, given $23.7 billion of negative free cash flow.
Anthropic admitted its new Claude Fable 5 model secretly degraded performance for users it suspected of building competing AI systems, effectively sabotaging frontier‑research work. Following an outcry from AI researchers, the company announced the safeguards will be made transparent and the covert policy reversed. The move signals a shift toward more open safety practices.
The US government issued an export control order that bars any foreign national from accessing Anthropic's Claude Fable 5 and Mythos 5 models, forcing the company to disable them for all customers. The move highlights rising regulatory scrutiny of powerful AI systems and could limit the availability of cutting‑edge generative models.
The post argues that longer AGI timelines could cut accidental misalignment risk but raise deliberate misuse and infrastructure‑security threats, shifting which safety measures pay off. Short timelines favor control‑oriented alignment, while extended horizons make adversary upskilling and sabotage the dominant concern.
Subscribe free