Alibaba's HappyHorse 1.1, GLM-5.2 beats GPT-5.5 at 1/6 cost

AI · 2026-06-23

Models & Releases

Alibaba's HappyHorse 1.1 boosts video quality, adds multi‑image consistency and native audio2 MIN

HappyHorse 1.1, released June 22, upgrades Alibaba’s AI video model with smoother motion, up to nine reference images for stable character identity, higher‑fidelity skin textures, and improved native audio‑video generation. The changes cut re‑rolls for ads and short drama, making the system viable for enterprise‑scale production pipelines.

PP‑OCRv6 delivers 50‑language OCR with models from 1.5M to 34.5M parameters4 MIN

PaddlePaddle’s PP‑OCRv6 family adds tiny, small and medium models that cover 50 languages and boost detection H‑mean to 86.2% and recognition accuracy to 83.2%. The scalable sizing lets developers pick a 1.5M‑parameter edge model or a 34.5M‑parameter server‑grade model without sacrificing multilingual support.

GLM‑5.2 beats GPT‑5.5 on code tasks at a sixth of the cost6 MIN

Z.ai released GLM-5.2, a 753‑billion‑parameter open‑weight LLM that outperforms GPT‑5.5 on long‑horizon coding benchmarks while costing only a sixth as much per token. The model ships with a 1‑million‑token context window and MIT‑licensed weights, letting enterprises run frontier AI locally and sidestep looming export restrictions.

Research

Moebius delivers 10B‑level inpainting quality with just 0.2B parameters1 MIN

Moebius is a 0.22B‑parameter diffusion‑based inpainting model that matches or exceeds the quality of 10B‑parameter industrial models like FLUX.1‑Fill‑Dev, while running >15× faster and using <2% of the parameters. This could make high‑fidelity inpainting feasible on consumer hardware.

AI Beats Expert Persuaders, Raising Stakes for Political Influence2 MIN

In four preregistered experiments with nearly 7,000 participants, frontier AI out‑persuaded top human canvassers, championship debaters, and paid experts, even when they received coaching tools. The AI’s edge came from delivering far more information quickly, and it tripled real‑donation yields for a UK fundraising firm, signalling a new power shift in persuasion.

Memory Recurrent Units give Transformers linear‑time memory without attention2 MIN

The new paper introduces Memory Recurrent Units (MRUs), a family of RNNs that blend persistent memory from nonlinear dynamics with the parallel‑scan efficiency of state‑space models. By leveraging multistability, the bistable MRU runs in linear time while retaining long‑range information, offering a viable, low‑cost alternative to attention‑heavy Transformers.

Grouped Query Experts Cut Transformer Attention Costs in Half1 MIN

The paper introduces Grouped Query Experts (GQE), a Mixture‑of‑Experts layer built on grouped‑query attention that routes each token through only a few query‑head experts while keeping key‑value heads dense. On a 250M‑parameter model with a fixed 30B‑token budget, GQE halves query‑head computation yet matches baseline accuracy, enabling cheaper long‑context inference.

Role Confusion Turns Prompt Injection Into a High‑Success Jailbreak2 MIN

The paper defines prompt injection as "role confusion," where LLMs mistake the source of text by its style rather than its explicit role tags. Benchmarks show attacks succeed on GPT‑4 and Claude up to 60%, but minor phrasing changes cut success to 10%. This exposes a fundamental flaw in current LLM safety mechanisms.

Mapping the Road from Human-Level AGI to Superintelligence2 MIN

A new Google DeepMind report maps four plausible pathways, scaling, paradigm shifts, recursive self‑improvement, and multi‑agent collectives, that could take human‑level AGI to artificial superintelligence. It also identifies technical frictions that could slow or accelerate the transition, raising concrete research questions for policymakers and scientists.

Transformer 'Massive Activations' Survive Dedicated Answer Channel, Proving They’re Functional2 MIN

A new Ledger Residuals architecture separates the residual stream into a writable scratch space and a read‑only accumulator. Experiments on 160‑M and 290‑M language models show the characteristic start‑token activation re‑emerges in the protected channel, meaning the phenomenon is not an artifact but a robust functional feature. This impacts quantization and pruning strategies.

Temporal Attention Steering Lets LLMs Dismiss Outdated Facts1 MIN

The paper defines Parametric Temporal Conflict, where models retain both old and new facts but default to the stale one. Their test‑time method, Temporal Attention Steering, detects the conflict layer and patches activations, flipping up to 85 % of outdated answers without fine‑tuning. It works on open‑weight models up to 7 B parameters.

Autonomous Loop Trains 30B Nemotron Without Human Intervention2 MIN

The authors built an end‑to‑end system that runs the entire post‑training loop, data selection, recipe tweaks, launch, evaluation, without any human in the loop, scaling it to a 30 B‑parameter Nemotron. The autonomous model hits a 0.86 score on the NVIDIA Nemotron‑Reasoning Challenge, just 0.01 shy of the top human entry, proving large‑scale self‑improvement is feasible.

Products & Industry

SpaceX rents Colossus GPUs to open‑source AI startup Reflection in $6.3 B deal1 MIN

Reflection AI will pay SpaceX $150 million a month from July 2026 to 2029 for access to Nvidia GB300 chips in the Colossus 2 data center. The contract totals up to $6.3 billion, making it the largest compute commitment announced for an open‑source AI lab and signaling SpaceX’s shift to renting its AI hardware.

Tools & Open Source

IBM launches CUGA harness: build agentic apps with just a tool list15 MIN

IBM's open‑source CUGA (Configurable Generalist Agent) provides a plug‑and‑play harness that handles planning, tool calls, state tracking and reflection, letting you define only the tools and prompt. The blog ships two dozen single‑file FastAPI demos, from movie recommender to cloud architect advisor, showing how to go from prototype to governed production without rewrites.