Zalando hits 1M RPS, Context Graph beats vector RAG

Data · 2026-06-26

Data Engineering

Zalando’s in‑process load balancer shatters the 1 M RPS barrier30 MIN

Zalando swapped the shared Skipper edge router for an in‑process client‑side balancer on its Product Read API, handling over a million internal requests per second. The new stack adds only microseconds of latency, cuts infrastructure costs and isolates failures, making the service more resilient and keeping checkout flows fast.

Netflix replaces custom batch scheduler with Kueue for Kubernetes‑native scaling7 MIN

Netflix swapped its homegrown Compute Managed Batch system for Kueue, a Kubernetes‑native job queue. By preserving API compatibility, they migrated millions of batch jobs with minimal disruption, gaining tighter integration with their Titus platform and easier scaling. The move signals a broader shift toward cloud‑native scheduling in large‑scale media pipelines.

Meta's Hybrid LLM‑Human Pipeline Classifies Data Assets for Scalable Privacy Controls23 MIN

Meta announced a hybrid system that pairs LLM‑driven interpretation with human‑reviewed, versioned rules to classify any data asset, from table columns to embeddings, across its AI‑native stack. The pipeline delivers precise, low‑latency privacy enforcement (retention, access, purpose, sharing) while keeping humans in the loop for novel or ambiguous cases, turning noisy signals into auditable compliance evidence.

Analytics & Visualization

Climate.us launches nonprofit hub to keep climate data public after Climate.gov shutdown17 MIN

Former NOAA staff revived the trusted Climate.gov content as Climate.us, a nonprofit site that aggregates climate data, assessments, and visual tools. Launched in June 2026, it safeguards resources like the deleted Fifth National Climate Assessment, giving the public a stable, non‑partisan source for climate literacy and decision‑making.

ML & AI for Data

Context Graph Boosts Multi-Agent Memory Accuracy 89% vs 50% Vector RAG11 MIN

A benchmark of three memory architectures shows flat transcripts hit 61% accuracy, vector-only RAG drops to 50%, while a context‑graph layer reaches 88.9% with only 27 tokens per query. The graph stores entities and relations, solving the structural blind spot that makes multi‑agent recall brittle.

Hybrid LLMs Beat Transformers on Meaningful Tokens, Lose on Exact Repeats6 MIN

AllenAI’s Olmo Hybrid outperforms its pure‑transformer sibling on tokens that carry semantic weight, nouns, verbs, adjectives, and coreferential pronouns, while the transformer wins on verbatim copy‑tasks. The gap stems from the hybrid’s recurrent layers, which excel at sequential reasoning but sacrifice exact recall. This shows hybrid architectures trade memory for deeper understanding.

AI‑crafted stories expose what language patches of brain actually detect5 MIN

Microsoft Research’s generative causal testing (GCT) turns opaque language‑model predictions into short, testable hypotheses about cortical patches, like “food preparation” or “clock times”. By feeding LLM‑written stories to fMRI subjects, researchers prove the targeted region lights up only when the explanation is correct, closing the gap between prediction and neuroscientific understanding.

Zepto’s Dual‑Sequence Ranker Merges History and Session Data for Millisecond‑Scale Personalisation10 MIN

Zepto introduced a dual‑sequence re‑ranker that encodes a shopper’s long‑term purchase history and their in‑session actions with separate transformer encoders. A cross‑attention fusion layer merges the two streams, delivering personalized rankings in under 50 ms for millions of daily users. This architecture lets the app instantly shift recommendations between routine repurchases and context‑driven cross‑sells, boosting conversion rates.

Monte Carlo’s Agent Health flags stale LLM in its own troubleshooting bot5 MIN

Monte Carlo’s internal Agent Health monitor caught its Troubleshooting Agent calling an outdated Anthropic Haiku LLM in 595 daily calls. The silent drift could have wasted credits and degraded diagnostics, showing that continuous trace‑analysis is essential to keep production agents up‑to‑date and reliable.

Gemma models recall facts via a three‑phase circuit that leans on the residual stream8 MIN

Activation‑patching on Gemma‑2B and Gemma‑12B‑IT shows that factual recall proceeds in three stages, encoding, routing, and read‑out, mostly via the residual stream. The study maps these phases across transformer layers, revealing a consistent circuit that scales with model size. Code and data are released for reproducibility.

Databases & Storage

Linear elastic caching trims cloud cache costs with rental‑style memory pricing6 MIN

Google Research’s linear elastic caching reframes in‑memory cache size as a variable, rental‑style cost and uses a ski‑rental algorithm to set per‑item TTLs. The approach cuts total cache spend while keeping latency low, and the paper shows measurable savings on cloud workloads.