Netflix kills Kafka, MCP reshapes BI, Sail 0.3 speeds Spark

Data · 2026-06-27

Data Engineering

Pinterest rolls out Moka, a Spark‑on‑EKS platform to trim costs and boost scale11 MIN

Pinterest’s Big Data Platform team built Moka, a Spark‑on‑EKS service that runs batch Spark jobs in containers on AWS. By moving from Hadoop to Kubernetes, Moka promises better performance, lower ops costs, and easier scaling for non‑sensitive data workloads.

Netflix replaces Kafka with Raw Hollow for faster, simpler Tudum updates7 MIN

Netflix swapped Tudum's event‑stream backbone from Kafka to Raw Hollow, a snapshot‑oriented data format that pushes up‑to‑date content directly to client devices. The move cuts latency, simplifies the read path, and scales more predictably for millions of monthly fans.

Sail 0.3 adds Rust‑native Spark engine, slashing latency for Spark 4.04 MIN

Sail 0.3 replaces the Java Spark server with a Rust‑native implementation that speaks the Spark Connect protocol. It supports Spark 4.0 and 3.5, cuts object‑store latency, and ships a lightweight PySpark client, giving data teams faster, cheaper batch and streaming workloads without code changes.

Analytics & Visualization

MCP gives AI direct data access, reshaping BI and visualization7 MIN

Model Context Protocol (MCP) lets LLMs like Claude hook straight into PDFs, databases and BI tools via a single open spec. That eliminates custom connectors, letting AI fetch and visualize data on the fly, which could sideline many visualization and reporting roles.

ML & AI for Data

Frozen Multi‑Token Prediction Slashes Gemini Nano Latency on Pixel Phones5 MIN

Google Research unveiled frozen Multi-Token Prediction (MTP), a retrofit that speeds up Gemini Nano on Pixel phones without fine‑tuning separate draft models. By integrating a lightweight transformer into the frozen model, MTP cuts inference latency and power use, making on‑device AI features like notification summaries and proofreading snappier and more battery‑friendly.

Run Three LLM Agents on a Single 8 GB GPU with C++ Multiplexing11 MIN

A tiny C++ daemon multiplexes transformer layers and enforces admission control so three distinct LLMs, SmolLM, Qwen2, and Llama, share an 8 GB GTX 1080 without out‑of‑memory crashes. The trick lets developers run parallel agents on legacy GPUs, saving costly upgrades.

NUMA‑aware scheduling can double PyTorch throughput on multi‑socket servers11 MIN

Binding processes to the correct NUMA nodes can double PyTorch throughput on multi-socket servers. The article shows how NUMA‑aware CPU‑GPU coordination reduces memory latency and boosts training speed, turning a common bottleneck into a performance lever for large‑scale deep learning workloads.

Why RAG Benchmarks Can Mislead: Hidden Overfitting Inflates Scores10 MIN

RAG benchmarks often become accidental training sets when developers tweak models based on the same test queries. This hidden overfitting inflates scores but masks real retrieval failures, meaning deployed systems may miss critical information. The post shows how to break the cycle and keep evaluation truly unseen.

Databases & Storage

How PostgreSQL Outpaced Kafka to Handle 100k Events/sec19 MIN

RudderStack proved PostgreSQL can sustain 100,000 events per second as a streaming queue, sidestepping Kafka. They tamed table bloat, rewrote indexing, and managed retry storms, turning a simple relational DB into a high‑throughput, resilient backbone. The playbook shows teams can leverage existing SQL skills to cut ops complexity while scaling massive event pipelines.

ClickHouse Cloud adopts fully stateless compute, cutting disk dependency20 MIN

ClickHouse Cloud now runs compute without any local disks, using a new in‑memory engine backed by a Shared Catalog that centralizes metadata. This eliminates warm‑up, enables instant elastic scaling, and adds atomic DDL features like cross‑db renames and UNDROP, boosting reliability and speed for cloud workloads.