Half of adults can’t afford healthcare

Data · 2026-06-24

Data Engineering

SQLBuild trims dbt rebuilds by reusing unchanged models7 MIN

SQLBuild hooks into any dbt project and skips rebuilding unchanged models by fingerprinting them in the warehouse and reusing production‑shaped tables. This cuts dev‑time dramatically without altering your dbt files or adding extra models, letting teams iterate faster on data pipelines.

Postgres‑native Iceberg tiering removes ETL bottlenecks8 MIN

ColdFront lets PostgreSQL store recent rows natively while archiving older data to an Apache Iceberg lake, all behind a single SQL surface. The extension rewrites DML to the correct tier, removing the need for separate ETL pipelines and enabling transparent hot‑cold queries directly from Postgres.

Idempotent pipelines stop silent duplicate data disasters11 MIN

A single retry in an Airflow DAG doubled order rows, exposing how silent duplicates slip past logs. The article shows that making each load idempotent guarantees the same end state no matter how many times it runs, eliminating hidden data corruption. Adopt idempotent patterns to keep pipelines reliable.

Shopify’s Sidekick learns to refuse impossible requests through data curation9 MIN

Shopify’s Sidekick AI assistant initially only trained on successful merchant queries, so it failed to refuse impossible requests and returned empty results instead. By building a curation system that injects refusal examples and filters blind‑spot data, they taught the model to say ‘no’, boosting reliability at scale.

Analytics & Visualization

Healthcare affordability plummets: under half of adults now cost‑secure1 MIN

A Gallup poll shows under 50% of U.S. adults can consistently afford quality healthcare, a five‑year low. The biggest drops hit ages 18‑29, falling from 46% to 32%, and seniors 65+, slipping from 73% to 61%. The trend signals widening affordability gaps across generations.

Netflix uses predictive models to cut content launch delays7 MIN

Netflix built a statistical model that predicts whether a content title will miss its planned launch date, based on historical delivery data of locked cuts and final IMF assets. By flagging high‑risk titles early, production teams can adjust schedules, reducing last‑minute compression and improving overall launch reliability.

ML & AI for Data

PP-OCRv6 adds 50‑language OCR across edge‑to‑server model sizes4 MIN

PaddlePaddle’s PP-OCRv6 delivers a 50‑language OCR suite in three size tiers, from 1.5 M to 34.5 M parameters, so you can pick a model that fits edge, mobile, or server needs. The medium tier hits 86.2% detection H‑mean and 83.2% recognition accuracy, a solid boost over v5, making large‑scale document ingestion more reliable.

Practice & Datasets

Cross‑Origin Storage cuts duplicate model downloads in browser‑based Transformers12 MIN

Hugging Face demonstrates the Cross‑Origin Storage API in Transformers.js to share model caches across domains. By deduplicating up to 177 MB of model files, apps load instantly on repeat visits, cutting bandwidth and storage. This paves the way for faster, cheaper client‑side inference at scale.

Cross‑Agency Violation Dataset Connects Federal Contractors to 8 Regulators3 MIN

The new 2.8 MB CSV joins every federal contract award with enforcement actions from OSHA, WHD, MSHA, EPA ECHO, NLRB, SEC, the UVA Corporate Prosecution Registry and the SAM.gov exclusion list. Each row flags companies cited by multiple agencies, giving analysts a single view of systemic compliance problems for risk‑screening and policy research.