Claude audit catches data leaks LoRA falls short

Data · 2026-06-19

Analytics & Visualization

Claude‑Powered Audit Tool Flags Hidden Data Mistakes Across the Entire Workflow6 MIN

Datapitfalls uses Claude to audit every step of a data workflow, from question framing to final chart, spotting the hidden biases, technical slips and statistical missteps that typical chart linters miss. It returns a structured report with concrete fixes, turning an LLM into a thinking partner for honest data work.

ML & AI for Data

Beyond LoRA: Benchmarks Reveal Faster, Smaller Fine‑tuning Wins13 MIN

Hugging Face’s PEFT team benchmarked five alternatives to LoRA, DoRA, LoKr, LoHa, AdaLoRA and FourierFT, across classification, generation and retrieval tasks. Several methods topped LoRA in accuracy while cutting parameter counts by up to 50%, showing you can fine‑tune larger models with less memory. This expands the toolbox for anyone needing efficient adaptation.

MosaicLeaks reveals AI agents leak private data via web queries9 MIN

ServiceNow introduces MosaicLeaks, a benchmark that measures how multi‑step AI research agents unintentionally expose confidential information through their external queries. Across tested models, agents frequently leaked intent, answers, or full facts, and standard training worsened the problem. A privacy‑aware RL fine‑tuning method cuts full‑information leakage from 34% to under 10%.

When to Use JSON Mode vs. Function Calling for Reliable LLM Outputs10 MIN

LLMs can now return machine‑readable data via two distinct mechanisms: JSON mode, which forces a parsable JSON blob, and function calling, which enforces a predefined schema and can trigger downstream actions. The article maps each approach to real‑world use cases, helping engineers pick the right tool for reliable data pipelines.

ForecastOps gives Python forecasts a local UI and full run tracking2 MIN

ForecastOps adds a local-first observability layer to any Python forecasting code. One line after .predict() captures runs, stores them in DuckDB, and spins up a read‑only UI with metrics, diagnostics, and comparisons, no cloud, no data exfiltration. It lets data teams track reproducibility and model performance in production without rebuilding pipelines.

Practice & Datasets

Granular Polymarket crypto order‑book & trade data now openly available3 MIN

A public, MIT‑licensed dataset delivers over 26 M high‑frequency order‑book snapshots and 23 M trade records from Polymarket's crypto up/down markets. It spans BTC, ETH, SOL, XRP (and more) across 5‑minute and 15‑minute windows, giving researchers a rare, granular view of prediction‑market microstructure.