✦ Writing
Recent posts
A pedagogical paradox: the strongest teacher agent (Claude Opus 4.6) produces the weakest students, while the weakest (DeepSeek-V3.2) produces the strongest. Environment-Grounded Supervision explains why — and turns it into 30× more data-efficient agent training. 8–10 min read.
SWE-Review-30B-A3B is an agentic reviewer that explores the repo, traces the root cause, and returns structured feedback — lifting resolve rate by up to +8.3pp and reaching 38.4% with just 2.44 samples on average. 7–9 min read.
A close read of the paper behind the pipeline — 32k curated instances, 18k validated trajectories, error masking, a difficulty curriculum, and a test-time verifier that lifts an 8B model to 49.6% on SWE-bench Verified, no RL required. 8–10 min read.