"Which city in Paris are you staying in?" - A post-training study on small-model specialization with an autoresearch loopAcross 77 experiments — every lever from prompting and retrieval to fine-tuning and 2025's rubric-grounded RL — an on-device 3B model landed within about one rubric point of a frontier cloud model on every quality dimension: close, but consistently a notch below. None of the post-training playbook closed the last notch, and the controlled tests show why — the gap is a broad capability limit, not a single missing skill.
Digital garden
alexisrondeau on Obsidian PublishA public thinking space — interconnected notes for ideas, work, and the things worth remembering.A few worth starting with