The wall that wasn't coverage: A post-training study on small-model specialization with an autoresearch loopAcross 77 experiments — every prompting, retrieval, decoding, pipeline, fine-tuning, preference-learning, reinforcement-learning and adversarial-distillation lever we could find — an on-device 3B model came within about one rubric point of a frontier cloud model on every quality dimension: close, but consistently a notch below, across the board. This is the record of how far it got: we applied the current post-training playbook — from established fine-tuning to 2025's rubric-grounded RL — and none of it closed the last notch. The controlled tests then show the gap is a broad capability limit, not a single missing skill.
Digital garden
alexisrondeau on Obsidian PublishA public thinking space — interconnected notes for ideas, work, and the things worth remembering.A few worth starting with