More data doesn't fix SGD on algorithmic tasks
BLOOM-75M trained with up to 100k examples. LLM-PV needs only 200.