Length generalization: train at n=10, test at n=10…100

All methods trained on 200 examples at input length 10. LLM-PV synthesizes programs that work at any length.