Task:
1
Training data
Split m=200 labeled examples into S_tr (100) and S_val (100)
2
Build prompt from S_tr
Format I/O examples into a code-generation prompt for the LLM
3
LLM proposes k=5 candidate programs
Sample independent programs — no gradient updates, no adaptive feedback
4
Execute and score on S_val
Compile each candidate in a sandbox, evaluate validation error
5
Select best program by validation error
Return h* = argmin validation error — standard ERM selection