Revisiting ERM in the LLM Era

March 20, 2026

How pretrained LLMs can serve as search priors for empirical risk minimization over programs, recovering exact algorithmic rules from a handful of examples.


Why SGD Prefers Low-Rank Neural Networks

March 11, 2026

Why do trained neural networks often end up low rank? Mini-batch SGD and weight decay together create a built-in pressure toward compressible layers.


Why Pretrained Classifiers Work So Well in Few-Shot Learning

March 11, 2026

A geometric explanation for why ordinary supervised pretraining can transfer remarkably well to new classes with only a few labeled examples.


Self-Supervised Learning ≈ Supervised Learning

March 11, 2026

Contrastive learning is often much closer to supervised contrastive learning than it first appears, both at the level of the objective and at the level of the learned representation geometry.