Machine learning

Visualizing random vs grid search

Random search usually works better than grid search for hyperparameter optimization. This brief post suggests a way to visualize the reason for this geometrically.

Dec 22, 2021 2 min read

Troubles with the Bias-Variance tradeoff

The bias-variance tradeoff is a key idea in machine learning. But I’ll argue that we know surprisingly little about it: when does it hold? How does it relate to the Double Descent phenomenon? And what do we even formally mean when we talk about it?

Apr 7, 2021 13 min read

Troubles with the Bias-Variance tradeoff

L1 regularization: sparsity through singularities

L1 regularization is famous for leading to sparse optima, in contrast to L2 regularization. There are several ways of understanding this but I’ll argue that it’s really all about one fact: the L1 norm has a singularity at the origin, while the L2 norm does not. And this is not just true for L1 and L2 regularization: singularities are always necessary to get sparse weights.

Feb 17, 2021 6 min read

L1 regularization: sparsity through singularities