Home Posts

Swoosh: Rethinking Activation Functions

April 1, 2023

Introducing the new Swoosh activation function. Perfect test set generalization guaranteed.

Einsum is easy and useful

November 5, 2022

einsum is one of the most useful functions in Numpy/Pytorch/Tensorflow and yet many people don't use it. It seems to have a reputation as being difficult to understand and use, which is completely backwards in my view: the reason einsum is great is precisely because it is easier to use and reason about than the alternatives. So this post tries to set the record straight and show how simple einsum really is.

Distributions Part II: What can we do with distributions?

March 5, 2022
Math

As promised in part I, we can do a lot of the same things with Schwartz distributions as with classical functions. To see how, we'll cover derivatives, convolutions, and Fourier transforms of distributions.

Visualizing random vs grid search

December 22, 2021
Machine learning

Random search usually works better than grid search for hyperparameter optimization. This brief post suggests a way to visualize the reason for this geometrically.

Extensions of Karger's algorithm

September 10, 2021

If you prefer videos, check out our ICCV presentation, which covers similar content as this blog post. For more details, see our paper....

Distributions Part I: the Delta distribution

July 6, 2021
Math

Did you always want to know kind of object this weird Dirac delta "function" actually is? Well, it's a Schwartz distribution. If that doesn't help much, then keep reading.

Scripting for personal productivity

April 14, 2021
Productivity

If you can program, you can use that to support your habits and automate some routines. This post gives a few examples.

Troubles with the Bias-Variance tradeoff

April 7, 2021
Machine learning

The bias-variance tradeoff is a key idea in machine learning. But I'll argue that we know surprisingly little about it: when does it hold? How does it relate to the Double Descent phenomenon? And what do we even formally mean when we talk about it?

Collection of quick computer tips

March 31, 2021
Productivity

Many of us spend a lot of time working with our computer, so it's worth spending some time to make that experience as pleasent and productive as possible. This is a collection of tips that are relatively quick to implement and still very valuable in the long run in my opinion. Mainly geared towards developers and others who work with the shell a lot.

State formally, reason informally

March 24, 2021
Math

There's a style of teaching mathematics that I really like: stating definitions and theorems as formally as in any textbook, but focusing on informal arguments for why they should be true.

Emacs as an amazing LaTeX editor

March 17, 2021
Productivity

Emacs has some really amazing features for writing LaTeX; this post gives an overview of some of them, either to convince you to give Emacs a try, or to make you aware that these features exist if you're already using Emacs but didn't know about them.

Perspectives on spherical harmonics

March 10, 2021
Math

Spherical harmonics are ubiquitous in math and physics, in part because they naturally appear as solutions to several problems; in particular they are the eigenfunctions of the spherical Laplacian and the irreducible representations of SO(3). But why should the solutions to these problems be the same? And why are they called spherical harmonics?

Deep Implicit layers

March 3, 2021
Deep learning

Several new architectures for neural networks, such as Neural ODEs and deep equlibirum models can be understood as replacing classical layers that explicitly specify how to compute the output with implicit layers. These layers describe which conditions the output should specify but leave the actual computation up to some solver that can be chosen arbitrarily. This post contains a brief introduction to the main ideas behind implicit layers.

Building Blocks of RL Part III: Model-based RL

February 24, 2021
Reinforcement learning

Reinforcement Learning consists of a few key building blocks that can be combined to create many of the well-known algorithms. Framing RL in terms of these building blocks can give a good overview and better understanding of these algorithms. This is the conclusion of a series with such an overview, covering model-based RL.

L1 regularization: sparsity through singularities

February 17, 2021
Machine learning

L1 regularization is famous for leading to sparse optima, in contrast to L2 regularization. There are several ways of understanding this but I'll argue that it's really all about one fact: the L1 norm has a singularity at the origin, while the L2 norm does not. And this is not just true for L1 and L2 regularization: singularities are always necessary to get sparse weights.

Boring numbers, complexity and Chaitin's incompleteness theorem

February 10, 2021

There is a "complexity barrier": a number such that we can't prove the Kolmogorov complexity of any specific string to be larger than that. The proof of this astonishing fact is closely related to some famous paradoxa and we'll use this connection to get a better intuition for why the complexity barrier exists.

Building Blocks of RL Part II: Policy Optimization

February 3, 2021
Reinforcement learning

Reinforcement Learning consists of a few key building blocks that can be combined to create many of the well-known algorithms. Framing RL in terms of these building blocks can give a good overview and better understanding of these algorithms. This is part 2 of a series with such an overview, covering some policy optimization methods.

Too much structure

January 27, 2021
StructureMath

Proving things for object that have a lot of structure can be harder than for object with less structure, simply because the tree of possible proofs is much wider. This is probably why trying to prove a more general case is sometimes a helpful strategy.

Asymmetry between position and momentum in physics

January 19, 2021
Physics

In both classical mechanics and QM, there are transformations between position-based and momentum-based representations that preserve the dynamical laws. So from a mathematical perspective, position and momentum seem to play equivalent roles in physics. But they don't play equivalent roles in our cognition, which is part of the physical universe -- seemingly a paradox.

Building Blocks of RL Part I: Value-based methods

January 13, 2021
Reinforcement learning

Reinforcement Learning consists of a few key building blocks that can be combined to create many of the well-known algorithms. Framing RL in terms of these building blocks can give a good overview and better understanding of these algorithms. This is part 1 of a series with such an overview, covering value-based methods (mainly in a tabular setting).

VAEs from a generative perspective

January 6, 2021
Deep learning

Variational autoencoders are usually introduced as a probabilistic extension of autoencoders with regularization. An alternative view is that the encoder arises naturally as a tool for efficiently training the decoder. This is the perspective I take in this post, deriving VAEs without assuming an autoencoder architecture a priori.

Ways to think about structure in mathematics

December 29, 2020
StructureMath

"Structure" is a concept that keeps popping up when thinking about mathematics but it's hard to pin down what it is exactly. I discuss several different perspectives for thinking about it.

Trading off speed against the probability of success in the Karger-Stein Algorithm

December 6, 2020
Graphs

The Karger-Stein algorithm is an improvement over Karger's beautiful contraction algorithm for minimum graph cuts. In this post, I show how it finds the perfect tradeoff between finding a mincut with high probability and finding it quickly. In the course of doing so, we will also understand where the somewhat opaque factor of sqrt(2) comes from.

Discounting in a relativistic universe

June 20, 2020
Physics

For people who want to discount the future, special relativity creates some challenges. There are different ways to handle those but none seem completely satisfactory which may be yet another argument against discounting pure utilities.