Swoosh: Rethinking Activation Functions
Introducing the new Swoosh activation function. Perfect test set generalization guaranteed.
I'm a Research Scientist at Google DeepMind working on AGI Safety & Alignment. Previously, I did part of a PhD at the Center for Human-Compatible AI at UC Berkeley.
See Google Scholar for papers I've written, and the Alignment Forum for some blog posts about earlier-stage work.
Introducing the new Swoosh activation function. Perfect test set generalization guaranteed.
einsum
is one of the most useful functions in Numpy/Pytorch/Tensorflow and yet many people don't use it. It seems to have a reputation as being difficult to understand and use, which is completely backwards in my view: the reason einsum
is great is precisely because it is easier to use and reason about than the alternatives. So this post tries to set the record straight and show how simple einsum
really is.
As promised in part I, we can do a lot of the same things with Schwartz distributions as with classical functions. To see how, we'll cover derivatives, convolutions, and Fourier transforms of distributions.