Distributions Part II: What can we do with distributions?

Math

Last time, we introduced Schwartz distributions. Let’s briefly recap: a distribution is a function that maps certain types of functions to real numbers. We write D(U)\mathcal{D}(U) for the space of (compactly supported, smooth) functions on a space URnU \subseteq \mathbb{R}^n, called test functions. Distributions are continuous linear maps T:D(U)RT: \mathcal{D}(U) \to \mathbb{R}, and we write D(U)\mathcal{D}'(U) for the space of these distributions.

An important aspect of distributions is that they generalize functions f:URf: U \to \mathbb{R}. This is not obvious at first---distributions have an entirely different type signature after all, so how can they be a generalization? What we mean by that is that there is a natural way to embed the space of (locally integrable) functions on UU into the space of distributions D(U)\mathcal{D}'(U). Namely, each function f:URf: U \to \mathbb{R} induces a distribution TfT_f defined by Tf,φ:=Ufφdλn.\langle T_f, \varphi \rangle := \int_{U} f\varphi d\lambda^n. Here, Tf,φ\langle T_f, \varphi \rangle is just a commonly used notation for Tf(φ)T_f(\varphi), i.e. the distribution TfT_f applied to the function φ\varphi.

For “classical” functions, we can do things like add them, convolve them, take their derivative, and many other operations. Given that distribution generalize classical functions, it’s natural to ask whether we can also generalize these operations. It turns out that this is possible in a (to me) surprising number of cases, in a surprisingly simple way!

Addition

Let’s start with a warm-up: just like we can add classical functions, we can add two distributions SS and TT. Specifically, we define S+TS + T as the distribution given by S+T,φ:=S,φ+T,φ.\langle S + T, \varphi \rangle := \langle S, \varphi \rangle + \langle T, \varphi\rangle. In general, this will be how we define distributions: we just say how to evaluate them on an arbitrary function φ\varphi. If this were a math textbook, we’d also have to show that the distributions we define this way are indeed continuous and linear in φ\varphi, but since this is a blog post, we’ll skip that part. The goal here is only to get a good understanding of the definitions.

Now we get to an important theme for this post: we’ve just defined addition of distributions, but does this definition really generalize the definition for classical functions? In other words: say we have two functions ff and gg, both from UU to R\mathbb{R}. There are now two things we could do:

  1. Add ff and gg as functions, then turn the result into a distribution, i.e. Tf+gT_{f + g}.
  2. Turn ff and gg into distributions, then add them as distributions, i.e. Tf+TgT_f + T_g.

We really, really want 1. and 2. to be equivalent! This is a good example of how definitions in math can be good or bad: in principle, we’re free to define addition of distributions however we want---but if our definition doesn’t generalize the definition we already use for classical functions, then it will be really hard to work with and probably just not useful.

Luckily, it’s easy to check that our definition is a good one: we have

Tf+g,φ=U(f+g)φdλn=Ufφdλn+Ugφdλn=Tf,φ+Tg,φ=Tf+Tg,φ.\begin{aligned} \langle T_{f + g}, \varphi\rangle &= \int_U (f + g)\varphi d\lambda^n\\\\ &= \int_U f\varphi d\lambda^n + \int_U g\varphi d\lambda^n\\\\ &= \langle T_f, \varphi \rangle + \langle T_g, \varphi\rangle\\\\ &= \langle T_f + T_g, \varphi\rangle. \end{aligned}

This implies that Tf+g=Tf+TgT_{f + g} = T_f + T_g (since distributions are themselves just functions, and if two functions are equal on all inputs, they’re the same).

Derivatives

Addition was pretty easy, but how are we supposed to define derivatives of distributions? This seems really hard at first. Do we need some kind of limit of finite differences, like for classical functions? But what are these finite differences supposed to look like?

Recall that in Part I, we used the powerful tool of Wishful Thinking: just pretend that distributions behave like classical functions, calculate a bit, and then use the result as the definition.

We’re now in a better position to understand what is actually going on there. Our central demand of any new definition is that it should generalize the corresponding definition for classical functions. In the case of derivatives, this means we want iTf=Tif\partial_i T_f = T_{\partial_i f} for all functions ff. (i:=xi\partial_i := \frac{\partial}{\partial x_i} is the ii-th partial derivative). So let’s just consider the special case of distributions that are induced by classical functions for now (this is the “pretend that distributions behave like functions” step). We then have

iTf,φ=Tif,φ=U(if)φdλn=Uf(iφ)dλn=Tf,iφ.\begin{aligned} \langle \partial_i T_f, \varphi\rangle &= \langle T_{\partial_i f}, \varphi\rangle\\\\ &= \int_U (\partial_i f) \varphi d\lambda^n\\\\ &= -\int_U f (\partial_i \varphi) d\lambda^n\\\\ &= -\langle T_f, \partial_i \varphi\rangle. \end{aligned}

We used integration by parts here, and made use of the fact that φ\varphi is compactly supported and URnU \subseteq \mathbb{R}^n is open, so the boundary term vanishes.

And now comes the second part of the Wishful Thinking strategy: let’s just forget that we restricted ourselves to distributions induced by classical functions, and instead define derivatives this way for all distributions. More specifically, we just need to replace TfT_f by TT in the equation above, and get our definition: iT,φ:=T,iφ.\langle \partial_i T, \varphi\rangle := -\langle T, \partial_i \varphi\rangle. Voilà, that’s our definition for derivatives of distributions! By construction, this definition extends the one for classical functions, that was the entire point of our calculation.

One cool thing to note here is that since the test function φ\varphi is infinitely differentiable (by assumption), so are distributions! Since distributions generalize classical functions, this means we can take derivatives of functions that we’d normally call non-differentiable. But the crux is that this derivative will not necessarily itself be a function. For example, the derivative of the following step function is the delta distribution that we saw in Part I: Step function The delta distribution is not induced by any function, as we discussed back then.

Convolutions

We know the recipe now, so let’s just run through two more examples, starting with convolutions. As a quick reminder, the convolution of two functions ff and gg is defined as the function fgf * g given by (fg)(x):=Uf(y)g(xy)dy.(f * g)(x) := \int_U f(y)g(x - y)dy.

In this post, we’ll just look at convolving a distribution with a classical function (rather than convolving two distributions). To do that, let’s introduce some notation: write gˇ\check{g} for the reflection of a function gg, i.e. gˇ(y):=g(y)\check{g}(y) := g(-y). Furthermore, we write τxg\tau_x g for the function gg shifted by xRnx \in \mathbb{R}^n, i.e. (τxg)(y):=g(yx)(\tau_x g)(y) := g(y - x). Then note that we can write the term from the convolution above as g(xy)=gˇ(yx)=τxgˇ(y).g(x - y) = \check{g}(y - x) = \tau_x\check{g}(y).

The point of this is that we can now rewrite the definition of a convolution as (fg)(x):=f,τxgˇ,(f * g)(x) := \langle f, \tau_x \check{g}\rangle, without explicitly writing out an integral. And this definition again has a form that suggests a straightforward generalization to the case where ff is replaced by a distribution, namely (Tg)(x):=T,τxgˇ.(T * g)(x) := \langle T, \tau_x \check{g}\rangle. So the convolution of a distribution with a function is again a classical function (we evaluate it at points xRnx \in \mathbb{R}^n, rather than on test functions).

Fourier transforms

We’ll finish with a pretty cool operation: even Fourier transforms work for distributions. Well, at least for some distributions, we’ll get back to that in a moment. First, let’s recall the definition of the Fourier transform for functions: the Fourier transform of a function ff is again a function, F(f)\mathcal{F}(f), given by F(f)(ξ):=(2π)n/2Uf(x)exp(ixξ)dx.\mathcal{F}(f)(\xi) := (2\pi)^{-n/2}\int_U f(x) \exp(-ix \cdot \xi)dx. (There are other definitions that differ slightly, but we’ll go with this one).

We want to define the Fourier transform F(T)\mathcal{F}(T) of a distribution TT, so we need to define F(T),φ\langle \mathcal{F}(T), \varphi\rangle for arbitrary test functions φ\varphi. Let’s assume again that our distribution is induced by a function, i.e. T=TfT = T_f for some function ff. In this case, we want our new definition to extend the old one, i.e. F(Tf)=TF(f)\mathcal{F}(T_f) = T_{\mathcal{F}(f)}. That gives us

F(Tf),φ=F(f),φ=U((2π)n/2Uf(x)exp(ixξ)dx)φ(ξ)dξ=U((2π)n/2Uϕ(ξ)exp(iξx)dξ)f(x)dx=f,F(φ).\begin{aligned} \langle \mathcal{F}(T_f), \varphi\rangle &= \langle \mathcal{F}(f), \varphi\rangle\\\\ &= \int_U \left((2\pi)^{-n/2}\int_U f(x)\exp(-ix \cdot \xi)dx\right)\varphi(\xi)d\xi \\\\ &= \int_U\left((2\pi)^{-n/2}\int_U \phi(\xi) \exp(-i\xi \cdot x) d\xi\right)f(x)dx\\\\ &= \langle f, \mathcal{F}(\varphi)\rangle. \end{aligned}

The final step should be familiar by now: we use this result, which we derived for functions, as the definition for general distributions: F(T),φ:=T,F(φ).\langle \mathcal{F}(T), \varphi\rangle := \langle T, \mathcal{F}(\varphi)\rangle.