Distributions Part I: the Delta distribution

Math

Schwartz distributions are a generalization of functions from Rn\mathbb{R}^n to R\mathbb{R}: strictly speaking, they aren’t such functions themselves, but you can do a lot of the same stuff with them that you can do with normal functions, such as taking derivatives, computing convolutions, and even Fourier transforms (at least in certain cases). And in some ways, they even make life easier compared to functions. For example, every distribution is infinitely differentiable! But of course, we do have to give up some things: distributions can’t be evaluated at a single point and it’s in general impossible to multiply two distributions.

In this series, we’ll try to understand all of these properties of distributions and more. I will focus on intuition but still give formal definitions of all the concepts we look at. As a secondary purpose, studying distributions will also be an excellent opportunity to practice finding good definitions. We will introduce many different operations on distributions and in each case, we will try to understand how one could come up with the definition in a natural way.

Motivation

In electrostatics, charge densities are used to model the amount of electric charge in different places. Such a charge density is a function ρ:R3R\rho: \mathbb{R}^3 \to \mathbb{R} that assigns an amount of charge per volume to every point xR3x \in \mathbb{R}^3. From an experimental standpoint, these densities are only useful abstractions; what we can measure is at best the total charge in some volume. This charge QQ is given by the integral of the density over the volume:

Q(V)=Vρ(x)dxQ(V) = \int_V \rho(x) dx

for any subset VR3V \subseteq \mathbb{R}^3. You can even think of this as the definition of the density ρ\rho: the only thing we care about is that when we measure the charge Q(V)Q(V) in any volume VV, we get Vρ(x)dx\int_V \rho(x) dx.

Now assume we observe the following: Q(V)=1Q(V) = 1 for any volume VV that contains the origin but Q(V)=0Q(V) = 0 if VV does not contain the origin. Intuitively, we conclude that there is a point charge with value 1 in the origin and no charge anywhere else. But how can we model this using a density ρ\rho? If ρ\rho is any (integrable) function, as we originally assumed, then we must have ρ(x)=0\rho(x) = 0 for x0x \neq 0.1 But in that case, Vρ(x)dx=0\int_V \rho(x) dx = 0 for all volumes VV, which contradicts our first observation.

For now, let’s just “define this problem away”: we’ll say that ρ(x)=δ(x)\rho(x) = \delta(x), where δ(x)\delta(x) is an object such that

Vδ(x)dx:=1 if 0V, otherwise 0.\int_V \delta(x) dx := 1 \text{ if } 0 \in V, \text{ otherwise } 0.

The word “object” here is code for “we’re pretty confused and don’t know what this thing is but we’d like to have something that behaves this way”.

We’ll develop a formal definition of δ\delta soon. But first, let’s extend the original example a bit: suppose instead of being interested only in the charge inside some volume, we now introduce a charged test particle and want to know the potential energy it has due to the charge density ρ\rho. This potential is given by

ΦR31x0xρ(x)dx\Phi \propto \int_{\mathbb{R}^3} \frac{1}{|x_0 - x|} \rho(x) dx

for a test particle at position x0x_0. So what is the potential energy if we have the point charge from before, ρ(x)=δ(x)\rho(x) = \delta(x)? So far, we have only defined Vδ(x)dx\int_V \delta(x) dx, and if δ(x)\delta(x) appears anywhere else, we don’t really know what to do with it. Remember, Vδ(x)dx\int_V \delta(x) dx is just a notation we introduced to mean “1 if 0V0 \in V and 0 otherwise”, it’s not actually an integral in any usual sense.

So we will apply a powerful technique — wishful thinking. We just assume that δ(x)\delta(x) behaves the way we would intuitively like it to, and then worry later about constructing something that actually does behave that way. Since for ρ(x)=δ(x)\rho(x) = \delta(x), there is no charge outside the origin, all parts of the integral above except for x=0x = 0 ought to vanish. So let’s just write

R31x0xδ(x)dx={0}1x0xδ(x)dx.\int_{\mathbb{R}^3} \frac{1}{|x_0 - x|}\delta(x) dx = \int_{\{0\}}\frac{1}{|x_0 - x|}\delta(x)dx.

Since we’re only integrating over {0}\{0\} now, we can set x=0x = 0 in x0x|x_0 - x|. Then this part doesn’t depend on xx anymore and we get

{0}1x0xδ(x)dx=1x0{0}δ(x)dx.\int_{\{0\}}\frac{1}{|x_0 - x|}\delta(x)dx = \frac{1}{|x_0|}\int_{\{0\}}\delta(x)dx.

But we know what to do with that last part, it’s 1! So the potential should be Φ1x0\Phi \propto \frac{1}{|x_0|}.

We can apply the same argument more generally to φ(x)δ(x)dx\int \varphi(x) \delta(x)dx for other functions φ\varphi. So let’s “wish” that

R3φ(x)δ(x)dx:=φ(0)\int_{\mathbb{R}^3} \varphi(x) \delta(x) dx := \varphi(0)

hold for all functions φ\varphi. This contains our original definition of δ(x)\delta(x) as a special case, namely for the indicator function φ=1V\varphi = 1_V.

Schwartz distributions

The defining property of δ(x)\delta(x) that we would like to have is

R3δ(x)φ(x)dx:=φ(0)\int_{\mathbb{R}^3} \delta(x) \varphi(x) dx := \varphi(0)

for arbitrary functions φ\varphi. We have already noted that this cannot be an actual (Lebesgue) integral, so it makes sense to get rid of that notation. Instead, we will write

δ,φ:=φ(0).\langle \delta, \varphi\rangle := \varphi(0).

This hightlights the important part: δ\delta lets us take any function φ\varphi and maps it to its value φ(0)\varphi(0) at the origin. So δ\delta is a function after all; just not from R3\mathbb{R}^3 to R\mathbb{R} but from the space of functions on R3\mathbb{R}^3 to R\mathbb{R}!

δ\delta is one example of Schwartz distributions or distributions for short, which are all maps from a space of functions to the real numbers. Let’s make this more precise:

Definition: Let URnU \subseteq \mathbb{R}^n be an open subset. A test function on UU is a smooth, compactly supported function φ:UR\varphi: U \to \mathbb{R} and we write D(U)\mathcal{D}(U) for the space of all such test functions. A Schwartz distribution on UU is then a continuous linear function T:D(U)RT: \mathcal{D}(U) \to \mathbb{R}. We write D(U)\mathcal{D}'(U) for the space of all such distributions on UU.

This definition requires some clarifications. First, Schwartz distributions are not at all the same thing as probability distributions, and when I say “distribution” in this series, I will always mean a Schwartz distribution. Second, if we want to talk about continuity, we of course need to define a topology on the space D(U)\mathcal{D}(U) of test functions. The topology we use here is called the canonical LF topology but we won’t discuss that any further in this post.

The name test function comes from the fact that these are the functions on which we can “test”, i.e. evaluate distributions. In our first example about the total charge in some volume, we used indicator functions 1V1_V as test functions. The δ\delta distribution would in principle work on any space of test functions. But it turns out that a good choice for the general definition are smooth compactly supported functions because this makes a lot of the theory very nice.

We will write T,φ\langle T, \varphi \rangle for the distribution TT applied to the test function φ\varphi. But why did we write δ(x)φ(x)dx\int \delta(x) \varphi(x) dx before? What does all of this have to do with integrals? The reason is the following: let f:URf : U \to \mathbb{R} be any locally integrable (read “somewhat reasonable”) function. Then the map

φUf(x)φ(x)dx\varphi \mapsto \int_U f(x) \varphi(x) dx

defines a distribution on UU, which we denote by TfT_f. This is the sense in which distributions are generalized functions; each classical function induces a distribution. So when we write δ(x)φ(x)dx\int \delta(x) \varphi(x) dx, we are essentially pretending that the delta distribution is induced by a function δ(x)\delta(x). There is no such function, but the notation is used very often anyway; probably in part for historical reasons and in part because it turns out to work surprisingly well, as we’ll see next.

We will revisit distributions in general in the next post but for now, we focus on the δ\delta distribution again.

Variations of the δ\delta distribution

We now have a formal understanding of terms of the form δ(x)φ(x)dx\int \delta(x) \varphi(x)dx. But in practice, the δ\delta distribution often appears in modified versions, such as in terms like

δ(xx0)φ(x)dx\int \delta(x - x_0)\varphi(x) dx

or

δ(ax)φ(x)dx.\int \delta(ax)\varphi(x)dx.

So far, we haven’t formally defined these terms. That means it’s time to apply the Power of Wishful Thinking again, in order to find good definitions for them.

It’s pretty clear what δ(xx0)\delta(x - x_0) should mean: it’s just a shifted version of δ(x)\delta(x), with its “peak” at x0x_0 instead of 00. More explicitly, it makes sense to demand that

δ(xx0)φ(x)dx=δ(x)φ(x+x0)dx\int \delta(x - x_0)\varphi(x) dx = \int \delta(x) \varphi(x + x_0) dx

as would be the case if δ\delta was a regular function (all integrations are assumed to be over all of Rn\mathbb{R}^n). Then we can see that

δ(xx0)φ(x)dx=φ(x0).\int \delta(x - x_0)\varphi(x) dx = \varphi(x_0).

Let’s consider δ(ax)φ(x)dx\int \delta(ax)\varphi(x)dx instead. You might argue as follows: ”δ(ax)=0\delta(ax) = 0 for x0x \neq 0, so we only need to consider x=0x = 0. In that case, ax=0=xax = 0 = x, so δ(ax)\delta(ax) should be the same as δ(x)\delta(x)”. But this is a misunderstanding caused by the (admittedly very confusing) notation often used for the δ\delta distribution: δ(x)\delta(x) doesn’t mean that anything is actually being evaluated at xx, it’s just a notational convention that only makes sense inside integrals. We don’t want to demand that δ()\delta(\cdot) behaves like functions when we plug in different things because we never have δ(x)\delta(x) appearing on its own anyway.

What we do want is that δ(x)\delta(x) behaves like functions inside an integral. In particular, for functions ff and φ\varphi and a scalar a0a \neq 0, we have

f(ax)φ(x)dx=1anf(x)φ(xa)dx.\int f(ax)\varphi(x)dx = \frac{1}{|a|^n}\int f(x)\varphi\left(\frac{x}{a}\right)dx.

So since we want δ(x)\delta(x) to behave the way that functions behave inside integrals, we define

δ(ax)φ(x)dx:=1anδ(x)φ(xa)dx=1anφ(0).\int \delta(ax)\varphi(x)dx := \frac{1}{|a|^n}\int\delta(x)\varphi\left(\frac{x}{a}\right)dx = \frac{1}{|a|^n}\varphi(0).

In fact, we can generalize this argument: for any diffeomorphism gg of Rn\mathbb{R}^n, we have

f(g(x))φ(x)dx=detDg(x)1f(x)φ(g1(x))dx\int f(g(x))\varphi(x)dx = \int |\operatorname{det} Dg(x)|^{-1} f(x)\varphi(g^{-1}(x))dx

where DgDg is the derivative (Jacobian) of gg. So in analogy, we can define δ(g(x))\delta(g(x)) for any diffeomorphism gg by

δ(g(x))φ(x)dx:=detDg(x)1δ(x)φ(g1(x))dx=detDg(0)1φ(g1(0)).\begin{aligned} \int \delta(g(x))\varphi(x)dx :&= \int |\operatorname{det} Dg(x)|^{-1} \delta(x)\varphi(g^{-1}(x))dx \\\\ &= |\operatorname{det} Dg(0)|^{-1}\varphi(g^{-1}(0)). \end{aligned}

I want to stress again that none of these arguments are “proofs” or “derivations” — in the end, we have to choose how to define all of these terms. But clearly some definitions make more sense than others and in the examples here there is clearly one “right” way to define what δ(g(x))\delta(g(x)) etc. should mean. This will become even more clear in the next post: we will continue the theme of finding good definitions via “wishful thinking”, only this time for arbitrary distributions and for many more types of operations.


  1. Actually, only for almost all xx but that doesn’t change anything.