(Note: this is an analysis of one aspect of the Karger-Stein algorithm, it’s not meant to be a beginner-friendly introduction)
Karger’s algorithm randomly contracts a graph and surprisingly, this can be used
to find a minimum cut with probability , where is
the number of vertices of the graph. This is a much, much higher probability
than sampling a graph cut uniformly at random would give! But it still means we
need to run the algorithm times to get a high success
probability. Karger’s algorithm can be implemented in
time, which gives an overall runtime of – not great,
there are much faster deterministic algorithms.
To understand how the Karger-Stein algorithm improves upon that, we need
the following key result that forms the foundation for Karger’s algorithm:
Theorem:
When Karger’s algorithm contracts a graph from to vertices,
any given mincut survives with probability .
In particular, for , we get the probability mentioned above.
But note the following: if we make only a few contractions, , then
mincuts are almost guaranteed to survive! This is the key insight that allows us
to improve the runtime of Karger’s algorithm, leading to the improved Karger-Stein algorithm.
The idea is the following: first we contract the graph down to roughly vertices,
where is small enough that mincuts are very likely to survive. Then we branch: we again
contract the graph down by a factor of , but we do so times independently
from one another. needs to be chosen high enough that mincuts are very likely
to survive in at least one of the branches. We repeat this process until we have contracted
the graph down to just 2 vertices.
If we chose and right, at least one of the final leaves of our computational
tree will likely contain a mincut. So we return the best cut we’ve found among all
the leaves.
The Karger-Stein algorithm as it was originally described and as it is usually presented
uses and . So we always split the computation into two branches
and reduce the number of vertices by a factor of before branching again.
But in this post, I would like to motivate where these numbers come from, as well
as show that they’re not the only ones that work. So in the following, we’re going to
analyze the “generalized Karger-Stein algorithm” with arbitrary and .
Success probability
As mentioned above, any minimum cut survives a contraction from
to vertices with probability .
I said we first contract to “roughly” vertices – to be precise we contract
until vertices are left, this will give us a nice bound.
The probability that a mincut survives this contraction is:
We can now apply this bound recursively: After we have contracted to
vertices, we can forget that this is a partially contracted graph, and just treat
this number as the “new ”.
We will write for the survival probability if there are levels of recursion
left before we reach the leaves of the tree. So will be the probability in the
leaves of the recursion tree. Depending on when precisely we stop the recursion
and what method we use to finish the contraction, might take different values,
but all that matters for us is that it is some constant.
Using the bound we found above, we get the following recurrence:
What’s going on here? is a lower bound on the probability
that any given mincut survives in one particular branch. So
is an upper bound on the probability that the mincut survives in none of the
branches, and consequently is a lower
bound on the probability that it survives in at least one.
This recurrence doesn’t have an obious solution we can just read off but with
some rewriting, we can get something that’s good enough for our purposes.
Substituting , we get
where we used in the last step. The constant term may depend
on and but not on .
If , then which means that .
The depth of recursion for a graph with vertices is ,
so the overall success probability is .
What this means in words: if we create enough branches (at least ) compared
to how long we contract before branching again, then we get quite a high success
probability – means that runs are enough
to get an overall success probability that approaches 1 as .
But what if , i.e. if we don’t have enough branches at each stage?
Then the inequality derived above only yields
so the success probability can be exponentially low in .
This means we’d have to repeat the algorithm a potentially exponential
number of times, which would make it useless.
That still leaves the question: why does the Karger-Stein algorithm use
in particular, when would give a success probability at least as high?
For that we need to turn to the runtime complexity.
Runtime
The runtime of the Karger-Stein algorithm can be described with the following
recurrence:
The term is for contracting down to roughly
vertices. At that point, we solve smaller version of the original problem,
each of size . That leads to the term.
This kind of recurrence is exactly what the Master theorem is for. In this case, if we
choose , we get a runtime of for a single run of
the Karger-Stein algorithm. We already saw that with , we get an exponentially
low success probability, so that choice isn’t interesting anyway. Finally, if ,
we get a high success probability, but the runtime becomes ,
where .
We can summarize all our results (and a few I didn’t mention) in one table:
Condition |
Time for single run |
Success probability |
Total runtime |
Comment |
|
|
exponentially low in |
exponential in |
Too little branching |
|
|
|
|
Just right |
|
with |
|
with |
Unnecessarily much branching |
The “total runtime” column contains the runtime that is needed to achieve a high success probability
by repeating the Karger-Stein algorithm often enough (at least if is large enough). This is the complexity
that we want to minimize in practice.
We can now see that combining the analysis of the success probability with the runtime analysis
explains why the Karger-Stein algorithm uses : in the other cases, we are either very
unlikely to succeed and therefore need too many runs, or we are taking unnecessarily long for
a single run.
But also note that any choice of a “branching factor” works, as long as we then choose
. So splitting the computation up into just two subproblems is a reasonable
and simple choice, but from a purely asymptotic perspective it is arbitrary.