2018-05-31
[last updated 2018-06-12]

*tl;dr* We’ve typically considered it a deal-breaker to discover that
an algorithm we care about is *NP*-hard. We’ll go to great lengths to simplify
the problem domain so that we can switch to a polynomial-time algorithm. But if
we simplify too much, then we run the risk that our solution is no longer
useful. Luckily, we might not have to! *NP*-hard is a ** worst-case** bound.
If we can convince ourselves that we won’t encounter pathological inputs, the

Complexity theory is the study of how difficult certain problems are. We
typically measure a problem’s complexity by estimating the amount of time and
memory that a computer would need to solve the problem for an arbitrary input.
We also like to lump together problems with basically the same difficulty into
** complexity classes**. For instance, there is a complexity class named

Another important complexity class is ** NP**, which is where things first
start to get difficult for our poor little computers. A problem in

The last wrinkle is that even after decades upon decades, we still don’t know
whether or not *P* and *NP* are the same thing. We ** believe** that there are
no easy solutions to any

All of this means that we have an (understandably) innate fear of *NP*-hard
problems. With an algorithm that’s just in *NP*, there’s always a chance that
we just haven’t thought about things long enough or hard enough — if we keep at
it, we’d see that while *this algorithm* is hard, the underlying problem is not,
and we’d find a different algorithm that solves the problem more easily. On the
other hand, if we discover that our algorithm is *NP*-hard, we’re hosed. To
crack this nut, we’d have to do something that none of the greatest minds in the
history of our field have done.

So instead of trying to solve the unsolvable, we look for a simpler problem to
solve. For example, dependency management (like that provided by build tools
like cargo and npm) is a provably *NP*-hard problem — you
can construct a set of libraries whose dependency relationships with each other
would cause `npm install`

to run effectively forever. This is…not the best
usability story, to say the least!

While researching this problem for the Go ecosystem, Russ Cox pointed out that
you can simplify the problem by prohibiting certain kinds of negative
dependencies (e.g., “the `kitten`

package doesn’t work with version 1.2.6 of the
`catnip`

package”). That in turn, moves the dependency management problem from
*NP*-hard to *P*. Perfect! Now our tool will never blow up on us.

Unfortunately, by simplifying the problem, we run the risk that our solution is no longer useful. Sam Boyer suggests this is the case for Go dependency management: that there are real use cases where the community will need the kinds of negative dependencies that would be outlawed.

That leaves us with a conundrum. On the one hand, we can’t simplify our problem without sacrificing real use cases. On the other, we have an algorithm that might not finish in any reasonable amount of time. What can we do?

I would argue that there’s a third possibilty, which is to just…shrug.
*NP*-hard is a ** worst-case** bound on your algorithm. It might only apply to
“pathological” inputs. For dependency management, for instance, we said that

`npm install`

to blow up.
But how many hoops do you have to jump through to construct that graph? Are you
Now, this doesn’t necessarily apply to all *NP*-hard problems. You might be
dealing with something so fundamentally difficult that all interesting inputs
take exponential time to solve. If that’s the case…well, this post isn’t
going to help you all that much. You’ll still have to simplify your problem
domain somewhow to make any forward progress.

But before you go down that road, it’s worth thinking through some real-world use cases. You might be pleasantly surprised!

If you have a hankering for formal methods, check out this paper that
I wrote during my grad school days, where I use a CSP refinement checker to
explore the complexity space of an interesting *NP*-hard problem, trying to find
what kinds of inputs lead to the worst-case exponential behavior.