¶NP-hard isn't the end of the world
2018-05-31 (last updated 2018-06-12)
We've typically considered it a deal-breaker to discover that an algorithm we care about is NP-hard. We'll go to great lengths to simplify the problem domain so that we can switch to a polynomial-time algorithm. But if we simplify too much, then we run the risk that our solution is no longer useful. Luckily, we might not have to! NP-hard is a _worst-case_ bound. If we can convince ourselves that we won't encounter pathological inputs, the NP-hard algorithm might be just fine in practice.
Complexity theory is the study of how difficult certain problems are. We typically measure a problem's complexity by estimating the amount of time and memory that a computer would need to solve the problem for an arbitrary input. We also like to lump together problems with basically the same difficulty into
Another important complexity class is NP, which is where things first start to get difficult for our poor little computers. A problem in NP is hard to
The last wrinkle is that even after decades upon decades, we still don't know whether or not P and NP are the same thing. We
All of this means that we have an (understandably) innate fear of NP-hard problems. With an algorithm that's just in NP, there's always a chance that we just haven't thought about things long enough or hard enough — if we keep at it, we'd see that while
So instead of trying to solve the unsolvable, we look for a simpler problem to solve. For example, dependency management (like that provided by build tools like cargo and npm) is a provably NP-hard problem — you can construct a set of libraries whose dependency relationships with each other would cause npm install to run effectively forever. This is...not the best usability story, to say the least!
While researching this problem for the Go ecosystem, Russ Cox pointed out that you can simplify the problem by prohibiting certain kinds of negative dependencies (e.g., "the kitten package doesn't work with version 1.2.6 of the catnip package"). That in turn, moves the dependency management problem from NP-hard to P. Perfect! Now our tool will never blow up on us.
Minimal Version Selection [Cox]
Unfortunately, by simplifying the problem, we run the risk that our solution is no longer useful. Sam Boyer suggests this is the case for Go dependency management: that there are real use cases where the community will need the kinds of negative dependencies that would be outlawed.
That leaves us with a conundrum. On the one hand, we can't simplify our problem without sacrificing real use cases. On the other, we have an algorithm that might not finish in any reasonable amount of time. What can we do?
I would argue that there's a third possibilty, which is to just...shrug. NP-hard is a
Now, this doesn't necessarily apply to all NP-hard problems. You might be dealing with something so fundamentally difficult that all interesting inputs take exponential time to solve. If that's the case...well, this post isn't going to help you all that much. You'll still have to simplify your problem domain somewhow to make any forward progress.
But before you go down that road, it's worth thinking through some real-world use cases. You might be pleasantly surprised!
If you have a hankering for formal methods, check out [Creager2007b]: a paper I wrote during my grad school days, where I use a CSP refinement checker to explore the complexity space of an interesting NP-hard problem, trying to find what kinds of inputs lead to the worst-case exponential behavior.
[Creager2007b] Empirical analysis and optimization of an NP-hard algorithm using CSP and FDR