CS302 Lecture notes -- NP Completeness
- James S. Plank
- December 1, 2009.
- Latest revision:
Wed Nov 28 20:09:28 EST 2018
This is not a complete treatment of NP-Completeness. Like the Halting Problem
lecture notes, they introduce you to a concept that you will see later in
your CS careers and will provide you with fodder for endless conversations
around the family dinner table.
As always, you can spend quite a bit of time reading Wikipedia on the
subject. Their page is
in here.
This is not required reading, but (as of 2015) is a nice treatment of the topic.
P, NP, NP-Complete and NP-Hard are sets of problems, defined as follows:
- P: problems whose solution is polynomial time in the size of their inputs.
- NP: problems whose solutions can be verified in polynomial time.
(NP stands for non-deterministic polynomial time).
- NP-Complete: A collection of problems in NP whose solutions may or may not
polynomial time. We don't know. However, if we can prove that one of them may be solved
in polynomial time, then all of them can.
- NP-Hard: A collection of problems that do not have to be in NP,
whose solutions are at least as hard as the NP-Complete problems. If a problem is in NP, and
it's NP hard, then it is also NP-Complete.
In this lecture, we are going to see what it takes to prove that problems belong
to these sets.
Suppose you have a problem to solve, and you want to know its complexity class.
This takes two steps:
- Prove that it is in NP. Typically the problem is couched as
a yes or no problem involving a data structure, such
as ``does there exist a simple cycle through a
given directed graph that visits all the nodes?''
To prove it is in NP, you need to show that
a yes solution can be checked in polynomial time.
In the above example, you can check to see if a given path through the graph
is indeed a simple cycle in linear time. Therefore, the problem is in
NP. You don't have to prove anything about the no solutions,
and you don't have to prove anything about how you'd calculate a solution.
- Transform a known NP-Complete problem to this one in polynomial time.
Suppose the problem in question is Q,
and that L is a well-known NP-Complete problem like
the 3-satisfiability problem. You need to show that if you have
any instance of problem L, you can transform it into an instance
of problem Q in polynomial time. Thus, if you could solve problem
Q in polynomial time, you could solve problem L in polynomial
time.
If you can do both of these things, then you have proved that a problem is
NP-Complete. If you can prove that either of these things cannot be done, then you
have proved that a problem is not NP-Complete. Sometimes you can't come up
with good proofs, and you just don't know.
The complexity classes P and NP-Hard may be put in terms of the above:
- P: If we can prove that the solution to a problem may be calculated
in polynomial time, then the problem is in P. All of the algorithms that we
have studied in this class, with the exception of enumeration, are in P.
- NP-Hard: These are problems that are at least as hard to solve as NP-Complete
problems. If they are in NP, then they are NP-complete problems.
We prove that a problem is NP-Hard by
performing the transformation in step 2 of a known NP-Complete problem the problem
at hand. That is how we demonstrate that they are "at least as hard to solve as NP-Complete
problems."
3-SAT - A Canonical NP-Complete Problem
3-SAT is a very simple NP-Complete problem. You are given a boolean expression,
which is a big AND (∧) of clauses:
E = C0 ∧
C1 ∧ ... ∧
Cm-1
Each clause Ci is the OR (∨) of three literals, where a literal is
either a variable xi or the negation of a variable ¬ xi
(or sometimes the negation of a is denoted a).
Here is an example with three clauses and three variables. To make it easier to read, I'm simply
calling the variables a, b and c .
E = ( a ∨ b ∨ c )
∧
( a ∨
b ∨
c )
∧
( a ∨
b ∨
c )
Given this definition, 3-SAT is simple -- is there an assignment of the variables so that E
is true? In the above example, it's easy to find such an assignment. For example, set a
and c to TRUE and b to FALSE (I'm coloring the true statements red -- you can
see that there is always at least one TRUE in each clause).
E = ( a ∨ b ∨ c )
∧
( a ∨
b ∨
c )
∧
( a ∨
b ∨
c )
In general, 3-SAT can be a very difficult problem to solve. Here's a harder example
with seven clauses and four variables.
E = ( a ∨ b ∨ c )
∧
( a ∨
b ∨
d )
∧
( a ∨
c ∨
d )
∧
( b ∨
c ∨
d )
∧
( a ∨
b ∨
c )
∧
( b ∨
c ∨
d )
∧
( b ∨
c ∨
d )
One correct assignment is setting a and c to FALSE,
and b and d to TRUE:
E = ( a ∨ b ∨ c )
∧
( a ∨
b ∨
d )
∧
( a ∨
c ∨
d )
∧
( b ∨
c ∨
d )
∧
( a ∨
b ∨
c )
∧
( b ∨
c ∨
d )
∧
( b ∨
c ∨
d )
From our lecture notes on enumeration, we can
answer whether an instance of 3-SAT is true or false with a simple power
set enumeration. That enumerates all possible true/false settings of the literals,
and for each setting, you can test to see whether the expression is true.
Of course, if there are n literals, the power set enumeration will
enumerate 2n settings, so this is definitely not polynomial
time.
Is there a polynomial time solution? No one knows.
It is an easy matter to prove that 3-SAT is in NP. How many different clauses can there
be? (4/3) * n * (n-1) * (n-2) -- we'll go over that in class. That's a polynomial of
n. If we have a solution, we can test its validity by simply setting the variables and
seeing if E is true. That test is polynomial time, so 3-SAT is in NP.
As for proving that 3-SAT is NP-Complete, that is well beyond the scope of this class. However,
3-SAT is a very popular problem for proving that other problems are NP-Complete.
How would we do that?
Suppose I have a problem, like
The Independent Set Decision Problem (ISDP): Given a graph G
and a number k, can we find a set of k vertices in G such that there are
no edges between any two of the vertices. Here's an example:
The yellow nodes are an independent set of size 5. There is no independent set of size 6.
Here's how we use 3-SAT to prove that ISDP is NP-Complete.
First, prove it's in NP: If you give me a set of k vertices, I can easily check to verify
that there are no edges between two nodes in k. That will be O(|E|) in the
worst case, which is most definitely polynomial in |V|.
Next, I need to figure out how to take an instance of 3-SAT, and convert it into an instance
of ISDP, so that if you can solve the ISDP instance in polynomial time, then you can solve the
instance of 3-SAT in polynomial time. Here's one way:
- Turn each clause into three nodes, and label the nodes with their literals (including
the not). Add an edge between each of these nodes.
- For every pair of nodes with the same, but negated, literals, add an edge between that
pair of nodes.
- Any independent set of size k=n will correspond to an assignment of the literals for
which the 3-SAT expression is true.
Here's the simple three-clause 3-SAT problem above, converted to a graph, with an example
3-node independent set colored magenta. You'll note that the set corresponds to a setting of
the variables that makes the 3-SAT equation true:
Below, I also convert the more complicated 7-node expression to a graph for the ISDP
problem.
I have the clauses clumped together going clockwise around the graph, starting at roughly
1:00. I also have colored inter-clause edges according to the literals that they connect:
I've colored the nodes in the Independent Set gray. You should be able to verify that:
- The set is indeed independent.
- The assignment of literals makes the expression true.
This works because you can only have one node per clause in the Independent Set. Moreover,
if you have a in the set, then you cannot have
a and vice versa.
Finally, think about the size of the graph. It will have 3m nodes, and a maximum
of something like 3m + 3(m/2)(m/2-1)/2 edges, which is clearly polynomial in m.
Thus, if I can solve ISDP in polynomial time, then I can solve 3-SAT in polynomial time.
Neat, no?
Who Cares?
NP-Complete problems usually have easy-to-write exponential solutions. However, we cannot prove
that they do not have polynomial time solutions. This is embodied in the equation:
P = NP?
It is a famous open question in theoretical computer science. Does its solution have practical
worth? Maybe -- a lot of these problems pop up very naturally (Spellseeker from Lab B
comes to mind...), and if we could solve them in polynomial time rather than
exponential, then that would be something!
Well-Known NP-Complete Problems
- Boolean Satisfiability: This is the general form of 3-SAT, and may be
reduced to 3-SAT.
- The Clique Problem: Given a graph G and a number k, does the
graph contain a clique of size k. A clique is a set of nodes where there is an
edge between every pair of nodes.
- Hamiltonian Path: Given a graph G, does there exist a simple path
that contains all of the nodes in G.
- Travelling Salesman: Given a graph G and a path length L,
is there a simple cycle whose path length is less than or equal to L?
- Subgraph Isomorphism: Given two graphs G and H, does G
contain a subgraph that is isomorphic to H? Isomorphism means that there
is a 1-to-1 correspondance between the nodes and edges of the two graphs.
- Knapsack Problem: Given a set of items, each of which has a weight and a
value, and given a total weight W and a total value V, can you select
items from the set whose weight is less than or equal to W and whose total
value is greater than or equal to V.
The nice thing about this problem is that it allows for polynomial time approximations,
and if the weights and values are integers, then it becomes easier. A practical variant
of this is the "DraftKings" problem (golfing version): Given a collection of
golfers, each of which has a salary and a score, which "team" of golfers, whose salary
is bounded by a salary cap has a maximum score? You'll note that this is really not
couched as a proper NP complete problem. However, you can turn it into one if you bound
the score, and then say "is there a team that achieves at least the bounded score."
You can then do binary search on the score, and the NP-complete decision problem will
solve the maximization problem.
Fortunately, for the problem sizes that you encounter in DraftKings golf (your team
has to be 6 golfers, and the field is 155 golfers or less), a nice dynamic program
solves the DraftKings problem quickly. What's the recursion?
int max_score(int index, int golfers_left, int salary_remaining);
|
In this, index is an index into a vector of golfers,
golfers_left is the number of golfers left to fill in your team, and
salary_remaining is how much salary you can devote to the rest of your
team.