Реферат: Алгоритмы
Реферат: Алгоритмы
The Efficiency Of Algorithms.
Some mathematical problems can be solved only by methods too slow for even
the fastest computers. More efficient methods have not been found, but
neither has it been proved, that there are no better methods by Harry R.Lewis
and Christos H. Paradimitrou
Suppose you were asked to plan an itinerary for a traveling salesman who must
visit a number of cities. You are given a map on which the distances between
the cities are marked and you are asked to find the shortest route that
passes through all the cities and returns to the starting point. An approach
to this problem that is certain to give the correct answer is to trace all
the possible routes, measure their length and pick the shortest one. If the
tour included more than a few cities, however, hundreds or thousands of
routes would have to be checked. If there were 100 cities, then even the
fastest computers would require weeks of calculation to find the shortest
path.
In the search for a quicker solution you might try some less rigorous
methods. One idea that seems reasonable is always to visit nearby cities
before going on to those farther away. You would soon discover, however, that
this procedure does not invariably yield the correct answer. Other shortcuts
also fail. In fact, the best methods known for solving the problem are not
much better than the obvious but laborious procedure of checking all the
possible itineraries. Mathematicians now suspect that this problem and many
others like it may forever remain beyond our ability to solve in any
efficient way. That speculation itself, however, is unconfirmed; although no
faster methods of solution have been found, neither has it been proved that
faster methods do not exist.
In the problem of the traveling salesman's tour it is not the solution for a
particular set of cities that is of the greatest importance but a general
method for finding the solution for any cities. Such a method is called an
algorithm: it is a precisely stated procedure or set of instructions that can
be applied in the same way to all instances of a problem. If the problem is
to be solved with the aid of a computer, an algorithm is indispensable,
because only those procedures that can be stated in the explicit and
unambiguous form of an algorithm can be presented to a computer. Instructions
that are vague or that rely on intuition are unacceptable.
An example of an algorithm is the procedure taught in the schools for the
subtraction of whole numbers. If each of the steps in this procedure is
applied correctly one at a time, the algorithm will always yield the correct
result. What is more, once the algorithm has been learned or stored in the
memory of a computer or embodied in the circuitry of an electronic
calculator, it can be applied to an infinite set of subtraction problems.
With this one algorithm the difference between any two whole numbers can be
determined.
In principle any problem for which an algorithm can be devised can be solved
mechanically. It may therefore seem surprising that there are problems for
which algorithms exist but for which we so far have no practical general
solution. The algorithms for solving these problems always give a correct
answer, but they often require an inordinate amount of time. The problem of
the traveling salesman's tour is among these intractable tasks.
The efficiency of computer algorithms is a topic of obvious practical
importance. It is also of interest in more formal areas of mathematics. There
are some problems in mathematics and logic for which no algorithm can ever be
written, and there are many others for which efficient, fast algorithms are
already known. Between these two groups is a third class of problems that can
always be solved in principle, but for which there are only inefficient (and
therefore largely unusable) algorithms. For some of these difficult problems
mathematicians have been able to demonstrate that efficient algorithms can
never be designed. For many of the most important problems, however, there is
only the suspicion that good algorithms are impossible.
A given problem can have more than one algorithm for its solution. For
example, children in Europe learn a procedure for subtraction slightly
different from the one taught in the U.S. Both of the subtraction algorithms,
however, give the same result in the same amount of time. That is not
invariably the case with different algorithms for solving the same problem.
One celebrated problem that can be solved by either a "fast" algorithm or a
"slow" one is the problem of the Koenigsberg bridges.
In the I8th-century German city of Koenigsberg (which is now the Russian city
of Kaliningrad) a park was built on the banks of the river Pregel and on two
islands in the river. Within the park seven bridges connected the islands and
the riverbanks. A popular puzzle of the time asked if it was possible to walk
through the park by a route that crossed each of the bridges once and only
once.
For the solution of the problem the size and shape of the islands and the
length of the bridges are immaterial: the only essential information is the
pattern of interconnections. This information can be presented compactly in
the mathematical structure known as a graph, which is merely a set of points
with lines drawn to join them. In the case of the Koenigsberg park each of
the riverbanks, and each of the islands is condensed to a single point, and
each of the bridges is represented by a line between two points. Thus the
graph consists of four points and seven lines. If the lines are labeled, any
path through the park can be specified by a simple listing of labels.
The obvious approach to the problem is to list all the paths that cross all
the bridges and to eliminate from consideration those that cross any bridge
more than once. This is the technique of exhaustive search, similar to the
one employed in the problem of the traveling salesman. When the mathematician
Leonard Euler was presented with the problem of the Koenigsberg bridges, he
recognized the limitations of the technique and found another method. In
recognition of his contribution a path that traverses each line of a graph
exactly once is now called an Eulerian path.
Euler wrote: "The particular problem of the seven bridges of Koenigsberg
could be solved by carefully tabulating all possible paths, thereby
ascertaining by inspection which of them, if any, met the requirement. This
method of solution, however, is too tedious and too difficult because of the
large number of possible combinations, and in other problems where many more
bridges are involved it could not be used at all."
Euler's alternative method is much simpler. He showed that a tour of the kind
sought must exist if the graph meets two conditions. First, it must be
possible to go from any point in the graph to any other point by following
the lines of the graph: in other words, the graph may not be disconnected.
Second, every point of the graph, with two possible exceptions, must be at
the junction of an even number of lines.
It is not hard to understand why a graph cannot have an Eulerian path unless
it meets these conditions. All regions of the graph must be connected to one
another if there is to be any path that traverses all the lines. Each point
must have an even number of lines because half of them are required to reach
the point and the other half to leave it. Two points with an odd number of
lines can be allowed if they are chosen as the starting and finishing points
of the path. Demonstrating that any graph that meets these conditions
actually has an Eulerian path requires a somewhat more complicated argument,
which we shall not present here, but Euler was able to give a rigorous proof.
It is an easy matter to express Euler's solution to this problem in an
algorithm that could be executed by a computer. The first requirement,
connectivity, can be established by marking some point of the graph, then
similarly marking all points connected to it by lines, then all the points
connected to the newly marked points, and so on. The graph is connected if at
the end all points have been marked. The second requirement is just as easily
tested: the machine is instructed to examine each point of the graph and
count the number of lines that terminate at that point. If no more than two
of the points have an odd number of lines, the graph has an Eulerian path.
The park at Koenigsberg met the first condition but not the second, and so
there was no Eulerian tour of the seven bridges.
Euler's method is unquestionably the more economical approach to the problem
of Koenigsberg bridges: it requires that each point and line of the graph be
listed just once, whereas the exhaustive search is not completed until every
path that crosses all the bridges has been listed. The number of such paths
is much larger than the number of points and lines in the graph. In that
sense Euler's method is the better algorithm, but how much better? How can
the difference be measured and how can one fell if the difference is
significant?
For a graph with only four points and seven lines both techniques are fast
enough to be considered practical. Suppose, however, that more islands and
bridges were added to the park, or in other words that more points and lines
were added to graph. If the problem is being solved by Euler's method, each
new point merely adds one item to the list of points that must be checked. If
the paths are to be examined by exhaustive search, on the other hand, then
with each new point and line the size of the list is multiplied by some
factor. A moderate increase in the size of the graph results in an explosive
increase in the number of paths. Ultimately the list of paths 'must become
prohibitively long.
In this comparison of the two solutions to Euler's problem there is the basis
for a completely general method of evaluating the speed or practicality or
efficiency of any algorithm. We imagine that the algorithm is supplied with
larger and larger inputs, and we note the rate at which the execution time of
the algorithm increases. In this way it is possible to make unequivocal
judgments of algorithms. Exhaustive search not only is a slower method; in
general it is too slow to be of any value. Euler's method remains practical
for problems of essentially unlimited size.
As the size of the graphs being examined increases, the lists produced by the
method of exhaustive search grow exponentially. Each time some fixed number of
points and lines are added to the graph the size of the list doubles. Growth of
this kind can be described by a mathematical function such as 2^n,
where n is some measure of the size of the graph. Many other functions
have similar or even higher rates of growth. Among them are n^n and
n! ( which is read as "n factorial" and signifies n multiplied by
all the integers between 1 and n). For the purposes of this
discussion all these functions can be regarded as having the same property of
exponential growth.
Mathematical functions of another kind are known as polynomials. The simplest
members of this class are linear functions, such as 3n, which
designate a simple relation of proportionality. The time needed to solve the
problem of Koenigsberg bridges by Euler's method increases as a linear function
of the size of the graph. Other polynomials are n^2, n^3 and so
on, and the sums of such functions. What distinguishes polynomials from
exponential functions is that n never appears in an exponent.
For sufficiently large values of n any exponential function will
overtake and exceed any polynomial function. It was the certainty of this
result that dismayed 'Ihomas Malthus when he compared the exponential rate of
population growth with the polynomial rate of increase in the food supply. For
small values of n a given polynomial function may well exceed a given
exponential one, but there is always a value of n beyond which the
exponential function is the greater. The exact form of the polynomial makes
little difference, except in changing the point at which the polynomial
function is overtaken.
It is now generally agreed by computer scientists that algorithms whose
execution time increases exponentially as a function of the size of the input
are not of practical value. We shall call algorithms of this kind
"exponential time" algorithms, or simply inefficient algorithms. The only
algorithms that are considered fast or efficient enough for general
application are "polynomial time" algorithms.
Of course, even among efficient algorithms some are faster than others, but
for the purposes of this discussion it is important only to distinguish
polynomial-time algorithms as a class from exponential-time algorithms.
Moreover, this system of classification has the advantage that it makes the
speed of an algorithm a property of the algorithm itself and independent of
the machine on which it is executed. For sufficiently large problems a
polynomial-time algorithm executed on even the slowest machine will find an
answer sooner than an exponential-time algorithm on the fastest computer.
The mathematical properties of algorithms were studied in the 1930's by the
British mathematician A. M. Turing, the inventor of the imaginary computer,
now called a Turing machine. The Turing machine was conceived to be an
automaton equipped with an infinite supply of paper tape marked off in square
regions. The machine was capable of just four actions: it could move the tape
one square, it could place a mark in a square, it could erase a mark already
present and at the end of a calculation it could halt. These operations were
to be performed according to a sequence of instructions built into the
internal mechanism. Of course, Turing never built such a machine; it was
merely a conceptual device for automatically solving problems in mathematics
and logic. Indeed, Turing was interested not in actually solving problems
with the machine but rather in investigating what kinds of problems it could
solve and what kinds it could not.
Turing discovered that even a machine as simple as this one could solve any
problem for which an algorithm could be devised. The computation might be
laborious and indirect, but given enough time and paper tape the machine
would eventually find the solution and halt. Reduced to its essentials the
Turing machine is a language for stating algorithms, as powerful in principle
as the more sophisticated languages now employed for communicating with
computers.
In addition to conceiving, these machines Turing demonstrated, their
limitations. In 1936 he showed that there are problems that cannot be solved
by Turing machines, and it follows that these problems cannot be solved by
any automatic computer. They are problems for which algorithms cannot be
written, even in principle. The example first studied by Turing is the
problem of predicting whether a particular Turing machine, once it is set in
motion, will ever finish its calculation and halt. Through an analysis of
this problem he was able to show that there can be no general procedure for
telling whether mathematical propositions are true or false. Since then a
variety of other problems with the same properties have been proposed.
One result of Turing's work was to divide all imaginable problems in
mathematics into two classes. Those problems for which algorithms can never
be written are in a formal sense permanently unsolvable. Some instances of
these problems may be solved by rare perception or by luck, but a general
method for their solution will never be found.
All other problems in mathematics and logic can be solved by algorithms. As
we have seen, however, some algorithms are more useful than others. The class
of solvable problems can therefore be divided into two subgroups: those for
which there are efficient, polynomial-time algorithms and those for which
there are only exponential-time algorithms. Euler's problem is a member of
the class with polynomial-time solutions, since Euler's method is itself a
polynomial-time algorithm. Problems that can be proved to have only
exponential-time algorithms are also known, although they are rather obscure.
Although these two groups of problems are quite distinct, it is not always a
straightforward task to assign a problem to one group or the other. Indeed, a
very interesting class of problems seems to fall somewhere between them. For
these problems no efficient algorithms are known and the best available
solutions require exponentially increasing time, yet no one has been able to
prove that the problems do not have polynomial-time solutions.
One such problem was considered in the 19th century by the Irish
mathematician William Rowan Hamilton. Superficially Hamilton's problem is much
like Euler's .The problem is to decide whether a given graph has a path that
takes in each point exactly once (whereas Euler looked for a path that
traversed each line once). Actually the tasks are quite different, and Euler's
method cannot be applied to Hamilton's problem. The graph derived from the plan
of the park at Koenigsberg has a Hamiltonian path, although, as we have seen,
it has no Eulerian path. On the other hand, removing two lines results in a
graph that has an Eulerian path but not a Hamiltonian one. Many other graphs
have neither kind of path.
Hamilton's problem can be solved by exhaustive search: indeed, the procedure
is not substantially different from that employed in listing all the possible
paths that might have the Eulerian property. For Hamilton's problem, however,
no efficient algorithm comparable to Euler's method has been found. The
problem has been pondered by many of the best mathematicians of the past
century, but the most efficient methods available today are fundamentally no
better than exhaustive tabulation. On the other hand, all attempts to prove
that there is no better method have also failed, and it must be considered a
possibility that an efficient algorithm will be discovered tomorrow.
Problems that are known to have polynomial-time solutions, such as Euler's
problem, are said to be members of the class P (for polynomial). Hamilton's
problem is a member of another class, designated by the letters NP,
signifying "nondeterministic polynomial." The class NP encompasses all the
problems in P, or in other words P is a subset of NP. In addition NP includes
problems whose status is less certain. They are all solvable problems in
principle: they have algorithms, but for now only exponential-time algorithms
are known. They may also have polynomial-time algorithms (in which case NP
and P are identical) or they may prove to be permanently intractable, with
only inefficient solutions.
The problems considered here, and all problems classified in this way, can be
described as an infinite set of similar questions each of which can be
Another way of defining NP is as the class of yes-or-no problems that can be
solved by guessing certificates. If one is given an instance of a problem in
NP for which the answer happens to be yes, then with luck one may discover
the required certificate fairly quickly by making a sequence of guesses; if
the answer is no, guessing cannot possibly yield an answer any faster than an
exhaustive search could. For example, in solving Hamilton's problem one might
find a correct path (if there is one) on the first try by tracing small
portions of the path and guessing at each stage how to proceed. Such a
procedure, it should be emphasized, is not an algorithm. It could be made
into an algorithm only by crossing off each trial path as it is tested and
checking all possible paths, but that is equivalent to the method of
exhaustive search.
A mathematical procedure defined in terms of luckly guesses may seem bizarre,
but it is a quite legitimate approach to defining the problems in the class NP.
In principle the procedure could even be mechanized by building a device called
a nondeterministic Turing machine. This device can do all that an ordinary
Turing machine can do; in addition, at some points in its operation it
may have more than one choice of what to do next. Such a machine would be
considered to answer yes to a question if there were some sequence of choices
that could lead it to a yes conclusion. NP, the class of nondeterministic
polynomial-time problems, consists of precisely those problems whose yes
instances can be identified by, machines making comparatively short guessing
computations.
The inclusion of guessing in the definition of these problems suggests strongly
to many mathematicians that P and NP are not the same set and
hence that efficient algorithms can never be found for the intractable problems
in the class NP. If every problem in NP were actually in P,
then all the guesswork and luck could be replaced by some systematic procedure
without great sacrifice in time. It is hard to believe the ability to guess and
be lucky could win so little.
The class NP includes a variety of commonly encountered problems that
seem to defy efficient solution. We have already mentioned Hamilton's problem
and the problem of composite numbers. Another example is known as the matching
problem. It can be considered in terms of the task faced by the colleges every
September, when a new class of freshmen must be assigned to shared dormitory
rooms.
For the sake of simplicity let us assume that all the information gathered
about the students' smoking habits, bed time hours, taste in music and so
forth results in a single yes-or-no decision as to the compatibility of each
possible pair of students. The entire class can then be represented as a
graph in which the points correspond to students and a line is drawn answered
yes or no. For problems that are formally unsolvable, such as the problem of
predicting whether a Turing machine will halt, these questions simply cannot
be answered by any algorithmic procedure. For problems of the class P the
questions can invariably be answered, whether the answer turns out to be yes
or no, by an efficient procedure. In order for a problem to qualify for the
class NP there need not be an efficient means of answering the yes-or-no
questions. What is required is that whenever the answer is yes there be a
short and convincing argument proving it.
Hamilton's problem, for example, meets this condition. It is not possible to
tell by any efficient means known today whether a graph has a Hamiltonian
path, but if it does, then the path itself can be exhibited. Hence for every
Hamiltonian graph it is possible to issue a "certificate" that proves its
membership in this special class of graphs. Such a certificate would name the
lines in the graph in the order the path traverses them. Finding the path
might require weeks of tabulation, but once it has been found it can easily
be exhibited. Another problem that belongs to the class NP is the question of
whether a whole number is composite, that is, whether it can be written as
the product of two other numbers. Again there is no known efficient procedure
for answering the question, but if the number is indeed composite there is a
succinct proof of that fact, namely a correctly worked-out multiplication
with the number on the bottom line.
Care must be taken when asking the yes-or-no question of a problem in the
class NP, since the complementary no-or-yes problem might not be in the same
class. For example, the complement of Hamilton's problem, in which one is
asked to show that a graph does not have a path passing once through each
point, may well not be in the class NP. For now the only way to demonstrate
the absence of such a path is to list all possible paths, and such a proof is
too lengthy to qualify as a certificate of membership in NP. On the other
hand, the complement of the composite-number problem, which asks if a number
is prime, turns out to be in the class NP. The reason, which is far from
obvious, is that relatively short proofs demonstrating that a number has no
factors other than 1 and itself were discovered in 1975 by Vaughan Pratt of
the Massachusetts Institute of Technology. Still, it is not known whether the
composite-number problem and its complement are in the class P.
It is easy to show that every problem in the class P is also in the class NP.
If a problem is in P, then by definition there is an efficient algorithm for
it. To produce a short and convincing proof that the answer to some instance
of the problem is yes, all we need to do is follow the algorithm: a record of
its operation constitutes the required certificate.
Another way of defining NP is as the class of yes-or-no problems that can be
solved by guessing certificates. If one is given an instance of a problem in
NP for which the answer happens to be yes, then with luck one may discover
the required certificate fairly quickly by making a sequence of guesses; if
the answer is no, guessing cannot possibly yield an answer any faster than an
exhaustive search could. For example, in solving Hamilton's problem one might
find a correct path (if there is one) on the first try by tracing small
portions of the path and guessing at each stage how to proceed. Such a
procedure, it should be emphasized, is not an algorithm. It could be made
into an algorithm only by crossing off each trial path as it is tested and
checking all possible paths, but that is equivalent to the method of
exhaustive search.
A mathematical procedure defined in terms of luckly guesses may seem bizarre,
but it is a quite legitimate approach to defining the problems in the class
NP. In principle the procedure could even be mechanized by building a device
called a nondeterministic Turing machine. This device can do all that an
ordinary Turing machine can do; in addition, at some points in its operation
it may have more than one choice of what to do next. Such a machine would be
considered to answer yes to a question if there were some sequence of choices
that could lead it to a yes conclusion. NP, the class of nondeterministic
polynomial-time problems, consists of precisely those problems whose yes
instances can, be identified by machines making comparatively short guessing
computations.
The inclusion of guessing in the definition of these problems suggests strongly
to many mathematicians that P and NP are not the same set and
hence that efficient algorithms can never be found for the intractable problems
in the class NP. If every problem in NP were actually in P,
then all the guesswork and luck could be replaced by some systematic procedure
without great sacrifice in time. It is hard to believe the ability to guess and
be lucky could win so little.
The class NP includes a variety of commonly encountered problems that
seem to defy efficient solution. We have already mentioned Hamilton's problem
and the problem of composite numbers. Another example is known as the matching
problem. It can be considered in terms of the task faced by the colleges every
September, when a new class of freshmen must be assigned to shared dormitory
rooms.
For the sake of simplicity let us assume that all the information gathered about
the students' smoking habits, bed time hours, taste in music and so forth
results in a single yes-or-no decision as to the compatibility of each possible
pair of students. The entire class can then be represented as a graph in which
the points correspond to students and a line is drawn connecting every two
students who can be placed in the same room. If each room holds just two
students, the assignment can be made efficiently by a clever polynomial-time
algorithm discovered by Jack Edmonds of the University of Waterloo. If each
room is to be shared by three students, however, there is no known efficient
algorithm. The problem is in the class NP, since all yes instances have
succinct certificates: an acceptable room assignment, once it is discovered,
can easily be exhibited. Of course, a solution could be found by exhaustive
search, albeit inefficiently. With luck a suitable assignment, if there is one,
can be guessed quickly.
Map coloring is a problem in the class NP that concerns mathematicians
more than it does cartographers. The question is whether the countries on a
given map can be colored with a given number of colors so that no two countries
that share a border have the same color. It is easy to find out if a map can be
colored with two colors, it can be if there are no places on the map where an
odd number of countries meet at one point. It is even easier to tell if a map
can be colored with four colors; indeed, there is no need even to look at the
map, since Kenneth Appel and Wolfgang Haken of the University of Illinois
proved in 1975 that four colours suffice for any map. Surprisingly, however,
no efficient algorithm is known for determining whether three colors are enough
for a given map. The problem is in the class NP, since a correctly
colored map can-serve to certify a yes answer.
Map coloring can be regarded as a special case of another problem called graph
coloring. Any map can be converted into a graph by reducing each country to a
point and drawing a line between two points if the corresponding countries
share a border. Coloring the graph is then equivalent to coloring the map,
subject to the rule that two points connected by a line cannot have the same
color. Graph coloring, however, is a more general problem, with applications
outside graph theory. For example, a graph can represent the scheduling of work
in a factory. Each point of the graph stands for some job to be done, and two
points are connected by a line, if the jobs cannot be done concurrently,
perhaps because they require the same piece of machinery. A coloring of the
graph with three colors would then supply a schedule dividing the work of the
factory into three shifts. Like map coloring, the graph-coloring problem is in
the class NP.
It often happens that if one problem can be solved efficiently, so can many
others. For example, if an efficient algorithm could be found for the problem
of graph coloring, it could be applied with only minor modifications to the
problems of map coloring and factory scheduling. Map coloring and factory
scheduling are therefore said to be efficiently reducible to graph coloring. In
the past several years it has become apparent that some of the problems in the
class NP have a remarkable property: all the problems in NP are
efficiently reducible to them. These elite problems within the class NP
are called NP-complete. If any one of them has an efficient algorithm, then
every problem in NP can be solved efficiently.
The first proof that a problem is NP-complete was presented in 1971 by
Stephen A. Cook of the University of Toronto. His reasoning follows a path
essentially parallel to the path of Turing's earlier work on mathematical
machines and their relation to problems of formal logic. Cook stated his
proof in terms of the prepositional calculus, the formal language in which
separate logical statements, which individually may be either true or false,
are joined together by the lexical elements "and," "or" and "not." In general
a sentence in the propositional calculus can be shown to be either true or
false depending on which of its component statements are assumed to be true
or false. Certain sentences, however, cannot be true under any interpretation
because they are self-contradictory. Sentences that cannot be made true are
said to be unsatisfactory.
Cook employed the propositional calculus to describe the operation of the
nondeterministic Turing machines, the mechanized guessing devices essential to
the definition of the class NP. He showed that the calculations of any
such machine can be described succinctly by sentences of the propositional
calculus. When the machine is given a yes insfance of a problem in NP, its
operation is described by a satisfiable sentence, whereas the operation of a
machine given a no instance is described by a sentence that cannot be
satisfied.
It follows from Cook's proof that if one could efficiently determine whether a
sentence in the propositional calculus can be satisfied, one could also
determine efficiently in advance whether the problem presented to a
nondeterministic Turing machine will be answered yes or no. Since the problems
in the class NP are by definition all those that can be solved by
nondeterministic Turing machines, one would then have an efficient method for
solving all those problems. The catch, of course, is that there is no known
efficient method of determining whether a sentence in the propositional
calculus can be satisfied.
Cook's argument states in essence that the propositional calculus is a universal
language for describing problems in the class NP. Every instance of
such a problem corresponds to a sentence in that language, and if the sentence
is satisfiable, the instance has a yes answer. Many other problems have since
been shown to be NP-complete because the satisfiability problem can
efficiently be reduced to them. Hamilton's problem, the problem of matching
groups of three roommates and the problem of coloring graphs with three colors
are all NP-complete. The first to point out the broad applicability of this
theory was Richard M. Karp of the University of California at Berkeley. Similar
investigations were independently conducted by the Russian mathematician P. A.
Levin. Since NP-complete problems capture the difficulty of all other problems
in NP, it is widely thought today that all NP-complete problems are
computationally intractable. A proof that a problem is NP-complete is
usually considered a strong argument for abandoning further efforts to devise
an efficient algorithm for its solution.
Even the assumption that all NP-complete problems are intractable would not
settle all questions about the class NP. In addition to the mystery of the
NP-complete problems there is an even more obscure area: problems in NP for
which no efficient algorithms are known but which have not been proved to be
NP-complete either. The problem of composite numbers is one of these.
Not all problems that can be solved by a computer are of the yes-or-no. type.
Another common type is the optimization problem. For example, suppose one is
given the positions of some cities on a map and asked to find the shortest
possible railroad network connecting them. In one version of this problem one
is allowed to lay down a straight section of track between any two cities,
but one is not allowed to install isolated junction points; tracks can be
joined only at cities. One property of the solution to this problem is
immediately apparent: the optimum network can never include a closed circuit,
because if it did, the network could be made shorter simply by omitting one
link in the circuit. Thus the best net-work always branches like a tree, and
the problem itself is called the spanning-tree problem.
The spanning-tree problem can be solved correctly and quite efficiently by a
method called the greedy algorithm, devised by Joseph B. Kruskal of Bell
Laboratories. The procedure is simply to connect the closest pair of cities,
then the next-closest and so on without adding any superfluous lines (lines
joining cities that are already linked indirectly). It is far from obvious
that this method always yields the shortest network, but it does, and it has
the pleasant property of requiring no foresight and no reconsideration of
earlier decisions.
The greedy algorithm can be relied on to find the shortest network between
cities under the rules specified, but in general that network will not be the
shortest possible one. Further savings can be achieved by establishing
junction points between cities. The properties of networks with such
junctions were studied by the Swiss mathematician Jakob Steiner. It can be
shown that any shortest network must be arranged so that each junction point
is made up of three lines that meet at angles of 120 degrees. This rule
provides some guidance in evaluating networks, but there are many possible
networks with Steiner junction points. No algorithm has been discovered that
finds the best network quickly.
The problem of the traveling salesman's tour is closely related. Again one is
given a set of cities, but now one is asked to find the shortest round-trip
tour. As a first guess the greedy algorithm suggests that perhaps the
salesman should always go to the nearest city he has not yet visited, but
this procedure does not work. Indeed, the problem is notorious for having
resisted all attempts to find an efficient solution.
Optimization problems are not fundamentally different from those that ask
yes-or-no questions; in fact, every optimization problem can be rewritten in
a yes-or-no form. In the traveling salesman problem, for example, we might be
given a length along with a set of cities and asked to state whether a tour
can be constructed that does not exceed the specified length. This yes-or-no
problem cannot be harder than the associated optimization problem, because if
the optimum tour could be found by some efficient method, it would be a
trivial task to determine whether it exceeds a given number. Hence if the
yes-or-no version is computationally intractable, one cannot hope to solve
the optimization problem itself efficiently. For this reason certain
optimization problems, such as that of the traveling salesman's tour and that
of placing Steiner junction points, are said to be
NP-complete.
Optimization problems are encountered often in engineering, economics,
operations research and other fields. The discovery that at least some of
these problems are NP-complete is therefore of considerable practical
interest. Since the NP-complete problems probably have no efficient
algorithms, there would seem to be little point in expending further effort
in seeking optimum solutions. An alternative that has recently been adopted
is to seek approximate solutions that are good even if they are not precisely
optimal.
One technique that has been applied to the traveling salesman problem offers
a solution that may not be optimum but is guaranteed to be no worse than
twice the optimum path. The procedure starts with the shortest spanning tree,
which can be generated efficiently by the greedy algorithm. This network can
be converted into a tour of the cities by traversing each line twice and
returning to the origin. It is known that the optimum spanning tree must be
shorter than any possible tour of the cities, since a tour can be converted
into a spanning tree (albeit one without any branches) by simply omitting one
segment. Thus twice the length of the optimum spanning tree cannot be longer
than twice the optimum tour. The method is a polynomial-time algorithm.
Recently Nicos Christofides of the Imperial College of Science and Technology
in London has found a way to improve the algorithm so that it yields a tour
guaranteed to be no more than half again as long as the optimum.
A more profound compromise gives up not only the requirement that a solution
be optimal but also the insistence that a less than optimum solution be
guaranteed to fall within a specified range. Instead the assurance is given
that the solution will not often deviate far from the optimum. An underlying
assumption of such techniques is that the maps encountered in practice are
not concocted to confound basically plausible techniques; such maps are
encountered frequently only when they are constructed by computer scientists
to reveal the flaws in methods proposed by their colleagues. Indeed, if the
salesman's tour algorithms discussed above are applied to "natural" maps,
they deliver far more than they promise. The resulting tours are not 100
percent or 50 percent longer than the optimum but closer to 5 percent.
A reasonable assumption about the properties of many maps is that cities are
randomly placed. A theorem describing the statistical properties of optimum
tours through such randomly distributed points was proved in 1958 by Jillian
Beardwood, J. H. Halton and John M. Hammersley of the University of Oxford.
Relying on that theorem, Karp has shown that a simple method of constructing
tours almost always yields near-optimum results, when it is applied to maps
with many cities.
Karp begins by dividing the map into many small regions. Within each of these
regions the cities are sufficiently few to find the optimum tour by
exhaustive search, even though that method involves an exponential-time
algorithm. The tours of the small areas are then linked by a variant of the
greedy algorithm. Perhaps significantly, the method is not very different
from the method usually adopted by people solving the problem manually.
Efficient but approximate solutions can be found for many NP-complete
optimization problems. From the standpoint of mathematics, however, the
important question is whether NP is identical with P. The repeated failure of
attempts to find an efficient algorithm for the NP-complete problems has
created considerable confidence that NP and P are not the same. There is now
suspicion that they are not identical, but the proof of their distinctness
may be beyond present mathematical capabilities. The question may join that
select group of mathematical enigmas that remain unresolved for decades, and
the solution may have to await the development of new methods in mathematics. |