Vous êtes sur la page 1sur 454

Maths 1900-Present

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sun, 13 Jan 2013 05:42:31 UTC

Contents
Articles
Russell's paradox Principia Mathematica Koch snowflake Axiom of choice Jordan curve theorem Special relativity Intuitionism Intuitionistic logic Heyting arithmetic Intuitionistic type theory Constructive set theory Constructive analysis ZermeloFraenkel set theory Hairy ball theorem General relativity Hilbert's program Gdel's incompleteness theorems Travelling salesman problem Turing machine Binary number Ham sandwich theorem Enigma machine Colossus computer Game theory ENIAC Prisoner's dilemma Calculator George Plya How to Solve It Erds number Chaos theory Secretary problem Catastrophe theory Conway's Game of Life 1 6 19 24 34 37 55 60 66 67 72 75 76 83 86 119 121 138 153 169 184 187 208 217 236 244 254 269 272 275 280 295 301 306

Diophantine set P versus NP problem Public-key cryptography Fractal Four color theorem Logistic map Kepler conjecture Wiles's proof of Fermat's Last Theorem Millennium Prize Problems Hodge conjecture Poincar conjecture Riemann hypothesis YangMills existence and mass gap NavierStokes existence and smoothness Birch and Swinnerton-Dyer conjecture Grigori Perelman

315 318 329 339 351 361 367 371 378 381 385 394 418 420 423 427

References
Article Sources and Contributors Image Sources, Licenses and Contributors 436 446

Article Licenses
License 451

Russell's paradox

Russell's paradox
In the foundations of mathematics, Russell's paradox (also known as Russell's antinomy), discovered by Bertrand Russell in 1901, showed that the naive set theory created by Georg Cantor leads to a contradiction. The same paradox had been discovered a year before by Ernst Zermelo but he did not publish the idea, which remained known only to Hilbert, Husserl and other members of the University of Gttingen. According to naive set theory, any definable collection is a set. Let R be the set of all sets that are not members of themselves. If R qualifies as a member of itself, it would contradict its own definition as a set containing all sets that are not members of themselves. On the other hand, if such a set is not a member of itself, it would qualify as a member of itself by the same definition. This contradiction is Russell's paradox. Symbolically:

In 1908, two ways of avoiding the paradox were proposed, Russell's type theory and the Zermelo set theory, the first constructed axiomatic set theory. Zermelo's axioms went well beyond Frege's axioms of extensionality and unlimited set abstraction, and evolved into the now-canonical ZermeloFraenkel set theory (ZF).[1]

Informal presentation
Let us call a set "abnormal" if it is a member of itself, and "normal" otherwise. For example, take the set of all geometrical squares. That set is not itself a square, and therefore is not a member of the set of all squares. So it is "normal". On the other hand, if we take the complementary set that contains all non-squares, that set is itself not a square and so should be one of its own members. It is "abnormal". Now we consider the set of all normal sets, R. Determining whether R is normal or abnormal is impossible: If R were a normal set, it would be contained in the set of normal sets (itself), and therefore be abnormal; and if R were abnormal, it would not be contained in the set of all normal sets (itself), and therefore be normal. This leads to the conclusion that R is neither normal nor abnormal: Russell's paradox.

Formal presentation
Define Naive Set Theory (NST) as the theory of predicate logic with a binary predicate schema of unrestricted comprehension: and the following axiom

for any formula P with only the variable x free. Substitute (reusing the symbol y) and universal instantiation we have a contradiction. Therefore NST is inconsistent.

for

. Then by existential instantiation

Set-theoretic responses
In 1908, Ernst Zermelo proposed an axiomatization of set theory that avoided the paradoxes of naive set theory by replacing arbitrary set comprehension with weaker existence axioms, such as his axiom of separation (Aussonderung). Modifications to this axiomatic theory proposed in the 1920s by Abraham Fraenkel, Thoralf Skolem, and by Zermelo himself resulted in the axiomatic set theory called ZFC. This theory became widely accepted once Zermelo's axiom of choice ceased to be controversial, and ZFC has remained the canonical axiomatic set theory down to the present day. ZFC does not assume that, for every property, there is a set of all things satisfying that property. Rather, it asserts that given any set X, any subset of X definable using first-order logic exists. The object R discussed above cannot be

Russell's paradox constructed in this fashion, and is therefore not a ZFC set. In some extensions of ZFC, objects like R are called proper classes. ZFC is silent about types, although some argue that Zermelo's axioms tacitly presuppose a background type theory. In ZFC, given a set A, it is possible to define a set B that consists of exactly the sets in A that are not members of themselves. B cannot be in A by the same reasoning in Russell's Paradox. This variation of Russell's paradox shows that no set contains everything. Through the work of Zermelo and others, especially John von Neumann, the structure of what some see as the "natural" objects described by ZFC eventually became clear; they are the elements of the von Neumann universe, V, built up from the empty set by transfinitely iterating the power set operation. It is thus now possible again to reason about sets in a non-axiomatic fashion without running afoul of Russell's paradox, namely by reasoning about the elements of V. Whether it is appropriate to think of sets in this way is a point of contention among the rival points of view on the philosophy of mathematics. Other resolutions to Russell's paradox, more in the spirit of type theory, include the axiomatic set theories New Foundations and Scott-Potter set theory.

History
Russell discovered the paradox in May or June 1901.[2] By his own admission in his 1919 Introduction to Mathematical Philosophy, he "attempted to discover some flaw in Cantor's proof that there is no greatest cardinal".[3] In a 1902 letter,[4] he announced the discovery to Gottlob Frege of the paradox in Frege's 1879 Begriffsschrift and framed the problem in terms of both logic and set theory, and in particular in terms of Frege's definition of function; in the following, p.17 refers to a page in the original Begriffsschrift, and page 23 refers to the same page in van Heijenoort 1967: There is just one point where I have encountered a difficulty. You state (p. 17 [p. 23 above]) that a function too, can act as the indeterminate element. This I formerly believed, but now this view seems doubtful to me because of the following contradiction. Let w be the predicate: to be a predicate that cannot be predicated of itself. Can w be predicated of itself? From each answer its opposite follows. Therefore we must conclude that w is not a predicate. Likewise there is no class (as a totality) of those classes which, each taken as a totality, do not belong to themselves. From this I conclude that under certain circumstances a definable collection [Menge] does not form a totality.[5] Russell would go to cover it at length in his 1903 The Principles of Mathematics where he repeats his first encounter with the paradox:[6] Before taking leave of fundamental questions, it is necessary to examine more in detail the singular contradiction, already mentioned, with regard to predicates not predicable of themselves. ... I may mention that I was led to it in the endeavour to reconcile Cantor's proof...." Russell wrote to Frege about the paradox just as Frege was preparing the second volume of his Grundgesetze der Arithmetik.[7] Frege did not waste time responding to Russell, his letter dated 22 June 1902 appears, with van Heijenoort's commentary in Heijenoort 1967:126127. Frege then wrote an appendix admitting to the paradox,[8] and proposed a solution that Russell would endorse in his Principles of Mathematics,[9] but was later considered by some unsatisfactory.[10] For his part, Russell had his work at the printers and he added an appendix on the doctrine of types.[11] Ernst Zermelo in his (1908) A new proof of the possibility of a well-ordering (published at the same time he published "the first axiomatic set theory")[12] laid claim to prior discovery of the antinomy in Cantor's naive set theory. He states: "And yet, even the elementary form that Russell9 gave to the set-theoretic antinomies could have persuaded them [J. Knig, Jourdain, F. Bernstein] that the solution of these difficulties is not to be sought in the surrender of well-ordering but only in a suitable restriction of the notion of set".[13] Footnote 9 is where he stakes his

Russell's paradox claim:


9

1903, pp. 366368. I had, however, discovered this antinomy myself, independently of Russell, and had communicated it prior to 1903 to Professor Hilbert among others.[14] A written account of Zermelo's actual argument was discovered in the Nachlass of Edmund Husserl.[15] It is also known that unpublished discussions of set theoretical paradoxes took place in the mathematical community at the turn of the century. van Heijenoort in his commentary before Russell's 1902 Letter to Frege states that Zermelo "had discovered the paradox independently of Russell and communicated it to Hilbert, among others, prior to its publication by Russell".[16] In 1923, Ludwig Wittgenstein proposed to "dispose" of Russell's paradox as follows: The reason why a function cannot be its own argument is that the sign for a function already contains the prototype of its argument, and it cannot contain itself. For let us suppose that the function F(fx) could be its own argument: in that case there would be a proposition 'F(F(fx))', in which the outer function F and the inner function F must have different meanings, since the inner one has the form O(f(x)) and the outer one has the form Y(O(fx)). Only the letter 'F' is common to the two functions, but the letter by itself signifies nothing. This immediately becomes clear if instead of 'F(Fu)' we write '(do) : F(Ou) . Ou = Fu'. That disposes of Russell's paradox. (Tractatus Logico-Philosophicus, 3.333) Russell and Alfred North Whitehead wrote their three-volume Principia Mathematica (PM) hoping to achieve what Frege had been unable to do. They sought to banish the paradoxes of naive set theory by employing a theory of types they devised for this purpose. While they succeeded in grounding arithmetic in a fashion, it is not at all evident that they did so by purely logical means. While PM avoided the known paradoxes and allows the derivation of a great deal of mathematics, its system gave rise to new problems. In any event, Kurt Gdel in 193031 proved that while the logic of much of PM, now known as first-order logic, is complete, Peano arithmetic is necessarily incomplete if it is consistent. This is very widely though not universally regarded as having shown the logicist program of Frege to be impossible to complete.

Applied versions
There are some versions of this paradox that are closer to real-life situations and may be easier to understand for non-logicians. For example, the Barber paradox supposes a barber who shaves all men who do not shave themselves and only men who do not shave themselves. When one thinks about whether the barber should shave himself or not, the paradox begins to emerge. As another example, consider five lists of encyclopedia entries within the same encyclopedia:
List of articles about people: Ptolemy VII of Egypt Hermann Hesse Don Nix Don Knotts Nikola Tesla Sherlock Holmes Emperor Knin List of articles starting with the letter L: ... ... List of articles starting with the letter K List of articles starting with the letter L List of articles starting with the letter M L L!VE TV L&H List of articles about places: Leivonmki Katase River Enoshima List of articles about Japan: Emperor Showa Katase River Enoshima List of all lists that do not contain themselves: ... ... List of all lists that do not contain themselves? List of articles starting with the letter K List of articles starting with the letter M List of articles about Japan List of articles about places List of articles about people

Russell's paradox If the "List of all lists that do not contain themselves" contains itself, then it does not belong to itself and should be removed. However, if it does not list itself, then it should be added to itself. While appealing, these layman's versions of the paradox share a drawback: an easy refutation of the Barber paradox seems to be that such a barber does not exist, or at least does not shave (a variant of which is that the barber is a woman). The whole point of Russell's paradox is that the answer "such a set does not exist" means the definition of the notion of set within a given theory is unsatisfactory. Note the difference between the statements "such a set does not exist" and "it is an empty set". It is like the difference between saying, "There is no bucket", and saying, "The bucket is empty". A notable exception to the above may be the GrellingNelson paradox, in which words and meaning are the elements of the scenario rather than people and hair-cutting. Though it is easy to refute the Barber's paradox by saying that such a barber does not (and cannot) exist, it is impossible to say something similar about a meaningfully defined word. One way that the paradox has been dramatised is as follows: Suppose that every public library has to compile a catalog of all its books. Since the catalog is itself one of the library's books, some librarians include it in the catalog for completeness; while others leave it out as it being one of the library's books is self-evident. Now imagine that all these catalogs are sent to the national library. Some of them include themselves in their listings, others do not. The national librarian compiles two master catalogs one of all the catalogs that list themselves, and one of all those that don't. The question is: should these catalogs list themselves? The 'Catalog of all catalogs that list themselves' is no problem. If the librarian doesn't include it in its own listing, it is still a true catalog of those catalogs that do include themselves. If he does include it, it remains a true catalog of those that list themselves. However, just as the librarian cannot go wrong with the first master catalog, he is doomed to fail with the second. When it comes to the 'Catalog of all catalogs that don't list themselves', the librarian cannot include it in its own listing, because then it would include itself. But in that case, it should belong to the other catalog, that of catalogs that do include themselves. However, if the librarian leaves it out, the catalog is incomplete. Either way, it can never be a true catalog of catalogs that do not list themselves.

Applications and related topics


Russell-like paradoxes
As illustrated above for the Barber paradox, Russell's paradox is not hard to extend. Take: A transitive verb <V>, that can be applied to its substantive form. Form the sentence: The <V>er that <V>s all (and only those) who don't <V> themselves, Sometimes the "all" is replaced by "all <V>ers". An example would be "paint": The painter that paints all (and only those) that don't paint themselves. or "elect" The elector (representative), that elects all that don't elect themselves. Paradoxes that fall in this scheme include: The barber with "shave".

Russell's paradox The original Russell's paradox with "contain": The container (Set) that contains all (containers) that don't contain themselves. The GrellingNelson paradox with "describer": The describer (word) that describes all words, that don't describe themselves. Richard's paradox with "denote": The denoter (number) that denotes all denoters (numbers) that don't denote themselves. (In this paradox, all descriptions of numbers get an assigned number. The term "that denotes all denoters (numbers) that don't denote themselves" is here called Richardian.)

Related paradoxes
The liar paradox and Epimenides paradox, whose origins are ancient The KleeneRosser paradox, showing that the original lambda calculus is inconsistent, by means of a self-negating statement Curry's paradox (named after Haskell Curry), which does not require negation The smallest uninteresting integer paradox

Notes
[1] Set theory paradoxes (http:/ / www. suitcaseofdreams. net/ Set_theory_Paradox. htm) [2] Godehard Link (2004), One hundred years of Russell's paradox (http:/ / books. google. com/ ?id=Xg6QpedPpcsC& pg=PA350), p.350, ISBN978-3-11-017438-0, [3] Russell 1920:136 [4] Gottlob Frege, Michael Beaney (1997), The Frege reader (http:/ / books. google. com/ ?id=4ktC0UrG4V8C& pg=PA253), p.253, ISBN978-0-631-19445-3, . Also van Heijenoort 1967:124125 [5] Remarkably, this letter was unpublished until van Heijenoort 1967 it appears with van Heijenoort's commentary at van Heijenoort 1967:124125. [6] Russell 1903:101 [7] cf van Heijenoort's commentary before Frege's Letter to Russell in van Heijenoort 1967:126. [8] van Heijenoort's commentary, cf van Heijenoort 1967:126 ; Frege starts his analysis by this exceptionally honest comment : "Hardly anything more unfortunate can befall a scientific writer than to have one of the foundations of his edifice shaken after the work is finished. This was the position I was placed in by a letter of Mr Bertrand Russell, just when the printing of this volume was nearing its completion" (Appendix of Grundgesetze der Arithmetik, vol. II, in The Frege Reader, p.279, translation by Michael Beaney [9] cf van Heijenoort's commentary, cf van Heijenoort 1967:126. The added text reads as follows: " Note. The second volume of Gg., which appeared too late to be noticed in the Appendix, contains an interesting discussion of the contradiction (pp. 253265), suggesting that the solution is to be found by denying that two propositional functions that determine equal classes must be equivalent. As it seems very likely that this is the true solution, the reader is strongly recommended to examine Frege's argument on the point" (Russell 1903:522); The abbreviation Gg. stands for Frege's Grundgezetze der Arithmetik. Begriffsschriftlich abgeleitet. Vol. I. Jena, 1893. Vol. II. 1903. [10] Livio states that "While Frege did make some desperate attempts to remedy his axiom system, he was unsuccessful. The conclusion appeared to be disastrous...." Livio 2009:188. But van Heijenoort in his commentary before Frege's (1902) Letter to Russell describes Frege's proposed "way out" in some detail the matter has to do with the " 'transformation of the generalization of an equality into an equality of courses-of-values. For Frege a function is something incomplete, 'unsaturated' "; this seems to contradict the contemporary notion of a "function in extension"; see Frege's wording at page 128: "Incidentally, it seems to me that the expession 'a predicate is predicated of itself' is not exact. ...Therefore I would prefer to say that 'a concept is predicated of its own extension' [etc]". But he waffles at the end of his suggestion that a function-as-concept-in-extension can be written as predicated of its function. van Heijenoort cites Quine: "For a late and thorough study of Frege's "way out", see Quine 1955": "On Frege's way out", Mind 64, 145159; reprinted in Quine 1955b: Appendix. Completeness of quantification theory. Loewenheim's theorem, enclosed as a pamphlet with part of the third printing (1955) of Quine 1950 and incorporated in the revised edition (1959), 253260" (cf REFERENCES in van Heijenoort 1967:649) [11] Russell mentions this fact to Frege, cf van Heijenoort's commentary before Frege's (1902) Letter to Russell in van Heijenoort 1967:126 [12] van Heijenoort's commentary before Zermelo (1908a) Investigations in the foundations of set theory I in van Heijenoort 1967:199 [13] van Heijenoort 1967:190191. In the section before this he objects strenuously to the notion of impredicativity as defined by Poincar (and soon to be taken by Russell, too, in his 1908 Mathematical logic as based on the theory of types cf van Heijenoort 1967:150182). [14] Ernst Zermelo (1908) A new proof of the possibility of a well-ordering in van Heijenoort 1967:183198. Livio 2009:191 reports that Zermelo "discovered Russell's paradox independently as early as 1900"; Livio in turn cites Ewald 1996 and van Heijenoort 1967 (cf Livio 2009:268). [15] B. Rang and W. Thomas, "Zermelo's discovery of the 'Russell Paradox'", Historia Mathematica, v. 8 n. 1, 1981, pp. 1522. doi:10.1016/0315-0860(81)90002-1

Russell's paradox
[16] van Heijenoort 1967:124

References
Potter, Michael (15 January 2004), Set Theory and its Philosophy, Clarendon Press (Oxford University Press), ISBN978-0-19-926973-0 van Heijenoort, Jean (1967, third printing 1976), From Frege to Gdel: A Source Book in Mathematical Logic, 1979-1931, Cambridge, Massachusetts: Harvard University Press, ISBN0-674-32449-8 Livio, Mario (6 January 2009), Is God a Mathematician?, New York: Simon & Schuster, ISBN978-0-7432-9405-8

External links
Russell's Paradox (http://www.cut-the-knot.org/selfreference/russell.shtml) at Cut-the-Knot Stanford Encyclopedia of Philosophy: " Russell's Paradox (http://plato.stanford.edu/entries/russell-paradox/)" by A. D. Irvine. Inconsistent countable set,J.Foukzon (http://ru.scribd.com/doc/ 115667544?secret_password=2gzzmxsoylip718oxbvd)

Principia Mathematica
The Principia Mathematica is a three-volume work on the foundations of mathematics, written by Alfred North Whitehead and Bertrand Russell and published in 1910, 1912, and 1913. In 1927, it appeared in a second edition with an important Introduction To the Second Edition, an Appendix A that replaced 9 and an all-new Appendix C. PM, as it is often abbreviated, was an attempt to describe a set of axioms and inference rules in symbolic logic from which all mathematical truths could in principle be proven. As such, this ambitious project is of great importance in the history of mathematics and philosophy,[1] being one of the foremost products of the belief that such an undertaking may have been achievable. However, in 1931, Gdel's incompleteness theorem proved for good that PM, and in fact any other attempt, could never achieve this lofty goal; that is, for any set of axioms and inference rules proposed to encapsulate mathematics, there would in fact be some truths of mathematics which could not be deduced from them. One of the main inspirations and motivations for PM was the earlier work of Gottlob Frege on logic, which Russell discovered The title page of the shortened version of the Principia allowed for the construction of paradoxical sets. PM sought to Mathematica to *56 avoid this problem by ruling out the unrestricted creation of arbitrary sets. This was achieved by replacing the notion of a general set with notion of a hierarchy of sets of different 'types', a set of a certain type only allowed to contain sets of strictly lower types. Contemporary mathematics, however, avoids paradoxes such as Russell's in less unwieldy ways, such as the system of ZermeloFraenkel set theory.

Principia Mathematica PM is not to be confused with Russell's 1903 Principles of Mathematics. PM states: "The present work was originally intended by us to be comprised in a second volume of Principles of Mathematics... But as we advanced, it became increasingly evident that the subject is a very much larger one than we had supposed; moreover on many fundamental questions which had been left obscure and doubtful in the former work, we have now arrived at what we believe to be satisfactory solutions." The Modern Library placed it 23rd in a list of the top 100 English-language nonfiction books of the twentieth century.[2]

Scope of foundations laid


The Principia covered only set theory, cardinal numbers, ordinal numbers, and real numbers. Deeper theorems from real analysis were not included, but by the end of the third volume it was clear to experts that a large amount of known mathematics could in principle be developed in the adopted formalism. It was also clear how lengthy such a development would be. A fourth volume on the foundations of geometry had been planned, but the authors admitted to intellectual exhaustion upon completion of the third.

The construction of the theory of PM


As noted in the criticism of the theory by Kurt Gdel (below), unlike a Formalist theory, the "logicistic" theory of PM has no "precise statement of the syntax of the formalism". Another observation is that almost immediately in the theory, interpretations (in the sense of model theory) are presented in terms of truth-values for the behavior of the symbols "" (assertion of truth), "~" (logical not), and "V" (logical inclusive OR). Truth-values: PM embeds the notions of "truth" and "falsity" in the notion "primitive proposition". A raw (pure) Formalist theory would not provide the meaning of the symbols that form a "primitive proposition"the symbols themselves could be absolutely arbitrary and unfamiliar. The theory would specify only how the symbols behave based on the grammar of the theory. Then later, by assignment of "values", a model would specify an interpretation of what the formulas are saying. Thus in the formal Kleene symbol set below, the "interpretation" of what the symbols commonly mean, and by implication how they end up being used, is given in parentheses, e.g., " (not)". But this is not a pure Formalist theory.

The contemporary construction of a formal theory


The following formalist theory is offered as contrast to the logicistic theory of PM. A contemporary formal system would be constructed as follows: 1. Symbols used: This set is the starting set, and other symbols can appear but only by definition from these beginning symbols. A starting set might be the following set derived from Kleene 1952: logical symbols "" (implies, IF-THEN, ""), "&" (and), "V" (or), "" (not), "" (for all), "" (there exists); predicate symbol "=" (equals); function symbols "+" (arithmetic addition), "" (arithmetic multiplication), "'" (successor); individual symbol "0" (zero); variables "a", "b", "c", etc.; and parentheses "(" and ")".[3] 2. Symbol strings: The theory will build "strings" of these symbols by concatenation (juxtaposition).[4] 3. Formation rules: The theory specifies the rules of syntax (rules of grammar) usually as a recursive definition that starts with "0" and specifies how to build acceptable strings or "well-formed formulas" (wffs).[5] This includes a rule for "substitution".[6] of strings for the symbols called "variables" (as opposed to the other symbol-types). 4. Transformation rule(s): The axioms that specify the behaviors of the symbols and symbol sequences. 5. Rule of inference, detachment, modus ponens : The rule that allows the theory to "detach" a "conclusion" from the "premises" that led up to it, and thereafter to discard the "premises" (symbols to the left of the line , or symbols above the line if horizontal). If this were not the case, then substitution would result in longer and longer

Principia Mathematica strings that have to be carried forward. Indeed, after the application of modus ponens, nothing is left but the conclusion, the rest disappears forever. Contemporary theories often specify as their first axiom the classical or modus ponens or "the rule of detachment": A, A B B The symbol "" is usually written as a horizontal line, here "" means "implies". The symbols A and B are "stand-ins" for strings; this form of notation is called an "axiom schema" (i.e., there is a countable number of specific forms the notation could take). This can be read in a manner similar to IF-THEN but with a difference: given symbol string IF A and A implies B THEN B (and retain only B for further use). But the symbols have no "interpretation" (e.g., no "truth table" or "truth values" or "truth functions") and modus ponens proceeds mechanistically, by grammar alone.

The logicistic construction of the theory of PM


The theory of PM has both significant similarities, and similar differences, to a contemporary formal theory. Kleene states that "this deduction of mathematics from logic was offered as intuitive axiomatics. The axioms were intended to be believed, or at least to be accepted as plausible hypotheses concerning the world".[7] Indeed, unlike a Formalist theory that manipulates symbols according to rules of grammar, PM introduces the notion of "truth-values", i.e., truth and falsity in the real-world sense, and the "assertion of truth" almost immediately as the fifth and sixth elements in the structure of the theory (PM 1962:4-36): 1. Variables. 2. Uses of various letters. 3. The fundamental functions of propositions: "the Contradictory Function" symbolized by "~" and the "Logical Sum or Disjunctive Function" symbolized by "" being taken as primitive and logical implication defined (the following example also used to illustrate 9. Definition below) as p q .=. ~ p q Df. (PM 1962:11) and logical product defined as p . q .=. ~(~p ~q) Df. (PM 1962:12) (See more about the confusing "dots" used as both a grammatical device and as to symbolize logical conjunction (logical AND) at the section on notation.) 4. Equivalence: Logical equivalence, not arithmetic equivalence: "" given as a demonstration of how the symbols are used, i.e., "Thus ' p q ' stands for '( p q ) . ( q p )'." (PM 1962:7). Notice that to discuss a notation PM identifies a "meta"-notation with "[space] ... [space]":[8] Logical equivalence appears again as a definition: p q .=. ( p q ) . ( q p. ) (PM 1962:12), Notice the appearance of parentheses. This grammatical usage is not specified and appears sporadically; parentheses do play an important role in symbol strings, however, e.g., the notation "(x)" for the contemporary "x". 5. Truth-values: "The 'Truth-value' of a proposition is truth if it is true, and "falsehood if it is false" (this phrase is due to Frege) (PM 1962:7). 6. Assertion-sign: "'. p may be read 'it is true that' ... thus ':p..q ' means 'it is true that p implies q ', whereas '.p'.. q ' means ' p is true; therefore q is true'. The first of these does not necessarily involve the truth either of p or of q, while the second involves the truth of both" (PM 1962:92). 7. Inference: PM 's version of modus ponens. "[If] '. p ' and ' (p q)' have occurred, then ' . q ' will occur if it is desired to put it on record. The process of the inference cannot be reduced to symbols. Its sole record is the

Principia Mathematica occurrence of '. p ' [in other words, the symbols on the left disappear or can be erased]" (PM 1962:9). 8. The Use of Dots: See the section on notation. 9. Definitions: These use the "=" sign with "Df" at the right end. See the section on notation. 10. Summary of preceding statements: brief discussion of the primitive ideas "~ p" and "p q" and "" prefixed to a proposition. 11. Primitive propositions: the axioms or postulates. This was significantly modified in the 2nd edition. 12. Propositional functions: The notion of "proposition" was significantly modified in the 2nd edition, including the introduction of "atomic" propositions linked by logical signs to form "molecular" propositions, and the use of substitution of molecular propositions into atomic or molecular propositions to create new expressions. 13. The range of values and total variation. 14. Ambiguous assertion and the real variable: This and the next two sections were modified or abandoned in the 2nd edition. In particular, the distinction between the concepts defined in sections 15. Definition and the real variable and 16 Propositions connecting real and apparent variables was abandoned in the second edition. 17. Formal implication and formal equivalence. 18. Identity: See the section on notation. The symbol "=" indicates "predicate" or arithmetic equality. 19. Classes and relations. 20. Various descriptive functions of relations.

21. Plural descriptive functions. 22. Unit classes.

Primitive ideas
Cf. PM 1962:90-94, for the first edition: (1) Elementary propositions. (2) Elementary propositions of functions. (3) Assertion: introduces the notions of "truth" and "falsity". (4) Assertion of a propositional function. (5) Negation: "If p is any proposition, the proposition "not-p", or "p is false," will be represented by "~p" ". (6) Disjunction: "If p and q are any propositons, the proposition "p or q, i.e., "either p is true or q is true," where the alternatives are to be not mutually exclusive, will be represented by "p q" ". (cf. section B)

Primitive propositions (Pp)


The first edition (see discusion relative to the second edition, below) begins with a definition of the sign "" 1.01. p q .=. ~ p q. Df. 1.1. Anything implied by a true elementary proposition is true. Pp modus ponens (1.11 was abandoned in the second edition.) 1.2. : p p .. p. Pp principle of tautology 1.3. : q .. p q. Pp principle of addition 1.4. : p q .. q p. Pp principle of permutation 1.5. : p ( q r ) .. q ( p r ). Pp associative principle 1.6. :. q r .: p q .. p r. Pp principle of summation 1.7. If p is an elementary proposition, ~p is an elementary proposition. Pp 1.71. If p and q are elementary propositions, p q is an elementary proposition. Pp

Principia Mathematica 1.72. If p and p are elementary propositional functions which take elementary propositions as arguments, p p is an elementary proposition. Pp Together with the "Introduction to the Second Edition", the second edition's Appendix A abandons the entire section 9. This includes six primitive propositions 9 through 9.15 together with the Axioms of reducibility. The revised theory is made difficult by the introduction of the Sheffer stroke ("|") to symbolize "incompatibility" (i.e., if both elementary propositions p and q are true, their "stroke" p | q is false), the contemporary logical NAND (not-AND). In the revised theory, the Introduction presents the notion of "atomic proposition", a "datum" that "belongs to the philosophical part of logic". These have no parts that are propositions and do not contain the notions "all" or "some". For example: "this is red", or "this is earlier than that". Such things can exist ad finitum, i.e., even an "infinite eunumeration" of them to replace "generality" (i.e., the notion of "for all").[9] PM then "advance[s] to molecular propositions" that are all linked by "the stroke". Definitions give equivalences for "~", "", "", and ".". The new introduction defines "elementary propositions" as atomic and molecular positions together. It then replaces all the primitive propositions 1.2 to 1.72 with a single primitive proposition framed in terms of the stroke: "If p, q, r are elementary propositions, given p and p|(q|r), we can infer r. This is a primitive proposition." The new introduction keeps the notation for "there exists" (now recast as "sometimes true") and "for all" (recast as "always true"). Appendix A strengths the notion of "matrix" or "predicative function" (a "primitive idea", PM 1962:164) and presents four new Primitive propositions as 8.18.13. 88. Multiplicative axiom 102. Axiom of infinity

10

Notation used in PM
One author[1] observes that "The notation in that work has been superseded by the subsequent development of logic during the 20th century, to the extent that the beginner has trouble reading PM at all"; while much of the symbolic content can be converted to modern notation, the original notation itself is "a subject of scholarly dispute", and some notation "embod[y] substantive logical doctrines so that it cannot simply be replaced by contemporary symbolism".[10] Kurt Gdel was harshly critical of the notation: "It is to be regretted that this first comprehensive and thorough-going presentation of a mathematical logic and the derivation of mathematics from it [is] so greatly lacking in formal precision in the foundations (contained in 121 of Principia [i.e., sections 15 (propositional logic), 814 (predicate logic with identity/equality), 20(introduction to set theory), and 21 (introduction to relations theory)]) that it represents in this respect a considerable step backwards as compared with Frege. What is missing, above all, is a precise statement of the syntax of the formalism. Syntactical considerations are omitted even in cases where they are necessary for the cogency of the proofs".[11] This is reflected in the example below of the symbols "p", "q", "r" and "" that can be formed into the string "p q r". PM requires a definition of what this symbol-string means in terms of other symbols; in contemporary treatments the "formation rules" (syntactical rules leading to "well formed formulas") would have prevented the formation of this string. Source of the notation: Chapter I "Preliminary Explanations of Ideas and Notations" begins with the source of the notation: "The notation adopted in the present work is based upon that of Peano, and the following explanations are to some extent modelled on those which he prefixes to his Formulario Mathematico [i.e., Peano 1889]. His use of dots as brackets is adopted, and so are many of his symbols" (PM 1927:4).[12] PM adopts the assertion sign "" from Frege's 1879 Begriffsschrift:[13]

Principia Mathematica "(I)t may be read 'it is true that'"[14] Thus to assert a proposition p PM writes: ". p." (PM 1927:92) (Observe that, as in the original, the left dot is square and of greater size than the period on the right.)

11

An introduction to the notation of "Section A Mathematical Logic" (formulas 15.71)


PM 's dots[15] are used in a manner similar to parentheses. Later in section 14, brackets "[ ]" appear, and in sections 20 and following, braces "{ }" appear. Whether these symbols have specific meanings or are just for visual clarification is unclear. More than one dot indicates the "depth" of the parentheses, e.g., ".", ":" or ":.", "::", etc. Unfortunately for contemporary readers, the single dot (but also ":", ":.", "::", etc.) is used to symbolize "logical product" (contemporary logical AND often symbolized by "&" or ""). Logical implication is represented by Peano's "" simplified to "", logical negation is symbolized by an elongated tilde, i.e., "~" (contemporary "~" or ""), the logical OR by "v". The symbol "=" together with "Df" is used to indicate "is defined as", whereas in sections 13 and following, "=" is defined as (mathematically) "identical with", i.e., contemporary mathematical "equality" (cf. discussion in section 13). Logical equivalence is represented by "" (contemporary "if and only if"); "elementary" propositional functions are written in the customary way, e.g., "f(p)", but later the function sign appears directly before the variable without parenthesis e.g., "x", "x", etc. Example, PM introduces the definition of "logical product" as follows: 3.01. p . q .=. ~(~p v ~q) Df. where "p . q" is the logical product of p and q. 3.02. p q r .=. p q . q r Df. This definition serves merely to abbreviate proofs. Translation of the formulas into contemporary symbols: Various authors use alternate symbols, so no definitive translation can be given. However, because of criticisms such as that of Kurt Gdel below, the best contemporary treatments will be very precise with respect to the "formation rules" (the syntax) of the formulas. The first formula might be converted into modern symbolism as follows:[16] (p & q) =df (~(~p v ~q)) alternately (p & q) =df ((p v q)) alternately (p q) =df ((p v q)) etc. The second formula might be converted as follows: (p q r) =df (p q) & (q r) But note that this is not (logically) equivalent to (p (q r)) nor to ((p q) r), and these two are not logically equivalent either.

Principia Mathematica

12

An introduction to the notation of "Section B Theory of Apparent Variables" (formulas 814.34)


These sections concern what is now known as Predicate logic, and Predicate logic with identity (equality). NB: As a result of criticism and advances, the second edition of PM (1927) replaces 9 with a new 8 (Appendix A). This new section eliminates the first edition's distinction between real and apparent variables, and it eliminates "the primitive idea 'assertion of a propositional function'.[17] To add to the complexity of the treatment, 8 introduces the notion of substituting a "matrix", and the Sheffer stroke: Matrix: In contemporary usage, PM 's matrix is (at least for propositional functions), a truth table, i.e., all truth-values of a propositional or predicate function. Sheffer stroke: Is the contemporary logical NAND (NOT-AND), i.e., "incompatibility", meaning: "Given two propositions p and q, then ' p | q ' means "proposition p is incompatible with proposition q, i.e., if both propositions p and q evaluate as false, then p | q evaluates as true." After section 8 the Sheffer stroke sees no usage. Section 10: The existential and universal "operators": PM adds "(x)" to represent the contemporary symbolism "for all x " i.e., " x", and it uses a backwards serifed E to represent "there exists an x", i.e., "(x)", i.e., the contemporary "x". The typical notation would be similar to the following: "(x) . x" means "for all values of variable x, function evaluates to true" "(x) . x" means "for some value of variable x, function evaluates to true" Sections 10, 11, 12: Properties of a variable extended to all individuals: section 10 introduces the notion of "a property" of a "variable". PM gives the example: is a function that indicates "is a Greek", and indicates "is a man", and indicates "is a mortal" these functions then apply to a variable x. PM can now write, and evaluate: (x) . x The notation above means "for all x, x is a man". Given a collection of individuals, one can evaluate the above formula for truth or falsity. For example, given the restricted collection of individuals { Socrates, Plato, Russell, Zeus } the above evaluates to "true" if we allow for Zeus to be a man. But it fails for: (x) . x because Russell is not Greek. And it fails for (x) . x because Zeus is not a mortal. Equipped with this notation PM can create formulas to express the following: "If all Greeks are men and if all men are mortals then all Greeks are mortals". (PM 1962:138) (x) . x x :(x). x x :: (x) . x x Another example: the formula: 10.01. (x). x . = . ~(x) . ~x Df. means "The symbols representing the assertion 'There exists at least one x that satisfies function ' is defined by the symbols representing the assertion 'It's not true that, given all values of x, there are no values of x satisfying '". The symbolisms x and "x" appear at 10.02 and 10.03. Both are abbreviations for universality (i.e., for all) that bind the variable x to the logical operator. Contemporary notation would have simply used parentheses outside of the equality ("=") sign: 10.02 x x x .=. (x). x x Df Contemporary notation: x((x) (x)) (or a variant) 10.03 x x x .=. (x). x x Df

Principia Mathematica Contemporary notation: x((x) (x)) (or a variant) PM attributes the first symbolism to Peano. Section 11 applies this symbolism to two variables. Thus the following notations: x, y, x, y could all appear in a single formula. Section 12 reintroduces the notion of "matrix" (contemporary truth table), the notion of logical types, and in particular the notions of first-order and second-order functions and propositions. New symbolism " ! x" represents any value of a first-order function. If a circumflex "" is placed over a variable, then this is an "individual" value of y, meaning that "" indicates "individuals" (e.g., a row in a truth table); this distinction is necessary because of the matrix/extensional nature of propositional functions. Now equipped with the matrix notion, PM can assert its controversial axiom of reducibility: a function of one or two variables (two being sufficient for PM 's use) where all its values are given (i.e., in its matrix) is (logically) equivalent ("") to some "predicative" function of the same variables. The one-variable definition is given below as an illustration of the notation (PM 1962:166-167): 12.1 : ( f): x .x. f ! x Pp; Pp is a "Primitive proposition" ("Propositions assumed without proof") (PM 1962:12, i.e., contemporary "axioms"), adding to the 7 defined in section 1 (starting with 1.1 modus ponens). These are to be distinguished from the "primitive ideas" that include the assertion sign "", negation "~", logical OR "V", the notions of "elementary proposition" and "elementary propositional function"; these are as close as PM comes to rules of notational formation, i.e., syntax. This means: "We assert the truth of the following: There exists a function f with the property that: given all values of x, their evaluations in function (i.e., resulting their matrix) is logically equivalent to some f evaluated at those same values of x. (and vice versa, hence logical equivalence)". In other words: given a matrix determined by property applied to variable x, there exists a function f that, when applied to the x is logically equivalent to the matrix. Or: every matrix x can be represented by a function f applied to x, and vice versa. 13: The identity operator "=" : This is a definition that uses the sign in two different ways, as noted by the quote from PM: 13.01. x = y .=: (): ! x . . ! y Df means: "This definition states that x and y are to be called identical when every predicative function satisfied by x is also satisfied by y ... Note that the second sign of equality in the above definition is combined with "Df", and thus is not really the same symbol as the sign of equality which is defined." The not-equals sign "" makes its appearance as a definition at 13.02. 14: Descriptions: "A description is a phrase of the form "the term y which satisfies , where is some function satisfied by one and only one argument."[18] From this PM employes two new symbols, a forward "E" and an inverted iota "". Here is an example: 14.02. E ! ( y) (y) .=: ( b):y . y . y = b Df. This has the meaning: "The y satisfying exists," which holds when, and only when is satisfied by one value of y and by no other value." (PM 1967:173-174)

13

Principia Mathematica

14

Introduction to the notation of the theory of classes and relations


The text leaps from section 14 directly to the foundational sections 20 GENERAL THEORY OF CLASSES and 21 GENERAL THEORY OF RELATIONS. "Relations" are what known in contemporary set theory as ordered pairs. Sections 20 and 22 introduce many of the symbols still in contemporary usage. These include the symbols "", "", "", "", "", "", and "V": "" signifies "is an element of" (PM 1962:188); "" (22.01) signifies "is contained in", "is a subset of"; "" (22.02) signifies the intersection (logical product) of classes (sets); "" (22.03) signifies the union (logical sum) of classes (sets); "" (22.03) signifies negation of a class (set); "" signifies the null class; and "V" signifies the universal class or universe of discourse. Small Greek letters (other than "", "", "", "", "", "", and "") represent classes (e.g., "", "", "", "", etc.) (PM 1962:188): x "The use of single letter in place of symbols such as (z) or ( ! z) is practicallly almost indispensable, since otherwise the notation rapidly becomes intolerably cumbrous. Thus ' x ' will mean ' x is a member of the class '". (PM 1962:188) = V The union of a set and its inverse is the universal (completed) set.[19] = The intersection of a set and its inverse is the null (empty) set. When applied to relations in section 23 CALCULUS OF RELATIONS, the symbols "", "", "", and "" acquire a dot: for example: "", "".[20] The notion, and notation, of "a class" (set): In the first edition PM asserts that no new primitive ideas are necessary to define what is meant by "a class", and only two new "primitive propositions" called the axioms of reducibility for classes and relations respectively (PM 1962:25).[21] But before this notion can be defined, PM feels it necessary to create a peculiar notation "(z)" that it calls a "fictitious object". (PM 1962:188) : x (z) .. (x) "i.e., ' x is a member of the class determined by ()' is [logically] equivalent to ' x satisfies (),' or to '(x) is true.'". (PM 1962:25) At least PM can tell the reader how these fictitious objects behave, because "A class is wholly determinate when its membership is known, that is, there cannot be two different classes having he same membership" (PM 1962:26). This is symbolized by the following equality (similar to 13.01 above: (z) = (z) . : (x): x .. x "This last is the distinguishing characteristic of classes, and justifies us in treating (z) as the class determined by [the function] ." (PM 1962:188) Perhaps the above can be made clearer by the discussion of classes in Introduction to the 2nd Edition, which disposes of the Axiom of Reducibility and replaces it with the notion: "All functions of functions are extensional" (PM 1962:xxxix), i.e., x x x .. (x): () () (PM 1962:xxxix) This has the reasonable meaning that "IF for all values of x the truth-values of the functions and of x are [logically] equivalent, THEN the function of a given and of are [logically] equivalent." PM asserts this is "obvious": "This is obvious, since can only occur in () by the substitution of values of for p, q, r, ... in a [logical-] function, and, if x x, the substitution of x for p in a [logical-] function gives the same truth-value to the truth-function as the substitution of x. Consequently there is no longer any reason to distinguish between

Principia Mathematica functions classes, for we have, in virtue of the above, x x x .. (x). = . ". Observe the change to the equality "=" sign on the right. PM goes on to state that will continue to hang onto the notation "(z)", but this is merely equivalent to , and this is a class. (all quotes: PM 1962:xxxix).

15

Consistency and criticisms


According to Carnap's "Logicist Foundations of Mathematics", Russell wanted a theory that could plausibly be said to derive all of mathematics from purely logical axioms. However, Principia Mathematica required, in addition to the basic axioms of type theory, three further axioms that seemed to not be true as mere matters of logic, namely the axiom of infinity, the axiom of choice, and the axiom of reducibility. Since the first two were existential axioms, Russell phrased mathematical statements depending on them as conditionals. But reducibility was required to be sure that the formal statements even properly express statements of real analysis, so that statements depending on it could not be reformulated as conditionals. Frank P. Ramsey tried to argue that Russell's ramification of the theory of types was unnecessary, so that reducibility could be removed, but these arguments seemed inconclusive. Beyond the status of the axioms as logical truths, the questions remained: whether a contradiction could be derived from the Principia's axioms (the question of inconsistency), and whether there exists a mathematical statement which could neither be proven nor disproven in the system (the question of completeness). Propositional logic itself was known to be consistent, but the same had not been established for Principia's axioms of set theory. (See Hilbert's second problem.)

Gdel 1930, 1931


In 1930, Gdel's completeness theorem showed that propositional logic itself was complete in a much weaker sensethat is, any sentence that is unprovable from a given set of axioms must actually be false in some model of the axioms. However, this is not the stronger sense of completeness desired for Principia Mathematica, since a given system of axioms (such as those of Principia Mathematica) may have many models, in some of which a given statement is true and in others of which that statement is false, so that the statement is left undecided by the axioms. Gdel's incompleteness theorems cast unexpected light on these two related questions. Gdel's first incompleteness theorem showed that Principia could not be both consistent and complete. According to the theorem, within every sufficiently powerful logical system (such as Principia), there exists a statement G that essentially reads, "The statement G cannot be proved." Such a statement is a sort of Catch-22: if G is provable, then it is false, and the system is therefore inconsistent; and if G is not provable, then it is true, and the system is therefore incomplete. Gdel's second incompleteness theorem (1931) shows that no formal system extending basic arithmetic can be used to prove its own consistency. Thus, the statement "there are no contradictions in the Principia system" cannot be proven in the Principia system unless there are contradictions in the system (in which case it can be proven both true and false).

Wittgenstein 1919, 1939


By the second edition of PM, Russell had removed his axiom of reducibility to a new axiom (although he does not state it as such). Gdel 1944:126 describes it this way: "This change is connected with the new axiom that functions can occur in propositions only "through their values", i.e., extensionally . . . [this is] quite unobjectionable even from the constructive standpoint . . . provided that quantifiers are always restricted to definite orders". This change from a quasi-intensional stance to a fully extensional stance also restricts predicate logic to the second order, i.e. functions of functions: "We can decide that mathematics is to confine itself to functions of functions which obey the above

Principia Mathematica assumption" (PM 2nd Edition p.401, Appendix C). This new proposal resulted in a dire outcome. An "extensional stance" and restriction to a second-order predicate logic means that a propositional function extended to all individuals such as "All 'x' are blue" now has to list all of the 'x' that satisfy (are true in) the proposition, listing them in a possibly infinite conjunction: e.g. x1 V x2 V . . . V xn V . . .. Ironically, this change came about as the result of criticism from Wittgenstein in his 1919 Tractatus Logico-Philosophicus. As described by Russell in the Preface to the 2nd edition of PM: "There is another course, recommended by Wittgenstein (Tractatus Logico-Philosophicus, *5.54ff) for philosophical reasons. This is to assume that functions of propositions are always truth-functions, and that a function can only occur in a proposition through its values. . . . [Working through the consequences] it appears that everything in Vol. I remains true . . . the theory of inductive cardinals and ordinals survives; but it seems that the theory of infinite Dedekindian and well-ordered series largely collapses, so that irrationals, and real numbers generally, can no longer be adequately dealt with. Also Cantor's proof that 2n > n breaks down unless n is finite." (PM 2nd edition reprinted 1962:xiv, also cf new Appendix C). In other words, the fact that an infinite list cannot realistically be specified means that the concept of "number" in the infinite sense (i.e. the continuum) cannot be described by the new theory proposed in PM Second Edition. Wittgenstein in his Lectures on the Foundations of Mathematics, Cambridge 1939 criticised Principia on various grounds, such as: It purports to reveal the fundamental basis for arithmetic. However, it is our everyday arithmetical practices such as counting which are fundamental; for if a persistent discrepancy arose between counting and Principia, this would be treated as evidence of an error in Principia (e.g., that Principia did not characterize numbers or addition correctly), not as evidence of an error in everyday counting. The calculating methods in Principia can only be used in practice with very small numbers. To calculate using large numbers (e.g., billions), the formulae would become too long, and some short-cut method would have to be used, which would no doubt rely on everyday techniques such as counting (or else on non-fundamental and hence questionable methods such as induction). So again Principia depends on everyday techniques, not vice versa. Wittgenstein did, however, concede that Principia may nonetheless make some aspects of everyday arithmetic clearer.

16

Gdel 1944
In his 1944 Russell's mathematical logic, Gdel offers a "critical but sympathetic discussion of the logicistic order of ideas"[22]: "It is to be regretted that this first comprehensive and thorough-going presentation of a mathematical logic and the derivation of mathematics from it [is] so greatly lacking in formal precision in the foundations (contained in *1-*21 of Principia) that it represents in this respect a considerable step backwards as compared with Frege. What is missing, above all, is a precise statement of the syntax of the formalism. Syntactical considerations are omitted even in cases where they are necessary for the cogency of the proofs . . . The matter is especially doubtful for the rule of substitution and of replacing defined symbols by their definiens . . . it is chiefly the rule of substitution which would have to be proved" (Gdel 1944:124)[23]

Principia Mathematica

17

Quotations
"From this proposition it will follow, when arithmetical addition has been defined, that 1+1=2." Volume I, 1st edition, page 379 [24] (page 362 in 2nd edition; page 360 in abridged version). (The proof is actually completed in Volume II, 1st edition, page 86 [25], accompanied by the comment, "The above proposition is occasionally useful.")

54.43: From this proposition it will follow, ... that 1+1=2

Footnotes
[1] Irvine, Andrew D. (2003-05-01). "Principia Mathematica (Stanford Encyclopedia of Philosophy)" (http:/ / plato. stanford. edu/ entries/ principia-mathematica/ #SOPM). Metaphysics Research Lab, CSLI, Stanford University. . Retrieved 2009-08-05. [2] "The Modern Library's Top 100 Nonfiction Books of the Century" (http:/ / www. nytimes. com/ library/ books/ 042999best-nonfiction-list. html). The New York Times Company. 1999-04-30. . Retrieved 2009-08-05. [3] This set is taken from Kleene 1952:69 substituting for . [4] Kleene 1952:71, Enderton 2001:15 [5] Enderton 2001:16 [6] This is the word used by Kleene 1952:78 [7] Quote from Kleene 1952:45. See discussion LOGICISM at pages 43-46. [8] In his section 8.5.4 Groping towards metalogic Grattain-Guiness 2000:454ff discusses the American logicians' critical reception of the second edition of PM. For instance Sheffer "puzzled that ' In order to give an account of logic, we must presuppose and employ logic ' " (p. 452). And Bernstein ended his 1926 review with the comment that "This distinction between the propositional logic as a mathematical system and as a language must be made, if serious errors are to be avoided; this distinction the Principia does not make" (p.454). [9] This idea is due to Wittgenstein's Tractatus. See the discussion at PM 1962:xivxv) [10] http:/ / plato. stanford. edu/ entries/ pm-notation/ [11] Kurt Gdel 1944 "Russell's mathematical logic" appearing at page 120 in Feferman et. al. 1990 Kurt Gdel Collected Works Volume II, Oxford University Press, NY, ISBN 978-0-19-514721-6(v.2.pbk.) . [12] For comparison, see the translated portion of Peano 1889 in van Heijenoort 1967:81ff. About the only major change I can see is the substitution of for as used by Peano. [13] This work can be found at van Heijenoort 1967:1ff. [14] And see footnote, both at PM 1927:92 [15] The original typography is a square of a heavier weight than the conventional period. [16] The first example comes from plato.stanford.edu (loc.cit.). [17] page xiii of 1927 appearing in the 1962 paperback edition to 56. [18] The original typography employs an x with a circumflex rather than ; this continues below [19] See the ten postulates of Huntington, in particular postulates IIa and IIb at PM 1962:205 and discussion at page 206. [20] The "" sign has a dot inside it, and the intersection sign "" has a dot above it; these are not available in the Arial Unicode MS font. [21] Wiener 1914 "A simplification of the logic of relations" (van Hejenoort 1967:224ff) disposed of the second of these when he showed how to reduce the theory of relations to that of classes [22] Kleene 1952:46. [23] Gdel 1944 Russell's mathematical logic in Kurt Gdel: Collected Works Volume II, Oxford University Press, New York, NY, ISBN 0-19-514721 . [24] http:/ / quod. lib. umich. edu/ cgi/ t/ text/ pageviewer-idx?c=umhistmath& cc=umhistmath& idno=aat3201. 0001. 001& frm=frameset& view=image& seq=401

Principia Mathematica
[25] http:/ / quod. lib. umich. edu/ cgi/ t/ text/ pageviewer-idx?c=umhistmath& cc=umhistmath& idno=aat3201. 0002. 001& frm=frameset& view=image& seq=126

18

References
Primary: Whitehead, Alfred North, and Bertrand Russell. Principia Mathematica, 3 vols, Cambridge University Press, 1910, 1912, and 1913. Second edition, 1925 (Vol. 1), 1927 (Vols 2, 3). Abridged as Principia Mathematica to *56, Cambridge University Press, 1962. Alfred North Whitehead; Bertrand Russell (February 2009). Principia Mathematica. Volume One. Merchant Books. ISBN978-1-60386-182-3. Alfred North Whitehead; Bertrand Russell (February 2009). Principia Mathematica. Volume Two. Merchant Books. ISBN978-1-60386-183-0. Alfred North Whitehead; Bertrand Russell (February 2009). Principia Mathematica. Volume Three. Merchant Books. ISBN978-1-60386-184-7. Secondary: Stephen Kleene 1952 Introduction to Meta-Mathematics, 6th Reprint, North-Holland Publishing Company, Amsterdam NY, ISBN 0-7204-2103-9. Stephen Cole Kleene; Michael Beeson (March 2009). Introduction to Metamathematics (Paperback ed.). Ishi Press. ISBN978-0-923891-57-2. Ivor Grattan-Guinness (2000) The Search for Mathematical Roots 1870-1940, Princeton University Press, Princeton N.J., ISBN 0-691-05857-1 (alk. paper). Ludwig Wittgenstein 2009 Major Works: Selected Philosophical Writings, HarperrCollins, NY, NY, ISBN 978-0-06-155024-9. In particular: Tractatus Logico-Philosophicus (Vienna 1918, original publication in German). Jean van Heijenoort editor 1967 From Frege to Gdel: A Source book in Mathematical Logic, 1879-1931, 3rd printing, Harvard University Press, Cambridge MA, ISBN 0-674-32449-8 (pbk.)

External links
Stanford Encyclopedia of Philosophy: Principia Mathematica (http://plato.stanford.edu/entries/principia-mathematica/)by A. D. Irvine. The Notation in Principia Mathematica (http://plato.stanford.edu/entries/pm-notation/)by Bernard Linsky. Principia Mathematica online (University of Michigan Historical Math Collection): Volume I (http://www.hti.umich.edu/cgi/b/bib/bibperm?q1=AAT3201.0001.001) Volume II (http://www.hti.umich.edu/cgi/b/bib/bibperm?q1=AAT3201.0002.001) Volume III (http://www.hti.umich.edu/cgi/b/bib/bibperm?q1=AAT3201.0003.001) Proposition 54.43 (http://us.metamath.org/mpegif/pm54.43.html) in a more modern notation (Metamath)

Koch snowflake

19

Koch snowflake
The Koch snowflake (also known as the Koch star and Koch island[1]) is a mathematical curve and one of the earliest fractal curves to have been described. It is based on the Koch curve, which appeared in a 1904 paper titled "On a continuous curve without tangents, constructible from elementary geometry" (original French title: Sur une courbe continue sans tangente, obtenue par une construction gomtrique lmentaire) by the Swedish mathematician Helge von Koch.

Construction
The Koch snowflake can be constructed by starting with an equilateral triangle, then recursively altering each line segment as follows: 1. divide the line segment into three segments of equal length. 2. draw an equilateral triangle that has the middle segment from step 1 as its base and points outward. 3. remove the line segment that is the base of the triangle from step 2. After one iteration of this process, the resulting shape is the outline of a hexagram. The Koch snowflake is the limit approached as the above steps are followed over and over again. The Koch curve originally described by Koch is constructed with only one of the three sides of the original triangle. In other words, three Koch curves make a Koch snowflake.

The first four iterations of the Koch snowflake

Properties
The Koch curve has an infinite length because each time the steps above are performed on each line segment of
The first seven iterations in animation

the

figure

there

are

four

times

as

many

line

Koch snowflake

20

segments, the length of each being one-third the length of the segments in the previous stage. Hence, the total length increases by one third and thus the length at step n will be (4/3)n of the original triangle perimeter: the fractal dimension is log 4/log 3 1.26186, greater than the dimension of a line (1) but less than Peano's space-filling curve (2). The Koch curve is continuous everywhere but differentiable nowhere.
The Koch curve

Taking s as the side length, the original triangle area is

. The

side length of each successive small triangle is 1/3 of those in the previous iteration; because the area of the added triangles is proportional to the square of its side length, the area of each triangle added in the nth step is 1/9 of that in the (n-1)th step. In each iteration after the first, 4 times as many triangles are added as in the previous iteration; because the first iteration adds 3 triangles then the nth iteration will add triangles. Combining these two formulae gives the iteration formula:

where

is area of the original triangle. Substituting in

and expanding yields:

In the limit, as n goes to infinity, the limit of the sum of the powers of 4/9 is 4/5, so

So the area of a Koch snowflake is 8/5 of the area of the original triangle, or perimeter of the Koch triangle encloses a finite area. It is possible to tessellate the plane by copies of Koch snowflakes in two different sizes. However, such a tessellation is not possible using only snowflakes of the same size as each other. Since each Koch snowflake in the tessellation can be subdivided into seven smaller snowflakes of two different sizes, it is also possible to find tessellations that use more than two sizes at once.[3]

.[2] Therefore the infinite

Thue-Morse Sequence and Turtle graphics


A Turtle Graphic is the curve that is generated if an automaton is programmed with a sequence. If the ThueMorse sequence members are used in order to select program states: If t(n) = 0, move ahead by one unit, If t(n) = 1, rotate counterclockwise by an angle of /3,
Tessellation by two sizes of Koch snowflake

Koch snowflake the resulting curve converges to the Koch snowflake.

21

Representation as Lindenmayer system


The Koch Curve can be expressed by a rewrite system (Lindenmayer system). Alphabet : F Constants : +, Axiom : F++F++F Production rules: F FF++FF Here, F means "draw forward", + means "turn right 60", and means "turn left 60".

Variants of the Koch curve


Following von Koch's concept, several variants of the Koch curve were designed, considering right angles (quadratic), other angles (Csaro) or circles and their extensions to higher dimensions (Sphereflake):
Variant 1D, 85 angle Illustration Construction The Cesaro fractal is a variant of the Koch curve with an angle between 60 and 90 (here 85).

Cesaro fractal 1D, 90 angle

The first 2 iterations Quadratic type 1 curve 1D, 90 angle

Quadratic type 2 curve

The first 2 iterations. Its fractal dimension equals 1.5 and is exactly half-way between dimension 1 and 2. It is therefore often chosen when studying the physical properties of non-integer fractal objects.

Koch snowflake

22

1D, ln 3/ln (5)

The first 2 iterations. Its fractal dimension equals ln 3/ln (5)=1.37.

Quadratic flake 1D, ln 3.33/ln (5) Another variation. Its fractal dimension equals ln 3.33/ln (5)=1.49.

Quadratic Cross 2D, triangles

The first 3 iterations of a natural extension of the Koch curve in 2 dimensions von Koch surface

Koch snowflake

23
Extension of the quadratic type 1 curve. The illustration at left shows the fractal after the second iteration

2D, 90 angle

Quadratic type 1 surface

Animation quadratic surface . 2D, 90 angle Extension of the quadratic type 2 curve. The illustration at left shows the fractal after the first iteration.

Quadratic type 2 surface 3D, spheres Eric Haines has developed the sphereflake fractal, which is a three-dimensional version of the Koch snowflake, using spheres.

Closeup of Haines sphereflake

Koch snowflake

24

References
[1] Addison, Paul S. Fractals and Chaos - An Illustrated Course. Institute of Physics (IoP) Publishing (1997) ISBN 0-7503-0400-6 - Page 19 [2] Koch Snowflake (http:/ / ecademy. agnesscott. edu/ ~lriddle/ ifs/ ksnow/ ksnow. htm) [3] Burns, Aidan (1994), "78.13 Fractal tilings", Mathematical Gazette 78 (482): 193196, JSTOR3618577.

Edward Kasner & James Newman, Mathematics and the Imagination Dover Press reprint of Simon & Schuster (1940) ISBN 0-486-41703-4, pp 34451.

External links
von Koch Curve (http://www.efg2.com/Lab/FractalsAndChaos/vonKochCurve.htm) The Koch snowflake in Mathworld (http://mathworld.wolfram.com/KochSnowflake.html) Application of the Koch curve to an antenna (http://www.qsl.net/kb7qhc/antenna/fractal/Triadic Koch/ review.htm) "A mathematical analysis of the Koch curve and quadratic Koch curve" (https://ujdigispace.uj.ac.za/bitstream/ handle/10210/1941/2Mathematical.pdf?sequence=2) (pdf). Retrieved 22 November 2011.

Axiom of choice
In mathematics, the axiom of choice, or AC, is an axiom of set theory equivalent to the statement that "the product of a collection of non-empty sets is non-empty". More explicitly, it states that for every indexed family of nonempty sets there exists an indexed family of elements such that for every . The axiom of choice was formulated in 1904 by Ernst Zermelo in order to formalize his proof of the well-ordering theorem.[1] Informally put, the axiom of choice says that given any collection of bins, each containing at least one object, it is possible to make a selection of exactly one object from each bin. In many cases such a selection can be made without invoking the axiom of choice; this is in particular the case if the number of bins is finite, or if a selection rule is available: a distinguishing property that happens to hold for exactly one object in each bin. For example for any (even infinite) collection of pairs of shoes, one can pick out the left shoe from each pair to obtain an appropriate selection, but for an infinite collection of pairs of socks (assumed to have no distinguishing features), such a selection can be obtained only by invoking the axiom of choice. Although originally controversial, the axiom of choice is now used without reservation by most mathematicians,[2] and it is included in ZFC, the standard form of axiomatic set theory. One motivation for this use is that a number of generally accepted mathematical results, such as Tychonoff's theorem, require the axiom of choice for their proofs. Contemporary set theorists also study axioms that are not compatible with the axiom of choice, such as the axiom of determinacy. The axiom of choice is avoided in some varieties of constructive mathematics, although there are varieties of constructive mathematics in which the axiom of choice is embraced.

Statement
A choice function is a function f, defined on a collection X of nonempty sets, such that for every set s in X, f(s) is an element of s. With this concept, the axiom can be stated: For any set X of nonempty sets, there exists a choice function f defined on X. Thus the negation of the axiom of choice states that there exists a set of nonempty sets which has no choice function. Each choice function on a collection X of nonempty sets is an element of the Cartesian product of the sets in X. This is not the most general situation of a Cartesian product of a family of sets, where a same set can occur more than once as a factor; however, one can focus on elements of such a product that select the same element every time a given set appears as factor, and such elements correspond to an element of the Cartesian product of all distinct sets in

Axiom of choice the family. The axiom of choice asserts the existence of such elements; it is therefore equivalent to: Given any family of nonempty sets, their Cartesian product is a nonempty set.

25

Nomenclature ZF, AC, and ZFC


In this article and other discussions of the Axiom of Choice the following abbreviations are common: AC the Axiom of Choice. ZF ZermeloFraenkel set theory omitting the Axiom of Choice. ZFC ZermeloFraenkel set theory, extended to include the Axiom of Choice.

Variants
There are many other equivalent statements of the axiom of choice. These are equivalent in the sense that, in the presence of other basic axioms of set theory, they imply the axiom of choice and are implied by it. One variation avoids the use of choice functions by, in effect, replacing each choice function with its range. Given any set X of pairwise disjoint non-empty sets, there exists at least one set C that contains exactly one element in common with each of the sets in X.[3] This guarantees for any partition of a set X the existence of a subset C of X containing exactly one element from each part of the partition. Another equivalent axiom only considers collections X that are essentially powersets of other sets: For any set A, the power set of A (with the empty set removed) has a choice function. Authors who use this formulation often speak of the choice function on A, but be advised that this is a slightly different notion of choice function. Its domain is the powerset of A (with the empty set removed), and so makes sense for any set A, whereas with the definition used elsewhere in this article, the domain of a choice function on a collection of sets is that collection, and so only makes sense for sets of sets. With this alternate notion of choice function, the axiom of choice can be compactly stated as Every set has a choice function.[4] which is equivalent to For any set A there is a function f such that for any non-empty subset B of A, f(B) lies in B. The negation of the axiom can thus be expressed as: There is a set A such that for all functions f (on the set of non-empty subsets of A), there is a B such that f(B) does not lie in B.

Restriction to finite sets


The statement of the axiom of choice does not specify whether the collection of nonempty sets is finite or infinite, and thus implies that every finite collection of nonempty sets has a choice function. However, that particular case is a theorem of ZermeloFraenkel set theory without the axiom of choice (ZF); it is easily proved by mathematical induction.[5] In the even simpler case of a collection of one set, a choice function just corresponds to an element, so this instance of the axiom of choice says that every nonempty set has an element; this holds trivially. The axiom of choice can be seen as asserting the generalization of this property, already evident for finite collections, to arbitrary collections.

Axiom of choice

26

Usage
Until the late 19th century, the axiom of choice was often used implicitly, although it had not yet been formally stated. For example, after having established that the set X contains only non-empty sets, a mathematician might have said "let F(s) be one of the members of s for all s in X." In general, it is impossible to prove that F exists without the axiom of choice, but this seems to have gone unnoticed until Zermelo. Not every situation requires the axiom of choice. For finite sets X, the axiom of choice follows from the other axioms of set theory. In that case it is equivalent to saying that if we have several (a finite number of) boxes, each containing at least one item, then we can choose exactly one item from each box. Clearly we can do this: We start at the first box, choose an item; go to the second box, choose an item; and so on. The number of boxes is finite, so eventually our choice procedure comes to an end. The result is an explicit choice function: a function that takes the first box to the first element we chose, the second box to the second element we chose, and so on. (A formal proof for all finite sets would use the principle of mathematical induction to prove "for every natural number k, every family of k nonempty sets has a choice function.") This method cannot, however, be used to show that every countable family of nonempty sets has a choice function, as is asserted by the axiom of countable choice. If the method is applied to an infinite sequence (Xi : i) of nonempty sets, a function is obtained at each finite stage, but there is no stage at which a choice function for the entire family is constructed, and no "limiting" choice function can be constructed, in general, in ZF without the axiom of choice.

Examples
The nature of the individual nonempty sets in the collection may make it possible to avoid the axiom of choice even for certain infinite collections. For example, suppose that each member of the collection X is a nonempty subset of the natural numbers. Every such subset has a smallest element, so to specify our choice function we can simply say that it maps each set to the least element of that set. This gives us a definite choice of an element from each set, and makes it unnecessary to apply the axiom of choice. The difficulty appears when there is no natural choice of elements from each set. If we cannot make explicit choices, how do we know that our set exists? For example, suppose that X is the set of all non-empty subsets of the real numbers. First we might try to proceed as if X were finite. If we try to choose an element from each set, then, because X is infinite, our choice procedure will never come to an end, and consequently, we will never be able to produce a choice function for all of X. Next we might try specifying the least element from each set. But some subsets of the real numbers do not have least elements. For example, the open interval (0,1) does not have a least element: if x is in (0,1), then so is x/2, and x/2 is always strictly smaller than x. So this attempt also fails. Additionally, consider for instance the unit circle S, and the action on S by a group G consisting of all rational rotations. Namely, these are rotations by angles which are rational multiples of . Here G is countable while S is uncountable. Hence S breaks up into uncountably many orbits under G. Using the axiom of choice, we could pick a single point from each orbit, obtaining an uncountable subset X of S with the property that all of its translates by G are disjoint from X. The set of those translates partitions the circle into a countable collection of disjoint sets, which are all pairwise congruent. Since X isn't measurable for any rotation-invariant countably additive finite measure on S, finding an algorithm to select a point in each orbit requires the axiom of choice. See non-measurable set for more details. The reason that we are able to choose least elements from subsets of the natural numbers is the fact that the natural numbers are well-ordered: every nonempty subset of the natural numbers has a unique least element under the natural ordering. One might say, "Even though the usual ordering of the real numbers does not work, it may be possible to find a different ordering of the real numbers which is a well-ordering. Then our choice function can choose the least element of every set under our unusual ordering." The problem then becomes that of constructing a well-ordering, which turns out to require the axiom of choice for its existence; every set can be well-ordered if and

Axiom of choice only if the axiom of choice holds.

27

Criticism and acceptance


A proof requiring the axiom of choice may establish the existence of an object without explicitly defining the object in the language of set theory. For example, while the axiom of choice implies that there is a well-ordering of the real numbers, there are models of set theory with the axiom of choice in which no well-ordering of the reals is definable. Similarly, although a subset of the real numbers that is not Lebesgue measurable can be proven to exist using the axiom of choice, it is consistent that no such set is definable. The axiom of choice produces these intangibles (objects that are proven to exist, but which cannot be explicitly constructed), which may conflict with some philosophical principles. Because there is no canonical well-ordering of all sets, a construction that relies on a well-ordering may not produce a canonical result, even if a canonical result is desired (as is often the case in category theory). This has been used as an argument against the use of the axiom of choice. Another argument against the axiom of choice is that it implies the existence of counterintuitive objects. One example is the BanachTarski paradox which says that it is possible to decompose ("carve up") the 3-dimensional solid unit ball into finitely many pieces and, using only rotations and translations, reassemble the pieces into two solid balls each with the same volume as the original. The pieces in this decomposition, constructed using the axiom of choice, are non-measurable sets. Despite these facts, most mathematicians accept the axiom of choice as a valid principle for proving new results in mathematics. The debate is interesting enough, however, that it is considered of note when a theorem in ZFC (ZF plus AC) is logically equivalent (with just the ZF axioms) to the axiom of choice, and mathematicians look for results that require the axiom of choice to be false, though this type of deduction is less common than the type which requires the axiom of choice to be true. It is possible to prove many theorems using neither the axiom of choice nor its negation; such statements will be true in any model of ZermeloFraenkel set theory (ZF), regardless of the truth or falsity of the axiom of choice in that particular model. The restriction to ZF renders any claim that relies on either the axiom of choice or its negation unprovable. For example, the BanachTarski paradox is neither provable nor disprovable from ZF alone: it is impossible to construct the required decomposition of the unit ball in ZF, but also impossible to prove there is no such decomposition. Similarly, all the statements listed below which require choice or some weaker version thereof for their proof are unprovable in ZF, but since each is provable in ZF plus the axiom of choice, there are models of ZF in which each statement is true. Statements such as the BanachTarski paradox can be rephrased as conditional statements, for example, "If AC holds, the decomposition in the BanachTarski paradox exists." Such conditional statements are provable in ZF when the original statements are provable from ZF and the axiom of choice.

In constructive mathematics
As discussed above, in ZFC, the axiom of choice is able to provide "nonconstructive proofs" in which the existence of an object is proved although no explicit example is constructed. ZFC, however, is still formalized in classical logic. The axiom of choice has also been thoroughly studied in the context of constructive mathematics, where non-classical logic is employed. The status of the axiom of choice varies between different varieties of constructive mathematics. In Martin-Lf type theory and higher-order Heyting arithmetic, the appropriate statement of the axiom of choice is (depending on approach) included as an axiom or provable as a theorem.[6] Errett Bishop argued that the axiom of choice was constructively acceptable, saying "A choice function exists in constructive mathematics, because a choice is implied by the very meaning of existence."[7]

Axiom of choice In constructive set theory, however, Diaconescu's theorem shows that the axiom of choice implies the law of the excluded middle (unlike in Martin-Lf type theory, where it does not). Thus the axiom of choice is not generally available in constructive set theory. A cause for this difference is that the axiom of choice in type theory does not have the extensionality properties that the axiom of choice in constructive set theory does.[8] Some results in constructive set theory use the axiom of countable choice or the axiom of dependent choice, which do not imply the law of the excluded middle in constructive set theory. Although the axiom of countable choice in particular is commonly used in constructive mathematics, its use has also been questioned.[9]

28

Independence
Assuming ZF is consistent, Kurt Gdel showed that the negation of the axiom of choice is not a theorem of ZF by constructing an inner model (the constructible universe) which satisfies ZFC and thus showing that ZFC is consistent. Assuming ZF is consistent, Paul Cohen employed the technique of forcing, developed for this purpose, to show that the axiom of choice itself is not a theorem of ZF by constructing a much more complex model which satisfies ZFC (ZF with the negation of AC added as axiom) and thus showing that ZFC is consistent. Together these results establish that the axiom of choice is logically independent of ZF. The assumption that ZF is consistent is harmless because adding another axiom to an already inconsistent system cannot make the situation worse. Because of independence, the decision whether to use of the axiom of choice (or its negation) in a proof cannot be made by appeal to other axioms of set theory. The decision must be made on other grounds. One argument given in favor of using the axiom of choice is that it is convenient to use it because it allows one to prove some simplifying propositions that otherwise could not be proved. Many theorems which are provable using choice are of an elegant general character: every ideal in a ring is contained in a maximal ideal, every vector space has a basis, and every product of compact spaces is compact. Without the axiom of choice, these theorems may not hold for mathematical objects of large cardinality. The proof of the independence result also shows that a wide class of mathematical statements, including all statements that can be phrased in the language of Peano arithmetic, are provable in ZF if and only if they are provable in ZFC.[10] Statements in this class include the statement that P = NP, the Riemann hypothesis, and many other unsolved mathematical problems. When one attempts to solve problems in this class, it makes no difference whether ZF or ZFC is employed if the only question is the existence of a proof. It is possible, however, that there is a shorter proof of a theorem from ZFC than from ZF. The axiom of choice is not the only significant statement which is independent of ZF. For example, the generalized continuum hypothesis (GCH) is not only independent of ZF, but also independent of ZFC. However, ZF plus GCH implies AC, making GCH a strictly stronger claim than AC, even though they are both independent of ZF.

Stronger axioms
The axiom of constructibility and the generalized continuum hypothesis both imply the axiom of choice, but are strictly stronger than it. In class theories such as Von NeumannBernaysGdel set theory and MorseKelley set theory, there is a possible axiom called the axiom of global choice which is stronger than the axiom of choice for sets because it also applies to proper classes. And the axiom of global choice follows from the axiom of limitation of size.

Axiom of choice

29

Equivalents
There are important statements that, assuming the axioms of ZF but neither AC nor AC, are equivalent to the axiom of choice. The most important among them are Zorn's lemma and the well-ordering theorem. In fact, Zermelo initially introduced the axiom of choice in order to formalize his proof of the well-ordering theorem. Set theory Well-ordering theorem: Every set can be well-ordered. Consequently, every cardinal has an initial ordinal. Tarski's theorem: For every infinite set A, there is a bijective map between the sets A and AA. Trichotomy: If two sets are given, then either they have the same cardinality, or one has a smaller cardinality than the other. The Cartesian product of any family of nonempty sets is nonempty. Knig's theorem: Colloquially, the sum of a sequence of cardinals is strictly less than the product of a sequence of larger cardinals. (The reason for the term "colloquially", is that the sum or product of a "sequence" of cardinals cannot be defined without some aspect of the axiom of choice.) Every surjective function has a right inverse. Order theory Zorn's lemma: Every non-empty partially ordered set in which every chain (i.e. totally ordered subset) has an upper bound contains at least one maximal element. Hausdorff maximal principle: In any partially ordered set, every totally ordered subset is contained in a maximal totally ordered subset. The restricted principle "Every partially ordered set has a maximal totally ordered subset" is also equivalent to AC over ZF. Tukey's lemma: Every non-empty collection of finite character has a maximal element with respect to inclusion. Antichain principle: Every partially ordered set has a maximal antichain. Abstract algebra Every vector space has a basis.[11] Every unital ring other than the trivial ring contains a maximal ideal. For every non-empty set S there is a binary operation defined on S that makes it a group.[12] (A cancellative binary operation is enough.) Functional analysis The closed unit ball of the dual of a normed vector space over the reals has an extreme point. General topology Tychonoff's theorem stating that every product of compact topological spaces is compact. In the product topology, the closure of a product of subsets is equal to the product of the closures. Mathematical logic If S is a set of sentences of first-order logic and B is a consistent subset of S, then B is included in a set that is maximal among consistent subsets of S. The special case where S is the set of all first-order sentences in a given signature is weaker, equivalent to the Boolean prime ideal theorem; see the section "Weaker forms" below.

Axiom of choice

30

Category theory
There are several results in category theory which invoke the axiom of choice for their proof. These results might be weaker than, equivalent to, or stronger than the axiom of choice, depending on the strength of the technical foundations. For example, if one defines categories in terms of sets, that is, as sets of objects and morphisms (usually called a small category), or even locally small categories, whose hom-objects are sets, then there is no category of all sets, and so it is difficult for a category-theoretic formulation to apply to all sets. On the other hand, other foundational descriptions of category theory are considerably stronger, and an identical category-theoretic statement of choice may be stronger than the standard formulation, la class theory, mentioned above. Examples of category-theoretic statements which require choice include: Every small category has a skeleton. If two small categories are weakly equivalent, then they are equivalent. Every continuous functor on a small-complete category which satisfies the appropriate solution set condition has a left-adjoint (the Freyd adjoint functor theorem).

Weaker forms
There are several weaker statements that are not equivalent to the axiom of choice, but are closely related. One example is the axiom of dependent choice (DC). A still weaker example is the axiom of countable choice (AC or CC), which states that a choice function exists for any countable set of nonempty sets. These axioms are sufficient for many proofs in elementary mathematical analysis, and are consistent with some principles, such as the Lebesgue measurability of all sets of reals, that are disprovable from the full axiom of choice. Other choice axioms weaker than axiom of choice include the Boolean prime ideal theorem and the axiom of uniformization. The former is equivalent in ZF to the existence of an ultrafilter containing each given filter, proved by Tarski in 1930.

Results requiring AC (or weaker forms) but weaker than it


One of the most interesting aspects of the axiom of choice is the large number of places in mathematics that it shows up. Here are some statements that require the axiom of choice in the sense that they are not provable from ZF but are provable from ZFC (ZF plus AC). Equivalently, these statements are true in all models of ZFC but false in some models of ZF. Set theory Any union of countably many countable sets is itself countable. If the set A is infinite, then there exists an injection from the natural numbers N to A (see Dedekind infinite). Every infinite game Measure theory The Vitali theorem on the existence of non-measurable sets which states that there is a subset of the real numbers that is not Lebesgue measurable. The Hausdorff paradox. The BanachTarski paradox. The Lebesgue measure of a countable disjoint union of measurable sets is equal to the sum of the measures of the individual sets. Algebra Every field has an algebraic closure. Every field extension has a transcendence basis. Stone's representation theorem for Boolean algebras needs the Boolean prime ideal theorem. in which is a Borel subset of Baire space is determined.

Axiom of choice The NielsenSchreier theorem, that every subgroup of a free group is free. The additive groups of R and C are isomorphic.[13] and [14] Functional analysis The HahnBanach theorem in functional analysis, allowing the extension of linear functionals The theorem that every Hilbert space has an orthonormal basis. The BanachAlaoglu theorem about compactness of sets of functionals. The Baire category theorem about complete metric spaces, and its consequences, such as the open mapping theorem and the closed graph theorem. On every infinite-dimensional topological vector space there is a discontinuous linear map. General topology A uniform space is compact if and only if it is complete and totally bounded. Every Tychonoff space has a Stoneech compactification. Mathematical logic Gdel's completeness theorem for first-order logic: every consistent set of first-order sentences has a completion. That is, every consistent set of first-order sentences can be extended to a maximal consistent set.

31

Stronger forms of the negation of AC


Now, consider stronger forms of the negation of AC. For example, if we abbreviate by BP the claim that every set of real numbers has the property of Baire, then BP is stronger than AC, which asserts the nonexistence of any choice function on perhaps only a single set of nonempty sets. Note that strengthened negations may be compatible with weakened forms of AC. For example, ZF + DC[15] + BP is consistent, if ZF is. It is also consistent with ZF + DC that every set of reals is Lebesgue measurable; however, this consistency result, due to Robert M. Solovay, cannot be proved in ZFC itself, but requires a mild large cardinal assumption (the existence of an inaccessible cardinal). The much stronger axiom of determinacy, or AD, implies that every set of reals is Lebesgue measurable, has the property of Baire, and has the perfect set property (all three of these results are refuted by AC itself). ZF + DC + AD is consistent provided that a sufficiently strong large cardinal axiom is consistent (the existence of infinitely many Woodin cardinals).

Statements consistent with the negation of AC


There are models of Zermelo-Fraenkel set theory in which the axiom of choice is false. We will abbreviate "Zermelo-Fraenkel set theory plus the negation of the axiom of choice" by ZFC. For certain models of ZFC, it is possible to prove the negation of some standard facts. Note that any model of ZFC is also a model of ZF, so for each of the following statements, there exists a model of ZF in which that statement is true. There exists a model of ZFC in which there is a function f from the real numbers to the real numbers such that f is not continuous at a, but f is sequentially continuous at a, i.e., for any sequence {xn} converging to a, limn f(xn)=f(a). There exists a model of ZFC which has an infinite set of real numbers without a countably infinite subset. There exists a model of ZFC in which real numbers are a countable union of countable sets.[16] There exists a model of ZFC in which there is a field with no algebraic closure. In all models of ZFC there is a vector space with no basis. There exists a model of ZFC in which there is a vector space with two bases of different cardinalities. There exists a model of ZFC in which there is a free complete boolean algebra on countably many generators.[17] For proofs, see Thomas Jech, The Axiom of Choice, American Elsevier Pub. Co., New York, 1973.

Axiom of choice There exists a model of ZFC in which every set in Rn is measurable. Thus it is possible to exclude counterintuitive results like the BanachTarski paradox which are provable in ZFC. Furthermore, this is possible whilst assuming the Axiom of dependent choice, which is weaker than AC but sufficient to develop most of real analysis. In all models of ZFC, the generalized continuum hypothesis does not hold.

32

Quotes
"The Axiom of Choice is obviously true, the well-ordering principle obviously false, and who can tell about Zorn's lemma?" Jerry Bona This is a joke: although the three are all mathematically equivalent, many mathematicians find the axiom of choice to be intuitive, the well-ordering principle to be counterintuitive, and Zorn's lemma to be too complex for any intuition. "The Axiom of Choice is necessary to select a set from an infinite number of socks, but not an infinite number of shoes." Bertrand Russell The observation here is that one can define a function to select from an infinite number of pairs of shoes by stating for example, to choose the left shoe. Without the axiom of choice, one cannot assert that such a function exists for pairs of socks, because left and right socks are (presumably) indistinguishable from each other. "Tarski tried to publish his theorem [the equivalence between AC and 'every infinite set A has the same cardinality as AxA', see above] in Comptes Rendus, but Frchet and Lebesgue refused to present it. Frchet wrote that an implication between two well known [true] propositions is not a new result, and Lebesgue wrote that an implication between two false propositions is of no interest". Polish-American mathematician Jan Mycielski relates this anecdote in a 2006 article in the Notices of the AMS. "The axiom gets its name not because mathematicians prefer it to other axioms." A. K. Dewdney This quote comes from the famous April Fools' Day article in the computer recreations column of the Scientific American, April 1989.

Notes
[1] Zermelo, Ernst (1904). "Beweis, dass jede Menge wohlgeordnet werden kann" (http:/ / gdz. sub. uni-goettingen. de/ no_cache/ en/ dms/ load/ img/ ?IDDOC=28526) (reprint). Mathematische Annalen 59 (4): 51416. doi:10.1007/BF01445300. . [2] Jech, 1977, p. 348ff; Martin-Lf 2008, p. 210. [3] Herrlich, p. 9. [4] Patrick Suppes, "Axiomatic Set Theory", Dover, 1972 (1960), ISBN 0-486-61630-4, p. 240 [5] Tourlakis (2003), pp. 209210, 215216. [6] Per Martin-Lf, Intuitionistic type theory (http:/ / www. cs. cmu. edu/ afs/ cs/ Web/ People/ crary/ 819-f09/ Martin-Lof80. pd), 1980. Anne Sjerp Troelstra, Metamathematical investigation of intuitionistic arithmetic and analysis, Springer, 1973. [7] Errett Bishop and Douglas S. Bridges, Constructive analysis, Springer-Verlag, 1985. [8] Per Martin-Lf, "100 Years of Zermelos Axiom of Choice: What was the Problem with It?", The Computer Journal (2006) 49 (3): 345-350. doi: 10.1093/comjnl/bxh162 [9] Fred Richman, Constructive mathematics without choice, in: Reuniting the AntipodesConstructive and Nonstandard Views of the Continuum (P. Schuster et al., eds), Synthse Library 306, 199205, Kluwer Academic Publishers, Amsterdam, 2001. [10] This is because arithmetical statements are absolute to the constructible universe L. Shoenfield's absoluteness theorem gives a more general result. [11] Blass, Andreas (1984). "Existence of bases implies the axiom of choice". Contemporary mathematics 31. [12] A. Hajnal, A. Kertsz: Some new algebraic equivalents of the axiom of choice, Publ. Math. Debrecen, 19(1972), 339340, see also H. Rubin, J. Rubin, Equivalents of the axiom of choice, II, North-Holland, 1985, p. 111. [13] http:/ / www. cs. nyu. edu/ pipermail/ fom/ 2006-February/ 009959. html [14] http:/ / journals. cambridge. org/ action/ displayFulltext?type=1& fid=4931240& aid=4931232

Axiom of choice
[15] Axiom of dependent choice [16] Jech, Thomas (1973) "The axiom of choice", ISBN 0-444-10484-4, CH. 10, p. 142. [17] Stavi, Jonathan (1974). "A model of ZF with an infinite free complete Boolean algebra" (http:/ / www. springerlink. com/ content/ d5710380t753621u/ ) (reprint). Israel Journal of Mathematics 20 (2): 149163. doi:10.1007/BF02757883. .

33

References
Horst Herrlich, Axiom of Choice, Springer Lecture Notes in Mathematics 1876, Springer Verlag Berlin Heidelberg (2006). ISBN 3-540-30989-6. Paul Howard and Jean Rubin, "Consequences of the Axiom of Choice". Mathematical Surveys and Monographs 59; American Mathematical Society; 1998. Thomas Jech, "About the Axiom of Choice." Handbook of Mathematical Logic, John Barwise, ed., 1977. Per Martin-Lf, "100 years of Zermelo's axiom of choice: What was the problem with it?", in Logicism, Intuitionism, and Formalism: What Has Become of Them?, Sten Lindstrm, Erik Palmgren, Krister Segerberg, and Viggo Stoltenberg-Hansen, editors (2008). ISBN 1-4020-8925-2 Gregory H Moore, "Zermelo's axiom of choice, Its origins, development and influence", Springer; 1982. ISBN 0-387-90670-3 Herman Rubin, Jean E. Rubin: Equivalents of the axiom of choice. North Holland, 1963. Reissued by Elsevier, April 1970. ISBN 0-7204-2225-6. Herman Rubin, Jean E. Rubin: Equivalents of the Axiom of Choice II. North Holland/Elsevier, July 1985, ISBN 0-444-87708-8. George Tourlakis, Lectures in Logic and Set Theory. Vol. II: Set Theory, Cambridge University Press, 2003. ISBN 0-511-06659-7 Ernst Zermelo, "Untersuchungen ber die Grundlagen der Mengenlehre I," Mathematische Annalen 65: (1908) pp.26181. PDF download via digizeitschriften.de (http://www.digizeitschriften.de/no_cache/home/ jkdigitools/loader/?tx_jkDigiTools_pi1[IDDOC]=361762) Translated in: Jean van Heijenoort, 2002. From Frege to Gdel: A Source Book in Mathematical Logic, 1879-1931. New edition. Harvard University Press. ISBN 0-674-32449-8 1904. "Proof that every set can be well-ordered," 139-41. 1908. "Investigations in the foundations of set theory I," 199-215.

External links
Hazewinkel, Michiel, ed. (2001), "Axiom of choice" (http://www.encyclopediaofmath.org/index.php?title=p/ a014270), Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4 Axiom of Choice and Its Equivalents at ProvenMath (http://www.apronus.com/provenmath/choice.htm) includes formal statement of the Axiom of Choice, Hausdorff's Maximal Principle, Zorn's Lemma and formal proofs of their equivalence down to the finest detail. Consequences of the Axiom of Choice (http://www.math.purdue.edu/~hrubin/JeanRubin/Papers/conseq. html), based on the book by Paul Howard (http://www.emunix.emich.edu/~phoward/) and Jean Rubin. The Axiom of Choice (http://plato.stanford.edu/entries/axiom-choice) entry by John Lane Bell in the Stanford Encyclopedia of Philosophy

Jordan curve theorem

34

Jordan curve theorem


In topology, a Jordan curve is a non-self-intersecting continuous loop in the plane, and another name for a Jordan curve is a simple closed curve. The Jordan curve theorem asserts that every Jordan curve divides the plane into an "interior" region bounded by the curve and an "exterior" region containing all of the nearby and far away exterior points, so that any continuous path connecting a point of one region to a point of the other intersects with that loop somewhere. While the statement of this theorem seems to be intuitively obvious, it takes quite a bit of ingenuity to prove it by elementary means. More transparent proofs rely on the mathematical machinery of algebraic topology, and these lead to generalizations to higher-dimensional spaces. The Jordan curve theorem is named after the mathematician Camille Jordan, who found its first proof. For decades, it was generally thought that this proof was flawed and that the first rigorous proof was carried out by Oswald Veblen. However, this notion has been challenged by Thomas C. Hales and others.
Illustration of the Jordan curve theorem. The Jordan curve (drawn in black) divides the plane into an "inside" region (light blue) and an "outside" region (pink).

Definitions and the statement of the Jordan theorem


A Jordan curve or a simple closed curve in the plane R2 is the image C of an injective continuous map of a circle into the plane, : S1 R2. A Jordan arc in the plane is the image of an injective continuous map of a closed interval into the plane. Alternatively, a Jordan curve is the image of a continuous map : [0,1] R2 such that (0) = (1) and the restriction of to [0,1) is injective. The first two conditions say that C is a continuous loop, whereas the last condition stipulates that C has no self-intersection points. Let C be a Jordan curve in the plane R2. Then its complement, R2\C, consists of exactly two connected components. One of these components is bounded (the interior) and the other is unbounded (the exterior), and the curve C is the boundary of each component. Furthermore, the complement of a Jordan arc in the plane is connected.

Proof and generalizations


The Jordan curve theorem was independently generalized to higher dimensions by H. Lebesgue and L.E.J. Brouwer in 1911, resulting in the JordanBrouwer separation theorem. Let X be a topological sphere in the (n+1)-dimensional Euclidean space Rn+1, i.e. the image of an injective continuous mapping of the n-sphere Sn into Rn+1. Then the complement Y of X in Rn+1 consists of exactly two connected components. One of these components is bounded (the interior) and the other is unbounded (the exterior). The set X is their common boundary. The proof uses homology theory. It is first established that, more generally, if X is homeomorphic to the k-sphere, then the reduced integral homology groups of Y = Rn+1 \ X are as follows:

Jordan curve theorem This is proved by induction in k using the MayerVietoris sequence. When n = k, the zeroth reduced homology of Y has rank 1, which means that Y has 2 connected components (which are, moreover, path connected), and with a bit of extra work, one shows that their common boundary is X. A further generalization was found by J. W. Alexander, who established the Alexander duality between the reduced homology of a compact subset X of Rn+1 and the reduced cohomology of its complement. If X is an n-dimensional compact connected submanifold of Rn+1 (or Sn+1) without boundary, its complement has 2 connected components. There is a strengthening of the Jordan curve theorem, called the JordanSchnflies theorem, which states that the interior and the exterior planar regions determined by a Jordan curve in R2 are homeomorphic to the interior and exterior of the unit disk. In particular, for any point P in the interior region and a point A on the Jordan curve, there exists a Jordan arc connecting P with A and, with the exception of the endpoint A, completely lying in the interior region. An alternative and equivalent formulation of the JordanSchnflies theorem asserts that any Jordan curve : S1 R2, where S1 is viewed as the unit circle in the plane, can be extended to a homeomorphism : R2 R2 of the plane. Unlike Lebesgues' and Brouwer's generalization of the Jordan curve theorem, this statement becomes false in higher dimensions: while the exterior of the unit ball in R3 is simply connected, because it retracts onto the unit sphere, the Alexander horned sphere is a subset of R3 homeomorphic to a sphere, but so twisted in space that the unbounded component of its complement in R3 is not simply connected, and hence not homeomorphic to the exterior of the unit ball.

35

History and further proofs


The statement of the Jordan curve theorem may seem obvious at first, but it is a rather difficult theorem to prove. Bernard Bolzano was the first to formulate a precise conjecture, observing that it was not a self-evident statement, but that it required a proof. It is easy to establish this result for polygonal lines, but the problem came in generalizing it to all kinds of badly behaved curves, which include nowhere differentiable curves, such as the Koch snowflake and other fractal curves, or even a Jordan curve of positive area constructed by Osgood (1903). The first proof of this theorem was given by Camille Jordan in his lectures on real analysis, and was published in his book Cours d'analyse de l'cole Polytechnique.[1] There is some controversy about whether Jordan's proof was complete: the majority of commenters on it have claimed that the first complete proof was given later by Oswald Veblen, who said the following about Jordan's proof:

His proof, however, is unsatisfactory to many mathematicians. It assumes the theorem without proof in the important special case of a simple [2] polygon, and of the argument from that point on, one must admit at least that all details are not given.

However, Thomas C. Hales wrote:


Nearly every modern citation that I have found agrees that the first correct proof is due to Veblen... In view of the heavy criticism of Jordans proof, I was surprised when I sat down to read his proof to find nothing objectionable about it. Since then, I have contacted a number of the [3] authors who have criticized Jordan, and each case the author has admitted to having no direct knowledge of an error in Jordans proof.

Hales also pointed out that the special case of simple polygons is not only an easy exercise, but was not really used by Jordan anyway, and quoted Michael Reeken as saying:
Jordans proof is essentially correct... Jordans proof does not present the details in a satisfactory way. But the idea is right, and with some [4] polishing the proof would be impeccable.

Jordan's proof and another early proof by de la Valle-Poussin were later critically analyzed and completed by Shoenflies (1924). Due to the importance of the Jordan curve theorem in low-dimensional topology and complex analysis, it received much attention from prominent mathematicians of the first half of the 20th century. Various proofs of the theorem

Jordan curve theorem and its generalizations were constructed by J. W. Alexander, Louis Antoine, Bieberbach, Luitzen Brouwer, Denjoy, Hartogs, Kerkjrt, Alfred Pringsheim, and Schoenflies. Some new elementary proofs of the Jordan curve theorem, as well as simplifications of the earlier proofs, continue to be carried out. A short elementary proof of the Jordan curve theorem was presented by A. F. Filippov in 1950.[5] A proof using the Brouwer fixed point theorem by Maehara (1984). A proof using non-standard analysis by Narens (1971). A proof using constructive mathematics by Gordon O. Berg, W. Julian, and R. Mines et al.(1975). A proof using non-planarity of the complete bipartite graph K3,3 was given by Thomassen (1992). A simplification of the proof by Helge Tverberg.[6]

36

The first formal proof of the Jordan curve theorem was created by Hales (2007a) in the HOL Light system, in January 2005, and contained about 60,000 lines. Another rigorous 6,500-line formal proof was produced in 2005 by an international team of mathematicians using the Mizar system. Both the Mizar and the HOL Light proof rely on libraries of previously proved theorems, so these two sizes are not comparable. Nobuyuki Sakamoto and Keita Yokoyama(2007) showed that the Jordan curve theorem is equivalent in proof-theoretic strength to the weak Knig's lemma.

Notes
[1] [2] [3] [4] [5] Camille Jordan(1887) Oswald Veblen(1905) Hales (2007b) Ibid A. F. Filippov, An elementary proof of Jordan's theorem, Uspekhi Mat. Nauk, 5:5(39) (1950), 173176 (http:/ / www. mathnet. ru/ php/ archive. phtml?wshow=paper& jrnid=rm& paperid=8482& option_lang=eng) [6] Czes Kosniowski, A First Course in Algebraic Topology

References
Berg, Gordon O.; Julian, W.; Mines, R.; Richman, Fred (1975), "The constructive Jordan curve theorem", Rocky Mountain Journal of Mathematics 5 (2): 225236, doi:10.1216/RMJ-1975-5-2-225, ISSN0035-7596, MR0410701 Hales, Thomas C. (2007a), "The Jordan curve theorem, formally and informally", The American Mathematical Monthly 114 (10): 882894, ISSN0002-9890, MR2363054 Hales, Thomas (2007b), "Jordan's proof of the Jordan Curve theorem" (http://mizar.org/trybulec65/4.pdf), Studies in Logic, Grammar and Rhetoric 10 (23) Jordan, Camille (1887), Cours d'analyse (http://www.maths.ed.ac.uk/~aar/jordan/jordan.pdf), pp.587594 Maehara, Ryuji (1984), "The Jordan Curve Theorem Via the Brouwer Fixed Point Theorem", The American Mathematical Monthly (Mathematical Association of America) 91 (10): 641643, doi:10.2307/2323369, ISSN0002-9890, JSTOR2323369, MR0769530 Narens, Louis (1971), "A nonstandard proof of the Jordan curve theorem" (http://projecteuclid.org/euclid.pjm/ 1102971282), Pacific Journal of Mathematics 36: 219229, ISSN0030-8730, MR0276940 Osgood, William F. (1903), "A Jordan Curve of Positive Area", Transactions of the American Mathematical Society (Providence, R.I.: American Mathematical Society) 4 (1): 107112, ISSN0002-9947, JFM34.0533.02, JSTOR1986455 Ross, Fiona; Ross, William T. (2011), "The Jordan curve theorem is non-trivial" (http://www.tandfonline.com/ doi/abs/10.1080/17513472.2011.634320), Journal of Mathematics and the Arts (Taylor & Francis) 5 (4): 213219, doi:10.1080/17513472.2011.634320. author's site (https://facultystaff.richmond.edu/~wross/PDF/ Jordan-revised.pdf)

Jordan curve theorem Sakamoto, Nobuyuki; Yokoyama, Keita (2007), "The Jordan curve theorem and the Schnflies theorem in weak second-order arithmetic", Archive for Mathematical Logic 46 (5): 465480, doi:10.1007/s00153-007-0050-6, ISSN0933-5846, MR2321588 Thomassen, Carsten (1992), "The JordanSchnflies theorem and the classification of surfaces", American Mathematical Monthly 99 (2): 116130, doi:10.2307/2324180, JSTOR2324180 Veblen, Oswald (1905), "Theory on Plane Curves in Non-Metrical Analysis Situs", Transactions of the American Mathematical Society (Providence, R.I.: American Mathematical Society) 6 (1): 8398, ISSN0002-9947, JSTOR1986378

37

External links
M.I. Voitsekhovskii (2001), "Jordan theorem" (http://www.encyclopediaofmath.org/index.php?title=j/ j054370), in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4 The full 6,500 line formal proof of Jordan's curve theorem (http://mizar.uwb.edu.pl/version/7.11.07_4.156. 1112/html/jordan.html) in Mizar. Collection of proofs of the Jordan curve theorem (http://www.maths.ed.ac.uk/~aar/jordan) at Andrew Ranicki's homepage A simple proof of Jordan curve theorem (http://www.math.auckland.ac.nz/class750/section5.pdf) (PDF) by David B. Gauld Application of the theorem in computer science - Determining If A Point Lies On The Interior Of A Polygon (http://local.wasp.uwa.edu.au/~pbourke/geometry/insidepoly/) by Paul Bourke

Special relativity
Special relativity (SR, also known as the special theory of relativity or STR) is the physical theory of measurement in an inertial frame of reference proposed in 1905 by Albert Einstein in the paper "On the Electrodynamics of Moving Bodies".[1] It extends Galileo's principle of relativitythat all uniform motion is relative, and that there is no absolute and well-defined state of rest (no privileged reference frames)to account for the constant speed of light[2]which was previously observed in the Michelson-Morley experimentand postulates that it holds for all the laws of physics, including both the laws of mechanics and of electrodynamics, whatever they may be.[3]

USSR postage stamp dedicated to Albert Einstein

This theory has a wide range of consequences which have been experimentally verified,[4] including counter-intuitive ones such as length contraction, time dilation and relativity of simultaneity. It has replaced the classical notion of invariant time interval for two events with the notion of invariant space-time interval. Combined with other laws of physics, the two postulates of special relativity predict the equivalence of mass and energy, as expressed in the massenergy equivalence formula E=mc2, where c is the speed of light in vacuum.[5][6] The predictions of special relativity agree well with Newtonian mechanics in their common realm of applicability, specifically in experiments in which all velocities are small compared with the speed of light. Special relativity reveals that c is not just the velocity of a certain phenomenonnamely the propagation of electromagnetic radiation

Special relativity (light)but rather a fundamental feature of the way space and time are unified as spacetime. One of the consequences of the theory is that it is impossible for any particle that has rest mass to be accelerated to the speed of light. The theory was originally termed "special" because it applied the principle of relativity only to the special case of inertial reference frames, i.e. frames of reference in uniform relative motion with respect to each other.[7] Einstein developed general relativity to apply the principle in the more general case, that is, to any frame so as to handle general coordinate transformations, and that theory includes the effects of gravity. The term is currently used more generally to refer to any case in which gravitation is not significant. General relativity is the generalization of special relativity to include gravitation. In general relativity, gravity is described using noneuclidean geometry, so that gravitational effects are represented by curvature of spacetime; special relativity is restricted to flat spacetime. Just as the curvature of the earth's surface is not noticeable in everyday life, the curvature of spacetime can be neglected on small scales, so that locally, special relativity is a valid approximation to general relativity.[8] The presence of gravity becomes undetectable in a sufficiently small, free-falling laboratory.

38

Postulates

Reflections of this type made it clear to me as long ago as shortly after 1900, i.e., shortly after Planck's trailblazing work, that neither mechanics nor electrodynamics could (except in limiting cases) claim exact validity. Gradually I despaired of the possibility of discovering the true laws by means of constructive efforts based on known facts. The longer and the more desperately I tried, the more I came to the conviction that only the discovery of a universal formal principle could lead us to assured results... How, then, could such a universal principle be found?

[9]

Albert Einstein: Autobiographical Notes

Einstein discerned two fundamental propositions that seemed to be the most assured, regardless of the exact validity of the (then) known laws of either mechanics or electrodynamics. These propositions were the constancy of the speed of light and the independence of physical laws (especially the constancy of the speed of light) from the choice of inertial system. In his initial presentation of special relativity in 1905 he expressed these postulates as:[1] The Principle of Relativity The laws by which the states of physical systems undergo change are not affected, whether these changes of state be referred to the one or the other of two systems in uniform translatory motion relative to each other.[1] The Principle of Invariant Light Speed "... light is always propagated in empty space with a definite velocity [speed] c which is independent of the state of motion of the emitting body." (from the preface).[1] That is, light in vacuum propagates with the speed c (a fixed constant, independent of direction) in at least one system of inertial coordinates (the "stationary system"), regardless of the state of motion of the light source. The derivation of special relativity depends not only on these two explicit postulates, but also on several tacit assumptions (made in almost all theories of physics), including the isotropy and homogeneity of space and the independence of measuring rods and clocks from their past history.[10] Following Einstein's original presentation of special relativity in 1905, many different sets of postulates have been proposed in various alternative derivations.[11] However, the most common set of postulates remains those employed by Einstein in his original paper. A more mathematical statement of the Principle of Relativity made later by Einstein, which introduces the concept of simplicity not mentioned above is: Special principle of relativity: If a system of coordinates K is chosen so that, in relation to it, physical laws hold good in their simplest form, the same laws hold good in relation to any other system of coordinates K' moving in uniform translation relatively to K.[12]

Special relativity Henri Poincar provided the mathematical framework for relativity theory by proving that Lorentz transformations are a subset of his Poincar group of symmetry transformations. Einstein later derived these transformations from his axioms. Many of Einstein's papers present derivations of the Lorentz transformation based upon these two principles.[13] Einstein consistently based the derivation of Lorentz invariance (the essential core of special relativity) on just the two basic principles of relativity and light-speed invariance. He wrote: The insight fundamental for the special theory of relativity is this: The assumptions relativity and light speed invariance are compatible if relations of a new type ("Lorentz transformation") are postulated for the conversion of coordinates and times of events... The universal principle of the special theory of relativity is contained in the postulate: The laws of physics are invariant with respect to Lorentz transformations (for the transition from one inertial system to any other arbitrarily chosen inertial system). This is a restricting principle for natural laws...[9] Thus many modern treatments of special relativity base it on the single postulate of universal Lorentz covariance, or, equivalently, on the single postulate of Minkowski spacetime.[14][15] From the principle of relativity alone without assuming the constancy of the speed of light (i.e. using the isotropy of space and the symmetry implied by the principle of special relativity) one can show that the space-time transformations between inertial frames are either Euclidean, Galilean, or Lorentzian. In the Lorentzian case, one can then obtain relativistic interval conservation and a certain finite limiting speed. Experiments suggest that this speed is the speed of light in vacuum.[16][17] The constancy of the speed of light was motivated by Maxwell's theory of electromagnetism and the lack of evidence for the luminiferous ether. There is conflicting evidence on the extent to which Einstein was influenced by the null result of the MichelsonMorley experiment.[18][19] In any case, the null result of the MichelsonMorley experiment helped the notion of the constancy of the speed of light gain widespread and rapid acceptance.

39

Lack of an absolute reference frame


The principle of relativity, which states that there is no preferred inertial reference frame, dates back to Galileo, and was incorporated into Newtonian physics. However, in the late 19th century, the existence of electromagnetic waves led physicists to suggest that the universe was filled with a substance known as "aether", which would act as the medium through which these waves, or vibrations travelled. The aether was thought to constitute an absolute reference frame against which speeds could be measured, and could be considered fixed and motionless. Aether supposedly had some wonderful properties: it was sufficiently elastic that it could support electromagnetic waves, and those waves could interact with matter, yet it offered no resistance to bodies passing through it. The results of various experiments, including the MichelsonMorley experiment, indicated that the Earth was always 'stationary' relative to the aether something that was difficult to explain, since the Earth is in orbit around the Sun. Einstein's solution was to discard the notion of an aether and an absolute state of rest. Special relativity is formulated so as to not assume that any particular frame of reference is special; rather, in relativity, any reference frame moving with uniform motion will observe the same laws of physics. In particular, the speed of light in vacuum is always measured to be c, even when measured by multiple systems that are moving at different (but constant) velocities.

Special relativity

40

Reference frames, coordinates and the Lorentz transformation


Relativity theory depends on "reference frames". The term reference frame as used here is an observational perspective in space which is not undergoing any change in motion (acceleration), from which a position can be measured along 3 spatial axes. In addition, a reference frame has the ability to determine measurements of the time of events using a 'clock' (any reference device with uniform periodicity). An event is an occurrence that can be The primed system is in motion relative to the unprimed system with constant assigned a single unique time and location speed v only along the x-axis, from the perspective of an observer stationary in the in space relative to a reference frame: it is a unprimed system. By the principle of relativity, an observer stationary in the "point" in space-time. Since the speed of primed system will view a likewise construction except that the speed they record will be -v. The changing of the speed of propagation of interaction from infinite in light is constant in relativity in each and non-relativistic mechanics to a finite value will require a modification of the every reference frame, pulses of light can be transformation equations mapping events in one frame to another. used to unambiguously measure distances and refer back the times that events occurred to the clock, even though light takes time to reach the clock after the event has transpired. For example, the explosion of a firecracker may be considered to be an "event". We can completely specify an event by its four space-time coordinates: The time of occurrence and its 3-dimensional spatial location define a reference point. Let's call this reference frame S. In relativity theory we often want to calculate the position of a point from a different reference point. Suppose we have a second reference frame S, whose spatial axes and clock exactly coincide with that of S at time zero, but it is moving at a constant velocity v with respect to S along the x-axis. Since there is no absolute reference frame in relativity theory, a concept of 'moving' doesn't strictly exist, as everything is always moving with respect to some other reference frame. Instead, any two frames that move at the same speed in the same direction are said to be comoving. Therefore S and S are not comoving. Define the event to have space-time coordinates (t,x,y,z) in system S and (t,x,y,z) in S. Then the Lorentz transformation specifies that these coordinates are related in the following way:

where

is the Lorentz factor and c is the speed of light in vacuum, and the velocity v of S is parallel to the x-axis. The y and z coordinates are unaffected; only the x and t coordinates are transformed. These Lorentz transformations form a one-parameter group of linear mappings, that parameter being called rapidity. There is nothing special about the x-axis, the transformation can apply to the y or z axes, or indeed in any direction, which can be done by directions parallel to the motion (which are warped by the factor) and perpendicular; see

Special relativity main article for details. A quantity invariant under Lorentz transformations is known as a Lorentz scalar. Writing the Lorentz transformation and its inverse in terms of coordinate differences, where for instance one event has coordinates (x1, t1) and (x1, t1), another event has coordinates (x2, t2) and (x2, t2), and the differences are defined as

41

we get

These effects are not merely appearances; they are explicitly related to our way of measuring time intervals between events which occur at the same place in a given coordinate system (called "co-local" events). These time intervals will be different in another coordinate system moving with respect to the first, unless the events are also simultaneous. Similarly, these effects also relate to our measured distances between separated but simultaneous events in a given coordinate system of choice. If these events are not co-local, but are separated by distance (space), they will not occur at the same spatial distance from each other when seen from another moving coordinate system. However, the space-time interval will be the same for all observers. The underlying reality remains the same. Only our perspective changes.

Consequences derived from the Lorentz transformation


The consequences of special relativity can be derived from the Lorentz transformation equations.[20] These transformations, and hence special relativity, lead to different physical predictions than those of Newtonian mechanics when relative velocities become comparable to the speed of light. The speed of light is so much larger than anything humans encounter that some of the effects predicted by relativity are initially counterintuitive.

Relativity of simultaneity
Two events happening in two different locations that occur simultaneously in the reference frame of one inertial observer, may occur non-simultaneously in the reference frame of another inertial observer (lack of absolute simultaneity). From the first equation of the Lorentz transformation in terms of coordinate differences

it is clear that two events that are simultaneous in frame S (satisfying t = 0), are not necessarily simultaneous in another inertial frame S (satisfying t = 0). Only if these events are colocal in frame S (satisfying x = 0), will they be simultaneous in another frame S.

Event B is simultaneous with A in the green reference frame, but it occurred before in the blue frame, and will occur later in the red frame.

Time dilation
The time lapse between two events is not invariant from one observer to another, but is dependent on the relative speeds of the observers' reference frames (e.g., the twin paradox which concerns a twin who flies off in a spaceship traveling near the speed of light and returns to discover that his or her twin sibling has aged much more).

Special relativity Suppose a clock is at rest in the unprimed system S. Two different ticks of this clock are then characterized by x = 0. To find the relation between the times between these ticks as measured in both systems, the first equation can be used to find: for events satisfying This shows that the time (t') between the two ticks as seen in the frame in which the clock is moving (S'), is longer than the time (t) between these ticks as measured in the rest frame of the clock (S). Time dilation explains a number of physical phenomena; for example, the decay rate of muons produced by cosmic rays impinging on the Earth's atmosphere.[21]

42

Length contraction
The dimensions (e.g., length) of an object as measured by one observer may be smaller than the results of measurements of the same object made by another observer (e.g., the ladder paradox involves a long ladder traveling near the speed of light and being contained within a smaller garage). Similarly, suppose a measuring rod is at rest and aligned along the x-axis in the unprimed system S. In this system, the length of this rod is written as x. To measure the length of this rod in the system S', in which the clock is moving, the distances x to the end points of the rod must be measured simultaneously in that system S'. In other words, the measurement is characterized by t = 0, which can be combined with the fourth equation to find the relation between the lengths x and x: for events satisfying This shows that the length (x') of the rod as measured in the frame in which it is moving (S'), is shorter than its length (x) in its own rest frame (S).

Composition of velocities
Velocities (and speeds) do not simply add. If the observer in S measures an object moving along the x axis at velocity u, then the observer in the S system, a frame of reference moving at velocity v in the x direction with respect to S, will measure the object moving with velocity u' where (from the Lorentz transformations above):

The other frame S will measure:

Notice that if the object were moving at the speed of light in the S system (i.e. u = c), then it would also be moving at the speed of light in the S system. Also, if both u and v are small with respect to the speed of light, we will recover the intuitive Galilean transformation of velocities . The usual example given is that of a train (frame S above) traveling due east with a velocity v with respect to the tracks (frame S). A child inside the train throws a baseball due east with a velocity u with respect to the train. In classical physics, an observer at rest on the tracks will measure the velocity of the baseball (due east) as u = u + v, while in special relativity this is no longer true; instead the velocity of the baseball (due east) is given by the second equation: u = (u + v)/(1 + uv/c2). Again, there is nothing special about the x or east directions. This formalism applies to any direction by considering parallel and perpendicular motion to the direction of relative velocity v, see main article for details.

Special relativity Einstein's addition of colinear velocities is consistent with the Fizeau experiment which determined the speed of light in a fluid moving parallel to the light, but no experiment has ever tested the formula for the general case of non-parallel velocities.

43

Other consequences
Thomas rotation
The orientation of an object (i.e. the alignment of its axes with the observer's axes) may be different for different observers. Unlike other relativistic effects, this effect becomes quite significant at fairly low velocities as can be seen in the spin of moving particles.

Equivalence of mass and energy


As an object's speed approaches the speed of light from an observer's point of view, its relativistic mass increases thereby making it more and more difficult to accelerate it from within the observer's frame of reference. The energy content of an object at rest with mass m equals mc2. Conservation of energy implies that, in any reaction, a decrease of the sum of the masses of particles must be accompanied by an increase in kinetic energies of the particles after the reaction. Similarly, the mass of an object can be increased by taking in kinetic energies. In addition to the papers referenced abovewhich give derivations of the Lorentz transformation and describe the foundations of special relativityEinstein also wrote at least four papers giving heuristic arguments for the equivalence (and transmutability) of mass and energy, for E = mc2. Massenergy equivalence is a consequence of special relativity. The energy and momentum, which are separate in Newtonian mechanics, form a four-vector in relativity, and this relates the time component (the energy) to the space components (the momentum) in a nontrivial way. For an object at rest, the energy-momentum four-vector is (E, 0, 0, 0): it has a time component which is the energy, and three space components which are zero. By changing frames with a Lorentz transformation in the x direction with a small value of the velocity v, the energy momentum four-vector becomes (E, Ev/c2, 0, 0). The momentum is equal to the energy multiplied by the velocity divided by c2. As such, the Newtonian mass of an object, which is the ratio of the momentum to the velocity for slow velocities, is equal to E/c2. The energy and momentum are properties of matter and radiation, and it is impossible to deduce that they form a four-vector just from the two basic postulates of special relativity by themselves, because these don't talk about matter or radiation, they only talk about space and time. The derivation therefore requires some additional physical reasoning. In his 1905 paper, Einstein used the additional principles that Newtonian mechanics should hold for slow velocities, so that there is one energy scalar and one three-vector momentum at slow velocities, and that the conservation law for energy and momentum is exactly true in relativity. Furthermore, he assumed that the energy of light is transformed by the same Doppler-shift factor as its frequency, which he had previously shown to be true based on Maxwell's equations.[1] The first of Einstein's papers on this subject was "Does the Inertia of a Body Depend upon its Energy Content?" in 1905.[22] Although Einstein's argument in this paper is nearly universally accepted by physicists as correct, even self-evident, many authors over the years have suggested that it is wrong.[23] Other authors suggest that the argument was merely inconclusive because it relied on some implicit assumptions.[24] Einstein acknowledged the controversy over his derivation in his 1907 survey paper on special relativity. There he notes that it is problematic to rely on Maxwell's equations for the heuristic massenergy argument. The argument in his 1905 paper can be carried out with the emission of any massless particles, but the Maxwell equations are implicitly used to make it obvious that the emission of light in particular can be achieved only by doing work. To emit electromagnetic waves, all you have to do is shake a charged particle, and this is clearly doing work, so that the emission is of energy.[25][26]

Special relativity

44

How far can one travel from the Earth?


Since one can not travel faster than light, one might conclude that a human can never travel further from Earth than 40 light years if the traveler is active between the age of 20 and 60. One would easily think that a traveler would never be able to reach more than the very few solar systems which exist within the limit of 20-40 light years from the earth. But that would be a mistaken conclusion. Because of time dilation, a hypothetical spaceship can travel thousands of light years during the pilot's 40 active years. If a spaceship could be built that accelerates at a constant 1g, it will after a little less than a year be traveling at almost the speed of light as seen from Earth. Time dilation will increase his life span as seen from the reference system of the Earth, but his lifespan measured by a clock traveling with him will not thereby change. During his journey, people on Earth will experience more time than he does. A 5 year round trip for him will take 6 Earth years and cover a distance of over 6 light-years. A 20 year round trip for him (5 years accelerating, 5 decelerating, twice each) will land him back on Earth having traveled for 335 Earth years and a distance of 331 light years.[27] A full 40 year trip at 1 g will appear on Earth to last 58,000 years and cover a distance of 55,000 light years. A 40 year trip at 1.1 g will take 148,000 Earth years and cover about 140,000 light years. This same time dilation is why a muon traveling close to c is observed to travel much further than c times its half-life (when at rest).[28]

Causality and prohibition of motion faster than light


In diagram 2 the interval AB is 'time-like'; i.e., there is a frame of reference in which events A and B occur at the same location in space, separated only by occurring at different times. If A precedes B in that frame, then A precedes B in all frames. It is hypothetically possible for matter (or information) to travel from A to B, so there can be a causal relationship (with A the cause and B the effect). The interval AC in the diagram is 'space-like'; i.e., there is a frame of reference in which events A and C occur simultaneously, separated only in space. There are also frames in which A precedes C (as shown) and frames in which C precedes A. If it were possible for a cause-and-effect relationship to exist between events A and C, then paradoxes of causality would result. For example, if A was the cause, and C the effect, then there would be frames of reference in which the effect preceded the cause. Although this in itself won't give rise to a paradox, one can show[29][30] that faster than light signals can be sent back into one's own past. A causal paradox can then be constructed by sending the signal if and only if no signal was received previously.
Diagram 2. Light cone

Therefore, if causality is to be preserved, one of the consequences of special relativity is that no information signal or material object can travel faster than light in vacuum. However, some "things" can still move faster than light. For example, the location where the beam of a search light hits the bottom of a cloud can move faster than light when the search light is turned rapidly.[31] Even without considerations of causality, there are other strong reasons why faster-than-light travel is forbidden by special relativity. For example, if a constant force is applied to an object for a limitless amount of time, then integrating F=dp/dt gives a momentum that grows without bound, but this is simply because approaches infinity as approaches c. To an observer who is not accelerating, it appears as though the object's inertia is increasing, so as to produce a smaller acceleration in response to the same force. This behavior is in fact observed in particle accelerators, where each charged particle is accelerated by the electromagnetic force.

Special relativity Theoretical and experimental tunneling studies carried out by Gnter Nimtz and Petrissa Eckle wrongly claimed that under special conditions signals may travel faster than light.[32][33][34][35] It was measured that fiber digital signals were traveling up to 5 times c and a zero-time tunneling electron carried the information that the atom is ionized, with photons, phonons and electrons spending zero time in the tunneling barrier. According to Nimtz and Eckle, in this superluminal process only the Einstein causality and the special relativity but not the primitive causality are violated: Superluminal propagation does not result in any kind of time travel.[36][37] Several scientists have stated not only that Nimtz' interpretations were erroneous, but also that the experiment actually provided a trivial experimental confirmation of the special relativity theory.[38][39][40]

45

Geometry of space-time
Comparison between flat Euclidean space and Minkowski space
Special relativity uses a 'flat' 4-dimensional Minkowski space an example of a space-time. Minkowski spacetime appears to be very similar to the standard 3-dimensional Euclidean space, but there is a crucial difference with respect to time. In 3D space, the differential of distance (line element) ds is defined by ,
Orthogonality and rotation of coordinate systems compared between left: Euclidean space where dx = (dx1, dx2, dx3) are the through circular angle , right: in Minkowski spacetime through hyperbolic angle (red differentials of the three spatial lines labelled c denote the worldlines of a light signal, a vector is orthogonal to itself if it [41] dimensions. In Minkowski geometry, lies on this line). there is an extra dimension with coordinate x0 derived from time, such that the distance differential fulfills

, where dx = (dx0, dx1, dx2, dx3) are the differentials of the four spacetime dimensions. This suggests a deep theoretical insight: special relativity is simply a rotational symmetry of our space-time, analogous to the rotational symmetry of Euclidean space (see image right).[42] Just as Euclidean space uses a Euclidean metric, so space-time uses a Minkowski metric. Basically, special relativity can be stated as the invariance of any space-time interval (that is the 4D distance between any two events) when viewed from any inertial reference frame. All equations and effects of special relativity can be derived from this rotational symmetry (the Poincar group) of Minkowski space-time. The actual form of ds above depends on the metric and on the choices for the x0 coordinate. To make the time coordinate look like the space coordinates, it can be treated as imaginary: x0 = ict (this is called a Wick rotation). According to Misner, Thorne and Wheeler (1971, 2.3), ultimately the deeper understanding of both special and general relativity will come from the study of the Minkowski metric (described below) and to take x0 = ct, rather than a "disguised" Euclidean metric using ict as the time coordinate. Some authors use x0 = t, with factors of c elsewhere to compensate; for instance, spatial coordinates are divided by c or factors of c2 are included in the metric tensor.[43] These numerous conventions can be superseded by using natural units where c = 1. Then space and time have equivalent units, and no factors of c appear anywhere.

Special relativity

46

3D spacetime
If we reduce the spatial dimensions to 2, so that we can represent the physics in a 3D space , we see that the null geodesics lie along a dual-cone (see image right) defined by the equation;

or simply , which is the equation of a circle of radius c dt.


Three dimensional dual-cone.

4D spacetime
If we extend this to three spatial dimensions, the null geodesics are the 4-dimensional cone:

so . This null dual-cone represents the "line of sight" of a point in space. That is, when we look at the stars and say "The light from that star which I am receiving is X years old", we are looking down this line of sight: a null geodesic. We are looking at an event a distance away and a time d/c in the past. For this
Null spherical space.

reason the null dual cone is also known as the 'light cone'. (The point in the lower left of the picture below represents the star, the origin represents the observer, and the line represents the null geodesic "line of sight".) The cone in the t region is the information that the point is 'receiving', while the cone in the +t section is the information that the point is 'sending'. The geometry of Minkowski space can be depicted using Minkowski diagrams, which are useful also in understanding many of the thought-experiments in special relativity.

Physics in spacetime
The equations of special relativity can be written in a manifestly covariant form. The position of an event in spacetime is given by a contravariant four vector with components: We define x0 = ct so that the time coordinate has the same dimension of distance as the other spatial dimensions; so that space and time are treated equally.[44][45][46] Superscripts are contravariant indices in this section rather than exponents except when they indicate a square (it should be clear from the context). Subscripts are covariant indices which also range from zero to three, as with the four-gradient of a scalar field :

Special relativity

47

Transformations of physical quantities between reference frames


Coordinate transformations between inertial reference frames are given by the Lorentz transformation tensor . For the special case of motion along the x-axis:

which is simply the matrix of a boost (like a rotation) between the x and ct coordinates, where ' indicates the row and indicates the column, and

This can be generalized to a boost in any direction, and further to include rotations, at the cost of using spinors and gyrovectors, see Lorentz transformation for details. A transformation of a four-vector from one inertial frame to another (ignoring translations for simplicity) is given by the Lorentz transformation:

where there is an implied summation of ' and ' from 0 to 3. The inverse transformation is:

where

is the reciprocal matrix of

In the case of the Lorentz transformations above in the x-direction:

More generally, most physical quantities are best described as (components of) tensors. So to transform from one frame to another, we use the well-known tensor transformation law[47]

where

is the reciprocal matrix of

. All tensors transform by this rule.

Metric
Given the four-dimensional nature of spacetime the Minkowski metric has components (valid in any inertial reference frame) which can be arranged in a 4 4 matrix:

which is equal to its reciprocal,

, in those frames.

The Poincar group is the most general group of transformations which preserves the Minkowski metric

and this is the physical symmetry underlying special relativity.

Special relativity

48

Invariance
The squared length of the differential of the position four-vector constructed using

is an invariant. Being invariant means that it takes the same value in all inertial frames, because it is a scalar (0 rank tensor), and so no appears in its trivial transformation. Notice that when the line element dx2 is negative that is the differential of proper time, while when dx2 is positive, (dx2) is differential of the proper distance. The primary value of expressing the equations of physics in a tensor form is that they are then manifestly invariant under the Poincar group, so that we do not have to do a special and tedious calculation to check that fact. Also in constructing such equations we often find that equations previously thought to be unrelated are, in fact, closely connected being part of the same tensor equation.

Velocity and acceleration in 4D


Recognising other physical quantities as tensors also simplifies their transformation laws. First note that the velocity four-vector U is given by

Recognising this, we can turn the awkward looking law about composition of velocities into a simple statement about transforming the velocity four-vector of one particle from one frame to another. U also has an invariant form:

So all velocity four-vectors have a magnitude of c. This is an expression of the fact that there is no such thing as being at coordinate rest in relativity: at the least, you are always moving forward through time. The acceleration 4-vector is given by

Given this, differentiating the above equation by produces

So in relativity, the acceleration four-vector and the velocity four-vector are orthogonal.

Momentum in 4D
The momentum and energy combine into a covariant 4-vector:

where m is the invariant mass. The invariant magnitude of the momentum 4-vector is:

We can work out what this invariant is by first arguing that, since it is a scalar, it doesn't matter which reference frame we calculate it, and then by transforming to a frame where the total momentum is zero.

Special relativity We see that the rest energy is an independent invariant. A rest energy can be calculated even for particles and systems in motion, by translating to a frame in which momentum is zero. The rest energy is related to the mass according to the celebrated equation discussed above:

49

Note that the mass of systems measured in their center of momentum frame (where total momentum is zero) is given by the total energy of the system in this frame. It may not be equal to the sum of individual system masses measured in other frames.

Force in 4D
To use Newton's third law of motion, both forces must be defined as the rate of change of momentum with respect to the same time coordinate. That is, it requires the 3D force defined above. Unfortunately, there is no tensor in 4D which contains the components of the 3D force vector among its components. If a particle is not traveling at c, one can transform the 3D force from the particle's co-moving reference frame into the observer's reference frame. This yields a 4-vector called the four-force. It is the rate of change of the above energy momentum four-vector with respect to proper time. The covariant version of the four-force is:

where is the proper time. In the rest frame of the object, the time component of the four force is zero unless the "invariant mass" of the object is changing (this requires a non-closed system in which energy/mass is being directly added or removed from the object) in which case it is the negative of that rate of change of mass, times c. In general, though, the components of the four force are not equal to the components of the three-force, because the three force is defined by the rate of change of momentum with respect to coordinate time, i.e. dp/dt while the four force is defined by the rate of change of momentum with respect to proper time, i.e. dp/d. In a continuous medium, the 3D density of force combines with the density of power to form a covariant 4-vector. The spatial part is the result of dividing the force on a small cell (in 3-space) by the volume of that cell. The time component is 1/c times the power transferred to that cell divided by the volume of the cell. This will be used below in the section on electromagnetism.

Relativity and unifying electromagnetism


Theoretical investigation in classical electromagnetism led to the discovery of wave propagation. Equations generalizing the electromagnetic effects found that finite propagation-speed of the E and B fields required certain behaviors on charged particles. The general study of moving charges forms the LinardWiechert potential, which is a step towards special relativity. The Lorentz transformation of the electric field of a moving charge into a non-moving observer's reference frame results in the appearance of a mathematical term commonly called the magnetic field. Conversely, the magnetic field generated by a moving charge disappears and becomes a purely electrostatic field in a comoving frame of reference. Maxwell's equations are thus simply an empirical fit to special relativistic effects in a classical model of the Universe. As electric and magnetic fields are reference frame dependent and thus intertwined, one speaks of electromagnetic fields. Special relativity provides the transformation rules for how an electromagnetic field in one inertial frame appears in another inertial frame. Maxwell's equations in the 3D form are already consistent with the physical content of special relativity, although they are easier to manipulate in a manifestly covariant form, i.e. in the language of tensor calculus.[48] See main links

Special relativity for more detail.

50

Status
Special relativity in its Minkowski spacetime is accurate only when the absolute value of the gravitational potential is much less than c2 in the region of interest.[49] In a strong gravitational field, one must use general relativity. General relativity becomes special relativity at the limit of weak field. At very small scales, such as at the Planck length and below, quantum effects must be taken into consideration resulting in quantum gravity. However, at macroscopic scales and in the absence of strong gravitational fields, special relativity is experimentally tested to extremely high degree of accuracy (1020)[50] and thus accepted by the physics community. Experimental results which appear to contradict it are not reproducible and are thus widely believed to be due to experimental errors. Special relativity is mathematically self-consistent, and it is an organic part of all modern physical theories, most notably quantum field theory, string theory, and general relativity (in the limiting case of negligible gravitational fields). Newtonian mechanics mathematically follows from special relativity at small velocities (compared to the speed of light) thus Newtonian mechanics can be considered as a special relativity of slow moving bodies. See classical mechanics for a more detailed discussion. Several experiments predating Einstein's 1905 paper are now interpreted as evidence for relativity. Of these it is known Einstein was aware of the Fizeau experiment before 1905,[51] and historians have concluded that Einstein was at least aware of the MichelsonMorley experiment as early as 1899 despite claims he made in his later years that it played no role in his development of the theory.[19] The Fizeau experiment (1851, repeated by Michelson and Morley in 1886) measured the speed of light in moving media, with results that are consistent with relativistic addition of colinear velocities. The famous MichelsonMorley experiment (1881, 1887) gave further support to the postulate that detecting an absolute reference velocity was not achievable. It should be stated here that, contrary to many alternative claims, it said little about the invariance of the speed of light with respect to the source and observer's velocity, as both source and observer were travelling together at the same velocity at all times. The TroutonNoble experiment (1903) showed that the torque on a capacitor is independent of position and inertial reference frame. The Experiments of Rayleigh and Brace (1902, 1904) showed that length contraction doesn't lead to birefringence for a co-moving observer, in accordance with the relativity principle. Particle accelerators routinely accelerate and measure the properties of particles moving at near the speed of light, where their behavior is completely consistent with relativity theory and inconsistent with the earlier Newtonian mechanics. These machines would simply not work if they were not engineered according to relativistic principles. In addition, a considerable number of modern experiments have been conducted to test special relativity. Some examples: Tests of relativistic energy and momentum testing the limiting speed of particles IvesStilwell experiment testing relativistic Doppler effect and time dilation Time dilation of moving particles relativistic effects on a fast-moving particle's half-life KennedyThorndike experiment time dilation in accordance with Lorentz transformations HughesDrever experiment testing isotropy of space and mass Modern searches for Lorentz violation various modern tests Experiments to test emission theory demonstrated that the speed of light is independent of the speed of the emitter.

Experiments to test the aether drag hypothesis no "aether flow obstruction".

Special relativity

51

Relativistic quantum mechanics


Special relativity can be combined with quantum theory to form relativistic quantum mechanics. It is an unsolved question how general relativity and quantum mechanics can be unified; quantum gravitation is an active area in theoretical research. The early Bohr-Sommerfeld atomic model explained the fine structure of alkaline atoms by using both special relativity and the preliminary knowledge on quantum mechanics of the time. Paul Dirac developed a relativistic wave equation now known as the Dirac equation in his honour,[52] fully compatible both with special relativity and with the final version of quantum theory existing after 1926. This theory explained not only the intrinsic angular momentum of the electrons called spin, a property which can only be stated, but not explained by non-relativistic quantum mechanics, and led to the prediction of the antiparticle of the electron, the positron.[52][53] Also the fine structure could only be fully explained with special relativity. On the other hand, the existence of antiparticles leads to the conclusion that a naive unification of quantum mechanics (as originally formulated by Erwin Schrdinger, Werner Heisenberg, and many others) with special relativity is not possible. Instead, a theory of quantized fields is necessary; where particles can be created and destroyed throughout space, as in quantum electrodynamics and quantum chromodynamics. These elements merge in the standard model of particle physics.

References
[1] Albert Einstein (1905) " Zur Elektrodynamik bewegter Krper (http:/ / www. pro-physik. de/ Phy/ pdfs/ ger_890_921. pdf)", Annalen der Physik 17: 891; English translation On the Electrodynamics of Moving Bodies (http:/ / www. fourmilab. ch/ etexts/ einstein/ specrel/ www/ ) by George Barker Jeffery and Wilfrid Perrett (1923); Another English translation On the Electrodynamics of Moving Bodies by Megh Nad Saha (1920). [2] Edwin F. Taylor and John Archibald Wheeler (1992). Spacetime Physics: Introduction to Special Relativity. W. H. Freeman. ISBN0-7167-2327-1. [3] Wolfgang Rindler (1977). Essential Relativity (http:/ / books. google. com/ ?id=0J_dwCmQThgC& pg=PT148). Birkhuser. p.1,11 p. 7. ISBN3-540-07970-X. . [4] Tom Roberts and Siegmar Schleif (October 2007). "What is the experimental basis of Special Relativity?" (http:/ / www. edu-observatory. org/ physics-faq/ Relativity/ SR/ experiments. html). Usenet Physics FAQ. . Retrieved 2008-09-17. [5] Albert Einstein (2001). Relativity: The Special and the General Theory (http:/ / books. google. com/ ?id=idb7wJiB6SsC& pg=PA50) (Reprint of 1920 translation by Robert W. Lawson ed.). Routledge. p.48. ISBN0-415-25384-5. . [6] Richard Phillips Feynman (1998). Six Not-so-easy Pieces: Einstein's relativity, symmetry, and space-time (http:/ / books. google. com/ ?id=ipY8onVQWhcC& pg=PA68) (Reprint of 1995 ed.). Basic Books. p.68. ISBN0-201-32842-9. . [7] Albert Einstein, Relativity The Special and General Theory, chapter 18 (http:/ / www. marxists. org/ reference/ archive/ einstein/ works/ 1910s/ relative/ ch18. htm) [8] Charles W. Misner, Kip S. Thorne & John A. Wheeler,Gravitation, pg 172, 6.6 The local coordinate system of an accelerated observer, ISBN 0-7167-0344-0 [9] Einstein, Autobiographical Notes, 1949. [10] Einstein, "Fundamental Ideas and Methods of the Theory of Relativity", 1920 [11] For a survey of such derivations, see Lucas and Hodgson, Spacetime and Electromagnetism, 1990 [12] Einstein, A., Lorentz, H. A., Minkowski, H., & Weyl, H. (1952). The Principle of Relativity: a collection of original memoirs on the special and general theory of relativity (http:/ / books. google. com/ ?id=yECokhzsJYIC& pg=PA111). Courier Dover Publications. p.111. ISBN0-486-60081-5. . [13] Einstein, On the Relativity Principle and the Conclusions Drawn from It, 1907; "The Principle of Relativity and Its Consequences in Modern Physics", 1910; "The Theory of Relativity", 1911; Manuscript on the Special Theory of Relativity, 1912; Theory of Relativity, 1913; Einstein, Relativity, the Special and General Theory, 1916; The Principle Ideas of the Theory of Relativity, 1916; What Is The Theory of Relativity?, 1919; The Principle of Relativity (Princeton Lectures), 1921; Physics and Reality, 1936; The Theory of Relativity, 1949. [14] Das, A., The Special Theory of Relativity, A Mathematical Exposition, Springer, 1993. [15] Schutz, J., Independent Axioms for Minkowski Spacetime, 1997. [16] Yaakov Friedman, Physical Applications of Homogeneous Balls, Progress in Mathematical Physics 40 Birkhuser, Boston, 2004, pages 1-21. [17] David Morin, Introduction to Classical Mechanics, Cambridge University Press, Cambridge, 2007, chapter 11, Appendix I [18] Michael Polanyi, Personal Knowledge: Towards a Post-Critical Philosophy, 1974, ISBN 0-226-67288-3, footnote page 10-11: Einstein reports, via Dr N Balzas in response to Polanyi's query, that "The MichelsonMorely experiment had no role in the foundation of the theory."

Special relativity
and "..the theory of relativity was not founded to explain its outcome at all." (http:/ / books. google. com/ books?id=0Rtu8kCpvz4C& lpg=PP1& pg=PT19#v=onepage& q=& f=false) [19] Dongen, Jeroen van (2009). "On the role of the MichelsonMorley experiment: Einstein in Chicago" (http:/ / philsci-archive. pitt. edu/ 4778/ 1/ Einstein_Chicago_Web2. pdf). Eprint arXiv:0908.1545 0908: 1545. arXiv:0908.1545. Bibcode2009arXiv0908.1545V. . [20] Resnick, Robert (1968). Introduction to special relativity (http:/ / books. google. com/ books?id=fsIRAQAAIAAJ). Wiley. pp.6263. . [21] Kleppner, Daniel; Kolenkow, David (1973). An Introduction to Mechanics. pp.46870. [22] Does the inertia of a body depend upon its energy content? (http:/ / www. fourmilab. ch/ etexts/ einstein/ E_mc2/ www/ ) A. Einstein, Annalen der Physik. 18:639, 1905 (English translation by W. Perrett and G.B. Jeffery) [23] Max Jammer (1997). Concepts of Mass in Classical and Modern Physics (http:/ / books. google. com/ ?id=lYvz0_8aGsMC& pg=PA177). Courier Dover Publications. pp.177178. ISBN0-486-29998-8. . [24] John J. Stachel (2002). Einstein from B to Z (http:/ / books. google. com/ ?id=OAsQ_hFjhrAC& pg=PA215). Springer. p.221. ISBN0-8176-4143-2. . [25] On the Inertia of Energy Required by the Relativity Principle (http:/ / www. webcitation. org/ query?url=http:/ / www. geocities. com/ physics_world/ abstracts/ Einstein_1907A_abstract. htm& date=2009-10-26+ 00:34:19), A. Einstein, Annalen der Physik 23 (1907): 371-384 [26] In a letter to Carl Seelig in 1955, Einstein wrote "I had already previously found that Maxwell's theory did not account for the micro-structure of radiation and could therefore have no general validity.", Einstein letter to Carl Seelig, 1955. [27] Gibbs, Philip; Koks, Don. "The Relativistic Rocket" (http:/ / math. ucr. edu/ home/ baez/ physics/ Relativity/ SR/ rocket. html). . Retrieved 30 August 2012. [28] http:/ / library. thinkquest. org/ C0116043/ specialtheorytext. htm Thinkquest org [29] R. C. Tolman, The theory of the Relativity of Motion, (Berkeley 1917), p. 54 [30] G. A. Benford, D. L. Book, and W. A. Newcomb, The Tachyonic Antitelephone, Phys. Rev. D 2, 263265 (1970) article (http:/ / link. aps. org/ abstract/ PRD/ v2/ p263) [31] Salmon, Wesley C. (2006). Four Decades of Scientific Explanation (http:/ / books. google. com/ books?id=FHqOXCd06e8C). University of Pittsburgh. p.107. ISBN0-8229-5926-7. ., Section 3.7 page 107 (http:/ / books. google. com/ books?id=FHqOXCd06e8C& pg=PA107) [32] F. Low and P. Mende, A Note on the Tunneling Time Problem, Ann. Phys. NY, 210, 380-387 (1991) [33] A. Enders and G. Nimtz, On superluminal barrier traversal, J. Phys. I, France 2, 1693-1698 (1992) [34] S. Longhi et al., Measurement of superluminal optical tunneling times in double-barrier photonic band gaps, Phys.Rev. E, 65, 06610 1-6 (2002) [35] P. Eckle et al., Attosecond Ionization and Tunneling Delay Time Measurements in Helium, Science, 322, 1525-1529 (2008) [36] G. Nimtz, Do Evanescent Modes Violate Relativistic Causality?, Lect.Notes Phys. 702, 506-531 (2006) [37] G. Nimtz, Tunneling Violates Special Relativity, arXiv:1003.3944v1 [38] Herbert Winful (2007-09-18). "Comment on "Macroscopic violation of special relativity" by Nimtz and Stahlhofen". arXiv:0709.2736[quant-ph]. [39] Chris Lee (2007-08-16). "Latest "faster than the speed of light" claims wrong (again)" (http:/ / arstechnica. com/ news. ars/ post/ 20070816-faster-than-the-speed-of-light-no-i-dont-think-so. html). . [40] Winful, Herbert G. (December 2006). "Tunneling time, the Hartman effect, and superluminality: A proposed resolution of an old paradox" (http:/ / sitemaker. umich. edu/ herbert. winful/ files/ physics_reports_review_article__2006_. pdf). Physics Reports 436 (1-2): 169. Bibcode2006PhR...436....1W. doi:10.1016/j.physrep.2006.09.002. . [41] J.A. Wheeler, C. Misner, K.S. Thorne (1973). Gravitation. W.H. Freeman & Co. p.58. ISBN0-7167-0344-0. [42] J.R. Forshaw, A.G. Smith (2009). Dynamics and Relativity. Wiley. p.247. ISBN978-0-470-01460-8. [43] R. Penrose (2007). The Road to Reality. Vintage books. ISBN0-679-77631-1. [44] Jean-Bernard Zuber & Claude Itzykson, Quantum Field Theory, pg 5 , ISBN 0-07-032071-3 [45] Charles W. Misner, Kip S. Thorne & John A. Wheeler,Gravitation, pg 51, ISBN 0-7167-0344-0 [46] George Sterman, An Introduction to Quantum Field Theory, pg 4 , ISBN 0-521-31132-2 [47] M. Carroll, Sean (2004). Spacetime and Geometry: An Introduction to General Relativity (http:/ / books. google. com/ books?id=1SKFQgAACAAJ) (illustrated ed.). Addison Wesley. p.22. ISBN0-8053-8732-3. . [48] E. J. Post (1962). Formal Structure of Electromagnetics: General Covariance and Electromagnetics. Dover Publications Inc.. ISBN0-486-65427-3. [49] Grn, yvind; Hervik, Sigbjrn (2007). Einstein's general theory of relativity: with modern applications in cosmology (http:/ / books. google. com/ books?id=IyJhCHAryuUC). Springer. p.195. ISBN0-387-69199-5. ., Extract of page 195 (with units where c=1) (http:/ / books. google. com/ books?id=IyJhCHAryuUC& pg=PA195) [50] The number of works is vast, see as example: Sidney Coleman, Sheldon L. Glashow, Cosmic Ray and Neutrino Tests of Special Relativity, Phys. Lett. B405 (1997) 249-252, online (http:/ / arxiv. org/ abs/ hep-ph/ 9703240) An overview can be found on this page (http:/ / www. edu-observatory. org/ physics-faq/ Relativity/ SR/ experiments. html) [51] Norton, John D., John D. (2004), "Einstein's Investigations of Galilean Covariant Electrodynamics prior to 1905" (http:/ / philsci-archive. pitt. edu/ archive/ 00001743/ ), Archive for History of Exact Sciences 59: 45105, Bibcode2004AHES...59...45N, doi:10.1007/s00407-004-0085-6,

52

Special relativity
[52] Dirac, P.A.M. (1930). "A Theory of Electrons and Protons". Proc. R. Soc. A126: 360. Bibcode1930RSPSA.126..360D. doi:10.1098/rspa.1930.0013. JSTOR95359. [53] C.D. Anderson: The Positive Electron. Phys. Rev. 43, 491-494 (1933)

53

Textbooks
Einstein, Albert (1920). Relativity: The Special and General Theory. Einstein, Albert (1996). The Meaning of Relativity. Fine Communications. ISBN 1-56731-136-9 Freund, Jrgen (2008) Special Relativity for Beginners - A Textbook for Undergraduates (http://www.relativity. ch) World Scientific. ISBN 981-277-160-3 Logunov, Anatoly A. (2005) Henri Poincar and the Relativity Theory (http://arxiv.org/pdf/physics/0408077) (transl. from Russian by G. Pontocorvo and V. O. Soleviev, edited by V. A. Petrov) Nauka, Moscow. Charles Misner, Kip Thorne, and John Archibald Wheeler (1971) Gravitation. W. H. Freeman & Co. ISBN 0-7167-0334-3 Post, E.J., 1997 (1962) Formal Structure of Electromagnetics: General Covariance and Electromagnetics. Dover Publications. Wolfgang Rindler (1991). Introduction to Special Relativity (2nd ed.), Oxford University Press. ISBN 978-0-19-853952-0; ISBN 0-19-853952-5 Harvey R. Brown (2005). Physical relativity: space-time structure from a dynamical perspective, Oxford University Press, ISBN 0-19-927583-1; ISBN 978-0-19-927583-0 Qadir, Asghar (1989). Relativity: An Introduction to the Special Theory (http://books.google.com/ ?id=X5YofYrqFoAC&printsec=frontcover&dq=Relativity:+An+Introduction+to+the+Special+Theory+by+ Asghar+Qadir#v=onepage&q&f=false). Singapore: World Scientific Publications. pp.128. ISBN9971-5-0612-2. Silberstein, Ludwik (1914) The Theory of Relativity. Lawrence Sklar (1977). Space, Time and Spacetime (http://books.google.com/?id=cPLXqV3QwuMC& pg=PA206). University of California Press. ISBN0-520-03174-1. Lawrence Sklar (1992). Philosophy of Physics (http://books.google.com/?id=L3b_9PGnkMwC&pg=PA74). Westview Press. ISBN0-8133-0625-6. Taylor, Edwin, and John Archibald Wheeler (1992) Spacetime Physics (2nd ed.). W.H. Freeman & Co. ISBN 0-7167-2327-1 Tipler, Paul, and Llewellyn, Ralph (2002). Modern Physics (4th ed.). W. H. Freeman & Co. ISBN 0-7167-4345-0

Journal articles
Alvager, et al.; Farley, F. J. M.; Kjellman, J.; Wallin, L. (1964). "Test of the Second Postulate of Special Relativity in the GeV region". Physics Letters 12 (3): 260. Bibcode1964PhL....12..260A. doi:10.1016/0031-9163(64)91095-9. Darrigol, Olivier (2004). "The Mystery of the Poincar-Einstein Connection". Isis 95 (4): 61426. doi:10.1086/430652. PMID16011297. Feigenbaum, Mitchell (2008). "The Theory of Relativity - Galileo's Child". Eprint arXiv:0806.1234 0806: 1234. arXiv:0806.1234. Bibcode2008arXiv0806.1234F. Gulevich, D. R. et al.; Kusmartsev, F. V.; Savel'Ev, Sergey; Yampol'Skii, V. A.; Nori, Franco (2008). "Shape waves in 2D Josephson junctions: Exact solutions and time dilation". Phys. Rev. Lett. 101 (12): 127002. arXiv:0808.1514. Bibcode2008PhRvL.101l7002G. doi:10.1103/PhysRevLett.101.127002. PMID18851404. Rizzi, G. et al. (2005). "Synchronization Gauges and the Principles of Special Relativity". Found. Phys 34: 183587. arXiv:gr-qc/0409105. Bibcode2004FoPh...34.1835R. doi:10.1007/s10701-004-1624-3. Wolf, Peter; Petit, Gerard (1997). "Satellite test of Special Relativity using the Global Positioning System". Physical Review A 56 (6): 440509. Bibcode1997PhRvA..56.4405W. doi:10.1103/PhysRevA.56.4405.

Special relativity

54

External links
Original works
Zur Elektrodynamik bewegter Krper (http://www.physik.uni-augsburg.de/annalen/history/einstein-papers/ 1905_17_891-921.pdf) Einstein's original work in German, Annalen der Physik, Bern 1905 On the Electrodynamics of Moving Bodies (http://www.fourmilab.ch/etexts/einstein/specrel/specrel.pdf) English Translation as published in the 1923 book The Principle of Relativity.

Special relativity for a general audience (no mathematical knowledge required)


Wikibooks: Special Relativity (http://en.wikibooks.org/wiki/Special_Relativity) Einstein Light (http://www.phys.unsw.edu.au/einsteinlight) An award (http://www.sciam.com/article. cfm?chanID=sa004&articleID=0005CFF9-524F-1340-924F83414B7F0000)-winning, non-technical introduction (film clips and demonstrations) supported by dozens of pages of further explanations and animations, at levels with or without mathematics. Einstein Online (http://www.einstein-online.info/en/elementary/index.html) Introduction to relativity theory, from the Max Planck Institute for Gravitational Physics. Audio: Cain/Gay (2006) - Astronomy Cast (http://www.astronomycast.com/astronomy/ einsteins-theory-of-special-relativity/). Einstein's Theory of Special Relativity

Special relativity explained (using simple or more advanced mathematics)


Greg Egan's Foundations (http://gregegan.customer.netspace.net.au/FOUNDATIONS/01/found01.html). The Hogg Notes on Special Relativity (http://cosmo.nyu.edu/hogg/sr/) A good introduction to special relativity at the undergraduate level, using calculus. Relativity Calculator: Special Relativity (http://www.relativitycalculator.com/E=mc2.shtml) - An algebraic and integral calculus derivation for E = mc2. Motion Mountain, Volume II (http://www.motionmountain.net/download.html) - A modern introduction to relativity, including its visual effects. MathPages - Reflections on Relativity (http://www.mathpages.com/rr/rrtoc.htm) A complete online book on relativity with an extensive bibliography. Relativity (http://www.lightandmatter.com/html_books/0sn/ch07/ch07.html) An introduction to special relativity at the undergraduate level, without calculus. Relativity: the Special and General Theory at Project Gutenberg, by Albert Einstein Special Relativity Lecture Notes (http://www.phys.vt.edu/~takeuchi/relativity/notes) is a standard introduction to special relativity containing illustrative explanations based on drawings and spacetime diagrams from Virginia Polytechnic Institute and State University. Understanding Special Relativity (http://www.rafimoor.com/english/SRE.htm) The theory of special relativity in an easily understandable way. An Introduction to the Special Theory of Relativity (http://digitalcommons.unl.edu/physicskatz/49/) (1964) by Robert Katz, "an introduction ... that is accessible to any student who has had an introduction to general physics and some slight acquaintance with the calculus" (130 pp; pdf format). Lecture Notes on Special Relativity (http://www.physics.mq.edu.au/~jcresser/Phys378/LectureNotes/ VectorsTensorsSR.pdf) by J D Cresser Department of Physics Macquarie University.

Special relativity

55

Visualization
Raytracing Special Relativity (http://www.hakenberg.de/diffgeo/special_relativity.htm) Software visualizing several scenarios under the influence of special relativity. Real Time Relativity (http://www.anu.edu.au/Physics/Savage/RTR/) The Australian National University. Relativistic visual effects experienced through an interactive program. Spacetime travel (http://www.spacetimetravel.org) A variety of visualizations of relativistic effects, from relativistic motion to black holes. Through Einstein's Eyes (http://www.anu.edu.au/Physics/Savage/TEE/) The Australian National University. Relativistic visual effects explained with movies and images. Warp Special Relativity Simulator (http://www.adamauton.com/warp/) A computer program to show the effects of traveling close to the speed of light. Animation clip (http://www.youtube.com/watch?v=C2VMO7pcWhg) visualizing the Lorentz transformation. Original interactive FLASH Animations (http://math.ucr.edu/~jdp/Relativity/SpecialRelativity.html) from John de Pillis illustrating Lorentz and Galilean frames, Train and Tunnel Paradox, the Twin Paradox, Wave Propagation, Clock Synchronization, etc. Relativistic Optics at the ANU (http://www.anu.edu.au/physics/Searle/)

Intuitionism
In the philosophy of mathematics, intuitionism, or neointuitionism (opposed to preintuitionism), is an approach to mathematics as the constructive mental activity of humans. That is, mathematics does not consist of analytic activities wherein deep properties of existence are revealed and applied. Instead, logic and mathematics are the application of internally consistent methods to realize more complex mental constructs.

Truth and proof


The fundamental distinguishing characteristic of intuitionism is its interpretation of what it means for a mathematical statement to be true. In Brouwer's original intuitionism, the truth of a mathematical statement is a subjective claim: a mathematical statement corresponds to a mental construction, and a mathematician can assert the truth of a statement only by verifying the validity of that construction by intuition. The vagueness of the intuitionistic notion of truth often leads to misinterpretations about its meaning. Kleene formally defined intuitionistic truth from a realist position, yet Brouwer would likely reject this formalization as meaningless, given his rejection of the realist/Platonist position. Intuitionistic truth therefore remains somewhat ill defined. Regardless of how it is interpreted, intuitionism does not equate the truth of a mathematical statement with its provability. However, because the intuitionistic notion of truth is more restrictive than that of classical mathematics, the intuitionist must reject some assumptions of classical logic to ensure that everything he proves is in fact intuitionistically true. This gives rise to intuitionistic logic. To an intuitionist, the claim that an object with certain properties exists is a claim that an object with those properties can be constructed. Any mathematical object is considered to be a product of a construction of a mind, and therefore, the existence of an object is equivalent to the possibility of its construction. This contrasts with the classical approach, which states that the existence of an entity can be proved by refuting its non-existence. For the intuitionist, this is not valid; the refutation of the non-existence does not mean that it is possible to find a construction for the putative object, as is required in order to assert its existence. As such, intuitionism is a variety of mathematical constructivism; but it is not the only kind. The interpretation of negation is different in intuitionist logic than in classical logic. In classical logic, the negation of a statement asserts that the statement is false; to an intuitionist, it means the statement is refutable[1] (e.g., that

Intuitionism there is a counterexample). There is thus an asymmetry between a positive and negative statement in intuitionism. If a statement P is provable, then it is certainly impossible to prove that there is no proof of P. But even if it can be shown that no disproof of P is possible, we cannot conclude from this absence that there is a proof of P. Thus P is a stronger statement than not-not-P. Similarly, to assert that A or B holds, to an intuitionist, is to claim that either A or B can be proved. In particular, the law of excluded middle, "A or not A", is not accepted as a valid principle. For example, if A is some mathematical statement that an intuitionist has not yet proved or disproved, then that intuitionist will not assert the truth of "A or not A". However, the intuitionist will accept that "A and not A" cannot be true. Thus the connectives "and" and "or" of intuitionistic logic do not satisfy de Morgan's laws as they do in classical logic. Intuitionistic logic substitutes constructability for abstract truth and is associated with a transition from the proof to model theory of abstract truth in modern mathematics. The logical calculus preserves justification, rather than truth, across transformations yielding derived propositions. It has been taken as giving philosophical support to several schools of philosophy, most notably the Anti-realism of Michael Dummett. Thus, contrary to the first impression its name might convey, and as realized in specific approaches and disciplines (e.g. Fuzzy Sets and Systems), intuitionist mathematics is more rigorous than conventionally founded mathematics, where, ironically, the foundational elements which Intuitionism attempts to construct/refute/refound are taken as intuitively given.

56

Intuitionism and infinity


Among the different formulations of intuitionism, there are several different positions on the meaning and reality of infinity. The term potential infinity refers to a mathematical procedure in which there is an unending series of steps. After each step has been completed, there is always another step to be performed. For example, consider the process of counting: 1, 2, 3, The term actual infinity refers to a completed mathematical object which contains an infinite number of elements. An example is the set of natural numbers, N = {1, 2, }. In Cantor's formulation of set theory, there are many different infinite sets, some of which are larger than others. For example, the set of all real numbers R is larger than N, because any procedure that you attempt to use to put the natural numbers into one-to-one correspondence with the real numbers will always fail: there will always be an infinite number of real numbers "left over". Any infinite set that can be placed in one-to-one correspondence with the natural numbers is said to be "countable" or "denumerable". Infinite sets larger than this are said to be "uncountable". Cantor's set theory led to the axiomatic system of ZFC, now the most common foundation of modern mathematics. Intuitionism was created, in part, as a reaction to Cantor's set theory. Modern constructive set theory does include the axiom of infinity from Zermelo-Fraenkel set theory (or a revised version of this axiom), and includes the set N of natural numbers. Most modern constructive mathematicians accept the reality of countably infinite sets (however, see Alexander Esenin-Volpin for a counter-example). Brouwer rejected the concept of actual infinity, but admitted the idea of potential infinity. "According to Weyl 1946, 'Brouwer made it clear, as I think beyond any doubt, that there is no evidence supporting the belief in the existential character of the totality of all natural numbers ... the sequence of numbers which grows beyond any stage already reached by passing to the next number, is a manifold of possibilities open towards infinity; it remains forever in the status of creation, but is not a closed realm of things existing in themselves. That we blindly converted one into the other is the true source of our difficulties, including the antinomies a source of more fundamental nature than Russell's vicious circle principle indicated. Brouwer opened our eyes and made us see how far classical mathematics, nourished by a belief in the 'absolute' that transcends all human possibilities of realization, goes beyond such statements as can claim real meaning and truth founded on evidence." (Kleene (1952): Introduction to Metamathematics, p. 48-49)

Intuitionism Finitism is an extreme version of Intuitionism that rejects the idea of potential infinity. According to Finitism, a mathematical object does not exist unless it can be constructed from the natural numbers in a finite number of steps.

57

History of Intuitionism
Intuitionism's history can be traced to two controversies in nineteenth century mathematics. The first of these was the invention of transfinite arithmetic by Georg Cantor and its subsequent rejection by a number of prominent mathematicians including most famously his teacher Leopold Kronecker a confirmed finitist. The second of these was Gottlob Frege's effort to reduce all of mathematics to a logical formulation via set theory and its derailing by a youthful Bertrand Russell, the discoverer of Russell's paradox. Frege had planned a three volume definitive work, but shortly after the first volume had been published, Russell sent Frege a letter outlining his paradox which demonstrated that one of Frege's rules of self-reference was self-contradictory. Frege, the story goes, plunged into depression and did not publish the second and third volumes of his work as he had planned. For more see Davis (2000) Chapters 3 and 4: Frege: From Breakthrough to Despair and Cantor: Detour through Infinity. See van Heijenoort for the original works and van Heijenoort's commentary. These controversies are strongly linked as the logical methods used by Cantor in proving his results in transfinite arithmetic are essentially the same as those used by Russell in constructing his paradox. Hence how one chooses to resolve Russell's paradox has direct implications on the status accorded to Cantor's transfinite arithmetic. In the early twentieth century L. E. J. Brouwer represented the intuitionist position and David Hilbert the formalist position see van Heijenoort. Kurt Gdel offered opinions referred to as Platonist (see various sources re Gdel). Alan Turing considers: "non-constructive systems of logic with which not all the steps in a proof are mechanical, some being intuitive". (Turing 1939, reprinted in Davis 2004, p.210) Later, Stephen Cole Kleene brought forth a more rational consideration of intuitionism in his Introduction to Meta-mathematics (1952).

Contributors to intuitionism
L. E. J. Brouwer Michael Dummett Arend Heyting Stephen Kleene

Branches of intuitionistic mathematics


Intuitionistic logic Intuitionistic arithmetic Intuitionistic type theory Intuitionistic set theory Intuitionistic analysis

Intuitionism

58

References
[1] Imre Lakatos (1976) Proofs and Refutations

Further reading
"Analysis." Encyclopdia Britannica. 2006. Encyclopdia Britannica 2006 Ultimate Reference Suite DVD 15 June 2006, "Constructive analysis" (Ian Stewart, author) W. S. Anglin, Mathematics: A Concise history and Philosophy, Springer-Verlag, New York, 1994. In Chapter 39 Foundations, with respect to the 20th century Anglin gives very precise, short descriptions of Platonism (with respect to Godel), Formalism (with respect to Hilbert), and Intuitionism (with respect to Brouwer). Martin Davis (ed.) (1965), The Undecidable, Raven Press, Hewlett, NY. Compilation of original papers by Gdel, Church, Kleene, Turing, Rosser, and Post. Republished as Davis, Martin, ed. (2004). The Undecidable. Courier Dover Publications. ISBN978-0-486-43228-1. Martin Davis (2000). Engines of Logic: Mathematicians and the origin of the Computer (1st edition ed.). W. W. Norton & Company, New York. ISBN0-393-32229-7 pbk.. John W. Dawson Jr., Logical Dilemmas: The Life and Work of Kurt Gdel, A. K. Peters, Wellesley, MA, 1997. Less readable than Goldstein but, in Chapter III Excursis, Dawson gives an excellent "A Capsule History of the Development of Logic to 1928". Rebecca Goldstein, Incompleteness: The Proof and Paradox of Kurt Godel, Atlas Books, W.W. Norton, New York, 2005. In Chapter II Hilbert and the Formalists Goldstein gives further historical context. As a Platonist Gdel was reticent in the presence of the logical positivism of the Vienna Circle. She discusses Wittgenstein's impact and the impact of the formalists. Goldstein notes that the intuitionists were even more opposed to Platonism than Formalism. van Heijenoort, J., From Frege to Gdel, A Source Book in Mathematical Logic, 1879-1931, Harvard University Press, Cambridge, MA, 1967. Reprinted with corrections, 1977. The following papers appear in van Heijenoort: L.E.J. Brouwer, 1923, On the significance of the principle of excluded middle in mathematics, especially in function theory [reprinted with commentary, p. 334, van Heijenoort] Andrei Nikolaevich Kolmogorov, 1925, On the principle of excluded middle, [reprinted with commentary, p. 414, van Heijenoort] L.E.J. Brouwer, 1927, On the domains of definitions of functions, [reprinted with commentary, p. 446, van Heijenoort] Although not directly germane, in his (1923) Brouwer uses certain words defined in this paper. L.E.J. Brouwer, 1927(2), Intuitionistic reflections on formalism, [reprinted with commentary, p. 490, van Heijenoort] Jacques Herbrand, (1931b), "On the consistency of arithmetic", [reprinted with commentary, p. 618ff, van Heijenoort] From van Heijenoort's commentary it is unclear whether or not Herbrand was a true "intuitionist"; Gdel (1963) asserted that indeed "...Herbrand was an intuitionist". But van Heijenoort says Herbrand's conception was "on the whole much closer to that of Hilbert's word 'finitary' ('finit') that to "intuitionistic" as applied to Brouwer's doctrine". Hesseling, Dennis E. (2003). Gnomes in the Fog. The Reception of Brouwer's Intuitionism in the 1920s. Birkhuser. ISBN3-7643-6536-6.

Intuitionism Arend Heyting: Heyting, Arend (1971) [1956]. Intuitionism: An Introduction (3d rev. ed. ed.). Amsterdam: North-Holland Pub. Co. ISBN0-7204-2239-6. Kleene, Stephen C. (1991) [1952]. Introduction to Meta-Mathematics (Tenth impression 1991 ed.). Amsterdam NY: North-Holland Pub. Co. ISBN0-7204-2103-9. In Chapter III A Critique of Mathematic Reasoning, 11. The paradoxes, Kleene discusses Intuitionism and Formalism in depth. Throughout the rest of the book he treats, and compares, both Formalist (classical) and Intuitionist logics with an emphasis on the former. Extraordinary writing by an extraordinary mathematician. Stephen Cole Kleene and Richard Eugene Vesley, The Foundations of Intuistionistic Mathematics, North-Holland Publishing Co. Amsterdam, 1965. The lead sentence tells it all "The constructive tendency in mathematics...". A text for specialists, but written in Kleene's wonderfully-clear style. Hilary Putnam and Paul Benacerraf, Philosophy of Mathematics: Selected Readings, Englewood Cliffs, N.J.: Prentice-Hall, 1964. 2nd ed., Cambridge: Cambridge University Press, 1983. ISBN 0-521-29648-X Part I. The foundation of mathematics, Symposium on the foundations of mathematics Rudolf Carnap, The logicist foundations of mathematics, p. 41 Arend Heyting, The intuitionist foundations of mathematics, p. 52 Johann von Neumann, The formalist foundations of mathematics, p. 61 Arend Heyting, Disputation, p. 66 L. E. J. Brouwer, Intuitionnism and formalism, p. 77 L. E. J. Brouwer, Consciousness, philosophy, and mathematics, p. 90 Constance Reid, Hilbert, Copernicus - Springer-Verlag, 1st edition 1970, 2nd edition 1996. Definitive biography of Hilbert places his "Program" in historical context together with the subsequent fighting, sometimes rancorous, between the Intuitionists and the Formalists. Paul Rosenbloom, The Elements of Mathematical Logic, Dover Publications Inc, Mineola, New York, 1950. In a style more of Principia Mathematica many symbols, some antique, some from German script. Very good discussions of intuitionism in the following locations: pages 51-58 in Section 4 Many Valued Logics, Modal Logics, Intuitionism; pages 69-73 Chapter III The Logic of Propostional Functions Section 1 Informal Introduction; and p. 146-151 Section 7 the Axiom of Choice.

59

Secondary references
A. A. Markov (1954) Theory of algorithms. [Translated by Jacques J. Schorr-Kon and PST staff] Imprint Moscow, Academy of Sciences of the USSR, 1954 [i.e. Jerusalem, Israel Program for Scientific Translations, 1961; available from the Office of Technical Services, U.S. Dept. of Commerce, Washington] Description 444 p.28cm. Added t.p. in Russian Translation of Works of the Mathematical Institute, Academy of Sciences of the USSR, v. 42. Original title: Teoriya algorifmov. [QA248.M2943 Dartmouth College library. U.S. Dept. of Commerce, Office of Technical Services, number OTS 60-51085.] A secondary reference for specialists: Markov opined that "The entire significance for mathematics of rendering more precise the concept of algorithm emerges, however, in connection with the problem of a constructive foundation for mathematics....[p. 3, italics added.] Markov believed that further applications of his work "merit a special book, which the author hopes to write in the future" (p. 3). Sadly, said work apparently never appeared. Turing, Alan M. (1939). Systems of Logic Based on Ordinals.

Intuitionism

60

External links
Ten Questions about Intuitionism (http://www.intuitionism.org/)

Intuitionistic logic
Intuitionistic logic, or constructive logic, is a symbolic logic system differing from classical logic in its definition of the meaning of a statement being true. In classical logic, all well-formed statements are assumed to be either true or false, even if we do not have a proof of either. In constructive logic, a statement is 'only true' if there is a constructive proof that it is true, and 'only false' if there is a constructive proof that it is false. Operations in constructive logic preserve justification, rather than truth. Syntactically, intuitionistic logic is a restriction of classical logic in which the law of excluded middle and double negation elimination are not axioms of the system, and cannot be proved. There are several semantics commonly employed. One semantics mirrors classical Boolean-valued semantics but uses Heyting algebras in place of Boolean algebras. Another semantics uses Kripke models. Constructive logic is practically useful because its restrictions produce proofs that have the existence property, making it also suitable for other forms of mathematical constructivism. Informally, this means that given a constructive proof that an object exists, then that constructive proof can be turned into an algorithm for generating an example of it. Formalized intuitionistic logic was originally developed by Arend Heyting to provide a formal basis for Brouwer's programme of intuitionism.

Syntax
The syntax of formulas of intuitionistic logic is similar to propositional logic or first-order logic. However, intuitionistic connectives are not definable in terms of each other in the same way as in classical logic, hence their choice matters. In intuitionistic propositional logic it is customary to use , , , as the basic connectives, treating A as an abbreviation for (A ). In intuitionistic first-order logic both quantifiers , are needed. Many tautologies of classical logic can no longer be proven within intuitionistic logic. Examples include not only the law of excluded middle p p, but also Peirce's law ((p q) p) p, and even double negation elimination. In The RiegerNishimura lattice. Its nodes are the propositional formulas in one classical logic, both p p and also p p variable up to intuitionistic logical equivalence, ordered by intuitionistic are theorems. In intuitionistic logic, only the logical implication. former is a theorem: double negation can be introduced, but it cannot be eliminated. Rejecting p p may seem strange to those more familiar with classical logic, but proving this statement in constructive logic would require producing a proof for the truth or falsity of all possible statements, which is impossible for a variety of reasons.

Intuitionistic logic Because many classically valid tautologies are not theorems of intuitionistic logic, but all theorems of intuitionistic logic are valid classically, intuitionistic logic can be viewed as a weakening of classical logic, albeit one with many useful properties.

61

Sequent calculus
Gentzen discovered that a simple restriction of his system LK (his sequent calculus for classical logic) results in a system which is sound and complete with respect to intuitionistic logic. He called this system LJ. In LK any number of formulas is allowed to appear on the conclusion side of a sequent; in contrast LJ allows at most one formula in this position. Other derivatives of LK are limited to intuitionisitic derivations but still allow multiple conclusions in a sequent. LJ' [1] is one example.

Hilbert-style calculus
Intuitionistic logic can be defined using the following Hilbert-style calculus. Compare with the deduction system at Propositional calculus#Alternative calculus. In propositional logic, the inference rule is modus ponens MP: from and infer

and the axioms are THEN-1: THEN-2: AND-1: AND-2:

AND-3: OR-1: OR-2: OR-3: FALSE: To make this a system of first-order predicate logic, the generalization rules -GEN: from -GEN: from infer infer , if , if is not free in is not free in

are added, along with the axioms PRED-1: , if the term t is free for substitution for the variable x in (i.e., if no occurrence

of any variable in t becomes bound in ) PRED-2: , with the same restriction as for PRED-1

Intuitionistic logic Optional connectives Negation If one wishes to include a connective enough to add: NOT-1': NOT-2': There are a number of alternatives available if one wishes to omit the connective replace the three axioms FALSE, NOT-1', and NOT-2' with the two axioms NOT-1: NOT-2: as at Propositional . Equivalence The connective IFF-1: IFF-2: IFF-3: IFF-1 and IFF-2 can, if desired, be combined into a single axiom conjunction. Relation to classical logic The system of classical logic is obtained by adding any one of the following axioms: (Law of the excluded middle. May also be formulated as (Double negation elimination) (Peirce's law) .) using for equivalence may be treated as an abbreviation, with . Alternatively, one may add the axioms standing for calculus#Axioms. Alternatives to NOT-1 are or (false). For example, one may for negation rather than consider it an abbreviation for , it is

62

In general, one may take as the extra axiom any classical tautology that is not valid in the two-element Kripke frame (in other words, that is not included in Smetanich's logic). Another relationship is given by the GdelGentzen negative translation, which provides an embedding of classical first-order logic into intuitionistic logic: a first-order formula is provable in classical logic if and only if its GdelGentzen translation is provable intuitionistically. Therefore intuitionistic logic can instead be seen as a means of extending classical logic with constructive semantics. In 1932, Kurt Gdel defined a system of Gdel logics intermediate between classical and intuitionistic logic; such logics are known as intermediate logics.

Intuitionistic logic Relation to many-valued logic Kurt Gdel in 1932 showed that intuitionistic logic is not a finitely-many valued logic. (See the section titled Heyting algebra semantics below for a sort of "infinitely-many valued logic" interpretation of intuitionistic logic.)

63

Non-interdefinability of operators
In classical propositional logic, it is possible to take one of conjunction, disjunction, or implication as primitive, and define the other two in terms of it together with negation, such as in ukasiewicz's three axioms of propositional logic. It is even possible to define all four in terms of a sole sufficient operator such as the Peirce arrow (NOR) or Sheffer stroke (NAND). Similarly, in classical first-order logic, one of the quantifiers can be defined in terms of the other and negation. These are fundamentally consequences of the law of bivalence, which makes all such connectives merely Boolean functions. The law of bivalence does not hold in intuitionistic logic, only the law of non-contradiction. As a result none of the basic connectives can be dispensed with, and the above axioms are all necessary. Most of the classical identities are only theorems of intuitionistic logic in one direction, although some are theorems in both directions. They are as follows: Conjunction versus disjunction: Conjunction versus implication: Disjunction versus implication: Universal versus existential quantification: So, for example, "a or b" is a stronger statement than "if not a, then b", whereas these are classically interchangeable. On the other hand, "not (a or b)" is equivalent to "not a, and also not b". If we include equivalence in the list of connectives, some of the connectives become definable from others: In particular, {, , } and {, , } are complete bases of intuitionistic connectives.

Intuitionistic logic As shown by Alexander Kuznetsov, either of the following connectives the first one ternary, the second one quinary is by itself functionally complete: either one can serve the role of a sole sufficient operator for intuitionistic propositional logic, thus forming an analog of the Sheffer stroke from classical propositional logic:[2]

64

Semantics
The semantics are rather more complicated than for the classical case. A model theory can be given by Heyting algebras or, equivalently, by Kripke semantics.

Heyting algebra semantics


In classical logic, we often discuss the truth values that a formula can take. The values are usually chosen as the members of a Boolean algebra. The meet and join operations in the Boolean algebra are identified with the and logical connectives, so that the value of a formula of the form A B is the meet of the value of A and the value of B in the Boolean algebra. Then we have the useful theorem that a formula is a valid sentence of classical logic if and only if its value is 1 for every valuationthat is, for any assignment of values to its variables. A corresponding theorem is true for intuitionistic logic, but instead of assigning each formula a value from a Boolean algebra, one uses values from a Heyting algebra, of which Boolean algebras are a special case. A formula is valid in intuitionistic logic if and only if it receives the value of the top element for any valuation on any Heyting algebra. It can be shown that to recognize valid formulas, it is sufficient to consider a single Heyting algebra whose elements are the open subsets of the real line R.[3] In this algebra, the and operations correspond to set intersection and union, and the value assigned to a formula A B is int(AC B), the interior of the union of the value of B and the complement of the value of A. The bottom element is the empty set , and the top element is the entire line R. The negation A of a formula A is (as usual) defined to be A . The value of A then reduces to int(AC), the interior of the complement of the value of A, also known as the exterior of A. With these assignments, intuitionistically valid formulas are precisely those that are assigned the value of the entire line.[3] For example, the formula (A A) is valid, because no matter what set X is chosen as the value of the formula A, the value of (A A) can be shown to be the entire line: Value((A A)) = int((Value(A A))C) = int((Value(A) Value(A))C) = int((X int((Value(A))C))C) = int((X int(XC))C) A theorem of topology tells us that int(XC) is a subset of XC, so the intersection is empty, leaving: int(C) = int(R) = R So the valuation of this formula is true, and indeed the formula is valid. But the law of the excluded middle, A A, can be shown to be invalid by letting the value of A be {y : y > 0 }. Then the value of A is the interior of {y : y 0 }, which is {y : y < 0 }, and the value of the formula is the union of {y : y > 0 } and {y : y < 0 }, which is {y : y 0 }, not the entire line. The interpretation of any intuitionistically valid formula in the infinite Heyting algebra described above results in the top element, representing true, as the valuation of the formula, regardless of what values from the algebra are assigned to the variables of the formula.[3] Conversely, for every invalid formula, there is an assignment of values to the variables that yields a valuation that differs from the top element.[4][5] No finite Heyting algebra has both these properties.[3]

Intuitionistic logic

65

Kripke semantics
Building upon his work on semantics of modal logic, Saul Kripke created another semantics for intuitionistic logic, known as Kripke semantics or relational semantics.[6]

Relation to other logics


Intutionistic logic is related by duality to a paraconsistent logic known as Brazilian, anti-intuitionistic or dual-intuitionistic logic.[7] The subsystem of intuitionistic logic with the FALSE axiom removed is known as minimal logic.

Notes
[1] Proof Theory by G. Takeuti, ISBN 0-444-10492-5 [2] Alexander Chagrov, Michael Zakharyaschev, Modal Logic, vol. 35 of Oxford Logic Guides, Oxford University Press, 1997, pp. 5859. ISBN 0-19-853779-4. [3] Srensen, Morten Heine B; Pawe Urzyczyn (2006). Lectures on the Curry-Howard Isomorphism. Studies in Logic and the Foundations of Mathematics. Elsevier. p.42. ISBN0-444-52077-5. [4] Alfred Tarski, Der Aussagenkalkl und die Topologie, Fundamenta Mathematicae 31 (1938), 103134. (http:/ / matwbn. icm. edu. pl/ tresc. php?wyd=1& tom=31) [5] Rasiowa, Helena; Roman Sikorski (1963). The Mathematics of Metamathematics. Monografie matematyczne. Warsaw: Pastwowe Wydawn. Naukowe. pp.385386. [6] Intuitionistic Logic (http:/ / plato. stanford. edu/ entries/ logic-intuitionistic/ ). Written by Joan Moschovakis (http:/ / www. math. ucla. edu/ ~joan/ ). Published in Stanford Encyclopedia of Philosophy. [7] Aoyama, Hiroshi (2004). "LK, LJ, Dual Intuitionistic Logic, and Quantum Logic". Notre Dame Journal of Formal Logic 45 (4): 193213. doi:10.1305/ndjfl/1099238445.

References
Van Dalen, Dirk, 2001, "Intuitionistic Logic", in Goble, Lou, ed., The Blackwell Guide to Philosophical Logic. Blackwell. Morten H. Srensen, Pawe Urzyczyn, 2006, Lectures on the Curry-Howard Isomorphism (chapter 2: "Intuitionistic Logic"). Studies in Logic and the Foundations of Mathematics vol. 149, Elsevier. W. A. Carnielli (with A. B.M. Brunner). "Anti-intuitionism and paraconsistency" (http://dx.doi.org/10.1016/j. jal.2004.07.016). Journal of Applied Logic Volume 3, Issue 1, March 2005, pages 161-184.

External links
Stanford Encyclopedia of Philosophy: " Intuitionistic Logic (http://plato.stanford.edu/entries/ logic-intuitionistic/)" -- by Joan Moschovakis. Intuitionistic Logic (http://www.cs.le.ac.uk/people/nb118/Publications/ESSLLI'05.pdf) by Nick Bezhanishvili and Dick de Jongh (from the Institute for Logic, Language and Computation at the University of Amsterdam) Semantical Analysis of Intuitionistic Logic I (https://www.princeton.edu/~hhalvors/restricted/ kripke_intuitionism.pdf) by Saul A. Kripke from Harvard University, Cambridge, Mass., USA Intuitionistic Logic (http://www.phil.uu.nl/~dvdalen/articles/Blackwell(Dalen).pdf) by Dirk van Dalen The discovery of E.W. Beth's semantics for intuitionistic logic (http://www.illc.uva.nl/j50/contribs/troelstra/ troelstra.pdf) by A.S. Troelstra and P. van Ulsen Expressing Database Queries with Intuitionistic Logic (ftp://ftp.cs.toronto.edu/pub/bonner/papers/ hypotheticals/naclp89.ps) (FTP one-click download) by Anthony J. Bonner. L. Thorne McCarty. Kumar Vadaparty. Rutgers University, Department of Computer Science.

Heyting arithmetic

66

Heyting arithmetic
In mathematical logic, Heyting arithmetic (sometimes abbreviated HA) is an axiomatization of arithmetic in accordance with the philosophy of intuitionism (Troelstra 1973:18). It is named after Arend Heyting, who first proposed it. Heyting arithmetic adopts the axioms of Peano arithmetic (PA), but uses intuitionistic logic as its rules of inference. In particular, the law of the excluded middle does not hold in general, though the induction axiom can be used to prove many specific cases. For instance, one can prove that x, y N : x = y x y is a theorem (any two natural numbers are either equal to each other, or not equal to each other). In fact, since "=" is the only predicate symbol in Heyting arithmetic, it then follows that, for any quantifier-free formula p, x, y, z, N : p p is a theorem (where x, y, z are the free variables in p). Kurt Gdel studied the relationship between Heyting arithmetic and Peano arithmetic. He used the GdelGentzen negative translation to prove in 1933 that if HA is consistent, then PA is also consistent. Heyting arithmetic should not be confused with Heyting algebras, which are the intuitionistic analogue of Boolean algebras.

References
Ulrich Kohlenbach (2008), Applied proof theory, Springer. Anne S. Troelstra, ed. (1973), Metamathematical investiation of intuitionistic arithmetic and analysis, Springer, 1973.

External links
Stanford Encyclopedia of Philosophy: "Intuitionistic Number Theory [1]" by Joan Moschovakis. Fragments of Heyting Arithmetic [2] by Wolfgang Burr

References
[1] http:/ / plato. stanford. edu/ entries/ logic-intuitionistic/ #IntNumTheHeyAri [2] http:/ / wwwmath. uni-muenster. de%2Fu%2Fburr%2FHA. ps& ei=1xokUNzGBtOzhAeOhoDACg& usg=AFQjCNHBfKqVZwzEo2FgnF9Eia_Cmo4OZg

Intuitionistic type theory

67

Intuitionistic type theory


Intuitionistic type theory, or constructive type theory, or Martin-Lf type theory or just Type Theory is a logical system and a set theory based on the principles of mathematical constructivism. Intuitionistic type theory was introduced by Per Martin-Lf, a Swedish mathematician and philosopher, in 1972. Martin-Lf has modified his proposal a few times; his 1971 impredicative formulation was inconsistent as demonstrated by Girard's paradox. Later formulations were predicative. He proposed both intensional and extensional variants of the theory. Intuitionistic type theory is based on a certain analogy or isomorphism between propositions and types: a proposition is identified with the type of its proofs. This identification is usually called the CurryHoward isomorphism, which was originally formulated for intuitionistic logic and simply typed lambda calculus. Type Theory extends this identification to predicate logic by introducing dependent types, that is types which contain values. Type Theory internalizes the interpretation of intuitionistic logic proposed by Brouwer, Heyting and Kolmogorov, the so called BHK interpretation. The types of Type Theory play a similar role to sets in set theory but functions definable in Type Theory are always computable.

Connectives of type theory


In the context of Type Theory a connective is a way of constructing types, possibly using already given types. The basic connectives of Type Theory are:

-types
-types, also called dependent product types, are analogous to the indexed products of sets. As such, they generalize the normal function space to model functions whose result type may vary on their input. E.g. writing for -tuples of real numbers, stands for the type of a function that, given a natural number, returns an -tuple of real numbers. The usual function space arises as a special case when the range type does not actually depend on the input, e.g., is the type of functions from natural numbers to the real numbers, which is also written as . Using the CurryHoward isomorphism -types also serve to is a function model implication and universal quantification: e.g., a term inhabiting

which assigns to any pair of natural numbers a proof that addition is commutative for that pair and hence can be considered as a proof that addition is commutative for all natural numbers. The generalisation from function type to dependent product type is analogous to the generalisation from exponentiation of natural numbers to indexed products of them. Consider the expression : in this case, the dummy variable of the product is not mentioned within the term . However, it is clear that exponentiation can be generalised by allowing terms in the product to mention the dummy variable, i.e., by allowing indexing. In its general form, such a product then becomes: .

-types
-types, also called dependent sum types, are analogous to the indexed disjoint unions of sets. As such, they generalize the usual Cartesian product to model pairs where the type of the second component depends on the first. For example, the type stands for the type of pairs of a natural number and an -tuple of real numbers, i.e., this type can be used to model sequences of arbitrary length (usually called lists). The conventional Cartesian product type arises as a special case when the type of the second component doesn't actually depend on the first, e.g., is the type of pairs of a natural number and a real number, which is also written as . Again, using the CurryHoward isomorphism, regular multiplication -types also serve to model conjunction and existential -type is analogous to the generalisation of . quantification. The generalisation of the Cartesian product by the by the indexed sum

Intuitionistic type theory

68

Finite types
Of special importance are 0 or (the empty type), 1 or (the unit type) and 2 (the type of Booleans or classical truth values). Invoking the CurryHoward isomorphism again, stands for False and for True. Using finite types we can define negation as .

Equality type
Given inhabitant of then is the type of equality proofs that is equal to . . There is only one (canonical) and this is the proof of reflexivity

Inductive types
A prime example of an inductive type is the type of natural numbers (dependent) primitive recursion and induction by which is generated by one elimination for any given type and constant: indexed by . An important application of the propositions as types principle is the identification of

. In general inductive types can be defined in terms of W-types, the type of well-founded trees. An important class of inductive types are inductive families like the type of vectors which is inductively generated by the constructors mentioned above, and

. Applying the CurryHoward isomorphism once more, inductive families correspond to inductively defined relations.

Universes
An example of a universe is so far. To every name , the universe of all small types, which contains names for all the types introduced we associate a type with , its extension or meaning. It is standard to assume a , where the universe contains a code . (A hierarchy with this property is called for every natural number

predicative hierarchy of universes:

for the previous universe, i.e., we have

"cumulative".) Stronger universe principles have been investigated, i.e., super universes and the Mahlo universe. In 1992 Huet and Coquand introduced the calculus of constructions, a type theory with an impredicative universe, thus combining Type Theory with Girard's System F. This extension is not universally accepted by Intuitionists since it allows impredicative, i.e., circular, constructions, which are often identified with classical reasoning.

Formalisation of type theory


This formalization is based on the discussion in Nordstrom, Petersson, and Smith. The formal theory works with types and objects. A type is declared by: An object exists and is in a type if: Objects can be equal and types can be equal A type that depends on an object from another type is declared

Intuitionistic type theory and removed by substitution , replacing the variable with the object in .

69

An object that depends on an object from another type can be done two ways. If the object is "abstracted", then it is written and removed by substitution , replacing the variable with the object in .

The object-depending-on-object can also be declared as a constant as part of a recursive type. An example of a recursive type is: Here, is a constant object-depending-on-object. It is not associated with an abstraction. Constants like can be removed by defining equality. Here the relationship with addition is defined using equality and using pattern matching to handle the recursive aspect of . is manipulated as a opaque constant - it has no internal structure for substitution. So, objects and types and these relations are used to express formulae in the theory. The following styles of judgements are used to create new objects, types and relations from existing ones. , is a well-formed type in context . , is a well-formed term of type in context . , and are equal types in context . , and are equal terms of type in context . , is a well-formed context of typing assumptions. (or ). Since is a type, the refers to a type, that maps each object to its corresponding type. In most texts

By convention, there is a type that represents all other types. It is called member of it are objects. There is a dependent type or whether it refers to the object in

is never written. From the context of the statement, a reader can almost always tell whether that corresponds to the type. This is the complete foundation of the theory. Everything else is derived.

To implement logic, each proposition is given its own type. The objects in those types represent the different possible ways to prove the proposition. Obviously, if there is no proof for the proposition, then the type has no objects in it. Operators like "and" and "or" that work on propositions introduce new types and new objects. So is a type that depends on the type and the type . The objects in that dependent type are defined to exist for every pair of objects in new type representing and . Obviously, if or has no proof and is an empty type, then the is also empty.

This can be done for other types (booleans, natural numbers, etc.) and their operators.

Intuitionistic type theory

70

Categorical models of type theory


Using the language of category theory, R.A.G. Seely introduced the notion of a locally cartesian closed category (LCCC) as the basic model of Type Theory. This has been refined by Hofmann and Dybjer to Categories with Families or Categories with Attributes based on earlier work by Cartmell. A category with families is a category C of contexts (in which the objects are contexts, and the context morphisms are substitutions), together with a functor T : Cop Fam(Set). Fam(Set) is the category of families of Sets, in which objects are pairs (A,B) of an "index set" A and a function B: X A, and morphisms are pairs of functions f : A A' and g : X X' , such that B' g = f B - in other words, f maps Ba to B'g(a). The functor T assigns to a context G a set Ty(G) of types, and for each A : Ty(G), a set Tm(G,A) of terms. The axioms for a functor require that these play harmoniously with substitution. Substitution is usually written in the form Af or af, where A is a type in Ty(G) and a is a term in Tm(G,A), and f is a substitution from D to G. Here Af : Ty(D) and af : Tm(D,Af). The category C must contain a terminal object (the empty context), and a final object for a form of product called comprehension, or context extension, in which the right element is a type in the context of the left element. If G is a context, and A : Ty(G), then there should be an object (G,A) final among contexts D with mappings p : D G, q : Tm(D,Ap). A logical framework, such as Martin-Lf's takes the form of closure conditions on the context dependent sets of types and terms: that there should be a type called Set, and for each set a type, that the types should be closed under forms of dependent sum and product, and so forth. A theory such as that of predicative set theory expresses closure conditions on the types of sets and their elements: that they should be closed under operations that reflect dependent sum and product, and under various forms of inductive definition.

Extensional versus intensional


A fundamental distinction is extensional vs intensional Type Theory. In extensional Type Theory definitional (i.e., computational) equality is not distinguished from propositional equality, which requires proof. As a consequence type checking becomes undecidable in extensional type theory. This is because relying on computational equality means that the equality depends on computations that could be Turing complete in general and thus the equality itself is undecidable due to the halting problem. Some type theories enforce the restriction that all computations be decidable so that definitional equality may be used. In contrast in intensional Type Theory type checking is decidable, but the representation of standard mathematical concepts is somewhat complex, since extensional reasoning requires using setoids or similar constructions. It is a subject of current discussion whether this tradeoff is unavoidable and whether the lack of extensional principles in intensional Type Theory is a feature or a bug.

Intuitionistic type theory

71

Implementations of type theory


Type Theory has been the base of a number of proof assistants, such as NuPRL, LEGO and Coq. Recently, dependent types also featured in the design of programming languages such as ATS, Cayenne, Epigram and Agda.

References
Per Martin-Lf (1984). Intuitionistic Type Theory [1] Bibliopolis. ISBN 88-7088-105-9.

Further reading
Bengt Nordstrm; Kent Petersson; Jan M. Smith (1990). Programming in Martin-Lf's Type Theory. Oxford University Press. The book is out of print, but a free version can be picked up from here [2]. Thompson, Simon (1991). Type Theory and Functional Programming [3] Addison-Wesley. ISBN 0-201-41667-0. Granstrm, Johan G. (2011). Treatise on Intuitionistic Type Theory [4] Springer. ISBN 978-94-007-1735-0.

External links
EU Types Project: Tutorials [5] - lecture notes and slides from the Types Summer School 2005 n-Categories - Sketch of a Definition [6] - letter from John Baez and James Dolan to Ross Street, November 29, 1995

References
[1] [2] [3] [4] [5] [6] http:/ / intuitionistic. files. wordpress. com/ 2010/ 07/ martin-lof-tt. pdf http:/ / www. cs. chalmers. se/ Cs/ Research/ Logic/ book/ http:/ / www. cs. kent. ac. uk/ people/ staff/ sjt/ TTFP/ http:/ / www. springer. com/ philosophy/ book/ 978-94-007-1735-0 http:/ / www. cs. chalmers. se/ Cs/ Research/ Logic/ Types/ tutorials. html http:/ / math. ucr. edu/ home/ baez/ ncat. def. html

Constructive set theory

72

Constructive set theory


Constructive set theory is an approach to mathematical constructivism following the program of axiomatic set theory. That is, it uses the usual first-order language of classical set theory, and although of course the logic is constructive, there is no explicit use of constructive types. Rather, there are just sets, thus it can look very much like classical mathematics done on the most common foundations, namely the ZermeloFraenkel axioms (ZFC).

Intuitionistic ZermeloFraenkel
In 1973, John Myhill proposed a system of set theory based on intuitionistic logic[1] taking the most common foundation, ZFC, and throwing away the axiom of choice (AC) and the law of the excluded middle (LEM), leaving everything else as is. However, different forms of some of the ZFC axioms which are equivalent in the classical setting are inequivalent in the constructive setting, and some forms imply LEM. The system, which has come to be known as IZF, or Intuitionistic ZermeloFraenkel (ZF refers to ZFC without the axiom of choice), has the usual axioms of extensionality, pairing, union, infinity, separation and power set. The axiom of regularity is stated in the form of an axiom schema of set induction. Also, while Myhill used the axiom schema of replacement in his system, IZF usually stands for the version with collection While the axiom of replacement requires the relation to be a function over the set A (that is, for every x in A there is associated exactly one y), the axiom of collection does not: it merely requires there be associated at least one y, and it asserts the existence of a set which collects at least one such y for each such x. The axiom of regularity as it is normally stated implies LEM, whereas the form of set induction does not. The formal statements of these two schemata are:

Adding LEM back to IZF results in ZF, as LEM makes collection equivalent to replacement and set induction equivalent to regularity. Even without LEM, IZF's proof-theoretical power equals that of ZF.

Predicativity
While IZF is based on constructive rather than classical logic, it is considered impredicative. It allows formation of sets using the axiom of separation with any proposition, including ones which contain quantifiers which are not bounded. Thus new sets can be formed in terms of the universe of all sets. Additionally the power set axiom implies the existence of a set of truth values. In the presence of LEM, this set exists and has two elements. In the absence of it, the set of truth values is also considered impredicative.

Myhill's constructive set theory


The subject was begun by John Myhill to provide a formal foundation for Errett Bishop's program of constructive mathematics. As he presented it, Myhill's system CST is a constructive first-order logic with three sorts: natural numbers, functions, and sets. The system is: Constructive first-order predicate logic with identity, and basic axioms related to the three sorts. The usual Peano axioms for natural numbers. The usual axiom of extensionality for sets, as well as one for functions, and the usual axiom of union. A form of the axiom of infinity asserting that the collection of natural numbers (for which he introduces a constant N) is in fact a set.

Axioms asserting that the domain and range of a function are both sets. Additionally, an axiom of non-choice asserts the existence of a choice function in cases where the choice is already made. Together these act like the

Constructive set theory usual replacement axiom in classical set theory. The axiom of exponentiation, asserting that for any two sets, there is a third set which contains all (and only) the functions whose domain is the first set, and whose range is the second set. This is a greatly weakened form of the axiom of power set in classical set theory, to which Myhill, among others, objected on the grounds of its impredicativity. The axiom of restricted, or predicative, separation, which is a weakened form of the separation axiom in classical set theory, requiring that any quantifications be bounded to another set. An axiom of dependent choice, which is much weaker than the usual axiom of choice.

73

Aczel's constructive ZermeloFraenkel


Peter Aczel's constructive Zermelo-Fraenkel,[2] or CZF, is essentially IZF with its impredicative features removed. It strengthens the collection scheme, and then drops the impredicative power set axiom and replaces it with another collection scheme. Finally the separation axiom is restricted, as in Myhill's CST. This theory has a relatively simple interpretation in a version of constructive type theory and has modest proof theoretic strength as well as a fairly direct constructive and predicative justification, while retaining the language of set theory. Adding LEM to this theory also recovers full ZF. The collection axioms are: Strong collection schema: This is the constructive replacement for the axiom schema of replacement. It states that if is a binary relation between sets which is total over a certain domain set (that is, it has at least one image of every element in the domain), then there exists a set which contains at least one image under of every element of the domain, and only images of elements of the domain. Formally, for any formula :

Subset collection schema: This is the constructive version of the power set axiom. Formally, for any formula :

This is equivalent to a single and somewhat clearer axiom of fullness: between any two sets a and b, there is a set c which contains a total subrelation of any total relation between a and b that can be encoded as a set of ordered pairs. Formally:

where the references to P(a,b) are defined by:

and some set-encoding of the ordered pair <x,y> is assumed. The axiom of fullness implies CST's axiom of exponentiation: given two sets, the collection of all total functions from one to the other is also in fact a set. The remaining axioms of CZF are: the axioms of extensionality, pairing, union, and infinity are the same as in ZF; and set induction and predicative separation are the same as above.

Constructive set theory

74

Interpretability in type theory


In 1977 Aczel showed that CZF can be interpreted in Martin-Lf type theory,[3] (using the now consecrated propositions-as-types approach) providing what is now seen a standard model of CZF in type theory.[4] In 1989 Ingrid Lindstrm showed that non-well-founded sets obtained by replacing the axiom of foundation in CZF with Aczel's anti-foundation axiom (CZFA) can also be interpreted in Martin-Lf type theory.[5]

Interpretability in category theory


Presheaf models for constructive set theory were introduced by Nicola Gambino in 2004. They are analogous to the Presheaf models for intuitionistic set theory developed by Dana Scott in the 1980s (which remained unpublished).[6][7]

References
[1] Myhill, "Some properties of Intuitionistic Zermelo-Fraenkel set theory", Proceedings of the 1971 Cambridge Summer School in Mathematical Logic (Lecture Notes in Mathematics 337) (1973) pp 206-231 [2] Peter Aczel and Michael Rathjen, Notes on Constructive Set Theory (http:/ / www. ml. kva. se/ preprints/ meta/ AczelMon_Sep_24_09_16_56. rdf. html), Reports Institut Mittag-Leffler, Mathematical Logic - 2000/2001, No. 40 [3] Aczel, Peter: 1978. The type theoretic interpretation of constructive set theory. In: A. MacIntyre et al. (eds.), Logic Colloquium 77, Amsterdam: North-Holland, 5566. [4] Rathjen, M. (2004), "Predicativity, Circularity, and Anti-Foundation" (http:/ / www1. maths. leeds. ac. uk/ ~rathjen/ russelle. pdf), in Link, Godehard, One Hundred Years of Russell s Paradox: Mathematics, Logic, Philosophy, Walter de Gruyter, ISBN978-3-11-019968-0, [5] Lindstrm, Ingrid: 1989. A construction of non-well-founded sets within Martin-Lf type theory. Journal of Symbolic Logic 54: 5764. [6] Gambino, N. (2005). "PRESHEAF MODELS FOR CONSTRUCTIVE SET THEORIES" (http:/ / www. math. unipa. it/ ~ngambino/ Research/ Papers/ presheaf. pdf). In Laura Crosilla and Peter Schuster. From Sets and Types to Topology and Analysis. pp.6296. doi:10.1093/acprof:oso/9780198566519.003.0004. ISBN9780198566519. . [7] Scott, D. S. (1985). Category-theoretic models for Intuitionistic Set Theory. Manuscript slides of a talk given at Carnagie-Mellon University

Further reading
Troelstra, Anne; van Dalen, Dirk (1988). Constructivism in Mathematics, Vol. 2. Studies in Logic and the Foundations of Mathematics. p.619. ISBN0-444-70358-6. Aczel, P. and Rathjen, M. (2001). Notes on constructive set theory (http://www.ml.kva.se/preprints/meta/ AczelMon_Sep_24_09_16_56.rdf.html). Technical Report 40, 2000/2001. Mittag-Leffler Institute, Sweden.

External links
Laura Crosilla, Set Theory: Constructive and Intuitionistic ZF (http://plato.stanford.edu/entries/ set-theory-constructive/), Stanford Encyclopedia of Philosophy, Feb 20, 2009 Benno van den Berg, Constructive set theory an overview (http://www.illc.uva.nl/KNAW/Heyting/ uploaded_files/inlineitem/vdberg-slides.pdf), slides from Heyting dag, Amsterdam, 7 September 2012

Constructive analysis

75

Constructive analysis
In mathematics, constructive analysis is mathematical analysis done according to the principles of constructive mathematics. This contrasts with classical analysis, which (in this context) simply means analysis done according to the (ordinary) principles of classical mathematics. Generally speaking, constructive analysis can reproduce theorems of classical analysis, but only in application to separable spaces; also, some theorems may need to be approached by approximations. Furthermore, many classical theorems can be stated in ways that are logically equivalent according to classical logic, but not all of these forms will be valid in constructive analysis, which uses intuitionistic logic.

Examples
The intermediate value theorem
For a simple example, consider the intermediate value theorem (IVT). In classical analysis, IVT says that, given any continuous function f from a closed interval [a,b] to the real line R, if f(a) is negative while f(b) is positive, then there exists a real number c in the interval such that f(c) is exactly zero. In constructive analysis, this does not hold, because the constructive interpretation of existential quantification ("there exists") requires one to be able to construct the real number c (in the sense that it can be approximated to any desired precision by a rational number). But if f hovers near zero during a stretch along its domain, then this cannot necessarily be done. However, constructive analysis provides several alternative formulations of IVT, all of which are equivalent to the usual form in classical analysis, but not in constructive analysis. For example, under the same conditions on f as in the classical theorem, given any natural number n (no matter how large), there exists (that is, we can construct) a real number cn in the interval such that the absolute value of f(cn) is less than 1/n. That is, we can get as close to zero as we like, even if we can't construct a c that gives us exactly zero. Alternatively, we can keep the same conclusion as in the classical IVT a single c such that f(c) is exactly zero while strengthening the conditions on f. We require that f be locally non-zero, meaning that given any point x in the interval [a,b] and any natural number m, there exists (we can construct) a real number y in the interval such that |y x| < 1/m and |f(y)| > 0. In this case, the desired number c can be constructed. This is a complicated condition, but there are several other conditions which imply it and which are commonly met; for example, every analytic function is locally non-zero (assuming that it already satisfies f(a) < 0 and f(b) > 0). For another way to view this example, notice that according to classical logic, if the locally non-zero condition fails, then it must fail at some specific point x; and then f(x) will equal 0, so that IVT is valid automatically. Thus in classical analysis, which uses classical logic, in order to prove the full IVT, it is sufficient to prove the constructive version. From this perspective, the full IVT fails in constructive analysis simply because constructive analysis does not accept classical logic. Conversely, one may argue that the true meaning of IVT, even in classical mathematics, is the constructive version involving the locally non-zero condition, with the full IVT following by "pure logic" afterwards. Some logicians, while accepting that classical mathematics is correct, still believe that the constructive approach gives a better insight into the true meaning of theorems, in much this way.

The least upper bound principle and compact sets


Another difference between classical and constructive analysis is that constructive analysis does not accept the least upper bound principle, that any subset of the real line R has a least upper bound (or supremum), possibly infinite. However, as with the intermediate value theorem, an alternative version survives; in constructive analysis, any located subset of the real line has a supremum. (Here a subset S of R is located if, whenever x < y are real numbers, either there exists an element s of S such that x < s, or y is an upper bound of S.) Again, this is classically equivalent

Constructive analysis to the full least upper bound principle, since every set is located in classical mathematics. And again, while the definition of located set is complicated, nevertheless it is satisfied by several commonly studied sets, including all intervals and compact sets. Closely related to this, in constructive mathematics, fewer characterisations of compact spaces are constructively validor from another point of view, there are several different concepts which are classically equivalent but not constructively equivalent. Indeed, if the interval [a,b] were sequentially compact in constructive analysis, then the classical IVT would follow from the first constructive version in the example; one could find c as a cluster point of the infinite sequence (cn)n.

76

Uncountability of the real numbers


A constructive version of "the famous theorem of Cantor, that the real numbers are uncountable" is: "Let {an} be a sequence of real numbers. Let x0 and y0 be real numbers, x0<y0. Then there exists a real number x with x0xy0 and xan (nZ+)... The proof is essentially Cantor's 'diagonal' proof." (Theorem 1 in Errett Bishop, Foundations of Constructive Analysis, 1967, page 25.)

ZermeloFraenkel set theory


In mathematics, ZermeloFraenkel set theory with the axiom of choice, named after mathematicians Ernst Zermelo and Abraham Fraenkel and commonly abbreviated ZFC, is one of several axiomatic systems that were proposed in the early twentieth century to formulate a theory of sets without the paradoxes of naive set theory such as Russell's paradox. Specifically, ZFC does not allow unrestricted comprehension. Today ZFC is the standard form of axiomatic set theory and as such is the most common foundation of mathematics. ZFC is intended to formalize a single primitive notion, that of a hereditary well-founded set, so that all entities in the universe of discourse are such sets. Thus the axioms of ZFC refer only to sets, not to urelements (elements of sets which are not themselves sets) or classes (collections of mathematical objects defined by a property shared by their members). The axioms of ZFC prevent its models from containing urelements, and proper classes can only be treated indirectly. Formally, ZFC is a one-sorted theory in first-order logic. The signature has equality and a single primitive binary relation, set membership, which is usually denoted . The formula a b means that the set a is a member of the set b (which is also read, "a is an element of b" or "a is in b"). There are many equivalent formulations of the ZFC axioms. Most of the ZFC axioms state the existence of particular sets defined from other sets. For example, the axiom of pairing says that given any two sets a and b there is a new set {a, b} containing exactly a and b. Other axioms describe properties of set membership. A goal of the ZFC axioms is that each axiom should be true if interpreted as a statement about the collection of all sets in the von Neumann universe (also known as the cumulative hierarchy). The metamathematics of ZFC has been extensively studied. Landmark results in this area established the independence of the continuum hypothesis from ZFC, and of the axiom of choice from the remaining ZFC axioms.

History
In 1908, Ernst Zermelo proposed the first axiomatic set theory, Zermelo set theory. However, as first pointed out by Abraham Fraenkel in a 1921 letter to Zermelo, this theory was incapable of proving the existence of certain sets and cardinal numbers whose existence was taken for granted by most set theorists of the time, notably, the cardinal number and, where Z0 is any infinite set and is the power set operation, the set {Z0, (Z0), ((Z0)),...} (Ebbinghaus 2007, p. 136). Moreover, one of Zermelo's axioms invoked a concept, that of a "definite" property, whose operational meaning was not clear. In 1922, Fraenkel and Thoralf Skolem independently proposed

ZermeloFraenkel set theory operationalizing a "definite" property as one that could be formulated as a first order theory whose atomic formulas were limited to set membership and identity. They also independently proposed replacing the axiom schema of specification with the axiom schema of replacement. Appending this schema, as well as the axiom of regularity (first proposed by Dimitry Mirimanoff in 1917), to Zermelo set theory yields the theory denoted by ZF. Adding to ZF either the axiom of choice (AC) or a statement that is equivalent to it yields ZFC.

77

The axioms
There are many equivalent formulations of the ZFC axioms; for a rich but somewhat dated discussion of this fact, see Fraenkel et al. (1973). The following particular axiom set is from Kunen (1980). The axioms per se are expressed in the symbolism of first order logic. The associated English prose is only intended to aid the intuition. All formulations of ZFC imply that at least one set exists. Kunen includes an axiom that directly asserts the existence of a set, in addition to the axioms given below (although he notes that he does so only for emphasis (ibid., p. 10)). Its omission here can be justified in two ways. First, in the standard semantics of first-order logic in which ZFC is typically formalized, the domain of discourse must be nonempty. Hence, it is a logical theorem of first-order logic that something exists usually expressed as the assertion that something is identical to itself, x(x=x). Consequently, it is a theorem of every first-order theory that something exists. However, as noted above, because in the intended semantics of ZFC there are only sets, the interpretation of this logical theorem in the context of ZFC is that some set exists. Hence, there is no need for a separate axiom asserting that a set exists. Second, however, even if ZFC is formulated in so-called free logic, in which it is not a theorem that something exists, the axiom of infinity (below) asserts that an infinite set exists. This obviously implies that a set exists and so, once again, it is superfluous to include an axiom asserting as much.

1. Axiom of extensionality
Two sets are equal (are the same set) if they have the same elements.

The converse of this axiom follows from the substitution property of equality. If the background logic does not include equality "=", x=y may be defined as an abbreviation for the following formula (Hatcher 1982, p.138, def.1):

In this case, the axiom of extensionality can be reformulated as

which says that if x and y have the same elements, then they belong to the same sets (Fraenkel et al. 1973).

2. Axiom of regularity (also called the Axiom of foundation)


Every non-empty set x contains a member y such that x and y are disjoint sets.

3. Axiom schema of specification (also called the axiom schema of separation or of restricted comprehension)
If z is a set, and is any property which may characterize the elements x of z, then there is a subset y of z containing those x in z which satisfy the property. The "restriction" to z is necessary to avoid Russell's paradox and its variants. More formally, let be any formula in the language of ZFC with free variables among . So y is not free in . Then:

ZermeloFraenkel set theory In some other axiomatizations of ZF, this axiom is redundant in that it follows from the axiom schema of replacement. The set constructed by the axiom of specification is often denoted using set builder notation. Given a set z and a formula (x) with one free variable x, the set of all x in z that satisfy is denoted

78

The axiom of specification can be used to prove the existence of the empty set, denoted

, once the existence of at

least one set is established (see above). A common way to do this is to use an instance of specification for a property which all sets do not have. For example, if w is a set which already exists, the empty set can be constructed as . If the background logic includes equality, it is also possible to define the empty set as . Thus the axiom of the empty set is implied by the nine axioms presented here. The axiom of extensionality implies the empty set is unique (does not depend on w). It is common to make a definitional extension that adds the symbol to the language of ZFC.

4. Axiom of pairing
If x and y are sets, then there exists a set which contains x and y as elements.

The axiom schema of specification must be used to reduce this to a set with exactly these two elements. This axiom is part of Z, but is redundant in ZF because it follows from the axiom schema of replacement, if we are given a set with at least two elements. The existence of a set with at least two elements is assured by either the axiom of infinity, or by the axiom schema of specification and the axiom of the power set applied twice to any set.

5. Axiom of union
For any set there is a set A containing every set that is a member of some member of

6. Axiom schema of replacement


Let be any formula in the language of ZFC whose free variables are among is not free in . Then: , so that in particular

Less formally, this axiom states that if the domain of a definable function f (represented here by the relation set (denoted here by the set is denoted here by ). The form stated here, in which

) is a

), and f(x) is a set for any x in that domain, then the range of f is a subclass of a set (where may be larger than strictly necessary, is

sometimes called the axiom schema of collection.

ZermeloFraenkel set theory

79

7. Axiom of infinity
Let abbreviate , where is some set (We can see that is is a valid set by applying the Axiom of Pairing with so that the set ). Then there exists a set X such that the empty set is also a member of X.

is a member of X and, whenever a set y is a member of X, then

More colloquially, there exists a set X having infinitely many members. The minimal set X satisfying the axiom of infinity is the von Neumann ordinal , which can also be thought of as the set of natural numbers .

8. Axiom of power set


Let abbreviate For any set x, there is a set y which is a superset of the power set of x. The power set of x is the class whose members are all of the subsets of x. Axioms 18 define ZF. Alternative forms of these axioms are often encountered, some of which are listed in Jech (2003). Some ZF axiomatizations include an axiom asserting that the empty set exists. The axioms of pairing, union, replacement, and power set are often stated so that the members of the set x whose existence is being asserted are just those sets which the axiom asserts x must contain. The following axiom is added to turn ZF into ZFC:

9. Well-ordering theorem
For any set X, there is a binary relation R which well-orders X. This means R is a linear order on X such that every nonempty subset of X has a member which is minimal under R.

Given axioms 1-8, there are many statements provably equivalent to axiom 9, the best known of which is the axiom of choice (AC), which goes as follows. Let X be a set whose members are all non-empty. Then there exists a function f from X to the union of the members of X, called a "choice function", such that for all Y X one has f(Y) Y. Since the existence of a choice function when X is a finite set is easily proved from axioms 18, AC only matters for certain infinite sets. AC is characterized as nonconstructive because it asserts the existence of a choice set but says nothing about how the choice set is to be "constructed." Much research has sought to characterize the definability (or lack thereof) of certain sets whose existence AC asserts.

Motivation via the cumulative hierarchy


One motivation for the ZFC axioms is the cumulative hierarchy of sets introduced by John von Neumann (Shoenfield 1977, sec.2). In this viewpoint, the universe of set theory is built up in stages, with one stage for each ordinal number. At stage 0 there are no sets yet. At each following stage, a set is added to the universe if all of its elements have been added at previous stages. Thus the empty set is added at stage 1, and the set containing the empty set is added at stage 2; see Hinman (2005, p.467). The collection of all sets that are obtained in this way, over all the stages, is known as V. The sets in V can be arranged into a hierarchy by assigning to each set the first stage at which that set was added to V. It is provable that a set is in V if and only if the set is pure and well-founded; and provable that V satisfies all the axioms of ZFC, if the class of ordinals has appropriate reflection properties. For example, suppose that a set x is added at stage , which means that every element of x was added at a stage earlier than . Then every subset of x is also added at stage , because all elements of any subset of x were also added before stage . This means that any subset of x which the axiom of separation can construct is added at stage , and that the powerset of x will be added at the next stage after . For a complete argument that V satisfies ZFC see Shoenfield (1977).

ZermeloFraenkel set theory The picture of the universe of sets stratified into the cumulative hierarchy is characteristic of ZFC and related axiomatic set theories such as Von NeumannBernaysGdel set theory (often called NBG) and MorseKelley set theory. The cumulative hierarchy is not compatible with other set theories such as New Foundations. It is possible to change the definition of V so that at each stage, instead of adding all the subsets of the union of the previous stages, subsets are only added if they are definable in a certain sense. This results in a more "narrow" hierarchy which gives the constructible universe L, which also satisfies all the axioms of ZFC, including the axiom of choice. It is independent from the ZFC axioms whether V=L. Although the structure of L is more regular and well behaved than that ofV, few mathematicians argue thatV =L should be added to ZFC as an additional axiom.

80

Metamathematics
The axiom schemata of replacement and separation each contain infinitely many instances. Montague (1961) included a result first proved in his 1957 Ph.D. thesis: if ZFC is consistent, it is impossible to axiomatize ZFC using only finitely many axioms. On the other hand, Von NeumannBernaysGdel set theory (NBG) can be finitely axiomatized. The ontology of NBG includes proper classes as well as sets; a set is any class that can be a member of another class. NBG and ZFC are equivalent set theories in the sense that any theorem not mentioning classes and provable in one theory can be proved in the other. Gdel's second incompleteness theorem says that a recursively axiomatizable system that can interpret Robinson arithmetic can prove its own consistency only if it is inconsistent. Moreover, Robinson arithmetic can be interpreted in general set theory, a small fragment of ZFC. Hence the consistency of ZFC cannot be proved within ZFC itself (unless it is actually inconsistent). Thus, to the extent that ZFC is identified with ordinary mathematics, the consistency of ZFC cannot be demonstrated in ordinary mathematics. The consistency of ZFC does follow from the existence of a weakly inaccessible cardinal, which is unprovable in ZFC if ZFC is consistent. Nevertheless, it is deemed unlikely that ZFC harbors an unsuspected contradiction; it is widely believed that if ZFC were inconsistent, that fact would have been uncovered by now. This much is certain ZFC is immune to the classic paradoxes of naive set theory: Russell's paradox, the Burali-Forti paradox, and Cantor's paradox. Abian and LaMacchia (1978) studied a subtheory of ZFC consisting of the axioms of extensionality, union, powerset, replacement, and choice. Using models, they proved this subtheory consistent, and proved that each of the axioms of extensionality, replacement, and power set is independent of the four remaining axioms of this subtheory. If this subtheory is augmented with the axiom of infinity, each of the axioms of union, choice, and infinity is independent of the five remaining axioms. Because there are non-well-founded models that satisfy each axiom of ZFC except the axiom of regularity, that axiom is independent of the other ZFC axioms. If consistent, ZFC cannot prove the existence of the inaccessible cardinals that category theory requires. Huge sets of this nature are possible if ZF is augmented with Tarski's axiom (Tarski 1939). Assuming that axiom turns the axioms of infinity, power set, and choice (7 9 above) into theorems.

Independence in ZFC
Many important statements are independent of ZFC (see list of statements undecidable in ZFC). The independence is usually proved by forcing, whereby it is shown that every countable transitive model of ZFC (sometimes augmented with large cardinal axioms) can be expanded to satisfy the statement in question. A different expansion is then shown to satisfy the negation of the statement. An independence proof by forcing automatically proves independence from arithmetical statements, other concrete statements, and large cardinal axioms. Some statements independent of ZFC can be proven to hold in particular inner models, such as in the constructible universe. However, some statements that are true about constructible sets are not consistent with hypothesized large cardinal axioms. Forcing proves that the following statements are independent of ZFC: Continuum hypothesis

ZermeloFraenkel set theory Diamond principle Suslin hypothesis Martin's axiom (which is not a ZFC axiom) Axiom of Constructibility (V=L) (which is also not a ZFC axiom).

81

Remarks: The consistency of V=L is provable by inner models but not forcing: every model of ZF can be trimmed to become a model of ZFC+V=L. The Diamond Principle implies the Continuum Hypothesis and the negation of the Suslin Hypothesis. Martin's axiom plus the negation of the Continuum Hypothesis implies the Suslin Hypothesis. The constructible universe satisfies the Generalized Continuum Hypothesis, the Diamond Principle, Martin's Axiom and the Kurepa Hypothesis. The failure of the Kurepa hypothesis is equiconsistent with the existence of a strongly inaccessible cardinal. A variation on the method of forcing can also be used to demonstrate the consistency and unprovability of the axiom of choice, i.e., that the axiom of choice is independent of ZF. The consistency of choice can be (relatively) easily verified by proving that the inner model L satisfies choice. (Thus every model of ZF contains a submodel of ZFC, so that Con(ZF) implies Con(ZFC).) Since forcing preserves choice, we cannot directly produce a model contradicting choice from a model satisfying choice. However, we can use forcing to create a model which contains a suitable submodel, namely one satisfying ZF but not C. Another method of proving independence results, one owing nothing to forcing, is based on Gdel's second incompleteness theorem. This approach employs the statement whose independence is being examined, to prove the existence of a set model of ZFC, in which case Con(ZFC) is true. Since ZFC satisfies the conditions of Gdel's second theorem, the consistency of ZFC is unprovable in ZFC (provided that ZFC is, in fact, consistent). Hence no statement allowing such a proof can be proved in ZFC. This method can prove that the existence of large cardinals is not provable in ZFC, but cannot prove that assuming such cardinals, given ZFC, is free of contradiction.

Criticisms
For criticism of set theory in general, see Objections to set theory ZFC has been criticized both for being excessively strong and for being excessively weak, as well as for its failure to capture objects such as proper classes and the universal set. Many mathematical theorems can be proven in much weaker systems than ZFC, such as Peano arithmetic and second order arithmetic (as explored by the program of reverse mathematics). Saunders Mac Lane and Solomon Feferman have both made this point. Some of "mainstream mathematics" (mathematics not directly connected with axiomatic set theory) is beyond Peano arithmetic and second order arithmetic, but still, all such mathematics can be carried out in ZC (Zermelo set theory with choice), another theory weaker than ZFC. Much of the power of ZFC, including the axiom of regularity and the axiom schema of replacement, is included primarily to facilitate the study of the set theory itself. On the other hand, among axiomatic set theories, ZFC is comparatively weak. Unlike New Foundations, ZFC does not admit the existence of a universal set. Hence the universe of sets under ZFC is not closed under the elementary operations of the algebra of sets. Unlike von NeumannBernaysGdel set theory and MorseKelley set theory (MK), ZFC does not admit the existence of proper classes. These ontological restrictions are required for ZFC to avoid Russell's paradox, but critics argue these restrictions make the ZFC axioms fail to capture the informal concept of set. A further comparative weakness of ZFC is that the axiom of choice included in ZFC is weaker than the axiom of global choice included in MK. There are numerous mathematical statements undecidable in ZFC. These include the continuum hypothesis, the Whitehead problem, and the Normal Moore space conjecture. Some of these conjectures are provable with the

ZermeloFraenkel set theory addition of axioms such as Martin's axiom, large cardinal axioms to ZFC. Some others are decided in ZF+AD where AD is the axiom of determinacy, a strong supposition incompatible with choice. One attraction of large cardinal axioms is that they enable many results from ZF+AD to be established in ZFC adjoined by some large cardinal axiom (see projective determinacy). The Mizar system has adopted TarskiGrothendieck set theory instead of ZFC so that proofs involving Grothendieck universes (encountered in category theory and algebraic geometry) can be formalized.

82

References
Alexander Abian, 1965. The Theory of Sets and Transfinite Arithmetic. W B Saunders. -------- and LaMacchia, Samuel, 1978, "On the Consistency and Independence of Some Set-Theoretical Axioms, [1] " Notre Dame Journal of Formal Logic 19: 155-58. Keith Devlin, 1996 (1984). The Joy of Sets. Springer. Heinz-Dieter Ebbinghaus, 2007. Ernst Zermelo: An Approach to His Life and Work. Springer. ISBN 978-3-540-49551-2. Abraham Fraenkel, Yehoshua Bar-Hillel, and Azriel Levy, 1973 (1958). Foundations of Set Theory. North-Holland. Fraenkel's final word on ZF and ZFC. Hatcher, William, 1982 (1968). The Logical Foundations of Mathematics. Pergamon Press. Peter Hinman, 2005, Fundamentals of Mathematical Logic, A K Peters. ISBN 978-1-56881-262-5 Thomas Jech, 2003. Set Theory: The Third Millennium Edition, Revised and Expanded. Springer. ISBN 3-540-44085-2. Kenneth Kunen, 1980. Set Theory: An Introduction to Independence Proofs. Elsevier. ISBN 0-444-86839-9. Richard Montague, 1961, "Semantic closure and non-finite axiomatizability" in Infinistic Methods. London: Pergamon Press: 4569. Patrick Suppes, 1972 (1960). Axiomatic Set Theory. Dover reprint. Perhaps the best exposition of ZFC before the independence of AC and the Continuum hypothesis, and the emergence of large cardinals. Includes many theorems. Gaisi Takeuti and Zaring, W M, 1971. Introduction to Axiomatic Set Theory. Springer-Verlag. Alfred Tarski, 1939, "On well-ordered subsets of any set,", Fundamenta Mathematicae 32: 176-83. Tiles, Mary, 2004 (1989). The Philosophy of Set Theory. Dover reprint. Weak on metatheory; the author is not a mathematician. Tourlakis, George, 2003. Lectures in Logic and Set Theory, Vol. 2. Cambridge University Press. Jean van Heijenoort, 1967. From Frege to Gdel: A Source Book in Mathematical Logic, 18791931. Harvard University Press. Includes annotated English translations of the classic articles by Zermelo, Fraenkel, and Skolem bearing on ZFC. Zermelo, Ernst (1908), "Untersuchungen ber die Grundlagen der Mengenlehre I", Mathematische Annalen 65: 261281, doi:10.1007/BF01449999 English translation in *Heijenoort, Jean van (1967), "Investigations in the foundations of set theory", From Frege to Gdel: A Source Book in Mathematical Logic, 18791931, Source Books in the History of the Sciences, Harvard University Press, pp.199215, ISBN978-0-674-32449-7 Zermelo, Ernst (1930), "ber Grenzzahlen und Mengenbereiche" [2], Fundamenta Mathematicae 16: 2947, ISSN0016-2736

ZermeloFraenkel set theory

83

External links
Hazewinkel, Michiel, ed. (2001), "ZFC" [3], Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4 Stanford Encyclopedia of Philosophy articles by Thomas Jech: Set Theory [4]; Axioms of ZermeloFraenkel Set Theory [5]. Metamath version of the ZFC axioms [6] A concise and nonredundant axiomatization. The background first order logic is defined especially to facilitate machine verification of proofs. A derivation [7] in Metamath of a version of the separation schema from a version of the replacement schema. Zermelo-Fraenkel Axioms [8], PlanetMath.org.

References
[1] [2] [3] [4] [5] [6] http:/ / projecteuclid. org/ DPubS/ Repository/ 1. 0/ Disseminate?view=body& id=pdf_1& handle=euclid. ndjfl/ 1093888220 http:/ / matwbn. icm. edu. pl/ tresc. php?wyd=1& tom=16 http:/ / www. encyclopediaofmath. org/ index. php?title=p/ z130100 http:/ / plato. stanford. edu/ entries/ set-theory/ http:/ / plato. stanford. edu/ entries/ set-theory/ ZF. html http:/ / us. metamath. org/ mpegif/ mmset. html#staxioms

[7] http:/ / us. metamath. org/ mpegif/ axsep. html [8] http:/ / planetmath. org/ ?op=getobj& amp;from=objects& amp;id=317

Hairy ball theorem


The hairy ball theorem of algebraic topology states that there is no nonvanishing continuous tangent vector field on even dimensional n-spheres. For the ordinary sphere, or 2sphere, if f is a continuous function that assigns a vector in R3 to every point p on a sphere such that f(p) is always tangent to the sphere at p, then there is at least one p such that f(p) = 0. In other words, whenever one attempts to comb a hairy ball flat, there will always be at least one tuft of hair at one point on the ball. The theorem was first stated by Henri Poincar in the late 19th century. This is famously stated as "you can't comb a hairy ball flat without creating a cowlick", or sometimes "you can't comb the hair on a coconut". It was first proved in 1912 by Brouwer.[1]
A failed attempt to comb a hairy 3-ball (2-sphere), leaving an uncomfortable tuft at each pole

Counting zeros
From a more advanced point of view: every zero of a vector field has a (non-zero) "index", and it can be shown that the sum of all of the indices at all of the zeros must be two. (This is because the Euler characteristic of the 2-sphere is two.) Therefore there must be at least one zero. This is a consequence of the PoincarHopf theorem. In the case of the torus, the Euler characteristic is 0; and it is possible to "comb a hairy doughnut flat". In this regard, it follows that for any compact regular 2-dimensional manifold with non-zero Euler characteristic, any continuous tangent vector field has at least one zero.

Hairy ball theorem

84

Cyclone consequences
A curious meteorological application of this theorem involves considering the wind as a vector defined at every point continuously over the surface of a planet with an atmosphere. As an idealisation, take wind to be a two-dimensional vector: suppose that relative to the planetary diameter of the Earth, its vertical (i.e., non-tangential) motion is negligible. One scenario, in which there is absolutely no wind (air movement), corresponds to a field of zero-vectors. This scenario is uninteresting from the point of view of this theorem, and physically unrealistic (there will always be wind). In the case where there is at least some wind, the Hairy Ball Theorem dictates that at all times there must be at least one point on a planet with no wind at all and therefore a tuft. This corresponds to the above statement that there will always be p such that f(p) = 0. In a physical sense, this zero-wind point will be the eye of a cyclone or anticyclone. (Like the swirled hairs on the tennis ball, the wind will spiral around this zero-wind point - under our assumptions it cannot flow into or out of the point.) In brief, then, the Hairy Ball Theorem dictates that, given at least some wind on Earth, there must at all times be a cyclone somewhere. Note that the eye can be arbitrarily large or small and the magnitude of the wind surrounding it is irrelevant.
A hairy doughnut (2-torus), on the other hand, is quite easily combable.

A continuous tangent vector field on a 2-sphere with only one pole, in this case a dipole field with index 2. See also an animated version of this graphic.

This is not strictly true as the air above the earth has multiple layers, but for each layer there must be a point with zero horizontal windspeed.

Application to computer graphics


A common problem in computer graphics is to generate a non-zero vector in R3 that is orthogonal to a given non-zero one. There is no single continuous function that can do this for all non-zero vector inputs. This is a corollary of the hairy ball theorem. To see this, consider the given vector as the radius of a sphere and note that finding a non-zero vector orthogonal to the given one is equivalent to finding a non-zero vector that is tangent to the surface of that sphere. However, the hairy ball theorem says there exists no continuous function that can do this for every point on the sphere (i.e. every given vector).

Lefschetz connection
There is a closely related argument from algebraic topology, using the Lefschetz fixed point theorem. Since the Betti numbers of a 2-sphere are 1, 0, 1, 0, 0, ... the Lefschetz number (total trace on homology) of the identity mapping is 2. By integrating a vector field we get (at least a small part of) a one-parameter group of diffeomorphisms on the sphere; and all of the mappings in it are homotopic to the identity. Therefore they all have Lefschetz number 2, also. Hence they have fixed points (since the Lefschetz number is nonzero). Some more work would be needed to show

Hairy ball theorem that this implies there must actually be a zero of the vector field. It does suggest the correct statement of the more general Poincar-Hopf index theorem.

85

Corollary
A consequence of the hairy ball theorem is that any continuous function that maps an even-dimensional sphere into itself has either a fixed point or a point that maps onto its own antipodal point. This can be seen by transforming the function into a tangential vector field as follows. Let s be the function mapping the sphere to itself, and let v be the tangential vector function to be constructed. For each point p, construct the stereographic projection of s(p) with p as the point of tangency. Then v(p) is the displacement vector of this projected point relative to p. According to the hairy ball theorem, there is a p such that v(p) = 0, so that s(p) = p. This argument breaks down only if there exists a point p for which s(p) is the antipodal point of p, since such a point is the only one that cannot be stereographically projected onto the tangent plane of p.

Higher dimensions
The connection with the Euler characteristic suggests the correct generalisation: the 2n-sphere has no non-vanishing vector field for n 1. The difference in even and odd dimension is that the Betti numbers of the m-sphere are 0 except in dimensions 0 and m. Therefore their alternating sum is 2 for m even, and 0 for m odd.

Notes
[1] Georg-August-Universitt Gttingen (http:/ / dz-srv1. sub. uni-goettingen. de/ sub/ digbib/ loader?ht=VIEW& did=D28661)

References
Murray Eisenberg, Robert Guy, A Proof of the Hairy Ball Theorem, The American Mathematical Monthly, Vol. 86, No. 7 (Aug. - Sep., 1979), pp.571574

Further reading
Tyler Jarvis and James Tanton (2003-07-23) (PDF). The Hairy Ball Theorem via Sperner's Lemma (http://math. byu.edu/~jarvis/sperner.pdf). Richeson, David S. (2008). Euler's Gem: The Polyhedron Formula and the Birth of Topology (http://www. eulersgem.com/). Princeton University Press. ISBN0-691-12677-1.. See Chapter 19, "Combing the Hair on a Coconut", pp.202218. Reich, Henry (2011). "One-Minute Math: Why you can't comb a hairy ball" (http://www.newscientist.com/ blogs/nstv/2011/12/one-minute-math-why-you-cant-comb-a-hairy-ball.html). New ScentistTV.

General relativity

86

General relativity
General relativity, or the general theory of relativity, is the geometric theory of gravitation published by Albert Einstein in 1916[1] and the current description of gravitation in modern physics. General relativity generalises special relativity and Newton's law of universal gravitation, providing a unified description of gravity as a geometric property of space and time, or spacetime. In particular, the curvature of spacetime is directly related to the energy and momentum of whatever matter and radiation are present. The relation is specified by the Einstein field equations, a system of partial differential equations.
A simulated black hole of 10 solar masses as seen Some predictions of general relativity differ significantly from those of from a distance of 600kilometers with the Milky classical physics, especially concerning the passage of time, the Way in the background. geometry of space, the motion of bodies in free fall, and the propagation of light. Examples of such differences include gravitational time dilation, gravitational lensing, the gravitational redshift of light, and the gravitational time delay. The predictions of general relativity have been confirmed in all observations and experiments to date. Although general relativity is not the only relativistic theory of gravity, it is the simplest theory that is consistent with experimental data. However, unanswered questions remain, the most fundamental being how general relativity can be reconciled with the laws of quantum physics to produce a complete and self-consistent theory of quantum gravity.

Einstein's theory has important astrophysical implications. For example, it implies the existence of black holesregions of space in which space and time are distorted in such a way that nothing, not even light, can escapeas an end-state for massive stars. There is ample evidence that the intense radiation emitted by certain kinds of astronomical objects is due to black holes; for example, microquasars and active galactic nuclei result from the presence of stellar black holes and black holes of a much more massive type, respectively. The bending of light by gravity can lead to the phenomenon of gravitational lensing, in which multiple images of the same distant astronomical object are visible in the sky. General relativity also predicts the existence of gravitational waves, which have since been observed indirectly; a direct measurement is the aim of projects such as LIGO and NASA/ESA Laser Interferometer Space Antenna and various pulsar timing arrays. In addition, general relativity is the basis of current cosmological models of a consistently expanding universe.

General relativity

87

History
Soon after publishing the special theory of relativity in 1905, Einstein started thinking about how to incorporate gravity into his new relativistic framework. In 1907, beginning with a simple thought experiment involving an observer in free fall, he embarked on what would be an eight-year search for a relativistic theory of gravity. After numerous detours and false starts, his work culminated in the presentation to the Prussian Academy of Science in November 1915 of what are now known as the Einstein field equations. These equations specify how the geometry of space and time is influenced by whatever matter is present, and form the core of Einstein's general theory of relativity.[2] The Einstein field equations are nonlinear and very difficult to solve. Einstein used approximation methods in working out initial predictions of the theory. But as early as 1916, the astrophysicist Karl Schwarzschild found the first non-trivial exact solution to the Einstein field equations, the so-called Schwarzschild metric. This Albert Einstein developed the theories of special and general relativity. Picture from 1921. solution laid the groundwork for the description of the final stages of gravitational collapse, and the objects known today as black holes. In the same year, the first steps towards generalizing Schwarzschild's solution to electrically charged objects were taken, which eventually resulted in the ReissnerNordstrm solution, now associated with electrically charged black holes.[3] In 1917, Einstein applied his theory to the universe as a whole, initiating the field of relativistic cosmology. In line with contemporary thinking, he assumed a static universe, adding a new parameter to his original field equationsthe cosmological constantto reproduce that "observation".[4] By 1929, however, the work of Hubble and others had shown that our universe is expanding. This is readily described by the expanding cosmological solutions found by Friedmann in 1922, which do not require a cosmological constant. Lematre used these solutions to formulate the earliest version of the Big Bang models, in which our universe has evolved from an extremely hot and dense earlier state.[5] Einstein later declared the cosmological constant the biggest blunder of his life.[6] During that period, general relativity remained something of a curiosity among physical theories. It was clearly superior to Newtonian gravity, being consistent with special relativity and accounting for several effects unexplained by the Newtonian theory. Einstein himself had shown in 1915 how his theory explained the anomalous perihelion advance of the planet Mercury without any arbitrary parameters ("fudge factors").[7] Similarly, a 1919 expedition led by Eddington confirmed general relativity's prediction for the deflection of starlight by the Sun during the total solar eclipse of May 29, 1919,[8] making Einstein instantly famous.[9] Yet the theory entered the mainstream of theoretical physics and astrophysics only with the developments between approximately 1960 and 1975, now known as the golden age of general relativity.[10] Physicists began to understand the concept of a black hole, and to identify quasars as one of these objects' astrophysical manifestations.[11] Ever more precise solar system tests confirmed the theory's predictive power,[12] and relativistic cosmology, too, became amenable to direct observational tests.[13]

General relativity

88

From classical mechanics to general relativity


General relativity can be understood by examining its similarities with and departures from classical physics. The first step is the realization that classical mechanics and Newton's law of gravity admit of a geometric description. The combination of this description with the laws of special relativity results in a heuristic derivation of general relativity.[14]

Geometry of Newtonian gravity


At the base of classical mechanics is the notion that a body's motion can be described as a combination of free (or inertial) motion, and deviations from this free motion. Such deviations are caused by external forces acting on a body in accordance with Newton's second law of motion, which states that the net force acting on a body is equal to that body's (inertial) mass multiplied by its acceleration.[15] The preferred inertial motions are related to the geometry of space and time: in the standard reference frames of classical mechanics, objects in free motion move along straight lines at constant speed. In modern parlance, their paths are geodesics, straight world lines in curved spacetime.[16]

Conversely, one might expect that inertial motions, once identified by observing the actual motions of bodies and making allowances for the external forces (such as electromagnetism or friction), can be used to define the geometry of space, as well as a time coordinate. However, there is an ambiguity once gravity comes into play. According to Newton's law of gravity, and independently verified by experiments such as that of Etvs and its successors (see Etvs experiment), there is a universality of free fall (also known as the weak equivalence principle, or the universal equality of inertial and passive-gravitational mass): the trajectory of a test body in free fall depends only on its position and initial speed, but not on any of its material properties.[17] A simplified version of this is embodied in Einstein's elevator experiment, illustrated in the figure on the right: for an observer in a small enclosed room, it is impossible to decide, by mapping the trajectory of bodies such as a dropped ball, whether the room is at rest in a gravitational field, or in free space aboard an accelerating rocket generating a force equal to gravity.[18] Given the universality of free fall, there is no observable distinction between inertial motion and motion under the influence of the gravitational force. This suggests the definition of a new class of inertial motion, namely that of objects in free fall under the influence of gravity. This new class of preferred motions, too, defines a geometry of space and timein mathematical terms, it is the geodesic motion associated with a specific connection which depends on the gradient of the gravitational potential. Space, in this construction, still has the ordinary Euclidean geometry. However, spacetime as a whole is more complicated. As can be shown using simple thought experiments following the free-fall trajectories of different test particles, the result of transporting spacetime vectors that can denote a particle's velocity (time-like vectors) will vary with the particle's trajectory; mathematically speaking, the Newtonian connection is not integrable. From this, one can deduce that spacetime is curved. The result is a geometric formulation of Newtonian gravity using only covariant concepts, i.e. a description which is valid in any desired coordinate system.[19] In this geometric description, tidal effectsthe relative acceleration of bodies in free fallare related to the derivative of the connection, showing how the modified geometry is caused by the presence of mass.[20]

According to general relativity, objects in a gravitational field behave similarly to objects within an accelerating enclosure. For example, an observer will see a ball fall the same way in a rocket (left) as it does on Earth (right), provided that the acceleration of the rocket provides the same relative force.

General relativity

89

Relativistic generalization
As intriguing as geometric Newtonian gravity may be, its basis, classical mechanics, is merely a limiting case of (special) relativistic mechanics.[21] In the language of symmetry: where gravity can be neglected, physics is Lorentz invariant as in special relativity rather than Galilei invariant as in classical mechanics. (The defining symmetry of special relativity is the Poincar group which also includes translations and rotations.) The differences between the two become significant when we are dealing with speeds approaching the speed of light, and with high-energy phenomena.[22] With Lorentz symmetry, additional structures come into play. They are defined by the set of light cones (see the image on the left). The light-cones define a causal structure: for each event A, there is a set of events that can, in principle, either influence or be influenced by A via signals or interactions that do not need to travel faster than light (such as event B in the image), and a set of events for Light cone which such an influence is impossible (such as event C in the image). These sets are observer-independent.[23] In conjunction with the world-lines of freely falling particles, the light-cones can be used to reconstruct the spacetime's semi-Riemannian metric, at least up to a positive scalar factor. In mathematical terms, this defines a conformal structure.[24] Special relativity is defined in the absence of gravity, so for practical applications, it is a suitable model whenever gravity can be neglected. Bringing gravity into play, and assuming the universality of free fall, an analogous reasoning as in the previous section applies: there are no global inertial frames. Instead there are approximate inertial frames moving alongside freely falling particles. Translated into the language of spacetime: the straight time-like lines that define a gravity-free inertial frame are deformed to lines that are curved relative to each other, suggesting that the inclusion of gravity necessitates a change in spacetime geometry.[25] A priori, it is not clear whether the new local frames in free fall coincide with the reference frames in which the laws of special relativity holdthat theory is based on the propagation of light, and thus on electromagnetism, which could have a different set of preferred frames. But using different assumptions about the special-relativistic frames (such as their being earth-fixed, or in free fall), one can derive different predictions for the gravitational redshift, that is, the way in which the frequency of light shifts as the light propagates through a gravitational field (cf. below). The actual measurements show that free-falling frames are the ones in which light propagates as it does in special relativity.[26] The generalization of this statement, namely that the laws of special relativity hold to good approximation in freely falling (and non-rotating) reference frames, is known as the Einstein equivalence principle, a crucial guiding principle for generalizing special-relativistic physics to include gravity.[27] The same experimental data shows that time as measured by clocks in a gravitational fieldproper time, to give the technical termdoes not follow the rules of special relativity. In the language of spacetime geometry, it is not measured by the Minkowski metric. As in the Newtonian case, this is suggestive of a more general geometry. At small scales, all reference frames that are in free fall are equivalent, and approximately Minkowskian. Consequently, we are now dealing with a curved generalization of Minkowski space. The metric tensor that defines the geometryin particular, how lengths and angles are measuredis not the Minkowski metric of special relativity, it is a generalization known as a semi- or pseudo-Riemannian metric. Furthermore, each Riemannian metric is naturally associated with one particular kind of connection, the Levi-Civita connection, and this is, in fact, the connection that satisfies the equivalence principle and makes space locally Minkowskian (that is, in suitable locally inertial coordinates, the metric is Minkowskian, and its first partial derivatives and the connection coefficients vanish).[28]

General relativity

90

Einstein's equations
Having formulated the relativistic, geometric version of the effects of gravity, the question of gravity's source remains. In Newtonian gravity, the source is mass. In special relativity, mass turns out to be part of a more general quantity called the energymomentum tensor, which includes both energy and momentum densities as well as stress (that is, pressure and shear).[29] Using the equivalence principle, this tensor is readily generalized to curved space-time. Drawing further upon the analogy with geometric Newtonian gravity, it is natural to assume that the field equation for gravity relates this tensor and the Ricci tensor, which describes a particular class of tidal effects: the change in volume for a small cloud of test particles that are initially at rest, and then fall freely. In special relativity, conservation of energymomentum corresponds to the statement that the energymomentum tensor is divergence-free. This formula, too, is readily generalized to curved spacetime by replacing partial derivatives with their curved-manifold counterparts, covariant derivatives studied in differential geometry. With this additional conditionthe covariant divergence of the energymomentum tensor, and hence of whatever is on the other side of the equation, is zero the simplest set of equations are what are called Einstein's (field) equations:

On the left-hand side is the Einstein tensor, a specific divergence-free combination of the Ricci tensor metric. In particular, is the curvature scalar. The Ricci tensor itself is related to the more general Riemann curvature tensor as

and the

On the right-hand side, Tab is the energymomentum tensor. All tensors are written in abstract index notation.[30] Matching the theory's prediction to observational results for planetary orbits (or, equivalently, assuring that the weak-gravity, low-speed limit is Newtonian mechanics), the proportionality constant can be fixed as = 8G/c4, with G the gravitational constant and c the speed of light.[31] When there is no matter present, so that the energymomentum tensor vanishes, the result are the vacuum Einstein equations,

There are alternatives to general relativity built upon the same premises, which include additional rules and/or constraints, leading to different field equations. Examples are BransDicke theory, teleparallelism, and EinsteinCartan theory.[32]

Definition and basic applications


The derivation outlined in the previous section contains all the information needed to define general relativity, describe its key properties, and address a question of crucial importance in physics, namely how the theory can be used for model-building.

Definition and basic properties


General relativity is a metric theory of gravitation. At its core are Einstein's equations, which describe the relation between the geometry of a four-dimensional, pseudo-Riemannian manifold representing spacetime, and the energymomentum contained in that spacetime.[33] Phenomena that in classical mechanics are ascribed to the action of the force of gravity (such as free-fall, orbital motion, and spacecraft trajectories), correspond to inertial motion within a curved geometry of spacetime in general relativity; there is no gravitational force deflecting objects from their natural, straight paths. Instead, gravity corresponds to changes in the properties of space and time, which in turn changes the straightest-possible paths that objects will naturally follow.[34] The curvature is, in turn, caused by the energymomentum of matter. Paraphrasing the relativist John Archibald Wheeler, spacetime tells matter how to move; matter tells spacetime how to curve.[35]

General relativity While general relativity replaces the scalar gravitational potential of classical physics by a symmetric rank-two tensor, the latter reduces to the former in certain limiting cases. For weak gravitational fields and slow speed relative to the speed of light, the theory's predictions converge on those of Newton's law of universal gravitation.[36] As it is constructed using tensors, general relativity exhibits general covariance: its lawsand further laws formulated within the general relativistic frameworktake on the same form in all coordinate systems.[37] Furthermore, the theory does not contain any invariant geometric background structures, i.e. it is background independent. It thus satisfies a more stringent general principle of relativity, namely that the laws of physics are the same for all observers.[38] Locally, as expressed in the equivalence principle, spacetime is Minkowskian, and the laws of physics exhibit local Lorentz invariance.[39]

91

Model-building
The core concept of general-relativistic model-building is that of a solution of Einstein's equations. Given both Einstein's equations and suitable equations for the properties of matter, such a solution consists of a specific semi-Riemannian manifold (usually defined by giving the metric in specific coordinates), and specific matter fields defined on that manifold. Matter and geometry must satisfy Einstein's equations, so in particular, the matter's energymomentum tensor must be divergence-free. The matter must, of course, also satisfy whatever additional equations were imposed on its properties. In short, such a solution is a model universe that satisfies the laws of general relativity, and possibly additional laws governing whatever matter might be present.[40] Einstein's equations are nonlinear partial differential equations and, as such, difficult to solve exactly.[41] Nevertheless, a number of exact solutions are known, although only a few have direct physical applications.[42] The best-known exact solutions, and also those most interesting from a physics point of view, are the Schwarzschild solution, the ReissnerNordstrm solution and the Kerr metric, each corresponding to a certain type of black hole in an otherwise empty universe,[43] and the FriedmannLematreRobertsonWalker and de Sitter universes, each describing an expanding cosmos.[44] Exact solutions of great theoretical interest include the Gdel universe (which opens up the intriguing possibility of time travel in curved spacetimes), the Taub-NUT solution (a model universe that is homogeneous, but anisotropic), and Anti-de Sitter space (which has recently come to prominence in the context of what is called the Maldacena conjecture).[45] Given the difficulty of finding exact solutions, Einstein's field equations are also solved frequently by numerical integration on a computer, or by considering small perturbations of exact solutions. In the field of numerical relativity, powerful computers are employed to simulate the geometry of spacetime and to solve Einstein's equations for interesting situations such as two colliding black holes.[46] In principle, such methods may be applied to any system, given sufficient computer resources, and may address fundamental questions such as naked singularities. Approximate solutions may also be found by perturbation theories such as linearized gravity[47] and its generalization, the post-Newtonian expansion, both of which were developed by Einstein. The latter provides a systematic approach to solving for the geometry of a spacetime that contains a distribution of matter that moves slowly compared with the speed of light. The expansion involves a series of terms; the first terms represent Newtonian gravity, whereas the later terms represent ever smaller corrections to Newton's theory due to general relativity.[48] An extension of this expansion is the parametrized post-Newtonian (PPN) formalism, which allows quantitative comparisons between the predictions of general relativity and alternative theories.[49]

General relativity

92

Consequences of Einstein's theory


General relativity has a number of physical consequences. Some follow directly from the theory's axioms, whereas others have become clear only in the course of the ninety years of research that followed Einstein's initial publication.

Gravitational time dilation and frequency shift


Assuming that the equivalence principle holds,[50] gravity influences the passage of time. Light sent down into a gravity well is blueshifted, whereas light sent in the opposite direction (i.e., climbing out of the gravity well) is redshifted; collectively, these two effects are known as the gravitational frequency shift. More generally, processes close to a massive body run more slowly when compared with processes taking place farther away; this effect is known as gravitational time dilation.[51] Gravitational redshift has been measured in the laboratory[52] and using Schematic representation of the gravitational astronomical observations.[53] Gravitational time dilation in the Earth's redshift of a light wave escaping from the surface of a massive body gravitational field has been measured numerous times using atomic [54] clocks, while ongoing validation is provided as a side effect of the operation of the Global Positioning System (GPS).[55] Tests in stronger gravitational fields are provided by the observation of binary pulsars.[56] All results are in agreement with general relativity.[57] However, at the current level of accuracy, these observations cannot distinguish between general relativity and other theories in which the equivalence principle is valid.[58]

Light deflection and gravitational time delay


General relativity predicts that the path of light is bent in a gravitational field; light passing a massive body is deflected towards that body. This effect has been confirmed by observing the light of stars or distant quasars being deflected as it passes the Sun.[59] This and related predictions follow from the fact that light follows what is called a light-like or null geodesica generalization of the straight lines along which light travels in classical physics. Such geodesics are the generalization of the invariance of lightspeed in special relativity.[60] As one examines suitable model spacetimes (either the exterior Schwarzschild solution or, for more than a single mass, the post-Newtonian expansion),[61] several effects of gravity on light propagation emerge. Although the bending of light can also be derived by extending the universality of free fall to light,[62] the angle of deflection resulting from such calculations is only half the value given by general relativity.[63]

Deflection of light (sent out from the location shown in blue) near a compact body (shown in gray)

Closely related to light deflection is the gravitational time delay (or Shapiro delay), the phenomenon that light signals take longer to move through a gravitational field than they would in the absence of that field. There have been numerous successful tests of this prediction.[64] In the parameterized post-Newtonian formalism (PPN), measurements of both the deflection of light and the gravitational time delay determine a parameter called , which encodes the influence of gravity on the geometry of space.[65]

General relativity

93

Gravitational waves
One of several analogies between weak-field gravity and electromagnetism is that, analogous to electromagnetic waves, there are gravitational waves: ripples in the metric of spacetime that propagate at the speed of light.[66] The simplest type of such a wave can be visualized by its action on a ring of freely floating particles. A sine wave propagating through such a ring towards the reader distorts the ring in a characteristic, rhythmic fashion (animated image to the right).[67] Since Einstein's equations are non-linear, arbitrarily strong gravitational waves do not obey linear superposition, making their description difficult. However, for weak fields, a linear approximation can be made. Such linearized gravitational waves are sufficiently accurate to describe the exceedingly weak waves that are expected to arrive here on Earth from far-off cosmic events, which typically result in relative distances increasing and decreasing by or less.

Ring of test particles influenced by gravitational wave

Data-analysis methods routinely make use of the fact that these linearized waves can be Fourier decomposed.[68] Some exact solutions describe gravitational waves without any approximation, e.g., a wave train traveling through empty space[69] or so-called Gowdy universes, varieties of an expanding cosmos filled with gravitational waves.[70] But for gravitational waves produced in astrophysically relevant situations, such as the merger of two black holes, numerical methods are presently the only way to construct appropriate models.[71]

Orbital effects and the relativity of direction


General relativity differs from classical mechanics in a number of predictions concerning orbiting bodies. It predicts an overall rotation (precession) of planetary orbits, as well as orbital decay caused by the emission of gravitational waves and effects related to the relativity of direction. Precession of apsides In general relativity, the apsides of any orbit (the point of the orbiting body's closest approach to the system's center of mass) will precessthe orbit is not an ellipse, but akin to an ellipse that rotates on its focus, resulting in a rose curve-like shape (see image). Einstein first derived this result by using an approximate metric representing the Newtonian limit and treating the orbiting body as a test particle. For him, the fact that his theory gave a straightforward explanation of the anomalous perihelion shift of the planet Mercury, discovered earlier by Urbain Le Verrier in 1859, was important evidence that he had at last identified the correct form of the gravitational field equations.[72] The effect can also be derived by using either the exact Schwarzschild Newtonian (red) vs. Einsteinian orbit (blue) of a lone planet orbiting a star metric (describing spacetime around a spherical mass)[73] or the much [74] more general post-Newtonian formalism. It is due to the influence of gravity on the geometry of space and to the contribution of self-energy to a body's gravity (encoded in the nonlinearity of Einstein's equations).[75] Relativistic precession has been observed for all planets that allow for accurate precession measurements (Mercury, Venus and the Earth),[76] as well as in binary pulsar systems, where it is larger by five orders of magnitude.[77]

General relativity Orbital decay According to general relativity, a binary system will emit gravitational waves, thereby losing energy. Due to this loss, the distance between the two orbiting bodies decreases, and so does their orbital period. Within the solar system or for ordinary double stars, the effect is too small to be observable. This is not the case for a close binary pulsar, a system of two orbiting neutron stars, one of which is a pulsar: from the pulsar, observers on Earth receive a regular series of radio pulses that can serve as a highly accurate clock, which allows precise measurements of the orbital period. Since the neutron stars are very compact, significant amounts of energy are emitted in the form of gravitational radiation.[79] The first observation of a decrease in orbital period due to the emission of gravitational waves was made by Hulse and Taylor, using the binary Orbital decay for PSR1913+16: time shift in pulsar PSR1913+16 they had discovered in 1974. This was the first [78] seconds, tracked over three decades. detection of gravitational waves, albeit indirect, for which they were awarded the 1993 Nobel Prize in physics.[80] Since then, several other binary pulsars have been found, in particular the double pulsar PSR J0737-3039, in which both stars are pulsars.[81] Geodetic precession and frame-dragging Several relativistic effects are directly related to the relativity of direction.[82] One is geodetic precession: the axis direction of a gyroscope in free fall in curved spacetime will change when compared, for instance, with the direction of light received from distant starseven though such a gyroscope represents the way of keeping a direction as stable as possible ("parallel transport").[83] For the MoonEarth system, this effect has been measured with the help of lunar laser ranging.[84] More recently, it has been measured for test masses aboard the satellite Gravity Probe B to a precision of better than 0.3%.[85][86] Near a rotating mass, there are so-called gravitomagnetic or frame-dragging effects. A distant observer will determine that objects close to the mass get "dragged around". This is most extreme for rotating black holes where, for any object entering a zone known as the ergosphere, rotation is inevitable.[87] Such effects can again be tested through their influence on the orientation of gyroscopes in free fall.[88] Somewhat controversial tests have been performed using the LAGEOS satellites, confirming the relativistic prediction.[89] Also the Mars Global Surveyor probe around Mars has been used.[90][91]

94

General relativity

95

Astrophysical applications
Gravitational lensing
The deflection of light by gravity is responsible for a new class of astronomical phenomena. If a massive object is situated between the astronomer and a distant target object with appropriate mass and relative distances, the astronomer will see multiple distorted images of the target. Such effects are known as gravitational lensing.[92] Depending on the configuration, scale, and mass distribution, there can be two or more images, a bright ring known as an Einstein ring, or partial rings called arcs.[93] The earliest example was discovered in 1979;[94] since then, more than a hundred gravitational lenses have been observed.[95] Even if the multiple images are too close to each other to be resolved, the effect can still be measured, e.g., as an overall brightening of the target object; a number of such "microlensing events" have been observed.[96]

Einstein cross: four images of the same astronomical object, produced by a gravitational lens

Gravitational lensing has developed into a tool of observational astronomy. It is used to detect the presence and distribution of dark matter, provide a "natural telescope" for observing distant galaxies, and to obtain an independent estimate of the Hubble constant. Statistical evaluations of lensing data provide valuable insight into the structural evolution of galaxies.[97]

Gravitational wave astronomy


Observations of binary pulsars provide strong indirect evidence for the existence of gravitational waves (see Orbital decay, above). However, gravitational waves reaching us from the depths of the cosmos have not been detected directly, which is a major goal of current relativity-related research.[98] Several land-based gravitational wave detectors are currently in operation, most notably the interferometric detectors GEO 600, LIGO (three detectors), TAMA 300 and VIRGO.[99] Various pulsar timing arrays are using millisecond pulsars to detect gravitational waves in the 109 to 106 Hertz frequency range, which originate from binary supermassive blackholes.[100] European space-based detector, eLISA / NGO, is currently under development,[101] with a precursor mission (LISA Pathfinder) due for launch in 2014.[102] Observations of gravitational waves promise to complement Artist's impression of the space-borne observations in the electromagnetic spectrum.[103] They are expected to gravitational wave detector LISA yield information about black holes and other dense objects such as neutron stars and white dwarfs, about certain kinds of supernova implosions, and about processes in the very early universe, including the signature of certain types of hypothetical cosmic string.[104]

General relativity

96

Black holes and other compact objects


Whenever the ratio of an object's mass to its radius becomes sufficiently large, general relativity predicts the formation of a black hole, a region of space from which nothing, not even light, can escape. In the currently accepted models of stellar evolution, neutron stars of around 1.4 solar masses, and stellar black holes with a few to a few dozen solar masses, are thought to be the final state for the evolution of massive stars.[105] Usually a galaxy has one supermassive black hole with a few million to a few billion solar masses in its center,[106] and its presence is thought to have played an important role in the formation of the galaxy and larger cosmic structures.[107] Astronomically, the most important property of compact objects is that they provide a supremely efficient mechanism for converting gravitational energy into electromagnetic radiation.[108] Accretion, the falling of dust or gaseous matter onto stellar or supermassive black holes, is thought to be responsible for some spectacularly luminous astronomical objects, notably diverse kinds of active galactic nuclei on galactic scales and stellar-size objects such as microquasars.[109] In particular, accretion can lead to relativistic jets, focused beams of Simulation based on the equations of general highly energetic particles that are being flung into space at almost light relativity: a star collapsing to form a black hole while emitting gravitational waves speed.[110] General relativity plays a central role in modelling all these phenomena,[111] and observations provide strong evidence for the existence of black holes with the properties predicted by the theory.[112] Black holes are also sought-after targets in the search for gravitational waves (cf. Gravitational waves, above). Merging black hole binaries should lead to some of the strongest gravitational wave signals reaching detectors here on Earth, and the phase directly before the merger ("chirp") could be used as a "standard candle" to deduce the distance to the merger eventsand hence serve as a probe of cosmic expansion at large distances.[113] The gravitational waves produced as a stellar black hole plunges into a supermassive one should provide direct information about the supermassive black hole's geometry.[114]

Cosmology
The current models of cosmology are based on Einstein's field equations, which include the cosmological constant since it has important influence on the large-scale dynamics of the cosmos,

where gab is the spacetime metric.[115] Isotropic and homogeneous solutions of these enhanced equations, the FriedmannLematreRobertsonWalker solutions,[116] allow physicists to model a universe that has evolved over the past This blue horseshoe is a distant galaxy that has 14billionyears from a hot, early Big Bang phase.[117] Once a small been magnified and warped into a nearly number of parameters (for example the universe's mean matter density) complete ring by the strong gravitational pull of have been fixed by astronomical observation,[118] further observational the massive foreground luminous red galaxy. data can be used to put the models to the test.[119] Predictions, all successful, include the initial abundance of chemical elements formed in a period of primordial nucleosynthesis,[120] the large-scale structure of the universe,[121] and the existence and properties of a "thermal echo" from the early cosmos, the cosmic background radiation.[122] Astronomical observations of the cosmological expansion rate allow the total amount of matter in the universe to be estimated, although the nature of that matter remains mysterious in part. About 90% of all matter appears to be

General relativity so-called dark matter, which has mass (or, equivalently, gravitational influence), but does not interact electromagnetically and, hence, cannot be observed directly.[123] There is no generally accepted description of this new kind of matter, within the framework of known particle physics[124] or otherwise.[125] Observational evidence from redshift surveys of distant supernovae and measurements of the cosmic background radiation also show that the evolution of our universe is significantly influenced by a cosmological constant resulting in an acceleration of cosmic expansion or, equivalently, by a form of energy with an unusual equation of state, known as dark energy, the nature of which remains unclear.[126] A so-called inflationary phase,[127] an additional phase of strongly accelerated expansion at cosmic times of around seconds, was hypothesized in 1980 to account for several puzzling observations that were unexplained by classical cosmological models, such as the nearly perfect homogeneity of the cosmic background radiation.[128] Recent measurements of the cosmic background radiation have resulted in the first evidence for this scenario.[129] However, there is a bewildering variety of possible inflationary scenarios, which cannot be restricted by current observations.[130] An even larger question is the physics of the earliest universe, prior to the inflationary phase and close to where the classical models predict the big bang singularity. An authoritative answer would require a complete theory of quantum gravity, which has not yet been developed[131] (cf. the section on quantum gravity, below).

97

Advanced concepts
Causal structure and global geometry
In general relativity, no material body can catch up with or overtake a light pulse. No influence from an event A can reach any other location X before light sent out at A to X. In consequence, an exploration of all light worldlines (null geodesics) yields key information about the spacetime's causal structure. This structure can be displayed using PenroseCarter diagrams in which infinitely large regions of space and infinite time intervals are shrunk ("compactified") so as to fit onto a finite map, while light still travels along diagonals as in standard spacetime diagrams.[132] Aware of the importance of causal structure, Roger Penrose and others PenroseCarter diagram of an infinite Minkowski developed what is known as global geometry. In global geometry, the universe object of study is not one particular solution (or family of solutions) to Einstein's equations. Rather, relations that hold true for all geodesics, such as the Raychaudhuri equation, and additional non-specific assumptions about the nature of matter (usually in the form of so-called energy conditions) are used to derive general results.[133]

Horizons
Using global geometry, some spacetimes can be shown to contain boundaries called horizons, which demarcate one region from the rest of spacetime. The best-known examples are black holes: if mass is compressed into a sufficiently compact region of space (as specified in the hoop conjecture, the relevant length scale is the Schwarzschild radius[134]), no light from inside can escape to the outside. Since no object can overtake a light pulse, all interior matter is imprisoned as well. Passage from the exterior to the interior is still possible, showing that the boundary, the black hole's horizon, is not a physical barrier.[135]

General relativity

98 Early studies of black holes relied on explicit solutions of Einstein's equations, notably the spherically symmetric Schwarzschild solution (used to describe a static black hole) and the axisymmetric Kerr solution (used to describe a rotating, stationary black hole, and introducing interesting features such as the ergosphere). Using global geometry, later studies have revealed more general properties of black holes. In the long run, they are rather simple objects characterized by eleven parameters specifying energy, linear momentum, angular momentum, location at a specified time and electric charge. This is stated by the black hole uniqueness theorems: "black holes have no hair", that is, no distinguishing marks like the hairstyles of humans. Irrespective of the complexity of a gravitating object collapsing to form a black hole, the object that results (having emitted gravitational waves) is very simple.[136]

The ergosphere of a rotating black hole, which plays a key role when it comes to extracting energy from such a black hole

Even more remarkably, there is a general set of laws known as black hole mechanics, which is analogous to the laws of thermodynamics. For instance, by the second law of black hole mechanics, the area of the event horizon of a general black hole will never decrease with time, analogous to the entropy of a thermodynamic system. This limits the energy that can be extracted by classical means from a rotating black hole (e.g. by the Penrose process).[137] There is strong evidence that the laws of black hole mechanics are, in fact, a subset of the laws of thermodynamics, and that the black hole area is proportional to its entropy.[138] This leads to a modification of the original laws of black hole mechanics: for instance, as the second law of black hole mechanics becomes part of the second law of thermodynamics, it is possible for black hole area to decreaseas long as other processes ensure that, overall, entropy increases. As thermodynamical objects with non-zero temperature, black holes should emit thermal radiation. Semi-classical calculations indicate that indeed they do, with the surface gravity playing the role of temperature in Planck's law. This radiation is known as Hawking radiation (cf. the quantum theory section, below).[139] There are other types of horizons. In an expanding universe, an observer may find that some regions of the past cannot be observed ("particle horizon"), and some regions of the future cannot be influenced (event horizon).[140] Even in flat Minkowski space, when described by an accelerated observer (Rindler space), there will be horizons associated with a semi-classical radiation known as Unruh radiation.[141]

Singularities
Another generaland quite disturbingfeature of general relativity is the appearance of spacetime boundaries known as singularities. Spacetime can be explored by following up on timelike and lightlike geodesicsall possible ways that light and particles in free fall can travel. But some solutions of Einstein's equations have "ragged edges"regions known as spacetime singularities, where the paths of light and falling particles come to an abrupt end, and geometry becomes ill-defined. In the more interesting cases, these are "curvature singularities", where geometrical quantities characterizing spacetime curvature, such as the Ricci scalar, take on infinite values.[142] Well-known examples of spacetimes with future singularitieswhere worldlines endare the Schwarzschild solution, which describes a singularity inside an eternal static black hole,[143] or the Kerr solution with its ring-shaped singularity inside an eternal rotating black hole.[144] The FriedmannLematreRobertsonWalker solutions and other spacetimes describing universes have past singularities on which worldlines begin, namely Big Bang singularities, and some have future singularities (Big Crunch) as well.[145] Given that these examples are all highly symmetricand thus simplifiedit is tempting to conclude that the occurrence of singularities is an artifact of idealization.[146] The famous singularity theorems, proved using the methods of global geometry, say otherwise: singularities are a generic feature of general relativity, and unavoidable

General relativity once the collapse of an object with realistic matter properties has proceeded beyond a certain stage[147] and also at the beginning of a wide class of expanding universes.[148] However, the theorems say little about the properties of singularities, and much of current research is devoted to characterizing these entities' generic structure (hypothesized e.g. by the so-called BKL conjecture).[149] The cosmic censorship hypothesis states that all realistic future singularities (no perfect symmetries, matter with realistic properties) are safely hidden away behind a horizon, and thus invisible to all distant observers. While no formal proof yet exists, numerical simulations offer supporting evidence of its validity.[150]

99

Evolution equations
Each solution of Einstein's equation encompasses the whole history of a universe it is not just some snapshot of how things are, but a whole, possibly matter-filled, spacetime. It describes the state of matter and geometry everywhere and at every moment in that particular universe. Due to its general covariance, Einstein's theory is not sufficient by itself to determine the time evolution of the metric tensor. It must be combined with a coordinate condition, which is analogous to gauge fixing in other field theories.[151] To understand Einstein's equations as partial differential equations, it is helpful to formulate them in a way that describes the evolution of the universe over time. This is done in so-called "3+1" formulations, where spacetime is split into three space dimensions and one time dimension. The best-known example is the ADM formalism.[152] These decompositions show that the spacetime evolution equations of general relativity are well-behaved: solutions always exist, and are uniquely defined, once suitable initial conditions have been specified.[153] Such formulations of Einstein's field equations are the basis of numerical relativity.[154]

Global and quasi-local quantities


The notion of evolution equations is intimately tied in with another aspect of general relativistic physics. In Einstein's theory, it turns out to be impossible to find a general definition for a seemingly simple property such as a system's total mass (or energy). The main reason is that the gravitational fieldlike any physical fieldmust be ascribed a certain energy, but that it proves to be fundamentally impossible to localize that energy.[155] Nevertheless, there are possibilities to define a system's total mass, either using a hypothetical "infinitely distant observer" (ADM mass)[156] or suitable symmetries (Komar mass).[157] If one excludes from the system's total mass the energy being carried away to infinity by gravitational waves, the result is the so-called Bondi mass at null infinity.[158] Just as in classical physics, it can be shown that these masses are positive.[159] Corresponding global definitions exist for momentum and angular momentum.[160] There have also been a number of attempts to define quasi-local quantities, such as the mass of an isolated system formulated using only quantities defined within a finite region of space containing that system. The hope is to obtain a quantity useful for general statements about isolated systems, such as a more precise formulation of the hoop conjecture.[161]

Relationship with quantum theory


If general relativity is considered one of the two pillars of modern physics, quantum theory, the basis of understanding matter from elementary particles to solid state physics, is the other.[162] However, it is still an open question as to how the concepts of quantum theory can be reconciled with those of general relativity.

Quantum field theory in curved spacetime


Ordinary quantum field theories, which form the basis of modern elementary particle physics, are defined in flat Minkowski space, which is an excellent approximation when it comes to describing the behavior of microscopic particles in weak gravitational fields like those found on Earth.[163] In order to describe situations in which gravity is strong enough to influence (quantum) matter, yet not strong enough to require quantization itself, physicists have formulated quantum field theories in curved spacetime. These theories rely on classical general relativity to describe

General relativity a curved background spacetime, and define a generalized quantum field theory to describe the behavior of quantum matter within that spacetime.[164] Using this formalism, it can be shown that black holes emit a blackbody spectrum of particles known as Hawking radiation, leading to the possibility that they evaporate over time.[165] As briefly mentioned above, this radiation plays an important role for the thermodynamics of black holes.[166]

100

Quantum gravity
The demand for consistency between a quantum description of matter and a geometric description of spacetime,[167] as well as the appearance of singularities (where curvature length scales become microscopic), indicate the need for a full theory of quantum gravity: for an adequate description of the interior of black holes, and of the very early universe, a theory is required in which gravity and the associated geometry of spacetime are described in the language of quantum physics.[168] Despite major efforts, no complete and consistent theory of quantum gravity is currently known, even though a number of promising candidates exist.[169] Attempts to generalize ordinary quantum field theories, used in elementary particle physics to describe fundamental interactions, so as to include gravity have led to serious problems. At low energies, this approach proves successful, in that it results in an acceptable effective (quantum) field theory of gravity.[170] At very high energies, however, the result are models devoid of all predictive power ("non-renormalizability").[171]

Projection of a CalabiYau manifold, one of the ways of compactifying the extra dimensions posited by string theory

One attempt to overcome these limitations is string theory, a quantum theory not of point particles, but of minute one-dimensional extended objects.[172] The theory promises to be a unified description of all particles and interactions, including gravity;[173] the price to pay is unusual features such as six extra dimensions of space in addition to the usual three.[174] In what is called the second superstring revolution, it was conjectured that both string theory and a unification of general relativity and supersymmetry known as supergravity[175] form part of a hypothesized eleven-dimensional model known as M-theory, which would constitute a uniquely defined and consistent theory of quantum gravity.[176] Another approach starts with the canonical quantization procedures of quantum theory. Using the initial-value-formulation of general relativity (cf. evolution equations above), the result is the WheelerdeWitt equation (an analogue of the Schrdinger equation) which, regrettably, turns out to be ill-defined.[177] However, with the

Simple spin network of the type used in loop quantum gravity

General relativity introduction of what are now known as Ashtekar variables,[178] this leads to a promising model known as loop quantum gravity. Space is represented by a web-like structure called a spin network, evolving over time in discrete steps.[179] Depending on which features of general relativity and quantum theory are accepted unchanged, and on what level changes are introduced,[180] there are numerous other attempts to arrive at a viable theory of quantum gravity, some examples being dynamical triangulations,[181] causal sets,[182] twistor models[183] or the path-integral based models of quantum cosmology.[184] All candidate theories still have major formal and conceptual problems to overcome. They also face the common problem that, as yet, there is no way to put quantum gravity predictions to experimental tests (and thus to decide between the candidates where their predictions vary), although there is hope for this to change as future data from cosmological observations and particle physics experiments becomes available.[185]

101

Current status
General relativity has emerged as a highly successful model of gravitation and cosmology, which has so far passed many unambiguous observational and experimental tests. However, there are strong indications the theory is incomplete.[186] The problem of quantum gravity and the question of the reality of spacetime singularities remain open.[187] Observational data that is taken as evidence for dark energy and dark matter could indicate the need for new physics.[188] Even taken as is, general relativity is rich with possibilities for further exploration. Mathematical relativists seek to understand the nature of singularities and the fundamental properties of Einstein's equations,[189] and increasingly powerful computer simulations (such as those describing merging black holes) are run.[190] The race for the first direct detection of gravitational waves continues,[191] in the hope of creating opportunities to test the theory's validity for much stronger gravitational fields than has been possible to date.[192] More than ninety years after its publication, general relativity remains a highly active area of research.[193]

Notes
[1] "Nobel Prize Biography" (http:/ / nobelprize. org/ nobel_prizes/ physics/ laureates/ 1921/ einstein-bio. html). Nobel Prize Biography. Nobel Prize. . Retrieved 25 February 2011. [2] Pais 1982, ch. 9 to 15, Janssen 2005; an up-to-date collection of current research, including reprints of many of the original articles, is Renn 2007; an accessible overview can be found in Renn 2005, pp.110ff. An early key article is Einstein 1907, cf. Pais 1982, ch. 9. The publication featuring the field equations is Einstein 1915, cf. Pais 1982, ch. 1115 [3] Schwarzschild 1916a, Schwarzschild 1916b and Reissner 1916 (later complemented in Nordstrm 1918) [4] Einstein 1917, cf. Pais 1982, ch. 15e [5] Hubble's original article is Hubble 1929; an accessible overview is given in Singh 2004, ch. 24 [6] As reported in Gamow 1970. Einstein's condemnation would prove to be premature, cf. the section Cosmology, below [7] Pais 1982, pp.253254 [8] Kennefick 2005, Kennefick 2007 [9] Pais 1982, ch. 16 [10] Thorne, Kip (2003). "Warping spacetime" (http:/ / books. google. com/ books?id=yLy4b61rfPwC). The future of theoretical physics and cosmology: celebrating Stephen Hawking's 60th birthday. Cambridge University Press. p.74. ISBN0-521-82081-2. ., Extract of page 74 (http:/ / books. google. com/ books?id=yLy4b61rfPwC& pg=PA74) [11] Israel 1987, ch. 7.87.10, Thorne 1994, ch. 39 [12] Sections Orbital effects and the relativity of direction, Gravitational time dilation and frequency shift and Light deflection and gravitational time delay, and references therein [13] Section Cosmology and references therein; the historical development is in Overbye 1999 [14] The following exposition re-traces that of Ehlers 1973, sec. 1 [15] Arnold 1989, ch. 1 [16] Ehlers 1973, pp.5f [17] Will 1993, sec. 2.4, Will 2006, sec. 2 [18] Wheeler 1990, ch. 2 [19] Ehlers 1973, sec. 1.2, Havas 1964, Knzle 1972. The simple thought experiment in question was first described in Heckmann & Schcking 1959

General relativity
[20] Ehlers 1973, pp.10f [21] Good introductions are, in order of increasing presupposed knowledge of mathematics, Giulini 2005, Mermin 2005, and Rindler 1991; for accounts of precision experiments, cf. part IV of Ehlers & Lmmerzahl 2006 [22] An in-depth comparison between the two symmetry groups can be found in Giulini 2006a [23] Rindler 1991, sec. 22, Synge 1972, ch. 1 and 2 [24] Ehlers 1973, sec. 2.3 [25] Ehlers 1973, sec. 1.4, Schutz 1985, sec. 5.1 [26] Ehlers 1973, pp.17ff; a derivation can be found in Mermin 2005, ch. 12. For the experimental evidence, cf. the section Gravitational time dilation and frequency shift, below [27] Rindler 2001, sec. 1.13; for an elementary account, see Wheeler 1990, ch. 2; there are, however, some differences between the modern version and Einstein's original concept used in the historical derivation of general relativity, cf. Norton 1985 [28] Ehlers 1973, sec. 1.4 for the experimental evidence, see once more section Gravitational time dilation and frequency shift. Choosing a different connection with non-zero torsion leads to a modified theory known as EinsteinCartan theory [29] Ehlers 1973, p.16, Kenyon 1990, sec. 7.2, Weinberg 1972, sec. 2.8 [30] Ehlers 1973, pp.1922; for similar derivations, see sections 1 and 2 of ch. 7 in Weinberg 1972. The Einstein tensor is the only divergence-free tensor that is a function of the metric coefficients, their first and second derivatives at most, and allows the spacetime of special relativity as a solution in the absence of sources of gravity, cf. Lovelock 1972. The tensors on both side are of second rank, that is, they can each be thought of as 44 matrices, each of which contains ten independent terms; hence, the above represents ten coupled equations. The fact that, as a consequence of geometric relations known as Bianchi identities, the Einstein tensor satisfies a further four identities reduces these to six independent equations, e.g. Schutz 1985, sec. 8.3 [31] Kenyon 1990, sec. 7.4 [32] Brans & Dicke 1961, Weinberg 1972, sec. 3 in ch. 7, Goenner 2004, sec. 7.2, and Trautman 2006, respectively [33] Wald 1984, ch. 4, Weinberg 1972, ch. 7 or, in fact, any other textbook on general relativity [34] At least approximately, cf. Poisson 2004 [35] Wheeler 1990, p.xi [36] Wald 1984, sec. 4.4 [37] Wald 1984, sec. 4.1 [38] For the (conceptual and historical) difficulties in defining a general principle of relativity and separating it from the notion of general covariance, see Giulini 2006b [39] section 5 in ch. 12 of Weinberg 1972 [40] Introductory chapters of Stephani et al. 2003 [41] A review showing Einstein's equation in the broader context of other PDEs with physical significance is Geroch 1996 [42] For background information and a list of solutions, cf. Stephani et al. 2003; a more recent review can be found in MacCallum 2006 [43] Chandrasekhar 1983, ch. 3,5,6 [44] Narlikar 1993, ch. 4, sec. 3.3 [45] Brief descriptions of these and further interesting solutions can be found in Hawking & Ellis 1973, ch. 5 [46] Lehner 2002 [47] For instance Wald 1984, sec. 4.4 [48] Will 1993, sec. 4.1 and 4.2 [49] Will 2006, sec. 3.2, Will 1993, ch. 4 [50] Rindler 2001, pp.2426 vs. pp. 236237 and Ohanian & Ruffini 1994, pp.164172. Einstein derived these effects using the equivalence principle as early as 1907, cf. Einstein 1907 and the description in Pais 1982, pp.196198 [51] Rindler 2001, pp.2426; Misner, Thorne & Wheeler 1973, 38.5 [52] Pound-Rebka experiment, see Pound & Rebka 1959, Pound & Rebka 1960; Pound & Snider 1964; a list of further experiments is given in Ohanian & Ruffini 1994, table 4.1 on p. 186 [53] Greenstein, Oke & Shipman 1971; the most recent and most accurate Sirius B measurements are published in Barstow, Bond et al. 2005. [54] Starting with the Hafele-Keating experiment, Hafele & Keating 1972a and Hafele & Keating 1972b, and culminating in the Gravity Probe A experiment; an overview of experiments can be found in Ohanian & Ruffini 1994, table 4.1 on p. 186 [55] GPS is continually tested by comparing atomic clocks on the ground and aboard orbiting satellites; for an account of relativistic effects, see Ashby 2002 and Ashby 2003 [56] Stairs 2003 and Kramer 2004 [57] General overviews can be found in section 2.1. of Will 2006; Will 2003, pp. 3236; Ohanian & Ruffini 1994, sec. 4.2 [58] Ohanian & Ruffini 1994, pp.164172 [59] Cf. Kennefick 2005 for the classic early measurements by the Eddington expeditions; for an overview of more recent measurements, see Ohanian & Ruffini 1994, ch. 4.3. For the most precise direct modern observations using quasars, cf. Shapiro et al. 2004 [60] This is not an independent axiom; it can be derived from Einstein's equations and the Maxwell Lagrangian using a WKB approximation, cf. Ehlers 1973, sec. 5 [61] Blanchet 2006, sec. 1.3

102

General relativity
[62] Rindler 2001, sec. 1.16; for the historical examples, Israel 1987, pp.202204; in fact, Einstein published one such derivation as Einstein 1907. Such calculations tacitly assume that the geometry of space is Euclidean, cf. Ehlers & Rindler 1997 [63] From the standpoint of Einstein's theory, these derivations take into account the effect of gravity on time, but not its consequences for the warping of space, cf. Rindler 2001, sec. 11.11 [64] For the Sun's gravitational field using radar signals reflected from planets such as Venus and Mercury, cf. Shapiro 1964, Weinberg 1972, ch. 8, sec. 7; for signals actively sent back by space probes (transponder measurements), cf. Bertotti, Iess & Tortora 2003; for an overview, see Ohanian & Ruffini 1994, table 4.4 on p. 200; for more recent measurements using signals received from a pulsar that is part of a binary system, the gravitational field causing the time delay being that of the other pulsar, cf. Stairs 2003, sec. 4.4 [65] Will 1993, sec. 7.1 and 7.2 [66] These have been indirectly observed through the loss of energy in binary pulsar systems such as the HulseTaylor binary, the subject of the 1993 Nobel Prize in physics. A number of projects are underway to attempt to observe directly the effects of gravitational waves. For an overview, see Misner, Thorne & Wheeler 1973, part VIII. Unlike electromagnetic waves, the dominant contribution for gravitational waves is not the dipole, but the quadrupole; see Schutz 2001 [67] Most advanced textbooks on general relativity contain a description of these properties, e.g. Schutz 1985, ch. 9 [68] For example Jaranowski & Krlak 2005 [69] Rindler 2001, ch. 13 [70] Gowdy 1971, Gowdy 1974 [71] See Lehner 2002 for a brief introduction to the methods of numerical relativity, and Seidel 1998 for the connection with gravitational wave astronomy [72] Schutz 2003, pp.4849, Pais 1982, pp.253254 [73] Rindler 2001, sec. 11.9 [74] Will 1993, pp.177181 [75] In consequence, in the parameterized post-Newtonian formalism (PPN), measurements of this effect determine a linear combination of the terms and , cf. Will 2006, sec. 3.5 and Will 1993, sec. 7.3 [76] The most precise measurements are VLBI measurements of planetary positions; see Will 1993, ch. 5, Will 2006, sec. 3.5, Anderson et al. 1992; for an overview, Ohanian & Ruffini 1994, pp.406407 [77] Kramer et al. 2006 [78] A figure that includes error bars is fig. 7 in Will 2006, sec. 5.1 [79] Stairs 2003, Schutz 2003, pp.317321, Bartusiak 2000, pp.7086 [80] Weisberg & Taylor 2003; for the pulsar discovery, see Hulse & Taylor 1975; for the initial evidence for gravitational radiation, see Taylor 1994 [81] Kramer 2004 [82] Penrose 2004, 14.5, Misner, Thorne & Wheeler 1973, 11.4 [83] Weinberg 1972, sec. 9.6, Ohanian & Ruffini 1994, sec. 7.8 [84] Bertotti, Ciufolini & Bender 1987, Nordtvedt 2003 [85] Kahn 2007 [86] A mission description can be found in Everitt et al. 2001; a first post-flight evaluation is given in Everitt, Parkinson & Kahn 2007; further updates will be available on the mission website Kahn 19962012. [87] Townsend 1997, sec. 4.2.1, Ohanian & Ruffini 1994, pp.469471 [88] Ohanian & Ruffini 1994, sec. 4.7, Weinberg 1972, sec. 9.7; for a more recent review, see Schfer 2004 [89] Ciufolini & Pavlis 2004, Ciufolini, Pavlis & Peron 2006, Iorio 2009 [90] Iorio L. (August 2006), "COMMENTS, REPLIES AND NOTES: A note on the evidence of the gravitomagnetic field of Mars", Classical Quantum Gravity 23 (17): 54515454, arXiv:gr-qc/0606092, Bibcode2006CQGra..23.5451I, doi:10.1088/0264-9381/23/17/N01 [91] Iorio L. (June 2010), "On the LenseThirring test with the Mars Global Surveyor in the gravitational field of Mars", Central European Journal of Physics 8 (3): 509513, arXiv:gr-qc/0701146, Bibcode2010CEJPh...8..509I, doi:10.2478/s11534-009-0117-6 [92] For overviews of gravitational lensing and its applications, see Ehlers, Falco & Schneider 1992 and Wambsganss 1998 [93] For a simple derivation, see Schutz 2003, ch. 23; cf. Narayan & Bartelmann 1997, sec. 3 [94] Walsh, Carswell & Weymann 1979 [95] Images of all the known lenses can be found on the pages of the CASTLES project, Kochanek et al. 2007 [96] Roulet & Mollerach 1997 [97] Narayan & Bartelmann 1997, sec. 3.7 [98] Barish 2005, Bartusiak 2000, Blair & McNamara 1997 [99] Hough & Rowan 2000 [100] Hobbs, George. "The international pulsar timing array project: using pulsars as a gravitational wave detector". arXiv:0911.5206. [101] Danzmann & Rdiger 2003 [102] "LISA pathfinder overview" (http:/ / www. esa. int/ esaSC/ 120397_index_0_m. html). ESA. . Retrieved 2012-04-23. [103] Thorne 1995 [104] Cutler & Thorne 2002 [105] Miller 2002, lectures 19 and 21

103

General relativity
[106] Celotti, Miller & Sciama 1999, sec. 3 [107] Springel et al. 2005 and the accompanying summary Gnedin 2005 [108] Blandford 1987, sec. 8.2.4 [109] For the basic mechanism, see Carroll & Ostlie 1996, sec. 17.2; for more about the different types of astronomical objects associated with this, cf. Robson 1996 [110] For a review, see Begelman, Blandford & Rees 1984. To a distant observer, some of these jets even appear to move faster than light; this, however, can be explained as an optical illusion that does not violate the tenets of relativity, see Rees 1966 [111] For stellar end states, cf. Oppenheimer & Snyder 1939 or, for more recent numerical work, Font 2003, sec. 4.1; for supernovae, there are still major problems to be solved, cf. Buras et al. 2003; for simulating accretion and the formation of jets, cf. Font 2003, sec. 4.2. Also, relativistic lensing effects are thought to play a role for the signals received from X-ray pulsars, cf. Kraus 1998 [112] The evidence includes limits on compactness from the observation of accretion-driven phenomena ("Eddington luminosity"), see Celotti, Miller & Sciama 1999, observations of stellar dynamics in the center of our own Milky Way galaxy, cf. Schdel et al. 2003, and indications that at least some of the compact objects in question appear to have no solid surface, which can be deduced from the examination of X-ray bursts for which the central compact object is either a neutron star or a black hole; cf. Remillard et al. 2006 for an overview, Narayan 2006, sec. 5. Observations of the "shadow" of the Milky Way galaxy's central black hole horizon are eagerly sought for, cf. Falcke, Melia & Agol 2000 [113] Dalal et al. 2006 [114] Barack & Cutler 2004 [115] Originally Einstein 1917; cf. Pais 1982, pp.285288 [116] Carroll 2001, ch. 2 [117] Bergstrm & Goobar 2003, ch. 911; use of these models is justified by the fact that, at large scales of around hundred million light-years and more, our own universe indeed appears to be isotropic and homogeneous, cf. Peebles et al. 1991 [118] E.g. with WMAP data, see Spergel et al. 2003 [119] These tests involve the separate observations detailed further on, see, e.g., fig. 2 in Bridle et al. 2003 [120] Peebles 1966; for a recent account of predictions, see Coc, VangioniFlam et al. 2004; an accessible account can be found in Weiss 2006; compare with the observations in Olive & Skillman 2004, Bania, Rood & Balser 2002, O'Meara et al. 2001, and Charbonnel & Primas 2005 [121] Lahav & Suto 2004, Bertschinger 1998, Springel et al. 2005 [122] Alpher & Herman 1948, for a pedagogical introduction, see Bergstrm & Goobar 2003, ch. 11; for the initial detection, see Penzias & Wilson 1965 and, for precision measurements by satellite observatories, Mather et al. 1994 (COBE) and Bennett et al. 2003 (WMAP). Future measurements could also reveal evidence about gravitational waves in the early universe; this additional information is contained in the background radiation's polarization, cf. Kamionkowski, Kosowsky & Stebbins 1997 and Seljak & Zaldarriaga 1997 [123] Evidence for this comes from the determination of cosmological parameters and additional observations involving the dynamics of galaxies and galaxy clusters cf. Peebles 1993, ch. 18, evidence from gravitational lensing, cf. Peacock 1999, sec. 4.6, and simulations of large-scale structure formation, see Springel et al. 2005 [124] Peacock 1999, ch. 12, Peskin 2007; in particular, observations indicate that all but a negligible portion of that matter is not in the form of the usual elementary particles ("non-baryonic matter"), cf. Peacock 1999, ch. 12 [125] Namely, some physicists have questioned whether or not the evidence for dark matter is, in fact, evidence for deviations from the Einsteinian (and the Newtonian) description of gravity cf. the overview in Mannheim 2006, sec. 9 [126] Carroll 2001; an accessible overview is given in Caldwell 2004. Here, too, scientists have argued that the evidence indicates not a new form of energy, but the need for modifications in our cosmological models, cf. Mannheim 2006, sec. 10; aforementioned modifications need not be modifications of general relativity, they could, for example, be modifications in the way we treat the inhomogeneities in the universe, cf. Buchert 2007 [127] A good introduction is Linde 1990; for a more recent review, see Linde 2005 [128] More precisely, these are the flatness problem, the horizon problem, and the monopole problem; a pedagogical introduction can be found in Narlikar 1993, sec. 6.4, see also Brner 1993, sec. 9.1 [129] Spergel et al. 2007, sec. 5,6 [130] More concretely, the potential function that is crucial to determining the dynamics of the inflaton is simply postulated, but not derived from an underlying physical theory [131] Brandenberger 2007, sec. 2 [132] Frauendiener 2004, Wald 1984, sec. 11.1, Hawking & Ellis 1973, sec. 6.8, 6.9 [133] Wald 1984, sec. 9.29.4 and Hawking & Ellis 1973, ch. 6 [134] Thorne 1972; for more recent numerical studies, see Berger 2002, sec. 2.1 [135] Israel 1987. A more exact mathematical description distinguishes several kinds of horizon, notably event horizons and apparent horizons cf. Hawking & Ellis 1973, pp.312320 or Wald 1984, sec. 12.2; there are also more intuitive definitions for isolated systems that do not require knowledge of spacetime properties at infinity, cf. Ashtekar & Krishnan 2004 [136] For first steps, cf. Israel 1971; see Hawking & Ellis 1973, sec. 9.3 or Heusler 1996, ch. 9 and 10 for a derivation, and Heusler 1998 as well as Beig & Chruciel 2006 as overviews of more recent results [137] The laws of black hole mechanics were first described in Bardeen, Carter & Hawking 1973; a more pedagogical presentation can be found in Carter 1979; for a more recent review, see Wald 2001, ch. 2. A thorough, book-length introduction including an introduction to the

104

General relativity
necessary mathematics Poisson 2004. For the Penrose process, see Penrose 1969 [138] Bekenstein 1973, Bekenstein 1974 [139] The fact that black holes radiate, quantum mechanically, was first derived in Hawking 1975; a more thorough derivation can be found in Wald 1975. A review is given in Wald 2001, ch. 3 [140] Narlikar 1993, sec. 4.4.4, 4.4.5 [141] Horizons: cf. Rindler 2001, sec. 12.4. Unruh effect: Unruh 1976, cf. Wald 2001, ch. 3 [142] Hawking & Ellis 1973, sec. 8.1, Wald 1984, sec. 9.1 [143] Townsend 1997, ch. 2; a more extensive treatment of this solution can be found in Chandrasekhar 1983, ch. 3 [144] Townsend 1997, ch. 4; for a more extensive treatment, cf. Chandrasekhar 1983, ch. 6 [145] Ellis & van Elst 1999; a closer look at the singularity itself is taken in Brner 1993, sec. 1.2 [146] Here one should remind to the well-known fact that the important "quasi-optical" singularities of the so-called eikonal approximations of many wave-equations, namely the "caustics", are resolved into finite peaks beyond that approximation. [147] Namely when there are trapped null surfaces, cf. Penrose 1965 [148] Hawking 1966 [149] The conjecture was made in Belinskii, Khalatnikov & Lifschitz 1971; for a more recent review, see Berger 2002. An accessible exposition is given by Garfinkle 2007 [150] The restriction to future singularities naturally excludes initial singularities such as the big bang singularity, which in principle be visible to observers at later cosmic time. The cosmic censorship conjecture was first presented in Penrose 1969; a textbook-level account is given in Wald 1984, pp.302305. For numerical results, see the review Berger 2002, sec. 2.1 [151] Hawking & Ellis 1973, sec. 7.1 [152] Arnowitt, Deser & Misner 1962; for a pedagogical introduction, see Misner, Thorne & Wheeler 1973, 21.421.7 [153] Fours-Bruhat 1952 and Bruhat 1962; for a pedagogical introduction, see Wald 1984, ch. 10; an online review can be found in Reula 1998 [154] Gourgoulhon 2007; for a review of the basics of numerical relativity, including the problems arising from the peculiarities of Einstein's equations, see Lehner 2001 [155] Misner, Thorne & Wheeler 1973, 20.4 [156] Arnowitt, Deser & Misner 1962 [157] Komar 1959; for a pedagogical introduction, see Wald 1984, sec. 11.2; although defined in a totally different way, it can be shown to be equivalent to the ADM mass for stationary spacetimes, cf. Ashtekar & Magnon-Ashtekar 1979 [158] For a pedagogical introduction, see Wald 1984, sec. 11.2 [159] Wald 1984, p.295 and refs therein; this is important for questions of stabilityif there were negative mass states, then flat, empty Minkowski space, which has mass zero, could evolve into these states [160] Townsend 1997, ch. 5 [161] Such quasi-local massenergy definitions are the Hawking energy, Geroch energy, or Penrose's quasi-local energymomentum based on twistor methods; cf. the review article Szabados 2004 [162] An overview of quantum theory can be found in standard textbooks such as Messiah 1999; a more elementary account is given in Hey & Walters 2003 [163] Ramond 1990, Weinberg 1995, Peskin & Schroeder 1995; a more accessible overview is Auyang 1995 [164] Wald 1994, Birrell & Davies 1984 [165] For Hawking radiation Hawking 1975, Wald 1975; an accessible introduction to black hole evaporation can be found in Traschen 2000 [166] Wald 2001, ch. 3 [167] Put simply, matter is the source of spacetime curvature, and once matter has quantum properties, we can expect spacetime to have them as well. Cf. Carlip 2001, sec. 2 [168] Schutz 2003, p.407 [169] A timeline and overview can be found in Rovelli 2000 [170] Donoghue 1995 [171] In particular, a technique known as renormalization, an integral part of deriving predictions which take into account higher-energy contributions, cf. Weinberg 1996, ch. 17, 18, fails in this case; cf. Goroff & Sagnotti 1985 [172] An accessible introduction at the undergraduate level can be found in Zwiebach 2004; more complete overviews can be found in Polchinski 1998a and Polchinski 1998b [173] At the energies reached in current experiments, these strings are indistinguishable from point-like particles, but, crucially, different modes of oscillation of one and the same type of fundamental string appear as particles with different (electric and other) charges, e.g. Ibanez 2000. The theory is successful in that one mode will always correspond to a graviton, the messenger particle of gravity, e.g. Green, Schwarz & Witten 1987, sec. 2.3, 5.3 [174] Green, Schwarz & Witten 1987, sec. 4.2 [175] Weinberg 2000, ch. 31 [176] Townsend 1996, Duff 1996 [177] Kucha 1973, sec. 3 [178] These variables represent geometric gravity using mathematical analogues of electric and magnetic fields; cf. Ashtekar 1986, Ashtekar 1987

105

General relativity
[179] For a review, see Thiemann 2006; more extensive accounts can be found in Rovelli 1998, Ashtekar & Lewandowski 2004 as well as in the lecture notes Thiemann 2003 [180] Isham 1994, Sorkin 1997 [181] Loll 1998 [182] Sorkin 2005 [183] Penrose 2004, ch. 33 and refs therein [184] Hawking 1987 [185] Ashtekar 2007, Schwarz 2007 [186] Maddox 1998, pp.5259, 98122; Penrose 2004, sec. 34.1, ch. 30 [187] section Quantum gravity, above [188] section Cosmology, above [189] Friedrich 2005 [190] A review of the various problems and the techniques being developed to overcome them, see Lehner 2002 [191] See Bartusiak 2000 for an account up to that year; up-to-date news can be found on the websites of major detector collaborations such as GEO 600 (http:/ / geo600. aei. mpg. de) and LIGO (http:/ / www. ligo. caltech. edu/ ) [192] For the most recent papers on gravitational wave polarizations of inspiralling compact binaries, see Blanchet et al. 2008, and Arun et al. 2007; for a review of work on compact binaries, see Blanchet 2006 and Futamase & Itoh 2006; for a general review of experimental tests of general relativity, see Will 2006 [193] See, e.g., the electronic review journal Living Reviews in Relativity (http:/ / relativity. livingreviews. org)

106

References
Alpher, R. A.; Herman, R. C. (1948), "Evolution of the universe", Nature 162 (4124): 774775, Bibcode1948Natur.162..774A, doi:10.1038/162774b0 Anderson, J. D.; Campbell, J. K.; Jurgens, R. F.; Lau, E. L. (1992), "Recent developments in solar-system tests of general relativity", in Sato, H.; Nakamura, T., Proceedings of the Sixth Marcel Gromann Meeting on General Relativity, World Scientific, pp.353355, ISBN981-02-0950-9 Arnold, V. I. (1989), Mathematical Methods of Classical Mechanics, Springer, ISBN3-540-96890-3 Arnowitt, Richard; Deser, Stanley; Misner, Charles W. (1962), "The dynamics of general relativity", in Witten, Louis, Gravitation: An Introduction to Current Research, Wiley, pp.227265 Arun, K.G.; Blanchet, L.; Iyer, B. R.; Qusailah, M. S. S. (2007), Inspiralling compact binaries in quasi-elliptical orbits: The complete 3PN energy flux, arXiv:0711.0302, Bibcode2008PhRvD..77f4035A, doi:10.1103/PhysRevD.77.064035 Ashby, Neil (2002), "Relativity and the Global Positioning System" (http://www.ipgp.jussieu.fr/~tarantola/ Files/Professional/GPS/Neil_Ashby_Relativity_GPS.pdf) (PDF), Physics Today 55 (5): 4147, Bibcode2002PhT....55e..41A, doi:10.1063/1.1485583 Ashby, Neil (2003), "Relativity in the Global Positioning System" (http://relativity.livingreviews.org/Articles/ lrr-2003-1/index.html), Living Reviews in Relativity (http://relativity.livingreviews.org) 6, retrieved 2007-07-06 Ashtekar, Abhay (1986), "New variables for classical and quantum gravity", Phys. Rev. Lett. 57 (18): 22442247, Bibcode1986PhRvL..57.2244A, doi:10.1103/PhysRevLett.57.2244, PMID10033673 Ashtekar, Abhay (1987), "New Hamiltonian formulation of general relativity", Phys. Rev. D36 (6): 15871602, Bibcode1987PhRvD..36.1587A, doi:10.1103/PhysRevD.36.1587 Ashtekar, Abhay (2007), Loop Quantum Gravity: Four Recent Advances and a Dozen Frequently Asked Questions, arXiv:0705.2222, Bibcode2008mgm..conf..126A, doi:10.1142/9789812834300_0008 Ashtekar, Abhay; Krishnan, Badri (2004), "Isolated and Dynamical Horizons and Their Applications" (http:// www.livingreviews.org/lrr-2004-10), Living Rev. Relativity 7, retrieved 2007-08-28 Ashtekar, Abhay; Lewandowski, Jerzy (2004), "Background Independent Quantum Gravity: A Status Report", Class. Quant. Grav. 21 (15): R53R152, arXiv:gr-qc/0404018, Bibcode2004CQGra..21R..53A, doi:10.1088/0264-9381/21/15/R01 Ashtekar, Abhay; Magnon-Ashtekar, Anne (1979), "On conserved quantities in general relativity", Journal of Mathematical Physics 20 (5): 793800, Bibcode1979JMP....20..793A, doi:10.1063/1.524151

General relativity Auyang, Sunny Y. (1995), How is Quantum Field Theory Possible?, Oxford University Press, ISBN0-19-509345-3 Bania, T. M.; Rood, R. T.; Balser, D. S. (2002), "The cosmological density of baryons from observations of 3He+ in the Milky Way", Nature 415 (6867): 5457, Bibcode2002Natur.415...54B, doi:10.1038/415054a, PMID11780112 Barack, Leor; Cutler, Curt (2004), "LISA Capture Sources: Approximate Waveforms, Signal-to-Noise Ratios, and Parameter Estimation Accuracy", Phys. Rev. D69 (8): 082005, arXiv:gr-qc/031012, Bibcode2004PhRvD..69h2005B, doi:10.1103/PhysRevD.69.082005 Bardeen, J. M.; Carter, B.; Hawking, S. W. (1973), "The Four Laws of Black Hole Mechanics" (http:// projecteuclid.org/euclid.cmp/1103858973), Comm. Math. Phys. 31 (2): 161170, Bibcode1973CMaPh..31..161B, doi:10.1007/BF01645742 Barish, Barry (2005), "Towards detection of gravitational waves", in Florides, P.; Nolan, B.; Ottewil, A., General Relativity and Gravitation. Proceedings of the 17th International Conference, World Scientific, pp.2434, ISBN981-256-424-1 Barstow, M; Bond, Howard E.; Holberg, J. B.; Burleigh, M. R.; Hubeny, I.; Koester, D. (2005), "Hubble Space Telescope Spectroscopy of the Balmer lines in Sirius B", Mon. Not. Roy. Astron. Soc. 362 (4): 11341142, arXiv:astro-ph/0506600, Bibcode2005MNRAS.362.1134B, doi:10.1111/j.1365-2966.2005.09359.x Bartusiak, Marcia (2000), Einstein's Unfinished Symphony: Listening to the Sounds of Space-Time, Berkley, ISBN978-0-425-18620-6 Begelman, Mitchell C.; Blandford, Roger D.; Rees, Martin J. (1984), "Theory of extragalactic radio sources", Rev. Mod. Phys. 56 (2): 255351, Bibcode1984RvMP...56..255B, doi:10.1103/RevModPhys.56.255 Beig, Robert; Chruciel, Piotr T. (2006), "Stationary black holes", in Francoise, J.-P.; Naber, G.; Tsou, T.S., Encyclopedia of Mathematical Physics, Volume 2, Elsevier, arXiv:gr-qc/0502041, Bibcode2005gr.qc.....2041B, ISBN0-12-512660-3 Bekenstein, Jacob D. (1973), "Black Holes and Entropy", Phys. Rev. D7 (8): 23332346, Bibcode1973PhRvD...7.2333B, doi:10.1103/PhysRevD.7.2333 Bekenstein, Jacob D. (1974), "Generalized Second Law of Thermodynamics in Black-Hole Physics", Phys. Rev. D9 (12): 32923300, Bibcode1974PhRvD...9.3292B, doi:10.1103/PhysRevD.9.3292 Belinskii, V. A.; Khalatnikov, I. M.; Lifschitz, E. M. (1971), "Oscillatory approach to the singular point in relativistic cosmology", Advances in Physics 19 (80): 525573, Bibcode1970AdPhy..19..525B, doi:10.1080/00018737000101171; original paper in Russian: Belinsky, V. A.; Khalatnikov, I. M.; Lifshitz, E. M. (1970), " ", Uspekhi Fizicheskikh Nauk ( ) 102(3) (11): 463500, Bibcode1970UsFiN.102..463B Bennett, C. L.; Halpern, M.; Hinshaw, G.; Jarosik, N.; Kogut, A.; Limon, M.; Meyer, S. S.; Page, L. et al. (2003), "First Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Preliminary Maps and Basic Results", Astrophys. J. Suppl. 148 (1): 127, arXiv:astro-ph/0302207, Bibcode2003ApJS..148....1B, doi:10.1086/377253 Berger, Beverly K. (2002), "Numerical Approaches to Spacetime Singularities" (http://www.livingreviews.org/ lrr-2002-1), Living Rev. Relativity' 5, retrieved 2007-08-04 Bergstrm, Lars; Goobar, Ariel (2003), Cosmology and Particle Astrophysics (2nd ed.), Wiley & Sons, ISBN3-540-43128-4 Bertotti, Bruno; Ciufolini, Ignazio; Bender, Peter L. (1987), "New test of general relativity: Measurement of de Sitter geodetic precession rate for lunar perigee", Physical Review Letters 58 (11): 10621065, Bibcode1987PhRvL..58.1062B, doi:10.1103/PhysRevLett.58.1062, PMID10034329 Bertotti, Bruno; Iess, L.; Tortora, P. (2003), "A test of general relativity using radio links with the Cassini spacecraft", Nature 425 (6956): 374376, Bibcode2003Natur.425..374B, doi:10.1038/nature01997, PMID14508481

107

General relativity Bertschinger, Edmund (1998), "Simulations of structure formation in the universe", Annu. Rev. Astron. Astrophys. 36 (1): 599654, Bibcode1998ARA&A..36..599B, doi:10.1146/annurev.astro.36.1.599 Birrell, N. D.; Davies, P. C. (1984), Quantum Fields in Curved Space, Cambridge University Press, ISBN0-521-27858-9 Blair, David; McNamara, Geoff (1997), Ripples on a Cosmic Sea. The Search for Gravitational Waves, Perseus, ISBN0-7382-0137-5 Blanchet, L.; Faye, G.; Iyer, B. R.; Sinha, S. (2008), The third post-Newtonian gravitational wave polarisations and associated spherical harmonic modes for inspiralling compact binaries in quasi-circular orbits, arXiv:0802.1249, Bibcode2008CQGra..25p5003B, doi:10.1088/0264-9381/25/16/165003 Blanchet, Luc (2006), "Gravitational Radiation from Post-Newtonian Sources and Inspiralling Compact Binaries" (http://www.livingreviews.org/lrr-2006-4), Living Rev. Relativity 9, retrieved 2007-08-07 Blandford, R. D. (1987), "Astrophysical Black Holes", in Hawking, Stephen W.; Israel, Werner, 300 Years of Gravitation, Cambridge University Press, pp.277329, ISBN0-521-37976-8 Brner, Gerhard (1993), The Early Universe. Facts and Fiction, Springer, ISBN0-387-56729-1 Brandenberger, Robert H. (2007), Conceptual Problems of Inflationary Cosmology and a New Approach to Cosmological Structure Formation, arXiv:hep-th/0701111, Bibcode2008LNP...738..393B, doi:10.1007/978-3-540-74353-8_11 Brans, C. H.; Dicke, R. H. (1961), "Mach's Principle and a Relativistic Theory of Gravitation", Physical Review 124 (3): 925935, Bibcode1961PhRv..124..925B, doi:10.1103/PhysRev.124.925 Bridle, Sarah L.; Lahav, Ofer; Ostriker, Jeremiah P.; Steinhardt, Paul J. (2003), "Precision Cosmology? Not Just Yet", Science 299 (5612): 15321533, arXiv:astro-ph/0303180, Bibcode2003Sci...299.1532B, doi:10.1126/science.1082158, PMID12624255 Bruhat, Yvonne (1962), "The Cauchy Problem", in Witten, Louis, Gravitation: An Introduction to Current Research, Wiley, pp.130, ISBN978-1-114-29166-9 Buchert, Thomas (2007), "Dark Energy from StructureA Status Report", General Relativity and Gravitation 40 (2-3): 467527, arXiv:0707.2153, Bibcode2008GReGr..40..467B, doi:10.1007/s10714-007-0554-8 Buras, R.; Rampp, M.; Janka, H.-Th.; Kifonidis, K. (2003), "Improved Models of Stellar Core Collapse and Still no Explosions: What is Missing?", Phys. Rev. Lett. 90 (24): 241101, arXiv:astro-ph/0303171, Bibcode2003PhRvL..90x1101B, doi:10.1103/PhysRevLett.90.241101, PMID12857181 Caldwell, Robert R. (2004), "Dark Energy", Physics World 17 (5): 3742 Carlip, Steven (2001), "Quantum Gravity: a Progress Report", Rept. Prog. Phys. 64 (8): 885942, arXiv:gr-qc/0108040, Bibcode2001RPPh...64..885C, doi:10.1088/0034-4885/64/8/301 Carroll, Bradley W.; Ostlie, Dale A. (1996), An Introduction to Modern Astrophysics, Addison-Wesley, ISBN0-201-54730-9 Carroll, Sean M. (2001), "The Cosmological Constant" (http://www.livingreviews.org/lrr-2001-1), Living Rev. Relativity 4, retrieved 2007-07-21 Carter, Brandon (1979), "The general theory of the mechanical, electromagnetic and thermodynamic properties of black holes", in Hawking, S. W.; Israel, W., General Relativity, an Einstein Centenary Survey, Cambridge University Press, pp.294369 and 860863, ISBN0-521-29928-4 Celotti, Annalisa; Miller, John C.; Sciama, Dennis W. (1999), "Astrophysical evidence for the existence of black holes", Class. Quant. Grav. 16 (12A): A3A21, arXiv:astro-ph/9912186, doi:10.1088/0264-9381/16/12A/301 Chandrasekhar, Subrahmanyan (1983), The Mathematical Theory of Black Holes, Oxford University Press, ISBN0-19-850370-9 Charbonnel, C.; Primas, F. (2005), "The Lithium Content of the Galactic Halo Stars", Astronomy & Astrophysics 442 (3): 961992, arXiv:astro-ph/0505247, Bibcode2005A&A...442..961C, doi:10.1051/0004-6361:20042491 Ciufolini, Ignazio; Pavlis, Erricos C. (2004), "A confirmation of the general relativistic prediction of the Lense-Thirring effect", Nature 431 (7011): 958960, Bibcode2004Natur.431..958C, doi:10.1038/nature03007,

108

General relativity PMID15496915 Ciufolini, Ignazio; Pavlis, Erricos C.; Peron, R. (2006), "Determination of frame-dragging using Earth gravity models from CHAMP and GRACE", New Astron. 11 (8): 527550, Bibcode2006NewA...11..527C, doi:10.1016/j.newast.2006.02.001 Coc, A.; VangioniFlam, Elisabeth; Descouvemont, Pierre; Adahchour, Abderrahim; Angulo, Carmen (2004), "Updated Big Bang Nucleosynthesis confronted to WMAP observations and to the Abundance of Light Elements", Astrophysical Journal 600 (2): 544552, arXiv:astro-ph/0309480, Bibcode2004ApJ...600..544C, doi:10.1086/380121 Cutler, Curt; Thorne, Kip S. (2002), "An overview of gravitational wave sources", in Bishop, Nigel; Maharaj, Sunil D., Proceedings of 16th International Conference on General Relativity and Gravitation (GR16), World Scientific, arXiv:gr-qc/0204090, Bibcode2002gr.qc.....4090C, ISBN981-238-171-6 Dalal, Neal; Holz, Daniel E.; Hughes, Scott A.; Jain, Bhuvnesh (2006), "Short GRB and binary black hole standard sirens as a probe of dark energy", Phys.Rev. D74 (6): 063006, arXiv:astro-ph/0601275, Bibcode2006PhRvD..74f3006D, doi:10.1103/PhysRevD.74.063006 Danzmann, Karsten; Rdiger, Albrecht (2003), "LISA TechnologyConcepts, Status, Prospects" (http://www. srl.caltech.edu/lisa/documents/KarstenAlbrechtOverviewCQG20-2003.pdf) (PDF), Class. Quant. Grav. 20 (10): S1S9, Bibcode2003CQGra..20S...1D, doi:10.1088/0264-9381/20/10/301

109

Dirac, Paul (1996), General Theory of Relativity, Princeton University Press, ISBN0-691-01146-X Donoghue, John F. (1995), "Introduction to the Effective Field Theory Description of Gravity", in Cornet, Fernando, Effective Theories: Proceedings of the Advanced School, Almunecar, Spain, 26 June1 July 1995, Singapore: World Scientific, arXiv:gr-qc/9512024, Bibcode1995gr.qc....12024D, ISBN981-02-2908-9 Duff, Michael (1996), "M-Theory (the Theory Formerly Known as Strings)", Int. J. Mod. Phys. A11 (32): 56235641, arXiv:hep-th/9608117, Bibcode1996IJMPA..11.5623D, doi:10.1142/S0217751X96002583 Ehlers, Jrgen (1973), "Survey of general relativity theory", in Israel, Werner, Relativity, Astrophysics and Cosmology, D. Reidel, pp.1125, ISBN90-277-0369-8 Ehlers, Jrgen; Falco, Emilio E.; Schneider, Peter (1992), Gravitational lenses, Springer, ISBN3-540-66506-4 Ehlers, Jrgen; Lmmerzahl, Claus, eds. (2006), Special RelativityWill it Survive the Next 101 Years?, Springer, ISBN3-540-34522-1 Ehlers, Jrgen; Rindler, Wolfgang (1997), "Local and Global Light Bending in Einstein's and other Gravitational Theories", General Relativity and Gravitation 29 (4): 519529, Bibcode1997GReGr..29..519E, doi:10.1023/A:1018843001842 Einstein, Albert (1907), "ber das Relativittsprinzip und die aus demselben gezogene Folgerungen" (http:// www.soso.ch/wissen/hist/SRT/E-1907.pdf) (PDF), Jahrbuch der Radioaktivitaet und Elektronik 4: 411, retrieved 2008-05-05 Einstein, Albert (1915), "Die Feldgleichungen der Gravitation" (http://nausikaa2.mpiwg-berlin.mpg.de/ cgi-bin/toc/toc.x.cgi?dir=6E3MAXK4&step=thumb), Sitzungsberichte der Preussischen Akademie der Wissenschaften zu Berlin: 844847, retrieved 2006-09-12 Einstein, Albert (1916), "Die Grundlage der allgemeinen Relativittstheorie" (http://web.archive.org/web/ 20060829045130/http://www.alberteinstein.info/gallery/gtext3.html) (PDF), Annalen der Physik 49, archived from the original (http://www.alberteinstein.info/gallery/gtext3.html) on 2006-08-29, retrieved 2006-09-03 Einstein, Albert (1917), "Kosmologische Betrachtungen zur allgemeinen Relativittstheorie", Sitzungsberichte der Preuischen Akademie der Wissenschaften: 142 Ellis, George F R; van Elst, Henk (1999), "Cosmological models (Cargse lectures 1998)", in Lachize-Rey, Marc, Theoretical and Observational Cosmology, Kluwer, pp.1116, arXiv:gr-qc/9812046, Bibcode1999toc..conf....1E

General relativity Everitt, C. W. F.; Buchman, S.; DeBra, D. B.; Keiser, G. M. (2001), "Gravity Probe B: Countdown to launch", Gyros, Clocks, and Interferometers: Testing Relativistic Gravity in Space (Lecture Notes in Physics 562), Springer, pp.5282, ISBN3-540-41236-0 Everitt, C. W. F.; Parkinson, Bradford; Kahn, Bob (2007) (PDF), The Gravity Probe B experiment. Post Flight AnalysisFinal Report (Preface and Executive Summary) (http://einstein.stanford.edu/content/ exec_summary/GP-B_ExecSum-scrn.pdf), Project Report: NASA, Stanford University and Lockheed Martin, retrieved 2007-08-05 Falcke, Heino; Melia, Fulvio; Agol, Eric (2000), "Viewing the Shadow of the Black Hole at the Galactic Center", Astrophysical Journal 528 (1): L13L16, arXiv:astro-ph/9912263, Bibcode2000ApJ...528L..13F, doi:10.1086/312423, PMID10587484 Flanagan, anna .; Hughes, Scott A. (2005), "The basics of gravitational wave theory", New J.Phys. 7: 204, arXiv:gr-qc/0501041, Bibcode2005NJPh....7..204F, doi:10.1088/1367-2630/7/1/204 Font, Jos A. (2003), "Numerical Hydrodynamics in General Relativity" (http://www.livingreviews.org/ lrr-2003-4), Living Rev. Relativity 6, retrieved 2007-08-19 Fours-Bruhat, Yvonne (1952), "Thorme d'existence pour certains systmes d'quations aux derives partielles non linaires", Acta Mathematica 88 (1): 141225, doi:10.1007/BF02392131 Frauendiener, Jrg (2004), "Conformal Infinity" (http://www.livingreviews.org/lrr-2004-1), Living Rev. Relativity 7, retrieved 2007-07-21 Friedrich, Helmut (2005), "Is general relativity `essentially understood'?", Annalen Phys. 15 (1-2): 84108, arXiv:gr-qc/0508016, Bibcode2006AnP...518...84F, doi:10.1002/andp.200510173 Futamase, T.; Itoh, Y. (2006), "The Post-Newtonian Approximation for Relativistic Compact Binaries" (http:// www.livingreviews.org/lrr-2007-2), Living Rev. Relativity 10, retrieved 2008-02-29 Gamow, George (1970), My World Line, Viking Press, ISBN0-670-50376-2 Garfinkle, David (2007), "Of singularities and breadmaking" (http://www.einstein-online.info/en/spotlights/ singularities_bkl/index.html), Einstein Online (http://www.einstein-online.info), retrieved 2007-08-03 Geroch, Robert (1996). "Partial Differential Equations of Physics". arXiv:gr-qc/9602055[gr-qc]. Giulini, Domenico (2005), Special Relativity: A First Encounter, Oxford University Press, ISBN0-19-856746-4 Giulini, Domenico (2006a), "Algebraic and Geometric Structures in Special Relativity", in Ehlers, Jrgen; Lmmerzahl, Claus, Special RelativityWill it Survive the Next 101 Years?, Springer, pp.45111, arXiv:math-ph/0602018, Bibcode2006math.ph...2018G, ISBN3-540-34522-1 Giulini, Domenico (2006b), "Some remarks on the notions of general covariance and background independence", in Stamatescu, I. O., An assessment of current paradigms in the physics of fundamental interactions, Springer, arXiv:gr-qc/0603087, Bibcode2007LNP...721..105G Gnedin, Nickolay Y. (2005), "Digitizing the Universe", Nature 435 (7042): 572573, Bibcode2005Natur.435..572G, doi:10.1038/435572a, PMID15931201 Goenner, Hubert F. M. (2004), "On the History of Unified Field Theories" (http://www.livingreviews.org/ lrr-2004-2), Living Rev. Relativity 7, retrieved 2008-02-28 Goroff, Marc H.; Sagnotti, Augusto (1985), "Quantum gravity at two loops", Phys. Lett. 160B (1-3): 8186, Bibcode1985PhLB..160...81G, doi:10.1016/0370-2693(85)91470-4 Gourgoulhon, Eric (2007). "3+1 Formalism and Bases of Numerical Relativity". arXiv:gr-qc/0703035[gr-qc]. Gowdy, Robert H. (1971), "Gravitational Waves in Closed Universes", Phys. Rev. Lett. 27 (12): 826829, Bibcode1971PhRvL..27..826G, doi:10.1103/PhysRevLett.27.826 Gowdy, Robert H. (1974), "Vacuum spacetimes with two-parameter spacelike isometry groups and compact invariant hypersurfaces: Topologies and boundary conditions", Ann. Phys. (N.Y.) 83 (1): 203241, Bibcode1974AnPhy..83..203G, doi:10.1016/0003-4916(74)90384-4 Green, M. B.; Schwarz, J. H.; Witten, E. (1987), Superstring theory. Volume 1: Introduction, Cambridge University Press, ISBN0-521-35752-7

110

General relativity Greenstein, J. L.; Oke, J. B.; Shipman, H. L. (1971), "Effective Temperature, Radius, and Gravitational Redshift of Sirius B", Astrophysical Journal 169: 563, Bibcode1971ApJ...169..563G, doi:10.1086/151174 Hafele, Joseph C.; Keating, Richard E. (July 14, 1972). "Around-the-World Atomic Clocks: Predicted Relativistic Time Gains". Science 177 (4044): 166168. Bibcode1972Sci...177..166H. doi:10.1126/science.177.4044.166. PMID17779917. Hafele, Joseph C.; Keating, Richard E. (July 14, 1972). "Around-the-World Atomic Clocks: Observed Relativistic Time Gains". Science 177 (4044): 168170. Bibcode1972Sci...177..168H. doi:10.1126/science.177.4044.168. PMID17779918. Havas, P. (1964), "Four-Dimensional Formulation of Newtonian Mechanics and Their Relation to the Special and the General Theory of Relativity", Rev. Mod. Phys. 36 (4): 938965, Bibcode1964RvMP...36..938H, doi:10.1103/RevModPhys.36.938 Hawking, Stephen W. (1966), "The occurrence of singularities in cosmology" (http://links.jstor.org/ sici?sici=0080-4630(19661018)294:1439<511:TOOSIC>2.0.CO;2-Y), Proceedings of the Royal Society of London A294 (1439): 511521 Hawking, S. W. (1975), "Particle Creation by Black Holes", Communications in Mathematical Physics 43 (3): 199220, Bibcode1975CMaPh..43..199H, doi:10.1007/BF02345020 Hawking, Stephen W. (1987), "Quantum cosmology", in Hawking, Stephen W.; Israel, Werner, 300 Years of Gravitation, Cambridge University Press, pp.631651, ISBN0-521-37976-8 Hawking, Stephen W.; Ellis, George F. R. (1973), The large scale structure of space-time, Cambridge University Press, ISBN0-521-09906-4 Heckmann, O. H. L.; Schcking, E. (1959), "Newtonsche und Einsteinsche Kosmologie", in Flgge, S., Encyclopedia of Physics, 53, pp.489 Heusler, Markus (1998), "Stationary Black Holes: Uniqueness and Beyond" (http://www.livingreviews.org/ lrr-1998-6), Living Rev. Relativity 1, retrieved 2007-08-04 Heusler, Markus (1996), Black Hole Uniqueness Theorems, Cambridge University Press, ISBN0-521-56735-1 Hey, Tony; Walters, Patrick (2003), The new quantum universe, Cambridge University Press, ISBN0-521-56457-3 Hough, Jim; Rowan, Sheila (2000), "Gravitational Wave Detection by Interferometry (Ground and Space)" (http:/ /www.livingreviews.org/lrr-2000-3), Living Rev. Relativity 3, retrieved 2007-07-21 Hubble, Edwin (1929), "A Relation between Distance and Radial Velocity among Extra-Galactic Nebulae" (http:/ /www.pnas.org/cgi/reprint/15/3/168.pdf) (PDF), Proc. Nat. Acad. Sci. 15 (3): 168173, Bibcode1929PNAS...15..168H, doi:10.1073/pnas.15.3.168, PMC522427, PMID16577160 Hulse, Russell A.; Taylor, Joseph H. (1975), "Discovery of a pulsar in a binary system", Astrophys. J. 195: L51L55, Bibcode1975ApJ...195L..51H, doi:10.1086/181708 Ibanez, L. E. (2000), "The second string (phenomenology) revolution", Class. Quant. Grav. 17 (5): 11171128, arXiv:hep-ph/9911499, Bibcode2000CQGra..17.1117I, doi:10.1088/0264-9381/17/5/321 Iorio, L. (2009), "An Assessment of the Systematic Uncertainty in Present and Future Tests of the Lense-Thirring Effect with Satellite Laser Ranging", Space Sci. Rev. 148 (1-4): 363, arXiv:0809.1373, Bibcode2009SSRv..148..363I, doi:10.1007/s11214-008-9478-1 Isham, Christopher J. (1994), "Prima facie questions in quantum gravity", in Ehlers, Jrgen; Friedrich, Helmut, Canonical Gravity: From Classical to Quantum, Springer, ISBN3-540-58339-4 Israel, Werner (1971), "Event Horizons and Gravitational Collapse", General Relativity and Gravitation 2 (1): 5359, Bibcode1971GReGr...2...53I, doi:10.1007/BF02450518 Israel, Werner (1987), "Dark stars: the evolution of an idea", in Hawking, Stephen W.; Israel, Werner, 300 Years of Gravitation, Cambridge University Press, pp.199276, ISBN0-521-37976-8 Janssen, Michel (2005), "Of pots and holes: Einsteins bumpy road to general relativity" (https://netfiles.umn. edu/xythoswfs/webui/_xy-15267453_1-t_ycAqaW0A) (PDF), Ann. Phys. (Leipzig) 14 (S1): 5885,

111

General relativity Bibcode2005AnP...517S..58J, doi:10.1002/andp.200410130 Jaranowski, Piotr; Krlak, Andrzej (2005), "Gravitational-Wave Data Analysis. Formalism and Sample Applications: The Gaussian Case" (http://www.livingreviews.org/lrr-2005-3), Living Rev. Relativity 8, retrieved 2007-07-30 Kahn, Bob (19962012), Gravity Probe B Website (http://einstein.stanford.edu/), Stanford University, retrieved 2012-04-20 Kahn, Bob (April 14, 2007) (PDF), Was Einstein right? Scientists provide first public peek at Gravity Probe B results (Stanford University Press Release) (http://einstein.stanford.edu/content/press_releases/SU/ pr-aps-041807.pdf), Stanford University News Service Kamionkowski, Marc; Kosowsky, Arthur; Stebbins, Albert (1997), "Statistics of Cosmic Microwave Background Polarization", Phys. Rev. D55 (12): 73687388, arXiv:astro-ph/9611125, Bibcode1997PhRvD..55.7368K, doi:10.1103/PhysRevD.55.7368 Kennefick, Daniel (2005), "Astronomers Test General Relativity: Light-bending and the Solar Redshift", in Renn, Jrgen, One hundred authors for Einstein, Wiley-VCH, pp.178181, ISBN3-527-40574-7 Kennefick, Daniel (2007), "Not Only Because of Theory: Dyson, Eddington and the Competing Myths of the 1919 Eclipse Expedition", Proceedings of the 7th Conference on the History of General Relativity, Tenerife, 2005, arXiv:0709.0685, Bibcode2007arXiv0709.0685K

112

Kenyon, I. R. (1990), General Relativity, Oxford University Press, ISBN0-19-851996-6 Kochanek, C.S.; Falco, E.E.; Impey, C.; Lehar, J. (2007), CASTLES Survey Website (http://cfa-www.harvard. edu/castles), Harvard-Smithsonian Center for Astrophysics, retrieved 2007-08-21 Komar, Arthur (1959), "Covariant Conservation Laws in General Relativity", Phys. Rev. 113 (3): 934936, Bibcode1959PhRv..113..934K, doi:10.1103/PhysRev.113.934 Kramer, Michael (2004), "Millisecond Pulsars as Tools of Fundamental Physics", in Karshenboim, S. G., Astrophysics, Clocks and Fundamental Constants (Lecture Notes in Physics Vol. 648), Springer, pp.3354, arXiv:astro-ph/0405178, Bibcode2004LNP...648...33K Kramer, M.; Stairs, I. H.; Manchester, R. N.; McLaughlin, M. A.; Lyne, A. G.; Ferdman, R. D.; Burgay, M.; Lorimer, D. R. et al. (2006), "Tests of general relativity from timing the double pulsar", Science 314 (5796): 97102, arXiv:astro-ph/0609417, Bibcode2006Sci...314...97K, doi:10.1126/science.1132305, PMID16973838 Kraus, Ute (1998), "Light Deflection Near Neutron Stars", Relativistic Astrophysics, Vieweg, pp.6681, ISBN3-528-06909-0 Kucha, Karel (1973), "Canonical Quantization of Gravity", in Israel, Werner, Relativity, Astrophysics and Cosmology, D. Reidel, pp.237288, ISBN90-277-0369-8 Knzle, H. P. (1972), "Galilei and Lorentz Structures on spacetime: comparison of the corresponding geometry and physics" (http://www.numdam.org/item?id=AIHPA_1972__17_4_337_0), Ann. Inst. Henri Poincar a 17: 337362 Lahav, Ofer; Suto, Yasushi (2004), "Measuring our Universe from Galaxy Redshift Surveys" (http://www. livingreviews.org/lrr-2004-8), Living Rev. Relativity 7, retrieved 2007-08-19 Landgraf, M.; Hechler, M.; Kemble, S. (2005), "Mission design for LISA Pathfinder", Class. Quant. Grav. 22 (10): S487S492, arXiv:gr-qc/0411071, Bibcode2005CQGra..22S.487L, doi:10.1088/0264-9381/22/10/048 Lehner, Luis (2001), "Numerical Relativity: A review", Class. Quant. Grav. 18 (17): R25R86, arXiv:gr-qc/0106072, Bibcode2001CQGra..18R..25L, doi:10.1088/0264-9381/18/17/202 Lehner, Luis (2002), Numerical Relativity: Status and Prospects, arXiv:gr-qc/0202055, Bibcode2002grg..conf..210L, doi:10.1142/9789812776556_0010 Linde, Andrei (1990), Particle Physics and Inflationary Cosmology, Harwood, arXiv:hep-th/0503203, Bibcode2005hep.th....3203L, ISBN3-7186-0489-2 Linde, Andrei (2005), "Towards inflation in string theory", J. Phys. Conf. Ser. 24: 151160, arXiv:hep-th/0503195, Bibcode2005JPhCS..24..151L, doi:10.1088/1742-6596/24/1/018

General relativity Loll, Renate (1998), "Discrete Approaches to Quantum Gravity in Four Dimensions" (http://www. livingreviews.org/lrr-1998-13), Living Rev. Relativity 1, retrieved 2008-03-09 Lovelock, David (1972), "The Four-Dimensionality of Space and the Einstein Tensor", J. Math. Phys. 13 (6): 874876, Bibcode1972JMP....13..874L, doi:10.1063/1.1666069 MacCallum, M. (2006), "Finding and using exact solutions of the Einstein equations", in Mornas, L.; Alonso, J. D., A Century of Relativity Physics (ERE05, the XXVIII Spanish Relativity Meeting), American Institute of Physics, arXiv:gr-qc/0601102, Bibcode2006AIPC..841..129M, doi:10.1063/1.2218172 Maddox, John (1998), What Remains To Be Discovered, Macmillan, ISBN0-684-82292-X Mannheim, Philip D. (2006), "Alternatives to Dark Matter and Dark Energy", Prog. Part. Nucl. Phys. 56 (2): 340445, arXiv:astro-ph/0505266, Bibcode2006PrPNP..56..340M, doi:10.1016/j.ppnp.2005.08.001 Mather, J. C.; Cheng, E. S.; Cottingham, D. A.; Eplee, R. E.; Fixsen, D. J.; Hewagama, T.; Isaacman, R. B.; Jensen, K. A. et al. (1994), "Measurement of the cosmic microwave spectrum by the COBE FIRAS instrument", Astrophysical Journal 420: 439444, Bibcode1994ApJ...420..439M, doi:10.1086/173574 Mermin, N. David (2005), It's About Time. Understanding Einstein's Relativity, Princeton University Press, ISBN0-691-12201-6 Messiah, Albert (1999), Quantum Mechanics, Dover Publications, ISBN0-486-40924-4 Miller, Cole (2002), Stellar Structure and Evolution (Lecture notes for Astronomy 606) (http://www.astro.umd. edu/~miller/teaching/astr606/), University of Maryland, retrieved 2007-07-25 Misner, Charles W.; Thorne, Kip. S.; Wheeler, John A. (1973), Gravitation, W. H. Freeman, ISBN0-7167-0344-0 Mller, Christian (1952), The Theory of Relativity (http://archive.org/details/theoryofrelativi029229mbp) (3rd ed.), Oxford University Press Narayan, Ramesh (2006), "Black holes in astrophysics", New Journal of Physics 7: 199, arXiv:gr-qc/0506078, Bibcode2005NJPh....7..199N, doi:10.1088/1367-2630/7/1/199 Narayan, Ramesh; Bartelmann, Matthias (1997). "Lectures on Gravitational Lensing". arXiv:astro-ph/9606001[astro-ph]. Narlikar, Jayant V. (1993), Introduction to Cosmology, Cambridge University Press, ISBN0-521-41250-1 Nieto, Michael Martin (2006), "The quest to understand the Pioneer anomaly" (http://www.europhysicsnews. com/full/42/article4.pdf) (PDF), EurophysicsNews 37 (6): 3034 Nordstrm, Gunnar (1918), "On the Energy of the Gravitational Field in Einstein's Theory" (http://www. digitallibrary.nl/proceedings/search/detail.cfm?pubid=2146&view=image&startrow=1), Verhandl. Koninkl. Ned. Akad. Wetenschap., 26: 12381245 Nordtvedt, Kenneth (2003). "Lunar Laser Ranginga comprehensive probe of post-Newtonian gravity". arXiv:gr-qc/0301024[gr-qc]. Norton, John D. (1985), "What was Einstein's principle of equivalence?" (http://www.pitt.edu/~jdnorton/ papers/ProfE_re-set.pdf) (PDF), Studies in History and Philosophy of Science 16 (3): 203246, doi:10.1016/0039-3681(85)90002-0, retrieved 2007-06-11 Ohanian, Hans C.; Ruffini, Remo; Ruffini (1994), Gravitation and Spacetime, W. W. Norton & Company, ISBN0-393-96501-5 Olive, K. A.; Skillman, E. A. (2004), "A Realistic Determination of the Error on the Primordial Helium Abundance", Astrophysical Journal 617 (1): 2949, arXiv:astro-ph/0405588, Bibcode2004ApJ...617...29O, doi:10.1086/425170 O'Meara, John M.; Tytler, David; Kirkman, David; Suzuki, Nao; Prochaska, Jason X.; Lubin, Dan; Wolfe, Arthur M. (2001), "The Deuterium to Hydrogen Abundance Ratio Towards a Fourth QSO: HS0105+1619", Astrophysical Journal 552 (2): 718730, arXiv:astro-ph/0011179, Bibcode2001ApJ...552..718O, doi:10.1086/320579

113

General relativity Oppenheimer, J. Robert; Snyder, H. (1939), "On continued gravitational contraction", Physical Review 56 (5): 455459, Bibcode1939PhRv...56..455O, doi:10.1103/PhysRev.56.455 Overbye, Dennis (1999), Lonely Hearts of the Cosmos: the story of the scientific quest for the secret of the Universe, Back Bay, ISBN0-316-64896-5 Pais, Abraham (1982), 'Subtle is the Lord...' The Science and life of Albert Einstein, Oxford University Press, ISBN0-19-853907-X Peacock, John A. (1999), Cosmological Physics, Cambridge University Press, ISBN0-521-41072-X Peebles, P. J. E. (1966), "Primordial Helium abundance and primordial fireball II", Astrophysical Journal 146: 542552, Bibcode1966ApJ...146..542P, doi:10.1086/148918 Peebles, P. J. E. (1993), Principles of physical cosmology, Princeton University Press, ISBN0-691-01933-9 Peebles, P.J.E.; Schramm, D.N.; Turner, E.L.; Kron, R.G. (1991), "The case for the relativistic hot Big Bang cosmology", Nature 352 (6338): 769776, Bibcode1991Natur.352..769P, doi:10.1038/352769a0 Penrose, Roger (1965), "Gravitational collapse and spacetime singularities", Physical Review Letters 14 (3): 5759, Bibcode1965PhRvL..14...57P, doi:10.1103/PhysRevLett.14.57 Penrose, Roger (1969), "Gravitational collapse: the role of general relativity", Rivista del Nuovo Cimento 1: 252276, Bibcode1969NCimR...1..252P Penrose, Roger (2004), The Road to Reality, A. A. Knopf, ISBN0-679-45443-8 Penzias, A. A.; Wilson, R. W. (1965), "A measurement of excess antenna temperature at 4080 Mc/s", Astrophysical Journal 142: 419421, Bibcode1965ApJ...142..419P, doi:10.1086/148307 Peskin, Michael E.; Schroeder, Daniel V. (1995), An Introduction to Quantum Field Theory, Addison-Wesley, ISBN0-201-50397-2 Peskin, Michael E. (2007), Dark Matter and Particle Physics, arXiv:0707.1536, Bibcode2007JPSJ...76k1017P, doi:10.1143/JPSJ.76.111017 Poisson, Eric (2004), "The Motion of Point Particles in Curved Spacetime" (http://www.livingreviews.org/ lrr-2004-6), Living Rev. Relativity 7, retrieved 2007-06-13 Poisson, Eric (2004), A Relativist's Toolkit. The Mathematics of Black-Hole Mechanics, Cambridge University Press, ISBN0-521-83091-5 Polchinski, Joseph (1998a), String Theory Vol. I: An Introduction to the Bosonic String (http://en.wikipedia. org/wiki/Joseph_Polchinski), Cambridge University Press, ISBN0-521-63303-6 Polchinski, Joseph (1998b), String Theory Vol. II: Superstring Theory and Beyond, Cambridge University Press, ISBN0-521-63304-4 Pound, R. V.; Rebka, G. A. (1959), "Gravitational Red-Shift in Nuclear Resonance", Physical Review Letters 3 (9): 439441, Bibcode1959PhRvL...3..439P, doi:10.1103/PhysRevLett.3.439 Pound, R. V.; Rebka, G. A. (1960), "Apparent weight of photons", Phys. Rev. Lett. 4 (7): 337341, Bibcode1960PhRvL...4..337P, doi:10.1103/PhysRevLett.4.337 Pound, R. V.; Snider, J. L. (1964), "Effect of Gravity on Nuclear Resonance", Phys. Rev. Lett. 13 (18): 539540, Bibcode1964PhRvL..13..539P, doi:10.1103/PhysRevLett.13.539 Ramond, Pierre (1990), Field Theory: A Modern Primer, Addison-Wesley, ISBN0-201-54611-6 Rees, Martin (1966), "Appearance of Relativistically Expanding Radio Sources", Nature 211 (5048): 468470, Bibcode1966Natur.211..468R, doi:10.1038/211468a0 Reissner, H. (1916), "ber die Eigengravitation des elektrischen Feldes nach der Einsteinschen Theorie", Annalen der Physik 355 (9): 106120, Bibcode1916AnP...355..106R, doi:10.1002/andp.19163550905 Remillard, Ronald A.; Lin, Dacheng; Cooper, Randall L.; Narayan, Ramesh (2006), "The Rates of Type I X-Ray Bursts from Transients Observed with RXTE: Evidence for Black Hole Event Horizons", Astrophysical Journal 646 (1): 407419, arXiv:astro-ph/0509758, Bibcode2006ApJ...646..407R, doi:10.1086/504862 Renn, Jrgen, ed. (2007), The Genesis of General Relativity (4 Volumes), Dordrecht: Springer, ISBN1-4020-3999-9

114

General relativity Renn, Jrgen, ed. (2005), Albert EinsteinChief Engineer of the Universe: Einstein's Life and Work in Context, Berlin: Wiley-VCH, ISBN3-527-40571-2 Reula, Oscar A. (1998), "Hyperbolic Methods for Einstein's Equations" (http://www.livingreviews.org/ lrr-1998-3), Living Rev. Relativity 1, retrieved 2007-08-29 Rindler, Wolfgang (2001), Relativity. Special, General and Cosmological, Oxford University Press, ISBN0-19-850836-0 Rindler, Wolfgang (1991), Introduction to Special Relativity, Clarendon Press, Oxford, ISBN0-19-853952-5 Robson, Ian (1996), Active galactic nuclei, John Wiley, ISBN0-471-95853-0 Roulet, E.; Mollerach, S. (1997), "Microlensing", Physics Reports 279 (2): 67118, arXiv:astro-ph/9603119, Bibcode1997PhR...279...67R, doi:10.1016/S0370-1573(96)00020-8 Rovelli, Carlo (2000). "Notes for a brief history of quantum gravity". arXiv:gr-qc/0006061[gr-qc]. Rovelli, Carlo (1998), "Loop Quantum Gravity" (http://www.livingreviews.org/lrr-1998-1), Living Rev. Relativity 1, retrieved 2008-03-13 Schfer, Gerhard (2004), "Gravitomagnetic Effects", General Relativity and Gravitation 36 (10): 22232235, arXiv:gr-qc/0407116, Bibcode2004GReGr..36.2223S, doi:10.1023/B:GERG.0000046180.97877.32 Schdel, R.; Ott, T.; Genzel, R.; Eckart, A.; Mouawad, N.; Alexander, T. (2003), "Stellar Dynamics in the Central Arcsecond of Our Galaxy", Astrophysical Journal 596 (2): 10151034, arXiv:astro-ph/0306214, Bibcode2003ApJ...596.1015S, doi:10.1086/378122 Schutz, Bernard F. (1985), A first course in general relativity, Cambridge University Press, ISBN0-521-27703-5 Schutz, Bernard F. (2001), "Gravitational radiation", in Murdin, Paul, Encyclopedia of Astronomy and Astrophysics, Grove's Dictionaries, ISBN1-56159-268-4 Schutz, Bernard F. (2003), Gravity from the ground up, Cambridge University Press, ISBN0-521-45506-5 Schwarz, John H. (2007), String Theory: Progress and Problems, arXiv:hep-th/0702219, Bibcode2007PThPS.170..214S, doi:10.1143/PTPS.170.214 Schwarzschild, Karl (1916a), "ber das Gravitationsfeld eines Massenpunktes nach der Einsteinschen Theorie", Sitzungsber. Preuss. Akad. D. Wiss.: 189196 Schwarzschild, Karl (1916b), "ber das Gravitationsfeld eines Kugel aus inkompressibler Flssigkeit nach der Einsteinschen Theorie", Sitzungsber. Preuss. Akad. D. Wiss.: 424434 Seidel, Edward (1998), "Numerical Relativity: Towards Simulations of 3D Black Hole Coalescence", in Narlikar, J. V.; Dadhich, N., Gravitation and Relativity: At the turn of the millennium (Proceedings of the GR-15 Conference, held at IUCAA, Pune, India, December 1621, 1997), IUCAA, arXiv:gr-qc/9806088, Bibcode1998gr.qc.....6088S, ISBN81-900378-3-8 Seljak, Uros; Zaldarriaga, Matias (1997), "Signature of Gravity Waves in the Polarization of the Microwave Background", Phys. Rev. Lett. 78 (11): 20542057, arXiv:astro-ph/9609169, Bibcode1997PhRvL..78.2054S, doi:10.1103/PhysRevLett.78.2054 Shapiro, S. S.; Davis, J. L.; Lebach, D. E.; Gregory, J. S. (2004), "Measurement of the solar gravitational deflection of radio waves using geodetic very-long-baseline interferometry data, 19791999", Phys. Rev. Lett. 92 (12): 121101, Bibcode2004PhRvL..92l1101S, doi:10.1103/PhysRevLett.92.121101, PMID15089661 Shapiro, Irwin I. (1964), "Fourth test of general relativity", Phys. Rev. Lett. 13 (26): 789791, Bibcode1964PhRvL..13..789S, doi:10.1103/PhysRevLett.13.789 Shapiro, I. I.; Pettengill, Gordon; Ash, Michael; Stone, Melvin; Smith, William; Ingalls, Richard; Brockelman, Richard (1968), "Fourth test of general relativity: preliminary results", Phys. Rev. Lett. 20 (22): 12651269, Bibcode1968PhRvL..20.1265S, doi:10.1103/PhysRevLett.20.1265 Singh, Simon (2004), Big Bang: The Origin of the Universe, Fourth Estate, ISBN0-00-715251-5 Sorkin, Rafael D. (2005), "Causal Sets: Discrete Gravity", in Gomberoff, Andres; Marolf, Donald, Lectures on Quantum Gravity, Springer, arXiv:gr-qc/0309009, Bibcode2003gr.qc.....9009S, ISBN0-387-23995-2

115

General relativity Sorkin, Rafael D. (1997), "Forks in the Road, on the Way to Quantum Gravity", Int. J. Theor. Phys. 36 (12): 27592781, arXiv:gr-qc/9706002, Bibcode1997IJTP...36.2759S, doi:10.1007/BF02435709 Spergel, D. N.; Verde, L.; Peiris, H. V.; Komatsu, E.; Nolta, M. R.; Bennett, C. L.; Halpern, M.; Hinshaw, G. et al. (2003), "First Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Determination of Cosmological Parameters", Astrophys. J. Suppl. 148 (1): 175194, arXiv:astro-ph/0302209, Bibcode2003ApJS..148..175S, doi:10.1086/377226 Spergel, D. N.; Bean, R.; Dor, O.; Nolta, M. R.; Bennett, C. L.; Dunkley, J.; Hinshaw, G.; Jarosik, N. et al. (2007), "Wilkinson Microwave Anisotropy Probe (WMAP) Three Year Results: Implications for Cosmology", Astrophysical Journal Supplement 170 (2): 377408, arXiv:astro-ph/0603449, Bibcode2007ApJS..170..377S, doi:10.1086/513700 Springel, Volker; White, Simon D. M.; Jenkins, Adrian; Frenk, Carlos S.; Yoshida, Naoki; Gao, Liang; Navarro, Julio; Thacker, Robert et al. (2005), "Simulations of the formation, evolution and clustering of galaxies and quasars", Nature 435 (7042): 629636, arXiv:astro-ph/0504097, Bibcode2005Natur.435..629S, doi:10.1038/nature03597, PMID15931216 Stairs, Ingrid H. (2003), "Testing General Relativity with Pulsar Timing" (http://www.livingreviews.org/ lrr-2003-5), Living Rev. Relativity 6, retrieved 2007-07-21 Stephani, H.; Kramer, D.; MacCallum, M.; Hoenselaers, C.; Herlt, E. (2003), Exact Solutions of Einstein's Field Equations (2 ed.), Cambridge University Press, ISBN0-521-46136-7 Synge, J. L. (1972), Relativity: The Special Theory, North-Holland Publishing Company, ISBN0-7204-0064-3 Szabados, Lszl B. (2004), "Quasi-Local Energy-Momentum and Angular Momentum in GR" (http://www. livingreviews.org/lrr-2004-4), Living Rev. Relativity 7, retrieved 2007-08-23 Taylor, Joseph H. (1994), "Binary pulsars and relativistic gravity", Rev. Mod. Phys. 66 (3): 711719, Bibcode1994RvMP...66..711T, doi:10.1103/RevModPhys.66.711 Thiemann, Thomas (2006). "Loop Quantum Gravity: An Inside View". arXiv:hep-th/0608210. Bibcode2007LNP...721..185T. Thiemann, Thomas (2003), "Lectures on Loop Quantum Gravity", Lect. Notes Phys. 631: 41135 Thorne, Kip S. (1972), "Nonspherical Gravitational CollapseA Short Review", in Klauder, J., Magic without Magic, W. H. Freeman, pp.231258 Thorne, Kip S. (1994), Black Holes and Time Warps: Einstein's Outrageous Legacy, W W Norton & Company, ISBN0-393-31276-3 Thorne, Kip S. (1995), Gravitational radiation, arXiv:gr-qc/9506086, Bibcode1995pnac.conf..160T, ISBN0-521-36853-7 Townsend, Paul K. (1997). "Black Holes (Lecture notes)". arXiv:gr-qc/9707012[gr-qc]. Townsend, Paul K. (1996). "Four Lectures on M-Theory". arXiv:hep-th/9612121. Bibcode1997hepcbconf..385T. Traschen, Jenny (2000), "An Introduction to Black Hole Evaporation", in Bytsenko, A.; Williams, F., Mathematical Methods of Physics (Proceedings of the 1999 Londrina Winter School), World Scientific, arXiv:gr-qc/0010055, Bibcode2000mmp..conf..180T Trautman, Andrzej (2006), "Einstein-Cartan theory", in Francoise, J.-P.; Naber, G. L.; Tsou, S. T., Encyclopedia of Mathematical Physics, Vol. 2, Elsevier, pp.189195, arXiv:gr-qc/0606062, Bibcode2006gr.qc.....6062T Unruh, W. G. (1976), "Notes on Black Hole Evaporation", Phys. Rev. D 14 (4): 870892, Bibcode1976PhRvD..14..870U, doi:10.1103/PhysRevD.14.870 Valtonen, M. J.; Lehto, H. J.; Nilsson, K.; Heidt, J.; Takalo, L. O.; Sillanp, A.; Villforth, C.; Kidger, M. et al. (2008), "A massive binary black-hole system in OJ 287 and a test of general relativity", Nature 452 (7189): 851853, arXiv:0809.1280, Bibcode2008Natur.452..851V, doi:10.1038/nature06896, PMID18421348 Wald, Robert M. (1975), "On Particle Creation by Black Holes", Commun. Math. Phys. 45 (3): 934, Bibcode1975CMaPh..45....9W, doi:10.1007/BF01609863 Wald, Robert M. (1984), General Relativity, University of Chicago Press, ISBN0-226-87033-2

116

General relativity Wald, Robert M. (1994), Quantum field theory in curved spacetime and black hole thermodynamics, University of Chicago Press, ISBN0-226-87027-8 Wald, Robert M. (2001), "The Thermodynamics of Black Holes" (http://www.livingreviews.org/lrr-2001-6), Living Rev. Relativity 4, retrieved 2007-08-08 Walsh, D.; Carswell, R. F.; Weymann, R. J. (1979), "0957 + 561 A, B: twin quasistellar objects or gravitational lens?", Nature 279 (5712): 381, Bibcode1979Natur.279..381W, doi:10.1038/279381a0, PMID16068158 Wambsganss, Joachim (1998), "Gravitational Lensing in Astronomy" (http://www.livingreviews.org/ lrr-1998-12), Living Rev. Relativity 1, retrieved 2007-07-20 Weinberg, Steven (1972), Gravitation and Cosmology, John Wiley, ISBN0-471-92567-5 Weinberg, Steven (1995), The Quantum Theory of Fields I: Foundations, Cambridge University Press, ISBN0-521-55001-7 Weinberg, Steven (1996), The Quantum Theory of Fields II: Modern Applications, Cambridge University Press, ISBN0-521-55002-5 Weinberg, Steven (2000), The Quantum Theory of Fields III: Supersymmetry, Cambridge University Press, ISBN0-521-66000-9 Weisberg, Joel M.; Taylor, Joseph H. (2003), "The Relativistic Binary Pulsar B1913+16"", in Bailes, M.; Nice, D. J.; Thorsett, S. E., Proceedings of "Radio Pulsars," Chania, Crete, August, 2002, ASP Conference Series Weiss, Achim (2006), "Elements of the past: Big Bang Nucleosynthesis and observation" (http://www. einstein-online.info/en/spotlights/BBN_obs/index.html), Einstein Online (http://www.einstein-online.info) (Max Planck Institute for Gravitational Physics), retrieved 2007-02-24 Wheeler, John A. (1990), A Journey Into Gravity and Spacetime, Scientific American Library, San Francisco: W. H. Freeman, ISBN0-7167-6034-7 Will, Clifford M. (1993), Theory and experiment in gravitational physics, Cambridge University Press, ISBN0-521-43973-6 Will, Clifford M. (2006), "The Confrontation between General Relativity and Experiment" (http://www. livingreviews.org/lrr-2006-3), Living Rev. Relativity, retrieved 2007-06-12 Zwiebach, Barton (2004), A First Course in String Theory, Cambridge University Press, ISBN0-521-83143-1

117

Further reading
Popular books Geroch, R (1981), General Relativity from A to B, Chicago: University of Chicago Press, ISBN0-226-28864-1 Lieber, Lillian (2008), The Einstein Theory of Relativity: A Trip to the Fourth Dimension, Philadelphia: Paul Dry Books, Inc., ISBN978-1-58988-044-3 Wald, Robert M. (1992), Space, Time, and Gravity: the Theory of the Big Bang and Black Holes, Chicago: University of Chicago Press, ISBN0-226-87029-4 Wheeler, John; Ford, Kenneth (1998), Geons, Black Holes, & Quantum Foam: a life in physics, New York: W. W. Norton, ISBN0-393- 31991-1 Beginning undergraduate textbooks Callahan, James J. (2000), The Geometry of Spacetime: an Introduction to Special and General Relativity, New York: Springer, ISBN0-387-98641-3 Taylor, Edwin F.; Wheeler, John Archibald (2000), Exploring Black Holes: Introduction to General Relativity, Addison Wesley, ISBN0-201-38423-X Advanced undergraduate textbooks B. F. Schutz (2009), A First Course in General Relativity (Second Edition), Cambridge University Press, ISBN978-0-521-88705-2

General relativity Cheng, Ta-Pei (2005), Relativity, Gravitation and Cosmology: a Basic Introduction, Oxford and New York: Oxford University Press, ISBN0-19-852957-0 Gron, O.; Hervik, S. (2007), Einstein's General theory of Relativity, Springer, ISBN978-0-387-69199-2 Hartle, James B. (2003), Gravity: an Introduction to Einstein's General Relativity, San Francisco: Addison-Wesley, ISBN0-8053-8662-9 Hughston, L. P. & Tod, K. P. (1991), Introduction to General Relativity, Cambridge: Cambridge University Press, ISBN0-521-33943-X d'Inverno, Ray (1992), Introducing Einstein's Relativity, Oxford: Oxford University Press, ISBN0-19-859686-3 Graduate-level textbooks Carroll, Sean M. (2004), Spacetime and Geometry: An Introduction to General Relativity (http:// spacetimeandgeometry.net/), San Francisco: Addison-Wesley, ISBN0-8053-8732-3 Grn, yvind; Hervik, Sigbjrn (2007), Einstein's General Theory of Relativity, New York: Springer, ISBN978-0-387-69199-2 Landau, Lev D.; Lifshitz, Evgeny F. (1980), The Classical Theory of Fields (4th ed.), London: Butterworth-Heinemann, ISBN0-7506-2768-9 Misner, Charles W.; Thorne, Kip. S.; Wheeler, John A. (1973), Gravitation, W. H. Freeman, ISBN0-7167-0344-0 Stephani, Hans (1990), General Relativity: An Introduction to the Theory of the Gravitational Field,, Cambridge: Cambridge University Press, ISBN0-521-37941-5 Wald, Robert M. (1984), General Relativity, University of Chicago Press, ISBN0-226-87033-2

118

External links
Relativity: The special and general theory (http://publicliterature.org/books/relativity/xaa.php) ( PDF (http:// publicliterature.org/pdf/relat10.pdf)) Einstein Online (http://www.einstein-online.info) Articles on a variety of aspects of relativistic physics for a general audience; hosted by the Max Planck Institute for Gravitational Physics NCSA Spacetime Wrinkles (http://archive.ncsa.uiuc.edu/Cyberia/NumRel/NumRelHome.html) produced by the numerical relativity group at the NCSA, with an elementary introduction to general relativity Courses/Lectures/Tutorials Einstein's General Theory of Relativity (http://www.youtube.com/watch?v=hbmf0bB38h0&feature=BFa& list=EC6C8BDEEBA6BDC78D) by Leonard Susskind's Modern Physics lectures. Recorded September 22, 2008 at Stanford University Series of lectures on General Relativity (http://www.luth.obspm.fr/IHP06/) given in 2006 at the Institut Henri Poincar (introductory courses and advanced ones). General Relativity Tutorials (http://www.math.ucr.edu/home/baez/gr/) by John Baez Brown, Kevin. "Reflections on relativity" (http://www.mathpages.com/rr/rrtoc.htm). Mathpages.com. Retrieved May 29, 2005. Carroll, Sean M.. "Lecture Notes on General Relativity" (http://preposterousuniverse.com/grnotes/). Retrieved November 26, 2006. Moor, Rafi. "Understanding General Relativity" (http://www.rafimoor.com/english/GRE.htm). Retrieved July 11, 2006. Waner, Stefan. "Introduction to Differential Geometry and General Relativity" (http://people.hofstra.edu/ faculty/Stefan_Waner/RealWorld/pdfs/DiffGeom.pdf) (PDF). Retrieved 2006-01-31.

Hilbert's program

119

Hilbert's program
In mathematics, Hilbert's program, formulated by German mathematician David Hilbert, was a proposed solution to the foundational crisis of mathematics, when early attempts to clarify the foundations of mathematics were found to suffer from paradoxes and inconsistencies. As a solution, Hilbert proposed to ground all existing theories to a finite, complete set of axioms, and provide a proof that these axioms were consistent. Hilbert proposed that the consistency of more complicated systems, such as real analysis, could be proven in terms of simpler systems. Ultimately, the consistency of all of mathematics could be reduced to basic arithmetic. However, some argue that Gdel's incompleteness theorems showed in 1931 that Hilbert's program was unattainable. In his first theorem, Gdel showed that any consistent system with a computable set of axioms which is capable of expressing arithmetic can never be complete: it is possible to construct a statement that can be shown to be true, but that cannot be derived from the formal rules of the system. In his second theorem, he showed that such a system could not prove its own consistency, so it certainly cannot be used to prove the consistency of anything stronger. This refuted Hilbert's assumption that a finitistic system could be used to prove the consistency of a stronger theory.

Statement of Hilbert's program


The main goal of Hilbert's program was to provide secure foundations for all mathematics. In particular this should include: A formalization of all mathematics; in other words all mathematical statements should be written in a precise formal language, and manipulated according to well defined rules. Completeness: a proof that all true mathematical statements can be proved in the formalism. Consistency: a proof that no contradiction can be obtained in the formalism of mathematics. This consistency proof should preferably use only "finitistic" reasoning about finite mathematical objects. Conservation: a proof that any result about "real objects" obtained using reasoning about "ideal objects" (such as uncountable sets) can be proved without using ideal objects. Decidability: there should be an algorithm for deciding the truth or falsity of any mathematical statement.

Gdel's incompleteness theorems


Kurt Gdel showed that most of the goals of Hilbert's program were impossible to achieve, at least if interpreted in the most obvious way. His second incompleteness theorem stated that any consistent theory powerful enough to encode addition and multiplication of integers cannot prove its own consistency. This wipes out most of Hilbert's program as follows: It is not possible to formalize all of mathematics, as any attempt at such a formalism will omit some true mathematical statements. An easy consequence of Gdel's incompleteness theorem is that there is no complete consistent extension of even Peano arithmetic with a recursively enumerable set of axioms, so in particular most interesting mathematical theories are not complete. A theory such as Peano arithmetic cannot even prove its own consistency, so a restricted "finitistic" subset of it certainly cannot prove the consistency of more powerful theories such as set theory. There is no algorithm to decide the truth (or provability) of statements in any consistent extension of Peano arithmetic. (Strictly speaking this result only appeared a few years after Gdel's theorem, because at the time the notion of an algorithm had not been precisely defined.)

Hilbert's program

120

Hilbert's program after Gdel


Many current lines of research in mathematical logic, proof theory and reverse mathematics can be viewed as natural continuations of Hilbert's original program. Much of it can be salvaged by changing its goals slightly (Zach 2005), and with the following modifications some of it was successfully completed: Although it is not possible to formalize all mathematics, it is possible to formalize essentially all the mathematics that anyone uses. In particular ZermeloFraenkel set theory, combined with first-order logic, gives a satisfactory and generally accepted formalism for essentially all current mathematics. Although it is not possible to prove completeness for systems at least as powerful as Peano arithmetic (at least if they have a computable set of axioms), it is possible to prove forms of completeness for many interesting systems. The first big success was by Gdel himself (before he proved the incompleteness theorems) who proved the completeness theorem for first-order logic, showing that any logical consequence of a series of axioms is provable. An example of a non-trivial theory for which completeness has been proved is the theory of algebraically closed fields of given characteristic. The question of whether there are finitary consistency proofs of strong theories is difficult to answer, mainly because there is no generally accepted definition of a "finitary proof". Most mathematicians in proof theory seem to regard finitary mathematics as being contained in Peano arithmetic, and in this case it is not possible to give finitary proofs of reasonably strong theories. On the other hand Gdel himself suggested the possibility of giving finitary consistency proofs using finitary methods that cannot be formalized in Peano arithmetic, so he seems to have had a more liberal view of what finitary methods might be allowed. A few years later, Gentzen gave a consistency proof for Peano arithmetic. The only part of this proof that was not clearly finitary was a certain transfinite induction up to the ordinal 0. If this transfinite induction is accepted as a finitary method, then one can assert that there is a finitary proof of the consistency of Peano arithmetic. More powerful subsets of second order arithmetic have been given consistency proofs by Gaisi Takeuti and others, and one can again debate about exactly how finitary or constructive these proofs are. (The theories that have been proved consistent by these methods are quite strong, and include most "ordinary" mathematics.) Although there is no algorithm for deciding the truth of statements in Peano arithmetic, there are many interesting and non-trivial theories for which such algorithms have been found. For example, Tarski found an algorithm that can decide the truth of any statement in analytic geometry (more precisely, he proved that the theory of real closed fields is decidable). Given the CantorDedekind axiom, this algorithm can be regarded as an algorithm to decide the truth of any statement in Euclidean geometry. This is substantial as few people would consider Euclidean geometry a trivial theory.

References
G. Gentzen, 1936/1969. Die Widerspruchfreiheit der reinen Zahlentheorie. Mathematische Annalen 112:493565. Translated as 'The consistency of arithmetic', in The collected papers of Gerhard Gentzen, M. E. Szabo (ed.), 1969. D. Hilbert. 'Die Grundlagen Der Elementaren Zahlentheorie'. Mathematische Annalen 104:48594. Translated by W. Ewald as 'The Grounding of Elementary Number Theory', pp.266273 in Mancosu (ed., 1998) From Brouwer to Hilbert: The debate on the foundations of mathematics in the 1920s, Oxford University Press. New York. S.G. Simpson, 1988. Partial realizations of Hilbert's program [1]. Journal of Symbolic Logic 53:349363. R. Zach, 2005. Hilbert's Program Then and Now [2]. Manuscript, arXiv:math/0508572v1.

Hilbert's program

121

External links
Entry on Hilbert's program [3] at the Stanford Encyclopedia of Philosophy.

References
[1] http:/ / www. math. psu. edu/ simpson/ papers/ hilbert/ hilbert. html [2] http:/ / arxiv. org/ abs/ math/ 0508572 [3] http:/ / plato. stanford. edu/ entries/ hilbert-program/

Gdel's incompleteness theorems


Gdel's incompleteness theorems are two theorems of mathematical logic that establish inherent limitations of all but the most trivial axiomatic systems capable of doing arithmetic. The theorems, proven by Kurt Gdel in 1931, are important both in mathematical logic and in the philosophy of mathematics. The two results are widely, but not universally, interpreted as showing that Hilbert's program to find a complete and consistent set of axioms for all mathematics is impossible, giving a negative answer to Hilbert's second problem. The first incompleteness theorem states that no consistent system of axioms whose theorems can be listed by an "effective procedure" (e.g., a computer program, but it could be any sort of algorithm) is capable of proving all truths about the relations of the natural numbers (arithmetic). For any such system, there will always be statements about the natural numbers that are true, but that are unprovable within the system. The second incompleteness theorem, an extension of the first, shows that such a system cannot demonstrate its own consistency.

Background
Because statements of a formal theory are written in symbolic form, it is possible to mechanically verify that a formal proof from a finite set of axioms is valid. This task, known as automatic proof verification, is closely related to automated theorem proving. The difference is that instead of constructing a new proof, the proof verifier simply checks that a provided formal proof (or, in instructions that can be followed to create a formal proof) is correct. This process is not merely hypothetical; systems such as Isabelle or Coq are used today to formalize proofs and then check their validity. Many theories of interest include an infinite set of axioms, however. To verify a formal proof when the set of axioms is infinite, it must be possible to determine whether a statement that is claimed to be an axiom is actually an axiom. This issue arises in first order theories of arithmetic, such as Peano arithmetic, because the principle of mathematical induction is expressed as an infinite set of axioms (an axiom schema). A formal theory is said to be effectively generated if its set of axioms is a recursively enumerable set. This means that there is a computer program that, in principle, could enumerate all the axioms of the theory without listing any statements that are not axioms. This is equivalent to the existence of a program that enumerates all the theorems of the theory without enumerating any statements that are not theorems. Examples of effectively generated theories with infinite sets of axioms include Peano arithmetic and ZermeloFraenkel set theory. In choosing a set of axioms, one goal is to be able to prove as many correct results as possible, without proving any incorrect results. A set of axioms is complete if, for any statement in the axioms' language, either that statement or its negation is provable from the axioms. A set of axioms is (simply) consistent if there is no statement such that both the statement and its negation are provable from the axioms. In the standard system of first-order logic, an inconsistent set of axioms will prove every statement in its language (this is sometimes called the principle of explosion), and is thus automatically complete. A set of axioms that is both complete and consistent, however, proves a maximal set of non-contradictory theorems. Gdel's incompleteness theorems show that in certain cases it is not possible to obtain an effectively generated, complete, consistent theory.

Gdel's incompleteness theorems

122

First incompleteness theorem


Gdel's first incompleteness theorem first appeared as "Theorem VI" in Gdel's 1931 paper On Formally Undecidable Propositions in Principia Mathematica and Related Systems I. The formal theorem is written in highly technical language. The broadly accepted natural language statement of the theorem is: Any effectively generated theory capable of expressing elementary arithmetic cannot be both consistent and complete. In particular, for any consistent, effectively generated formal theory that proves certain basic arithmetic truths, there is an arithmetical statement that is true,[1] but not provable in the theory (Kleene 1967, p.250). The true but unprovable statement referred to by the theorem is often referred to as "the Gdel sentence" for the theory. The proof constructs a specific Gdel sentence for each effectively generated theory, but there are infinitely many statements in the language of the theory that share the property of being true but unprovable. For example, the conjunction of the Gdel sentence and any logically valid sentence will have this property. For each consistent formal theory T having the required small amount of number theory, the corresponding Gdel sentence G asserts: "G cannot be proved within the theoryT". This interpretation of G leads to the following informal analysis. If G were provable under the axioms and rules of inference of T, then T would have a theorem, G, which effectively contradicts itself, and thus the theory T would be inconsistent. This means that if the theory T is consistent then G cannot be proved within it, and so the theory T is incomplete. Moreover, the claim G makes about its own unprovability is correct. In this sense G is not only unprovable but true, and provability-within-the-theory-T is not the same as truth. This informal analysis can be formalized to make a rigorous proof of the incompleteness theorem, as described in the section "Proof sketch for the first theorem" below. The formal proof reveals exactly the hypotheses required for the theory T in order for the self-contradictory nature of G to lead to a genuine contradiction. Each effectively generated theory has its own Gdel statement. It is possible to define a larger theory T that contains the whole of T, plus G as an additional axiom. This will not result in a complete theory, because Gdel's theorem will also apply to T, and thus T cannot be complete. In this case, G is indeed a theorem in T, because it is an axiom. Since G states only that it is not provable in T, no contradiction is presented by its provability in T. However, because the incompleteness theorem applies to T: there will be a new Gdel statement G for T, showing that T is also incomplete. G will differ from G in that G will refer to T, rather thanT. To prove the first incompleteness theorem, Gdel represented statements by numbers. Then the theory at hand, which is assumed to prove certain facts about numbers, also proves facts about its own statements, provided that it is effectively generated. Questions about the provability of statements are represented as questions about the properties of numbers, which would be decidable by the theory if it were complete. In these terms, the Gdel sentence states that no natural number exists with a certain, strange property. A number with this property would encode a proof of the inconsistency of the theory. If there were such a number then the theory would be inconsistent, contrary to the consistency hypothesis. So, under the assumption that the theory is consistent, there is no such number.

Meaning of the first incompleteness theorem


Gdel's first incompleteness theorem shows that any consistent effective formal system that includes enough of the theory of the natural numbers is incomplete: there are true statements expressible in its language that are unprovable. Thus no formal system (satisfying the hypotheses of the theorem) that aims to characterize the natural numbers can actually do so, as there will be true number-theoretical statements which that system cannot prove. This fact is sometimes thought to have severe consequences for the program of logicism proposed by Gottlob Frege and Bertrand Russell, which aimed to define the natural numbers in terms of logic (Hellman 1981, p.451468). Bob Hale and Crispin Wright argue that it is not a problem for logicism because the incompleteness theorems apply equally to second order logic as they do to arithmetic. They argue that only those who believe that the natural

Gdel's incompleteness theorems numbers are to be defined in terms of first order logic have this problem. The existence of an incomplete formal system is, in itself, not particularly surprising. A system may be incomplete simply because not all the necessary axioms have been discovered. For example, Euclidean geometry without the parallel postulate is incomplete; it is not possible to prove or disprove the parallel postulate from the remaining axioms. Gdel's theorem shows that, in theories that include a small portion of number theory, a complete and consistent finite list of axioms can never be created, nor even an infinite list that can be enumerated by a computer program. Each time a new statement is added as an axiom, there are other true statements that still cannot be proved, even with the new axiom. If an axiom is ever added that makes the system complete, it does so at the cost of making the system inconsistent. There are complete and consistent lists of axioms for arithmetic that cannot be enumerated by a computer program. For example, one might take all true statements about the natural numbers to be axioms (and no false statements), which gives the theory known as "true arithmetic". The difficulty is that there is no mechanical way to decide, given a statement about the natural numbers, whether it is an axiom of this theory, and thus there is no effective way to verify a formal proof in this theory. Many logicians believe that Gdel's incompleteness theorems struck a fatal blow to David Hilbert's second problem, which asked for a finitary consistency proof for mathematics. The second incompleteness theorem, in particular, is often viewed as making the problem impossible. Not all mathematicians agree with this analysis, however, and the status of Hilbert's second problem is not yet decided (see "Modern viewpoints on the status of the problem").

123

Relation to the liar paradox


The liar paradox is the sentence "This sentence is false." An analysis of the liar sentence shows that it cannot be true (for then, as it asserts, it is false), nor can it be false (for then, it is true). A Gdel sentence G for a theory T makes a similar assertion to the liar sentence, but with truth replaced by provability: G says "G is not provable in the theory T." The analysis of the truth and provability of G is a formalized version of the analysis of the truth of the liar sentence. It is not possible to replace "not provable" with "false" in a Gdel sentence because the predicate "Q is the Gdel number of a false formula" cannot be represented as a formula of arithmetic. This result, known as Tarski's undefinability theorem, was discovered independently by Gdel (when he was working on the proof of the incompleteness theorem) and by Alfred Tarski.

Extensions of Gdel's original result


Gdel demonstrated the incompleteness of the theory of Principia Mathematica, a particular theory of arithmetic, but a parallel demonstration could be given for any effective theory of a certain expressiveness. Gdel commented on this fact in the introduction to his paper, but restricted the proof to one system for concreteness. In modern statements of the theorem, it is common to state the effectiveness and expressiveness conditions as hypotheses for the incompleteness theorem, so that it is not limited to any particular formal theory. The terminology used to state these conditions was not yet developed in 1931 when Gdel published his results. Gdel's original statement and proof of the incompleteness theorem requires the assumption that the theory is not just consistent but -consistent. A theory is -consistent if it is not -inconsistent, and is -inconsistent if there is a predicate P such that for every specific natural number n the theory proves ~P(n), and yet the theory also proves that there exists a natural number n such that P(n). That is, the theory says that a number with property P exists while denying that it has any specific value. The -consistency of a theory implies its consistency, but consistency does not imply -consistency. J. Barkley Rosser (1936) strengthened the incompleteness theorem by finding a variation of the proof (Rosser's trick) that only requires the theory to be consistent, rather than -consistent. This is mostly of technical interest, since all true formal theories of arithmetic (theories whose axioms are all true statements about

Gdel's incompleteness theorems natural numbers) are -consistent, and thus Gdel's theorem as originally stated applies to them. The stronger version of the incompleteness theorem that only assumes consistency, rather than -consistency, is now commonly known as Gdel's incompleteness theorem and as the GdelRosser theorem.

124

Second incompleteness theorem


Gdel's second incompleteness theorem first appeared as "Theorem XI" in Gdel's 1931 paper On Formally Undecidable Propositions in Principia Mathematica and Related Systems I. The formal theorem is written in highly technical language. The broadly accepted natural language statement of the theorem is: For any formal effectively generated theory T including basic arithmetical truths and also certain truths about formal provability, if T includes a statement of its own consistency then T is inconsistent. This strengthens the first incompleteness theorem, because the statement constructed in the first incompleteness theorem does not directly express the consistency of the theory. The proof of the second incompleteness theorem is obtained by formalizing the proof of the first incompleteness theorem within the theory itself. A technical subtlety in the second incompleteness theorem is how to express the consistency of T as a formula in the language of T. There are many ways to do this, and not all of them lead to the same result. In particular, different formalizations of the claim that T is consistent may be inequivalent in T, and some may even be provable. For example, first-order Peano arithmetic (PA) can prove that the largest consistent subset of PA is consistent. But since PA is consistent, the largest consistent subset of PA is just PA, so in this sense PA "proves that it is consistent". What PA does not prove is that the largest consistent subset of PA is, in fact, the whole of PA. (The term "largest consistent subset of PA" is technically ambiguous, but what is meant here is the largest consistent initial segment of the axioms of PA ordered according to specific criteria; i.e., by "Gdel numbers", the numbers encoding the axioms as per the scheme used by Gdel mentioned above). For Peano arithmetic, or any familiar explicitly axiomatized theory T, it is possible to canonically define a formula Con(T) expressing the consistency of T; this formula expresses the property that "there does not exist a natural number coding a sequence of formulas, such that each formula is either of the axioms of T, a logical axiom, or an immediate consequence of preceding formulas according to the rules of inference of first-order logic, and such that the last formula is a contradiction". The formalization of Con(T) depends on two factors: formalizing the notion of a sentence being derivable from a set of sentences and formalizing the notion of being an axiom of T. Formalizing derivability can be done in canonical fashion: given an arithmetical formula A(x) defining a set of axioms, one can canonically form a predicate ProvA(P) which expresses that P is provable from the set of axioms defined by A(x). In addition, the standard proof of the second incompleteness theorem assumes that ProvA(P) satisfies that HilbertBernays provability conditions. Letting #(P) represent the Gdel number of a formula P, the derivability conditions say: 1. If T proves P, then T proves ProvA(#(P)). 2. T proves 1.; that is, T proves that if T proves P, then T proves ProvA(#(P)). In other words, T proves that ProvA(#(P)) implies ProvA(#(ProvA(#(P)))). 3. T proves that if T proves that (P Q) and T proves P then T proves Q. In other words, T proves that ProvA(#(P Q)) and ProvA(#(P)) imply ProvA(#(Q)).

Gdel's incompleteness theorems

125

Implications for consistency proofs


Gdel's second incompleteness theorem also implies that a theory T1 satisfying the technical conditions outlined above cannot prove the consistency of any theory T2 which proves the consistency of T1. This is because such a theory T1 can prove that if T2 proves the consistency of T1, then T1 is in fact consistent. For the claim that T1 is consistent has form "for all numbers n, n has the decidable property of not being a code for a proof of contradiction in T1". If T1 were in fact inconsistent, then T2 would prove for some n that n is the code of a contradiction in T1. But if T2 also proved that T1 is consistent (that is, that there is no such n), then it would itself be inconsistent. This reasoning can be formalized in T1 to show that if T2 is consistent, then T1 is consistent. Since, by second incompleteness theorem, T1 does not prove its consistency, it cannot prove the consistency of T2 either. This corollary of the second incompleteness theorem shows that there is no hope of proving, for example, the consistency of Peano arithmetic using any finitistic means that can be formalized in a theory the consistency of which is provable in Peano arithmetic. For example, the theory of primitive recursive arithmetic (PRA), which is widely accepted as an accurate formalization of finitistic mathematics, is provably consistent in PA. Thus PRA cannot prove the consistency of PA. This fact is generally seen to imply that Hilbert's program, which aimed to justify the use of "ideal" (infinitistic) mathematical principles in the proofs of "real" (finitistic) mathematical statements by giving a finitistic proof that the ideal principles are consistent, cannot be carried out. The corollary also indicates the epistemological relevance of the second incompleteness theorem. It would actually provide no interesting information if a theory T proved its consistency. This is because inconsistent theories prove everything, including their consistency. Thus a consistency proof of T in T would give us no clue as to whether T really is consistent; no doubts about the consistency of T would be resolved by such a consistency proof. The interest in consistency proofs lies in the possibility of proving the consistency of a theory T in some theory T which is in some sense less doubtful than T itself, for example weaker than T. For many naturally occurring theories T and T, such as T = ZermeloFraenkel set theory and T = primitive recursive arithmetic, the consistency of T is provable in T, and thus T can't prove the consistency of T by the above corollary of the second incompleteness theorem. The second incompleteness theorem does not rule out consistency proofs altogether, only consistency proofs that could be formalized in the theory that is proved consistent. For example, Gerhard Gentzen proved the consistency of Peano arithmetic (PA) in a different theory which includes an axiom asserting that the ordinal called 0 is wellfounded; see Gentzen's consistency proof. Gentzen's theorem spurred the development of ordinal analysis in proof theory.

Examples of undecidable statements


There are two distinct senses of the word "undecidable" in mathematics and computer science. The first of these is the proof-theoretic sense used in relation to Gdel's theorems, that of a statement being neither provable nor refutable in a specified deductive system. The second sense, which will not be discussed here, is used in relation to computability theory and applies not to statements but to decision problems, which are countably infinite sets of questions each requiring a yes or no answer. Such a problem is said to be undecidable if there is no computable function that correctly answers every question in the problem set (see undecidable problem). Because of the two meanings of the word undecidable, the term independent is sometimes used instead of undecidable for the "neither provable nor refutable" sense. The usage of "independent" is also ambiguous, however. Some use it to mean just "not provable", leaving open whether an independent statement might be refuted. Undecidability of a statement in a particular deductive system does not, in and of itself, address the question of whether the truth value of the statement is well-defined, or whether it can be determined by other means. Undecidability only implies that the particular deductive system being considered does not prove the truth or falsity of the statement. Whether there exist so-called "absolutely undecidable" statements, whose truth value can never be known or is ill-specified, is a controversial point in the philosophy of mathematics.

Gdel's incompleteness theorems The combined work of Gdel and Paul Cohen has given two concrete examples of undecidable statements (in the first sense of the term): The continuum hypothesis can neither be proved nor refuted in ZFC (the standard axiomatization of set theory), and the axiom of choice can neither be proved nor refuted in ZF (which is all the ZFC axioms except the axiom of choice). These results do not require the incompleteness theorem. Gdel proved in 1940 that neither of these statements could be disproved in ZF or ZFC set theory. In the 1960s, Cohen proved that neither is provable from ZF, and the continuum hypothesis cannot be proven from ZFC. In 1973, the Whitehead problem in group theory was shown to be undecidable, in the first sense of the term, in standard set theory. Gregory Chaitin produced undecidable statements in algorithmic information theory and proved another incompleteness theorem in that setting. Chaitin's incompleteness theorem states that for any theory that can represent enough arithmetic, there is an upper bound c such that no specific number can be proven in that theory to have Kolmogorov complexity greater than c. While Gdel's theorem is related to the liar paradox, Chaitin's result is related to Berry's paradox.

126

Undecidable statements provable in larger systems


These are natural mathematical equivalents of the Godel "true but undecidable" sentence. They can be proved in a larger system which is generally accepted as a valid form of reasoning, but are undecidable in a more limited system such as Peano Arithmetic. In 1977, Paris and Harrington proved that the Paris-Harrington principle, a version of the Ramsey theorem, is undecidable in the first-order axiomatization of arithmetic called Peano arithmetic, but can be proven in the larger system of second-order arithmetic. Kirby and Paris later showed Goodstein's theorem, a statement about sequences of natural numbers somewhat simpler than the Paris-Harrington principle, to be undecidable in Peano arithmetic. Kruskal's tree theorem, which has applications in computer science, is also undecidable from Peano arithmetic but provable in set theory. In fact Kruskal's tree theorem (or its finite form) is undecidable in a much stronger system codifying the principles acceptable based on a philosophy of mathematics called predicativism. The related but more general graph minor theorem (2003) has consequences for computational complexity theory.

Limitations of Gdel's theorems


The conclusions of Gdel's theorems are only proven for the formal theories that satisfy the necessary hypotheses. Not all axiom systems satisfy these hypotheses, even when these systems have models that include the natural numbers as a subset. For example, there are first-order axiomatizations of Euclidean geometry, of real closed fields, and of arithmetic in which multiplication is not provably total; none of these meet the hypotheses of Gdel's theorems. The key fact is that these axiomatizations are not expressive enough to define the set of natural numbers or develop basic properties of the natural numbers. Regarding the third example, Dan E. Willard (Willard 2001) has studied many weak systems of arithmetic which do not satisfy the hypotheses of the second incompleteness theorem, and which are consistent and capable of proving their own consistency (see self-verifying theories). Gdel's theorems only apply to effectively generated (that is, recursively enumerable) theories. If all true statements about natural numbers are taken as axioms for a theory, then this theory is a consistent, complete extension of Peano arithmetic (called true arithmetic) for which none of Gdel's theorems apply in a meaningful way, because this theory is not recursively enumerable. The second incompleteness theorem only shows that the consistency of certain theories cannot be proved from the axioms of those theories themselves. It does not show that the consistency cannot be proved from other (consistent) axioms. For example, the consistency of the Peano arithmetic can be proved in ZermeloFraenkel set theory (ZFC), or in theories of arithmetic augmented with transfinite induction, as in Gentzen's consistency proof.

Gdel's incompleteness theorems

127

Relationship with computability


The incompleteness theorem is closely related to several results about undecidable sets in recursion theory. Stephen Cole Kleene (1943) presented a proof of Gdel's incompleteness theorem using basic results of computability theory. One such result shows that the halting problem is undecidable: there is no computer program that can correctly determine, given a program P as input, whether P eventually halts when run with a particular given input. Kleene showed that the existence of a complete effective theory of arithmetic with certain consistency properties would force the halting problem to be decidable, a contradiction. This method of proof has also been presented by Shoenfield (1967, p.132); Charlesworth (1980); and Hopcroft and Ullman (1979). Franzn (2005, p.73) explains how Matiyasevich's solution to Hilbert's 10th problem can be used to obtain a proof to Gdel's first incompleteness theorem. Matiyasevich proved that there is no algorithm that, given a multivariate polynomial p(x1, x2,...,xk) with integer coefficients, determines whether there is an integer solution to the equation p = 0. Because polynomials with integer coefficients, and integers themselves, are directly expressible in the language of arithmetic, if a multivariate integer polynomial equation p = 0 does have a solution in the integers then any sufficiently strong theory of arithmetic T will prove this. Moreover, if the theory T is -consistent, then it will never prove that a particular polynomial equation has a solution when in fact there is no solution in the integers. Thus, if T were complete and -consistent, it would be possible to determine algorithmically whether a polynomial equation has a solution by merely enumerating proofs of T until either "p has a solution" or "p has no solution" is found, in contradiction to Matiyasevich's theorem. Moreover, for each consistent effectively generated theory T, it is possible to effectively generate a multivariate polynomial p over the integers such that the equation p = 0 has no solutions over the integers, but the lack of solutions cannot be proved in T (Davis 2006:416, Jones 1980). Smorynski (1977, p.842) shows how the existence of recursively inseparable sets can be used to prove the first incompleteness theorem. This proof is often extended to show that systems such as Peano arithmetic are essentially undecidable (see Kleene 1967, p.274). Chaitin's incompleteness theorem gives a different method of producing independent sentences, based on Kolmogorov complexity. Like the proof presented by Kleene that was mentioned above, Chaitin's theorem only applies to theories with the additional property that all their axioms are true in the standard model of the natural numbers. Gdel's incompleteness theorem is distinguished by its applicability to consistent theories that nonetheless include statements that are false in the standard model; these theories are known as -inconsistent.

Proof sketch for the first theorem


The proof by contradiction has three essential parts. To begin, choose a formal system that meets the proposed criteria: 1. Statements in the system can be represented by natural numbers (known as Gdel numbers). The significance of this is that properties of statementssuch as their truth and falsehoodwill be equivalent to determining whether their Gdel numbers have certain properties, and that properties of the statements can therefore be demonstrated by examining their Gdel numbers. This part culminates in the construction of a formula expressing the idea that "statement S is provable in the system" (which can be applied to any statement "S" in the system). 2. In the formal system it is possible to construct a number whose matching statement, when interpreted, is self-referential and essentially says that it (i.e. the statement itself) is unprovable. This is done using a technique called "diagonalization" (so-called because of its origins as Cantor's diagonal argument). 3. Within the formal system this statement permits a demonstration that it is neither provable nor disprovable in the system, and therefore the system cannot in fact be -consistent. Hence the original assumption that the proposed system met the criteria is false.

Gdel's incompleteness theorems

128

Arithmetization of syntax
The main problem in fleshing out the proof described above is that it seems at first that to construct a statement p that is equivalent to "p cannot be proved", p would somehow have to contain a reference to p, which could easily give rise to an infinite regress. Gdel's ingenious technique is to show that statements can be matched with numbers (often called the arithmetization of syntax) in such a way that "proving a statement" can be replaced with "testing whether a number has a given property". This allows a self-referential formula to be constructed in a way that avoids any infinite regress of definitions. The same technique was later used by Alan Turing in his work on the Entscheidungsproblem. In simple terms, a method can be devised so that every formula or statement that can be formulated in the system gets a unique number, called its Gdel number, in such a way that it is possible to mechanically convert back and forth between formulas and Gdel numbers. The numbers involved might be very long indeed (in terms of number of digits), but this is not a barrier; all that matters is that such numbers can be constructed. A simple example is the way in which English is stored as a sequence of numbers in computers using ASCII or Unicode: The word HELLO is represented by 72-69-76-76-79 using decimal ASCII, ie the number 7269767679. The logical statement x=y => y=x is represented by 120-061-121-032-061-062-032-121-061-120 using octal ASCII, ie the number 120061121032061062032121061120. In principle, proving a statement true or false can be shown to be equivalent to proving that the number matching the statement does or doesn't have a given property. Because the formal system is strong enough to support reasoning about numbers in general, it can support reasoning about numbers which represent formulae and statements as well. Crucially, because the system can support reasoning about properties of numbers, the results are equivalent to reasoning about provability of their equivalent statements.

Construction of a statement about "provability"


Having shown that in principle the system can indirectly make statements about provability, by analyzing properties of those numbers representing statements it is now possible to show how to create a statement that actually does this. A formula F(x) that contains exactly one free variable x is called a statement form or class-sign. As soon as x is replaced by a specific number, the statement form turns into a bona fide statement, and it is then either provable in the system, or not. For certain formulas one can show that for every natural number n, F(n) is true if and only if it can be proven (the precise requirement in the original proof is weaker, but for the proof sketch this will suffice). In particular, this is true for every specific arithmetic operation between a finite number of natural numbers, such as "23=6". Statement forms themselves are not statements and therefore cannot be proved or disproved. But every statement form F(x) can be assigned a Gdel number denoted by G(F). The choice of the free variable used in the form F(x) is not relevant to the assignment of the Gdel number G(F). Now comes the trick: The notion of provability itself can also be encoded by Gdel numbers, in the following way. Since a proof is a list of statements which obey certain rules, the Gdel number of a proof can be defined. Now, for every statement p, one may ask whether a number x is the Gdel number of its proof. The relation between the Gdel number of p and x, the potential Gdel number of its proof, is an arithmetical relation between two numbers. Therefore there is a statement form Bew(y) that uses this arithmetical relation to state that a Gdel number of a proof of y exists: Bew(y) = x ( y is the Gdel number of a formula and x is the Gdel number of a proof of the formula encoded by y). The name Bew is short for beweisbar, the German word for "provable"; this name was originally used by Gdel to denote the provability formula just described. Note that "Bew(y)" is merely an abbreviation that represents a particular, very long, formula in the original language of T; the string "Bew" itself is not claimed to be part of this

Gdel's incompleteness theorems language. An important feature of the formula Bew(y) is that if a statement p is provable in the system then Bew(G(p)) is also provable. This is because any proof of p would have a corresponding Gdel number, the existence of which causes Bew(G(p)) to be satisfied.

129

Diagonalization
The next step in the proof is to obtain a statement that says it is unprovable. Although Gdel constructed this statement directly, the existence of at least one such statement follows from the diagonal lemma, which says that for any sufficiently strong formal system and any statement form F there is a statement p such that the system proves p F(G(p)). By letting F be the negation of Bew(x), we obtain the theorem p ~Bew(G(p)) and the p defined by this roughly states that its own Gdel number is the Gdel number of an unprovable formula. The statement p is not literally equal to ~Bew(G(p)); rather, p states that if a certain calculation is performed, the resulting Gdel number will be that of an unprovable statement. But when this calculation is performed, the resulting Gdel number turns out to be the Gdel number of p itself. This is similar to the following sentence in English: ", when preceded by itself in quotes, is unprovable.", when preceded by itself in quotes, is unprovable. This sentence does not directly refer to itself, but when the stated transformation is made the original sentence is obtained as a result, and thus this sentence asserts its own unprovability. The proof of the diagonal lemma employs a similar method. Now, assume that the axiomatic system is -consistent, and let p be the statement obtained in the previous section. If p were provable, then Bew(G(p)) would be provable, as argued above. But p asserts the negation of Bew(G(p)). Thus the system would be inconsistent, proving both a statement and its negation. This contradiction shows that p cannot be provable. If the negation of p were provable, then Bew(G(p)) would be provable (because p was constructed to be equivalent to the negation of Bew(G(p))). However, for each specific number x, x cannot be the Gdel number of the proof of p, because p is not provable (from the previous paragraph). Thus on one hand the system proves there is a number with a certain property (that it is the Gdel number of the proof of p), but on the other hand, for every specific number x, we can prove that it does not have this property. This is impossible in an -consistent system. Thus the negation of p is not provable. Thus the statement p is undecidable in our axiomatic system: it can neither be proved nor disproved within the system. In fact, to show that p is not provable only requires the assumption that the system is consistent. The stronger assumption of -consistency is required to show that the negation of p is not provable. Thus, if p is constructed for a particular system: If the system is -consistent, it can prove neither p nor its negation, and so p is undecidable. If the system is consistent, it may have the same situation, or it may prove the negation of p. In the later case, we have a statement ("not p") which is false but provable, and the system is not -consistent. If one tries to "add the missing axioms" to avoid the incompleteness of the system, then one has to add either p or "not p" as axioms. But then the definition of "being a Gdel number of a proof" of a statement changes. which means that the formula Bew(x) is now different. Thus when we apply the diagonal lemma to this new Bew, we obtain a new statement p, different from the previous one, which will be undecidable in the new system if it is -consistent.

Gdel's incompleteness theorems

130

Proof via Berry's paradox


George Boolos (1989) sketches an alternative proof of the first incompleteness theorem that uses Berry's paradox rather than the liar paradox to construct a true but unprovable formula. A similar proof method was independently discovered by Saul Kripke (Boolos 1998, p.383). Boolos's proof proceeds by constructing, for any computably enumerable set S of true sentences of arithmetic, another sentence which is true but not contained in S. This gives the first incompleteness theorem as a corollary. According to Boolos, this proof is interesting because it provides a "different sort of reason" for the incompleteness of effective, consistent theories of arithmetic (Boolos 1998, p.388).

Formalized proofs
Formalized proofs of versions of the incompleteness theorem have been developed by Natarajan Shankar in 1986 using Nqthm (Shankar 1994) and by Russell O'Connor in 2003 using Coq (O'Connor 2005).

Proof sketch for the second theorem


The main difficulty in proving the second incompleteness theorem is to show that various facts about provability used in the proof of the first incompleteness theorem can be formalized within the system using a formal predicate for provability. Once this is done, the second incompleteness theorem follows by formalizing the entire proof of the first incompleteness theorem within the system itself. Let p stand for the undecidable sentence constructed above, and assume that the consistency of the system can be proven from within the system itself. The demonstration above shows that if the system is consistent, then p is not provable. The proof of this implication can be formalized within the system, and therefore the statement "p is not provable", or "not P(p)" can be proven in the system. But this last statement is equivalent to p itself (and this equivalence can be proven in the system), so p can be proven in the system. This contradiction shows that the system must be inconsistent.

Discussion and implications


The incompleteness results affect the philosophy of mathematics, particularly versions of formalism, which use a single system formal logic to define their principles. One can paraphrase the first theorem as saying the following: An all-encompassing axiomatic system can never be found that is able to prove all mathematical truths, but no falsehoods. On the other hand, from a strict formalist perspective this paraphrase would be considered meaningless because it presupposes that mathematical "truth" and "falsehood" are well-defined in an absolute sense, rather than relative to each formal system. The following rephrasing of the second theorem is even more unsettling to the foundations of mathematics: If an axiomatic system can be proven to be consistent from within itself, then it is inconsistent. Therefore, to establish the consistency of a system S, one needs to use some other system T, but a proof in T is not completely convincing unless T's consistency has already been established without using S. Theories such as Peano arithmetic, for which any computably enumerable consistent extension is incomplete, are called essentially undecidable or essentially incomplete.

Gdel's incompleteness theorems

131

Minds and machines


Authors including J. R. Lucas have debated what, if anything, Gdel's incompleteness theorems imply about human intelligence. Much of the debate centers on whether the human mind is equivalent to a Turing machine, or by the ChurchTuring thesis, any finite machine at all. If it is, and if the machine is consistent, then Gdel's incompleteness theorems would apply to it. Hilary Putnam (1960) suggested that while Gdel's theorems cannot be applied to humans, since they make mistakes and are therefore inconsistent, it may be applied to the human faculty of science or mathematics in general. Assuming that it is consistent, either its consistency cannot be proved or it cannot be represented by a Turing machine. Avi Wigderson (2010) has proposed that the concept of mathematical "knowability" should be based on computational complexity rather than logical decidability. He writes that "when knowability is interpreted by modern standards, namely via computational complexity, the Gdel phenomena are very much with us."

Paraconsistent logic
Although Gdel's theorems are usually studied in the context of classical logic, they also have a role in the study of paraconsistent logic and of inherently contradictory statements (dialetheia). Graham Priest (1984, 2006) argues that replacing the notion of formal proof in Gdel's theorem with the usual notion of informal proof can be used to show that naive mathematics is inconsistent, and uses this as evidence for dialetheism. The cause of this inconsistency is the inclusion of a truth predicate for a theory within the language of the theory (Priest 2006:47). Stewart Shapiro (2002) gives a more mixed appraisal of the applications of Gdel's theorems to dialetheism. Carl Hewitt (2008) has proposed that (inconsistent) paraconsistent logics that prove their own Gdel sentences may have applications in software engineering.

Appeals to the incompleteness theorems in other fields


Appeals and analogies are sometimes made to the incompleteness theorems in support of arguments that go beyond mathematics and logic. Several authors have commented negatively on such extensions and interpretations, including Torkel Franzn (2005); Alan Sokal and Jean Bricmont (1999); and Ophelia Benson and Jeremy Stangroom (2006). Bricmont and Stangroom (2006, p.10), for example, quote from Rebecca Goldstein's comments on the disparity between Gdel's avowed Platonism and the anti-realist uses to which his ideas are sometimes put. Sokal and Bricmont (1999, p.187) criticize Rgis Debray's invocation of the theorem in the context of sociology; Debray has defended this use as metaphorical (ibid.).

The role of self-reference


Torkel Franzn (2005, p.46) observes: Gdel's proof of the first incompleteness theorem and Rosser's strengthened version have given many the impression that the theorem can only be proved by constructing self-referential statements [...] or even that only strange self-referential statements are known to be undecidable in elementary arithmetic. To counteract such impressions, we need only introduce a different kind of proof of the first incompleteness theorem. He then proposes the proofs based on computability, or on information theory, as described earlier in this article, as examples of proofs that should "counteract such impressions".

Gdel's incompleteness theorems

132

History
After Gdel published his proof of the completeness theorem as his doctoral thesis in 1929, he turned to a second problem for his habilitation. His original goal was to obtain a positive solution to Hilbert's second problem (Dawson 1997, p.63). At the time, theories of the natural numbers and real numbers similar to second-order arithmetic were known as "analysis", while theories of the natural numbers alone were known as "arithmetic". Gdel was not the only person working on the consistency problem. Ackermann had published a flawed consistency proof for analysis in 1925, in which he attempted to use the method of -substitution originally developed by Hilbert. Later that year, von Neumann was able to correct the proof for a theory of arithmetic without any axioms of induction. By 1928, Ackermann had communicated a modified proof to Bernays; this modified proof led Hilbert to announce his belief in 1929 that the consistency of arithmetic had been demonstrated and that a consistency proof of analysis would likely soon follow. After the publication of the incompleteness theorems showed that Ackermann's modified proof must be erroneous, von Neumann produced a concrete example showing that its main technique was unsound (Zach 2006, p.418, Zach 2003, p.33). In the course of his research, Gdel discovered that although a sentence which asserts its own falsehood leads to paradox, a sentence that asserts its own non-provability does not. In particular, Gdel was aware of the result now called Tarski's indefinability theorem, although he never published it. Gdel announced his first incompleteness theorem to Carnap, Feigel and Waismann on August 26, 1930; all four would attend a key conference in Knigsberg the following week.

Announcement
The 1930 Knigsberg conference was a joint meeting of three academic societies, with many of the key logicians of the time in attendance. Carnap, Heyting, and von Neumann delivered one-hour addresses on the mathematical philosophies of logicism, intuitionism, and formalism, respectively (Dawson 1996, p.69). The conference also included Hilbert's retirement address, as he was leaving his position at the University of Gttingen. Hilbert used the speech to argue his belief that all mathematical problems can be solved. He ended his address by saying, For the mathematician there is no Ignorabimus, and, in my opinion, not at all for natural science either. ... The true reason why [no one] has succeeded in finding an unsolvable problem is, in my opinion, that there is no unsolvable problem. In contrast to the foolish Ignoramibus, our credo avers: We must know. We shall know! This speech quickly became known as a summary of Hilbert's beliefs on mathematics (its final six words, "Wir mssen wissen. Wir werden wissen!", were used as Hilbert's epitaph in 1943). Although Gdel was likely in attendance for Hilbert's address, the two never met face to face (Dawson 1996, p.72). Gdel announced his first incompleteness theorem at a roundtable discussion session on the third day of the conference. The announcement drew little attention apart from that of von Neumann, who pulled Gdel aside for conversation. Later that year, working independently with knowledge of the first incompleteness theorem, von Neumann obtained a proof of the second incompleteness theorem, which he announced to Gdel in a letter dated November 20, 1930 (Dawson 1996, p.70). Gdel had independently obtained the second incompleteness theorem and included it in his submitted manuscript, which was received by Monatshefte fr Mathematik on November 17, 1930. Gdel's paper was published in the Monatshefte in 1931 under the title ber formal unentscheidbare Stze der Principia Mathematica und verwandter Systeme I (On Formally Undecidable Propositions in Principia Mathematica and Related Systems I). As the title implies, Gdel originally planned to publish a second part of the paper; it was never written.

Gdel's incompleteness theorems

133

Generalization and acceptance


Gdel gave a series of lectures on his theorems at Princeton in 19331934 to an audience that included Church, Kleene, and Rosser. By this time, Gdel had grasped that the key property his theorems required is that the theory must be effective (at the time, the term "general recursive" was used). Rosser proved in 1936 that the hypothesis of -consistency, which was an integral part of Gdel's original proof, could be replaced by simple consistency, if the Gdel sentence was changed in an appropriate way. These developments left the incompleteness theorems in essentially their modern form. Gentzen published his consistency proof for first-order arithmetic in 1936. Hilbert accepted this proof as "finitary" although (as Gdel's theorem had already shown) it cannot be formalized within the system of arithmetic that is being proved consistent. The impact of the incompleteness theorems on Hilbert's program was quickly realized. Bernays included a full proof of the incompleteness theorems in the second volume of Grundlagen der Mathematik (1939), along with additional results of Ackermann on the -substitution method and Gentzen's consistency proof of arithmetic. This was the first full published proof of the second incompleteness theorem.

Criticisms
Finsler Paul Finsler (1926) used a version of Richard's paradox to construct an expression that was false but unprovable in a particular, informal framework he had developed. Gdel was unaware of this paper when he proved the incompleteness theorems (Collected Works Vol. IV., p.9). Finsler wrote Gdel in 1931 to inform him about this paper, which Finsler felt had priority for an incompleteness theorem. Finsler's methods did not rely on formalized provability, and had only a superficial resemblance to Gdel's work (van Heijenoort 1967:328). Gdel read the paper but found it deeply flawed, and his response to Finsler laid out concerns about the lack of formalization (Dawson:89). Finsler continued to argue for his philosophy of mathematics, which eschewed formalization, for the remainder of his career. Zermelo In September 1931, Ernst Zermelo wrote Gdel to announce what he described as an "essential gap" in Gdel's argument (Dawson:76). In October, Gdel replied with a 10-page letter (Dawson:76, Grattan-Guinness:512-513). But Zermelo did not relent and published his criticisms in print with "a rather scathing paragraph on his young competitor" (Grattan-Guinness:513). Gdel decided that to pursue the matter further was pointless, and Carnap agreed (Dawson:77). Much of Zermelo's subsequent work was related to logics stronger than first-order logic, with which he hoped to show both the consistency and categoricity of mathematical theories. Wittgenstein Ludwig Wittgenstein wrote several passages about the incompleteness theorems that were published posthumously in his 1953 Remarks on the Foundations of Mathematics. Gdel was a member of the Vienna Circle during the period in which Wittgenstein's early ideal language philosophy and Tractatus Logico-Philosophicus dominated the circle's thinking. Writings in Gdel's Nachlass express the belief that Wittgenstein deliberately misread his ideas. Multiple commentators have read Wittgenstein as misunderstanding Gdel (Rodych 2003), although Juliet Floyd and Hilary Putnam (2000), as well as Graham Priest (2004) have provided textual readings arguing that most commentary misunderstands Wittgenstein. On their release, Bernays, Dummett, and Kreisel wrote separate reviews on Wittgenstein's remarks, all of which were extremely negative (Berto 2009:208). The unanimity of this criticism caused Wittgenstein's remarks on the incompleteness theorems to have little impact on the logic community. In 1972, Gdel, stated: "Has Wittgenstein lost his mind? Does he mean it seriously?" (Wang 1996:197) And wrote to

Gdel's incompleteness theorems Karl Menger that Wittgenstein's comments demonstrate a willful misunderstanding of the incompleteness theorems writing: "It is clear from the passages you cite that Wittgenstein did "not" understand [the first incompleteness theorem] (or pretended not to understand it). He interpreted it as a kind of logical paradox, while in fact is just the opposite, namely a mathematical theorem within an absolutely uncontroversial part of mathematics (finitary number theory or combinatorics)." (Wang 1996:197) Since the publication of Wittgenstein's Nachlass in 2000, a series of papers in philosophy have sought to evaluate whether the original criticism of Wittgenstein's remarks was justified. Floyd and Putnam (2000) argue that Wittgenstein had a more complete understanding of the incompleteness theorem than was previously assumed. They are particularly concerned with the interpretation of a Gdel sentence for an -inconsistent theory as actually saying "I am not provable", since the theory has no models in which the provability predicate corresponds to actual provability. Rodych (2003) argues that their interpretation of Wittgenstein is not historically justified, while Bays (2004) argues against Floyd and Putnam's philosophical analysis of the provability predicate. Berto (2009) explores the relationship between Wittgenstein's writing and theories of paraconsistent logic.

134

Notes
[1] The word "true" is used disquotationally here: the Gdel sentence is true in this sense because it "asserts its own unprovability and it is indeed unprovable" (Smoryski 1977 p. 825; also see Franzn 2005 pp. 2833). It is also possible to read "GT is true" in the formal sense that primitive recursive arithmetic proves the implication Con(T)GT, where Con(T) is a canonical sentence asserting the consistency of T (Smoryski 1977 p. 840, Kikuchi and Tanaka 1994 p. 403)

References
Articles by Gdel
1931, ber formal unentscheidbare Stze der Principia Mathematica und verwandter Systeme, I. Monatshefte fr Mathematik und Physik 38: 173-98. 1931, ber formal unentscheidbare Stze der Principia Mathematica und verwandter Systeme, I. and On formally undecidable propositions of Principia Mathematica and related systems I in Solomon Feferman, ed., 1986. Kurt Gdel Collected works, Vol. I. Oxford University Press: 144-195. The original German with a facing English translation, preceded by a very illuminating introductory note by Kleene. Hirzel, Martin, 2000, On formally undecidable propositions of Principia Mathematica and related systems I. (http://www.research.ibm.com/people/h/hirzel/papers/canon00-goedel.pdf). A modern translation by Hirzel. 1951, Some basic theorems on the foundations of mathematics and their implications in Solomon Feferman, ed., 1995. Kurt Gdel Collected works, Vol. III. Oxford University Press: 304-23.

Translations, during his lifetime, of Gdel's paper into English


None of the following agree in all translated words and in typography. The typography is a serious matter, because Gdel expressly wished to emphasize "those metamathematical notions that had been defined in their usual sense before . . ."(van Heijenoort 1967:595). Three translations exist. Of the first John Dawson states that: "The Meltzer translation was seriously deficient and received a devastating review in the Journal of Symbolic Logic; "Gdel also complained about Braithwaite's commentary (Dawson 1997:216). "Fortunately, the Meltzer translation was soon supplanted by a better one prepared by Elliott Mendelson for Martin Davis's anthology The Undecidable . . . he found the translation "not quite so good" as he had expected . . . [but because of time constraints he] agreed to its publication" (ibid). (In a footnote Dawson states that "he would regret his compliance, for the published volume was marred throughout by sloppy typography and numerous misprints" (ibid)). Dawson states that "The translation that

Gdel's incompleteness theorems Gdel favored was that by Jean van Heijenoort"(ibid). For the serious student another version exists as a set of lecture notes recorded by Stephen Kleene and J. B. Rosser "during lectures given by Gdel at to the Institute for Advanced Study during the spring of 1934" (cf commentary by Davis 1965:39 and beginning on p.41); this version is titled "On Undecidable Propositions of Formal Mathematical Systems". In their order of publication: B. Meltzer (translation) and R. B. Braithwaite (Introduction), 1962. On Formally Undecidable Propositions of Principia Mathematica and Related Systems, Dover Publications, New York (Dover edition 1992), ISBN 0-486-66980-7 (pbk.) This contains a useful translation of Gdel's German abbreviations on pp.3334. As noted above, typography, translation and commentary is suspect. Unfortunately, this translation was reprinted with all its suspect content by Stephen Hawking editor, 2005. God Created the Integers: The Mathematical Breakthroughs That Changed History, Running Press, Philadelphia, ISBN 0-7624-1922-9. Gdel's paper appears starting on p. 1097, with Hawking's commentary starting on p. 1089. Martin Davis editor, 1965. The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable problems and Computable Functions, Raven Press, New York, no ISBN. Gdel's paper begins on page 5, preceded by one page of commentary. Jean van Heijenoort editor, 1967, 3rd edition 1967. From Frege to Gdel: A Source Book in Mathematical Logic, 1979-1931, Harvard University Press, Cambridge Mass., ISBN 0-674-32449-8 (pbk). van Heijenoort did the translation. He states that "Professor Gdel approved the translation, which in many places was accommodated to his wishes."(p.595). Gdel's paper begins on p.595; van Heijenoort's commentary begins on p.592. Martin Davis editor, 1965, ibid. "On Undecidable Propositions of Formal Mathematical Systems." A copy with Gdel's corrections of errata and Gdel's added notes begins on page 41, preceded by two pages of Davis's commentary. Until Davis included this in his volume this lecture existed only as mimeographed notes.

135

Articles by others
George Boolos, 1989, "A New Proof of the Gdel Incompleteness Theorem", Notices of the American Mathematical Society v. 36, pp.388390 and p.676, reprinted in Boolos, 1998, Logic, Logic, and Logic, Harvard Univ. Press. ISBN 0-674-53766-1 Arthur Charlesworth, 1980, "A Proof of Godel's Theorem in Terms of Computer Programs," Mathematics Magazine, v. 54 n. 3, pp.109121. JStor (http://links.jstor.org/ sici?sici=0025-570X(198105)54:3<109:APOGTI>2.0.CO;2-1&size=LARGE&origin=JSTOR-enlargePage) Martin Davis, " The Incompleteness Theorem (http://www.ams.org/notices/200604/fea-davis.pdf)", in Notices of the AMS vol. 53 no. 4 (April 2006), p.414. Jean van Heijenoort, 1963. "Gdel's Theorem" in Edwards, Paul, ed., Encyclopedia of Philosophy, Vol. 3. Macmillan: 348-57. Geoffrey Hellman, How to Gdel a Frege-Russell: Gdel's Incompleteness Theorems and Logicism. Nos, Vol. 15, No. 4, Special Issue on Philosophy of Mathematics. (Nov., 1981), pp.451468. David Hilbert, 1900, " Mathematical Problems. (http://aleph0.clarku.edu/~djoyce/hilbert/problems. html#prob2)" English translation of a lecture delivered before the International Congress of Mathematicians at Paris, containing Hilbert's statement of his Second Problem. Kikuchi, Makoto; Tanaka, Kazuyuki (1994), "On formalization of model-theoretic proofs of Gdel's theorems", Notre Dame Journal of Formal Logic 35 (3): 403412, doi:10.1305/ndjfl/1040511346, ISSN0029-4527, MR1326122 Stephen Cole Kleene, 1943, "Recursive predicates and quantifiers," reprinted from Transactions of the American Mathematical Society, v. 53 n. 1, pp.4173 in Martin Davis 1965, The Undecidable (loc. cit.) pp.255287. John Barkley Rosser, 1936, "Extensions of some theorems of Gdel and Church," reprinted from the Journal of Symbolic Logic vol. 1 (1936) pp.8791, in Martin Davis 1965, The Undecidable (loc. cit.) pp.230235.

Gdel's incompleteness theorems John Barkley Rosser, 1939, "An Informal Exposition of proofs of Gdel's Theorem and Church's Theorem", Reprinted from the Journal of Symbolic Logic, vol. 4 (1939) pp.5360, in Martin Davis 1965, The Undecidable (loc. cit.) pp.223230 C. Smoryski, "The incompleteness theorems", in J. Barwise, ed., Handbook of Mathematical Logic, North-Holland 1982 ISBN 978-0-444-86388-1, pp.821866. Dan E. Willard (2001), " Self-Verifying Axiom Systems, the Incompleteness Theorem and Related Reflection Principles (http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.jsl/ 1183746459)", Journal of Symbolic Logic, v. 66 n. 2, pp.536596. doi:10.2307/2695030 Zach, Richard (2003), "The Practice of Finitism: Epsilon Calculus and Consistency Proofs in Hilbert's Program" (http://www.ucalgary.ca/~rzach/static/conprf.pdf), Synthese (Berlin, New York: Springer-Verlag) 137 (1): 211259, doi:10.1023/A:1026247421383, ISSN0039-7857 Richard Zach, 2005, "Paper on the incompleteness theorems" in Grattan-Guinness, I., ed., Landmark Writings in Western Mathematics. Elsevier: 917-25.

136

Books about the theorems


Francesco Berto. There's Something about Gdel: The Complete Guide to the Incompleteness Theorem John Wiley and Sons. 2010. Domeisen, Norbert, 1990. Logik der Antinomien. Bern: Peter Lang. 142 S. 1990. ISBN 3-261-04214-1. Zentralblatt MATH (http://www.zentralblatt-math.org/zbmath/search/?q=an:0724.03003) Torkel Franzn, 2005. Gdel's Theorem: An Incomplete Guide to its Use and Abuse. A.K. Peters. ISBN 1-56881-238-8 MR2007d:03001 Douglas Hofstadter, 1979. Gdel, Escher, Bach: An Eternal Golden Braid. Vintage Books. ISBN 0-465-02685-0. 1999 reprint: ISBN 0-465-02656-7. MR80j:03009 Douglas Hofstadter, 2007. I Am a Strange Loop. Basic Books. ISBN 978-0-465-03078-1. ISBN 0-465-03078-5. MR2008g:00004 Stanley Jaki, OSB, 2005. The drama of the quantities. Real View Books. (http://www.realviewbooks.com/) Per Lindstrm, 1997, Aspects of Incompleteness (http://projecteuclid.org/DPubS?service=UI&version=1.0& verb=Display&handle=euclid.lnl/1235416274), Lecture Notes in Logic v. 10. J.R. Lucas, FBA, 1970. The Freedom of the Will. Clarendon Press, Oxford, 1970. Ernest Nagel, James Roy Newman, Douglas Hofstadter, 2002 (1958). Gdel's Proof, revised ed. ISBN 0-8147-5816-9. MR2002i:03001 Rudy Rucker, 1995 (1982). Infinity and the Mind: The Science and Philosophy of the Infinite. Princeton Univ. Press. MR84d:03012 Smith, Peter, 2007. An Introduction to Gdel's Theorems. (http://www.godelbook.net/) Cambridge University Press. MathSciNet (http://www.ams.org/mathscinet/search/publdoc.html?arg3=&co4=AND&co5=AND& co6=AND&co7=AND&dr=all&pg4=AUCN&pg5=AUCN&pg6=PC&pg7=ALLF&pg8=ET&s4=Smith, Peter&s5=&s6=&s7=&s8=All&yearRangeFirst=&yearRangeSecond=&yrop=eq&r=2&mx-pid=2384958) N. Shankar, 1994. Metamathematics, Machines and Gdel's Proof, Volume 38 of Cambridge tracts in theoretical computer science. ISBN 0-521-58533-3 Raymond Smullyan, 1991. Godel's Incompleteness Theorems. Oxford Univ. Press. , 1994. Diagonalization and Self-Reference. Oxford Univ. Press. MR96c:03001 Hao Wang, 1997. A Logical Journey: From Gdel to Philosophy. MIT Press. ISBN 0-262-23189-1 MR97m:01090

Gdel's incompleteness theorems

137

Miscellaneous references
Francesco Berto. "The Gdel Paradox and Wittgenstein's Reasons" Philosophia Mathematica (III) 17. 2009. John W. Dawson, Jr., 1997. Logical Dilemmas: The Life and Work of Kurt Gdel, A.K. Peters, Wellesley Mass, ISBN 1-56881-256-6. Goldstein, Rebecca, 2005, Incompleteness: the Proof and Paradox of Kurt Gdel, W. W. Norton & Company. ISBN 0-393-05169-2 Juliet Floyd and Hilary Putnam, 2000, "A Note on Wittgenstein's 'Notorious Paragraph' About the Gdel Theorem", Journal of Philosophy v. 97 n. 11, pp.624632. Carl Hewitt, 2008, "Large-scale Organizational Computing requires Unstratified Reflection and Strong Paraconsistency", Coordination, Organizations, Institutions, and Norms in Agent Systems III, Springer-Verlag. David Hilbert and Paul Bernays, Grundlagen der Mathematik, Springer-Verlag. John Hopcroft and Jeffrey Ullman 1979, Introduction to Automata theory, Addison-Wesley, ISBN 0-201-02988-X James P. Jones, Undecidable Diophantine Equations (http://www.ams.org/bull/1980-03-02/ S0273-0979-1980-14832-6/S0273-0979-1980-14832-6.pdf), Bulletin of the American Mathematical Society v. 3 n. 2, 1980, pp.859862. Stephen Cole Kleene, 1967, Mathematical Logic. Reprinted by Dover, 2002. ISBN 0-486-42533-9 Russell O'Connor, 2005, " Essential Incompleteness of Arithmetic Verified by Coq (http://arxiv.org/abs/cs/ 0505034)", Lecture Notes in Computer Science v. 3603, pp.245260. Graham Priest, 2006, In Contradiction: A Study of the Transconsistent, Oxford University Press, ISBN 0-19-926329-9 Graham Priest, 2004, Wittgenstein's Remarks on Gdel's Theorem in Max Klbel, ed., Wittgenstein's lasting significance, Psychology Press, pp.207227. Graham Priest, 1984, "Logic of Paradox Revisited", Journal of Philosophical Logic, v. 13,` n. 2, pp.153179 Hilary Putnam, 1960, Minds and Machines in Sidney Hook, ed., Dimensions of Mind: A Symposium. New York University Press. Reprinted in Anderson, A. R., ed., 1964. Minds and Machines. Prentice-Hall: 77. Rautenberg, Wolfgang (2010), A Concise Introduction to Mathematical Logic (http://www.springerlink.com/ content/978-1-4419-1220-6/) (3rd ed.), New York: Springer Science+Business Media, doi:10.1007/978-1-4419-1221-3, ISBN978-1-4419-1220-6. Victor Rodych, 2003, "Misunderstanding Gdel: New Arguments about Wittgenstein and New Remarks by Wittgenstein", Dialectica v. 57 n. 3, pp.279313. doi:10.1111/j.1746-8361.2003.tb00272.x Stewart Shapiro, 2002, "Incompleteness and Inconsistency", Mind, v. 111, pp 81732. doi:10.1093/mind/111.444.817 Alan Sokal and Jean Bricmont, 1999, Fashionable Nonsense: Postmodern Intellectuals' Abuse of Science, Picador. ISBN 0-312-20407-8 Joseph R. Shoenfield (1967), Mathematical Logic. Reprinted by A.K. Peters for the Association of Symbolic Logic, 2001. ISBN 978-1-56881-135-2 Jeremy Stangroom and Ophelia Benson, Why Truth Matters, Continuum. ISBN 0-8264-9528-1 George Tourlakis, Lectures in Logic and Set Theory, Volume 1, Mathematical Logic, Cambridge University Press, 2003. ISBN 978-0-521-75373-9 Wigderson, Avi (2010), "The Gdel Phenomena in Mathematics: A Modern View" (http://www.math.ias.edu/ ~avi/BOOKS/Godel_Widgerson_Text.pdf), Kurt Gdel and the Foundations of Mathematics: Horizons of Truth, Cambridge University Press Hao Wang, 1996, A Logical Journey: From Gdel to Philosophy, The MIT Press, Cambridge MA, ISBN 0-262-23189-1. Richard Zach, 2006, "Hilbert's program then and now" (http://www.ucalgary.ca/~rzach/static/hptn.pdf), in Philosophy of Logic, Dale Jacquette (ed.), Handbook of the Philosophy of Science, v. 5., Elsevier, pp.411447.

Gdel's incompleteness theorems

138

External links
Godel's Incompleteness Theorems (http://www.bbc.co.uk/programmes/b00dshx3) on In Our Time at the BBC. ( listen now (http://www.bbc.co.uk/iplayer/console/b00dshx3/ In_Our_Time_Godel's_Incompleteness_Theorems)) Stanford Encyclopedia of Philosophy: " Kurt Gdel (http://plato.stanford.edu/entries/goedel/)" by Juliette Kennedy. MacTutor biographies: Kurt Gdel. (http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Godel.html) Gerhard Gentzen. (http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Gentzen.html) What is Mathematics:Gdel's Theorem and Around (http://podnieks.id.lv/gt.html) by Karlis Podnieks. An online free book. World's shortest explanation of Gdel's theorem (http://blog.plover.com/math/Gdl-Smullyan.html) using a printing machine as an example. October 2011 RadioLab episode (http://www.radiolab.org/2011/oct/04/break-cycle/) about/including Gdel's Incompleteness theorem Hazewinkel, Michiel, ed. (2001), "Gdel incompleteness theorem" (http://www.encyclopediaofmath.org/index. php?title=p/g044530), Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4

Travelling salesman problem


The travelling salesman problem (TSP) is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find the shortest possible route that visits each city exactly once and returns to the origin city. It is a special case of the travelling purchaser problem. The problem was first formulated in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods. Even though the problem is computationally difficult,[1] a large number of heuristics and exact methods are known, so that some instances with tens of thousands of cities can be solved. The TSP has several applications even in its purest formulation, such as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such as DNA sequencing. In these applications, the concept city represents, for example, customers, soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or a similarity measure between DNA fragments. In many applications, additional constraints such as limited resources or time windows make the problem considerably harder. In the theory of computational complexity, the decision version of the TSP (where, given a length L, the task is to decide whether any tour is shorter than L) belongs to the class of NP-complete problems. Thus, it is likely that the worst-case running time for any algorithm for the TSP increases exponentially with the number of cities.

Travelling salesman problem

139

History
The origins of the travelling salesman problem are unclear. A handbook for travelling salesmen from 1832 mentions the problem and includes example tours through Germany and Switzerland, but contains no mathematical treatment.[2] The travelling salesman problem was defined in the 1800s by the Irish mathematician W. R. Hamilton and by the British mathematician Thomas Kirkman. Hamiltons Icosian Game was a recreational puzzle based on finding a Hamiltonian cycle.[3] The general form of the TSP appears to have been first studied by mathematicians during the 1930s in Vienna and at Harvard, notably by Karl Menger, who defines the problem, considers the obvious brute-force algorithm, and observes the non-optimality of the nearest neighbour heuristic: We denote by messenger problem (since in practice this question should be solved by each postman, anyway also by many travelers) the task to find, for nitely many points whose pairwise distances are known, the shortest route connecting the points. Of course, this problem is solvable by finitely many trials. Rules which would push the number of trials below the William Rowan Hamilton number of permutations of the given points, are not known. The rule that one first should go from the starting point to the closest point, then to the point closest to this, etc., in general does not yield the shortest route.[4] Hassler Whitney at Princeton University introduced the name travelling salesman problem soon after.[5] In the 1950s and 1960s, the problem became increasingly popular in scientific circles in Europe and the USA. Notable contributions were made by George Dantzig, Delbert Ray Fulkerson and Selmer M. Johnson at the RAND Corporation in Santa Monica, who expressed the problem as an integer linear program and developed the cutting plane method for its solution. With these new methods they solved an instance with 49 cities to optimality by constructing a tour and proving that no other tour could be shorter. In the following decades, the problem was studied by many researchers from mathematics, computer science, chemistry, physics, and other sciences. Richard M. Karp showed in 1972 that the Hamiltonian cycle problem was NP-complete, which implies the NP-hardness of TSP. This supplied a mathematical explanation for the apparent computational difficulty of finding optimal tours. Great progress was made in the late 1970s and 1980, when Grtschel, Padberg, Rinaldi and others managed to exactly solve instances with up to 2392 cities, using cutting planes and branch-and-bound. In the 1990s, Applegate, Bixby, Chvtal, and Cook developed the program Concorde that has been used in many recent record solutions. Gerhard Reinelt published the TSPLIB in 1991, a collection of benchmark instances of varying difficulty, which has been used by many research groups for comparing results. In 2005, Cook and others computed an optimal tour through a 33,810-city instance given by a microchip layout problem, currently the largest solved TSPLIB instance. For many other instances with millions of cities, solutions can be found that are guaranteed to be within 1% of an optimal tour.

Travelling salesman problem

140

Description
As a graph problem
TSP can be modelled as an undirected weighted graph, such that cities are the graph's vertices, paths are the graph's edges, and a path's distance is the edge's length. It is a minimization problem starting and finishing at a specified vertex after having visited each other vertex exactly once. Often, the model is a complete graph (i.e. each pair of vertices is connected by an edge). If no path exists between two cities, adding an arbitrarily long edge will complete the graph without affecting the optimal tour.

Asymmetric and symmetric

Symmetric TSP with four cities

In the symmetric TSP, the distance between two cities is the same in each opposite direction, forming an undirected graph. This symmetry halves the number of possible solutions. In the asymmetric TSP, paths may not exist in both directions or the distances might be different, forming a directed graph. Traffic collisions, one-way streets, and airfares for cities with different departure and arrival fees are examples of how this symmetry could break down.

Related problems
An equivalent formulation in terms of graph theory is: Given a complete weighted graph (where the vertices would represent the cities, the edges would represent the roads, and the weights would be the cost or distance of that road), find a Hamiltonian cycle with the least weight. The requirement of returning to the starting city does not change the computational complexity of the problem, see Hamiltonian path problem. Another related problem is the bottleneck traveling salesman problem (bottleneck TSP): Find a Hamiltonian cycle in a weighted graph with the minimal weight of the weightiest edge. The problem is of considerable practical importance, apart from evident transportation and logistics areas. A classic example is in printed circuit manufacturing: scheduling of a route of the drill machine to drill holes in a PCB. In robotic machining or drilling applications, the "cities" are parts to machine or holes (of different sizes) to drill, and the "cost of travel" includes time for retooling the robot (single machine job sequencing problem). The generalized traveling salesman problem deals with "states" that have (one or more) "cities" and the salesman has to visit exactly one "city" from each "state". Also known as the "traveling politician problem". One application is encountered in ordering a solution to the cutting stock problem in order to minimise knife changes. Another is concerned with drilling in semiconductor manufacturing, see e.g. U.S. Patent 7,054,798 [6]. Surprisingly, Behzad and Modarres[7] demonstrated that the generalised traveling salesman problem can be transformed into a standard traveling salesman problem with the same number of cities, but a modified distance matrix. The sequential ordering problem deals with the problem of visiting a set of cities where precedence relations between the cities exist. The traveling purchaser problem deals with a purchaser who is charged with purchasing a set of products. He can purchase these products in several cities, but at different prices and not all cities offer the same products. The objective is to find a route between a subset of the cities, which minimizes total cost (travel cost + purchasing cost) and which enables the purchase of all required products.

Travelling salesman problem

141

ILP Formulation
TSP can be formulated as an integer linear program.[8]

In the above formulation, path from city to city

corresponds to the distance between cities

and

and

indicates whether the

is included in the tour. The last constraint enforces that there is only a single tour. To

prove this, it must be shown that (1) every feasible solution is a tour and (2) that for every feasible tour, there are values for the that satisfy the constraints. To prove that every feasible solution is a tour, it suffices to show that every subtour in a feasible solution passes through city 0 because this implies that there can only be a single tour which includes all the cities. To prove this, assume a sequence of cities forms a subtour that does not include city 0. Summing the corresponding constraints for these cities,

gives

which is a contradiction. Thus, every subtour in a feasible solution passes through city 0 and so every feasible solution is a tour. It now must be shown that for every feasible tour there are values for the that satisfy the constraints. Let the sequence of cities be a tour and let for . Then if ,

which holds except for the case city is in the tour. If

and

. But this case is ruled out because the path from city

to

, the following must be satisfied only if and are consecutive cities on the tour (and ) and then

This holds because .

Travelling salesman problem

142

Computing a solution
The traditional lines of attack for the NP-hard problems are the following: Devising algorithms for finding exact solutions (they will work reasonably fast only for small problem sizes). Devising "suboptimal" or heuristic algorithms, i.e., algorithms that deliver either seemingly or probably good solutions, but which could not be proved to be optimal. Finding special cases for the problem ("subproblems") for which either better or exact heuristics are possible.

Computational complexity
The problem has been shown to be NP-hard (more precisely, it is complete for the complexity class FPNP; see function problem), and the decision problem version ("given the costs and a number x, decide whether there is a round-trip route cheaper than x") is NP-complete. The bottleneck travelling salesman problem is also NP-hard. The problem remains NP-hard even for the case when the cities are in the plane with Euclidean distances, as well as in a number of other restrictive cases. Removing the condition of visiting each city "only once" does not remove the NP-hardness, since it is easily seen that in the planar case there is an optimal tour that visits each city only once (otherwise, by the triangle inequality, a shortcut that skips a repeated visit would not increase the tour length). Complexity of approximation In the general case, finding a shortest travelling salesman tour is NPO-complete.[9] If the distance measure is a metric and symmetric, the problem becomes APX-complete[10] and Christofidess algorithm approximates it within 1.5.[11] If the distances are restricted to 1 and 2 (but still are a metric) the approximation ratio becomes 7/6. In the asymmetric, metric case, only logarithmic performance guarantees are known, the best current algorithm achieves performance ratio 0.814 log n;[12] it is an open question if a constant factor approximation exists. The corresponding maximization problem of finding the longest travelling salesman tour is approximable within 63/38.[13] If the distance function is symmetric, the longest tour can be approximated within 4/3 by a deterministic algorithm[14] and within by a randomised algorithm.[15]

Exact algorithms
The most direct solution would be to try all permutations (ordered combinations) and see which one is cheapest (using brute force search). The running time for this approach lies within a polynomial factor of , the factorial of the number of cities, so this solution becomes impractical even for only 20 cities. One of the earliest applications of dynamic programming is the HeldKarp algorithm that solves the problem in time .[16] The dynamic programming solution requires exponential space. Using inclusionexclusion, the problem can be solved in time within a polynomial factor of and polynomial space.[17] Improving these time bounds seems to be difficult. For example, it has not been determined whether an exact algorithm for TSP that runs in time exists.[18] Other approaches include: Various branch-and-bound algorithms, which can be used to process TSPs containing 4060 cities. Progressive improvement algorithms which use techniques reminiscent of linear programming. Works well for up to 200 cities. Implementations of branch-and-bound and problem-specific cut generation (branch-and-cut); this is the method of choice for solving large instances. This approach holds the current record, solving an instance with 85,900 cities, see Applegate et al. (2006). An exact solution for 15,112 German towns from TSPLIB was found in 2001 using the cutting-plane method proposed by George Dantzig, Ray Fulkerson, and Selmer M. Johnson in 1954, based on linear programming. The

Travelling salesman problem computations were performed on a network of 110 processors located at Rice University and Princeton University (see the Princeton external link). The total computation time was equivalent to 22.6years on a single 500MHz Alpha processor. In May 2004, the travelling salesman problem of visiting all 24,978 towns in Sweden was solved: a tour of length approximately 72,500 kilometers was found and it was proven that no shorter tour exists.[19] In March 2005, the travelling salesman problem of visiting all 33,810 points in a circuit board was solved using Concorde TSP Solver: a tour of length 66,048,945 units was found and it was proven that no shorter tour exists. The computation took approximately 15.7 CPU-years (Cook et al. 2006). In April 2006 an instance with 85,900 points was solved using Concorde TSP Solver, taking over 136 CPU-years, see Applegate et al. (2006).

143

Heuristic and approximation algorithms


Various heuristics and approximation algorithms, which quickly yield good solutions have been devised. Modern methods can find solutions for extremely large problems (millions of cities) within a reasonable time which are with a high probability just 23% away from the optimal solution. Several categories of heuristics are recognized. Constructive heuristics The nearest neighbour (NN) algorithm (or so-called greedy algorithm) lets the salesman choose the nearest unvisited city as his next move. This algorithm quickly yields an effectively short route. For N cities randomly distributed on a plane, the algorithm on average yields a path 25% longer than the shortest possible path.[20] However, there exist many specially arranged city distributions which make the NN algorithm give the worst route (Gutin, Yeo, and Zverovich, 2002). This is true for both asymmetric and symmetric TSPs (Gutin and Yeo, 2007). Rosenkrantz et al. [1977] showed that the NN algorithm has the approximation factor for instances satisfying the triangle inequality. A variation of NN algorithm, called Nearest Fragment (NF) operator, which connects a group (fragment) of nearest unvisited cities, can find shorter route with successive iterations.[21] The NF operator can also be applied on an initial solution obtained by NN algorithm for further improvement in an elitist model, where only better solutions are accepted. Constructions based on a minimum spanning tree have an approximation ratio of 2. The Christofides algorithm achieves a ratio of 1.5. The bitonic tour of a set of points is the minimum-perimeter monotone polygon that has the points as its vertices; it can be computed efficiently by dynamic programming. Another constructive heuristic, Match Twice and Stitch (MTS) (Kahng, Reda 2004 [22]), performs two sequential matchings, where the second matching is executed after deleting all the edges of the first matching, to yield a set of cycles. The cycles are then stitched to produce the final tour. Iterative improvement Pairwise exchange The pairwise exchange or 2-opt technique involves iteratively removing two edges and replacing these with two different edges that reconnect the fragments created by edge removal into a new and shorter tour. This is a special case of the k-opt method. Note that the label LinKernighan is an often heard misnomer for 2-opt. LinKernighan is actually the more general k-opt method. k-opt heuristic, or LinKernighan heuristics Take a given tour and delete k mutually disjoint edges. Reassemble the remaining fragments into a tour, leaving no disjoint subtours (that is, don't connect a fragment's endpoints together). This in effect simplifies the TSP under consideration into a much simpler problem. Each fragment endpoint can be connected to 2k2 other possibilities: of 2k total fragment endpoints available, the two endpoints of the fragment under

Travelling salesman problem consideration are disallowed. Such a constrained 2k-city TSP can then be solved with brute force methods to find the least-cost recombination of the original fragments. The k-opt technique is a special case of the V-opt or variable-opt technique. The most popular of the k-opt methods are 3-opt, and these were introduced by Shen Lin of Bell Labs in 1965. There is a special case of 3-opt where the edges are not disjoint (two of the edges are adjacent to one another). In practice, it is often possible to achieve substantial improvement over 2-opt without the combinatorial cost of the general 3-opt by restricting the 3-changes to this special subset where two of the removed edges are adjacent. This so-called two-and-a-half-opt typically falls roughly midway between 2-opt and 3-opt, both in terms of the quality of tours achieved and the time required to achieve those tours. V-opt heuristic The variable-opt method is related to, and a generalization of the k-opt method. Whereas the k-opt methods remove a fixed number (k) of edges from the original tour, the variable-opt methods do not fix the size of the edge set to remove. Instead they grow the set as the search process continues. The best known method in this family is the LinKernighan method (mentioned above as a misnomer for 2-opt). Shen Lin and Brian Kernighan first published their method in 1972, and it was the most reliable heuristic for solving travelling salesman problems for nearly two decades. More advanced variable-opt methods were developed at Bell Labs in the late 1980s by David Johnson and his research team. These methods (sometimes called LinKernighanJohnson) build on the LinKernighan method, adding ideas from tabu search and evolutionary computing. The basic LinKernighan technique gives results that are guaranteed to be at least 3-opt. The LinKernighanJohnson methods compute a LinKernighan tour, and then perturb the tour by what has been described as a mutation that removes at least four edges and reconnecting the tour in a different way, then v-opting the new tour. The mutation is often enough to move the tour from the local minimum identified by LinKernighan. V-opt methods are widely considered the most powerful heuristics for the problem, and are able to address special cases, such as the Hamilton Cycle Problem and other non-metric TSPs that other heuristics fail on. For many years LinKernighanJohnson had identified optimal solutions for all TSPs where an optimal solution was known and had identified the best known solutions for all other TSPs on which the method had been tried. Randomised improvement Optimized Markov chain algorithms which use local searching heuristic sub-algorithms can find a route extremely close to the optimal route for 700 to 800 cities. TSP is a touchstone for many general heuristics devised for combinatorial optimization such as genetic algorithms, simulated annealing, Tabu search, ant colony optimization, river formation dynamics (see swarm intelligence) and the cross entropy method. Ant colony optimization Artificial intelligence researcher Marco Dorigo described in 1997 a method of heuristically generating "good solutions" to the TSP using a simulation of an ant colony called ACS (Ant Colony System).[23] It models behavior observed in real ants to find short paths between food sources and their nest, an emergent behaviour resulting from each ant's preference to follow trail pheromones deposited by other ants. ACS sends out a large number of virtual ant agents to explore many possible routes on the map. Each ant probabilistically chooses the next city to visit based on a heuristic combining the distance to the city and the amount of virtual pheromone deposited on the edge to the city. The ants explore, depositing pheromone on each edge that they cross, until they have all completed a tour. At this point the ant which completed the shortest tour deposits virtual pheromone along its complete tour route (global trail updating). The amount of pheromone deposited is inversely proportional to the tour length: the shorter the tour, the more it deposits.

144

Travelling salesman problem

145

Ant Colony Optimization Algorithm for a TSP with 7 cities: Red and thick lines in the pheromone map indicate presence of more pheromone

Special cases
Metric TSP In the metric TSP, also known as delta-TSP or -TSP, the intercity distances satisfy the triangle inequality. A very natural restriction of the TSP is to require that the distances between cities form a metric, i.e., they satisfy the triangle inequality, that is the direct connection from A to B is never longer than the route via intermediate C:

The edge lengths then form a metric on the set of vertices. When the cities are viewed as points in the plane, many natural distance functions are metrics, and so many natural instances of TSP satisfy this constraint. The following are some examples of metric TSPs for various metrics. In the Euclidean TSP (see below) the distance between two cities is the Euclidean distance between the corresponding points. In the rectilinear TSP the distance between two cities is the sum of the differences of their x- and y-coordinates. This metric is often called the Manhattan distance or city-block metric. In the maximum metric, the distance between two points is the maximum of the absolute values of differences of their x- and y-coordinates. The last two metrics appear for example in routing a machine that drills a given set of holes in a printed circuit board. The Manhattan metric corresponds to a machine that adjusts first one co-ordinate, and then the other, so the time to move to a new point is the sum of both movements. The maximum metric corresponds to a machine that adjusts both co-ordinates simultaneously, so the time to move to a new point is the slower of the two movements.

Travelling salesman problem In its definition, the TSP does not allow cities to be visited twice, but many applications do not need this constraint. In such cases, a symmetric, non-metric instance can be reduced to a metric one. This replaces the original graph with a complete graph in which the inter-city distance is replaced by the shortest path between and in the original graph. The length of the minimum spanning tree of the network is a natural lower bound for the length of the optimal . In route, because deleting any edge of the optimal route yields a Hamiltonian path, which is a spanning tree in

146

the TSP with triangle inequality case it is possible to prove upper bounds in terms of the minimum spanning tree and design an algorithm that has a provable upper bound on the length of the route. The first published (and the simplest) example follows: 1. Construct a minimum spanning tree for . 2. Duplicate all edges of . That is, wherever there is an edge from u to v, add a second edge from v to u. This gives us an Eulerian graph . 3. Find an Eulerian circuit in . Clearly, its length is twice the length of the tree. 4. Convert the Eulerian circuit of into a Hamiltonian cycle of in the following way: walk along each time you are about to come into an already visited vertex, skip it and try to go to the next one (along

, and ).

It is easy to prove that the last step works. Moreover, thanks to the triangle inequality, each skipping at Step 4 is in fact a shortcut; i.e., the length of the cycle does not increase. Hence it gives us a TSP tour no more than twice as long as the optimal one. The Christofides algorithm follows a similar outline but combines the minimum spanning tree with a solution of another problem, minimum-weight perfect matching. This gives a TSP tour which is at most 1.5 times the optimal. The Christofides algorithm was one of the first approximation algorithms, and was in part responsible for drawing attention to approximation algorithms as a practical approach to intractable problems. As a matter of fact, the term "algorithm" was not commonly extended to approximation algorithms until later; the Christofides algorithm was initially referred to as the Christofides heuristic. In the special case that distances between cities are all either one or two (and thus the triangle inequality is necessarily satisfied), there is a polynomial-time approximation algorithm that finds a tour of length at most 8/7 times the optimal tour length.[24] However, it is a long-standing (since 1975) open problem to improve the Christofides approximation factor of 1.5 for general metric TSP to a smaller constant. It is known that, unless P=NP, the best that a polynomial-time algorithm can find is a tour of length 220/219=1.00456 times the optimal tour's length.[25] In the case of bounded metrics it has been shown that the best a polynomial time algorithm can do is to construct a tour with a length 321/320=1.003125 times the optimal tour's length, unless P=NP.[26] Euclidean TSP The Euclidean TSP, or planar TSP, is the TSP with the distance being the ordinary Euclidean distance. The Euclidean TSP is a particular case of the metric TSP, since distances in a plane obey the triangle inequality. Like the general TSP, the Euclidean TSP (and therefore the general metric TSP) is NP-complete.[27] However, in some respects it seems to be easier than the general metric TSP. For example, the minimum spanning tree of the graph associated with an instance of the Euclidean TSP is a Euclidean minimum spanning tree, and so can be computed in expected O(n log n) time for n points (considerably less than the number of edges). This enables the simple 2-approximation algorithm for TSP with triangle inequality above to operate more quickly. In general, for any c > 0, where d is the number of dimensions in the Euclidean space, there is a polynomial-time algorithm that finds a tour of length at most (1 + 1/c) times the optimal for geometric instances of TSP in time; this is called a polynomial-time approximation scheme (PTAS).[28] Sanjeev Arora and Joseph S. B. Mitchell were awarded the Gdel Prize in 2010 for their concurrent discovery of a PTAS for the Euclidean TSP.

Travelling salesman problem In practice, heuristics with weaker guarantees continue to be used. Asymmetric TSP In most cases, the distance between two nodes in the TSP network is the same in both directions. The case where the distance from A to B is not equal to the distance from B to A is called asymmetric TSP. A practical application of an asymmetric TSP is route optimisation using street-level routing (which is made asymmetric by one-way streets, slip-roads, motorways, etc.). Solving by conversion to symmetric TSP Solving an asymmetric TSP graph can be somewhat complex. The following is a 33 matrix containing all possible path weights between the nodes A, B and C. One option is to turn an asymmetric matrix of size N into a symmetric matrix of size 2N.[29]
A B C A B 6 C 5 4 1 2 3

147

|+ Asymmetric path weights To double the size, each of the nodes in the graph is duplicated, creating a second ghost node. Using duplicate points with very low weights, such as , provides a cheap route "linking" back to the real node and allowing symmetric evaluation to continue. The original 33 matrix shown above is visible in the bottom left and the inverse of the original in the top-right. Both copies of the matrix have had their diagonals replaced by the low-cost hop paths, represented by .
A A B C A B C 6 5 1 4 2 3 B C A 1 2 B 6 3 C 5 4

|+ Symmetric path weights The original 33 matrix would produce two Hamiltonian cycles (a path that visits every node once), namely A-B-C-A [score 9] and A-C-B-A [score 12]. Evaluating the 66 symmetric version of the same problem now produces many paths, including A-A-B-B-C-C-A, A-B-C-A-A, A-A-B-C-A [all score 9 ]. The important thing about each new sequence is that there will be an alternation between dashed (A,B,C) and un-dashed nodes (A, B, C) and that the link to "jump" between any related pair (A-A) is effectively free. A version of the algorithm could use any weight for the A-A path, as long as that weight is lower than all other path weights present in the graph. As the path weight to "jump" must effectively be "free", the value zero (0) could be used to represent this costif zero is not being used for another purpose already (such as designating invalid paths). In the two examples above, non-existent paths between nodes are shown as a blank square.

Travelling salesman problem

148

Benchmarks
For benchmarking of TSP algorithms, TSPLIB [30] is a library of sample instances of the TSP and related problems is maintained, see the TSPLIB external reference. Many of them are lists of actual cities and layouts of actual printed circuits.

Human performance on TSP


The TSP, in particular the Euclidean variant of the problem, has attracted the attention of researchers in cognitive psychology. It is observed that humans are able to produce good quality solutions quickly. The first issue of the Journal of Problem Solving [31] is devoted to the topic of human performance on TSP.

TSP path length for random pointset in a square


Suppose N points are randomly distributed in a 1 x 1 square with N>>1. Consider many such squares. Suppose we want to know the average of the shortest path length (i.e. TSP solution) of each square.

Lower bound
is a lower bound obtained by assuming i be a point in the tour sequence and i has its nearest neighbor as its next in the path. is a better lower bound obtained by assuming is next is is nearest, and is previous is is second nearest. is an even better lower bound obtained by dividing the path sequence into two parts as before_i and after_i with each part containing N/2 points, and then deleting the before_i part to form a diluted pointset (see discussion). David S. Johnson[32] obtained a lower bound by computer experiment: , where 0.522 comes from the points near square boundary which have fewer neighbors. Christine L. Valenzuela and Antonia J. Jones [33] obtained another lower bound by computer experiment:

Upper bound
By applying Simulated Annealing method on samples of N=40000, computer analysis shows an upper bound of , where 0.72 comes from the boundary effect. Because the actual solution is only the shortest path, for the purposes of programmatic search another upper bound is the length of any previously discovered approximation.

Travelling salesman problem

149

Analyst's travelling salesman problem


There is an analogous problem in geometric measure theory which asks the following: under what conditions may a subset E of Euclidean space be contained in a rectifiable curve (that is, when is there a curve with finite length that visits every point in E)? This problem is known as the analyst's travelling salesman problem or the geometric travelling salesman problem.

Free software for solving TSP


Name (alphabetically) Concorde [34] License API language only executable C Brief info

free for academic ?

requires a linear solver installation for its MILP subproblem

DynOpt

[35]

an ANSI C implementation a dynamic programming based algorithm developed by Balas and Simonetti, approximate solution an effective implementation of the Lin-Kernighan heuristic for Euclidean traveling salesman problem exact and approximate solvers, STSP / ATSP, can handle multigraphs, constraints, multiobjective [37] problems, see its TSP page for details and examples branch and bound algorithm approximate solution of the STSP using the ``pgapack" package

LKH

[36]

research only

OpenOpt

BSD

Python

tspg

[38] [39]

GPL ?

C++ C

TSPGA

Popular Culture
Travelling Salesman, by director Timothy Lanzone, is the story of 4 mathematicians hired by the US Government to solve the most elusive problem in computer-science history: P vs. NP.[40]

Notes
[1] http:/ / www. mjc2. com/ logistics-planning-complexity. htm Why is vehicle routing hard - a simple explanation [2] "Der Handlungsreisende wie er sein soll und was er zu thun [sic] hat, um Auftrge zu erhalten und eines glcklichen Erfolgs in seinen Geschften gewi zu sein von einem alten Commis-Voyageur" (The traveling salesman how he must be and what he should do in order to be sure to perform his tasks and have success in his business by a high commis-voyageur) [3] A discussion of the early work of Hamilton and Kirkman can be found in Graph Theory 17361936 [4] Cited and English translation in Schrijver (2005). Original German: "Wir bezeichnen als Botenproblem (weil diese Frage in der Praxis von jedem Postboten, brigens auch von vielen Reisenden zu lsen ist) die Aufgabe, fr endlich viele Punkte, deren paarweise Abstnde bekannt sind, den krzesten die Punkte verbindenden Weg zu finden. Dieses Problem ist natrlich stets durch endlich viele Versuche lsbar. Regeln, welche die Anzahl der Versuche unter die Anzahl der Permutationen der gegebenen Punkte herunterdrcken wrden, sind nicht bekannt. Die Regel, man solle vom Ausgangspunkt erst zum nchstgelegenen Punkt, dann zu dem diesem nchstgelegenen Punkt gehen usw., liefert im allgemeinen nicht den krzesten Weg." [5] A detailed treatment of the connection between Menger and Whitney as well as the growth in the study of TSP can be found in Alexander Schrijver's 2005 paper "On the history of combinatorial optimization (till 1960). Handbook of Discrete Optimization (K. Aardal, G.L. Nemhauser, R. Weismantel, eds.), Elsevier, Amsterdam, 2005, pp. 168. PS (http:/ / homepages. cwi. nl/ ~lex/ files/ histco. ps), PDF (http:/ / homepages. cwi. nl/ ~lex/ files/ histco. pdf) [6] http:/ / www. google. com/ patents?vid=7054798 [7] Behzad, Arash; Modarres, Mohammad (2002), "New Efficient Transformation of the Generalized Traveling Salesman Problem into Traveling Salesman Problem", Proceedings of the 15th International Conference of Systems Engineering (Las Vegas) [8] Papadimitriou, C.H.; Steiglitz, K. (1998). Combinatorial optimization: algorithms and complexity. Mineola, NY: Dover. [9] Orponen (1987) [10] Papadimitriou (1983) [11] Christofides (1976) [12] Kaplan (2004)

Travelling salesman problem


[13] Kosaraju (1994) [14] Serdyukov (1984) [15] Hassin (2000) [16] Bellman (1960), Bellman (1962), Held & Karp (1962) [17] Kohn (1977) Karp (1982) [18] Woeginger (2003) [19] Work by David Applegate, AT&T Labs Research, Robert Bixby, ILOG and Rice University, Vaek Chvtal, Concordia University, William Cook, Georgia Tech, and Keld Helsgaun, Roskilde University is discussed on their project web page hosted by Georgia Tech and last updated in June 2004, here (http:/ / www. tsp. gatech. edu/ sweden/ ) [20] Johnson, D.S. and McGeoch, L.A.. "The traveling salesman problem: A case study in local optimization", Local search in combinatorial optimization, 1997, 215-310 [21] S. S. Ray, S. Bandyopadhyay and S. K. Pal, "Genetic Operators for Combinatorial Optimization in TSP and Microarray Gene Ordering," Applied Intelligence, 2007, 26(3). pp. 183-195. [22] A. B. Kahng and S. Reda, "Match Twice and Stitch: A New TSP Tour Construction Heuristic," Operations Research Letters, 2004, 32(6). pp. 499509. http:/ / dx. doi. org/ 10. 1016/ j. orl. 2004. 04. 001 [23] Marco Dorigo. Ant Colonies for the Traveling Salesman Problem. IRIDIA, Universit Libre de Bruxelles. IEEE Transactions on Evolutionary Computation, 1(1):5366. 1997. http:/ / citeseer. ist. psu. edu/ 86357. html [24] P. Berman (2006). M. Karpinski, "8/7-Approximation Algorithm for (1,2)-TSP", Proc. 17th ACM-SIAM SODA (2006), pp. 641648, ECCCTR05-069. [25] C.H. Papadimitriou and Santosh Vempala. On the approximability of the traveling salesman problem (http:/ / dx. doi. org/ 10. 1007/ s00493-006-0008-z), Combinatorica 26(1):101120, 2006. [26] L. Engebretsen, M. Karpinski, TSP with bounded metrics (http:/ / dx. doi. org/ 10. 1016/ j. jcss. 2005. 12. 001). Journal of Computer and System Sciences, 72(4):509546, 2006. [27] Christos H. Papadimitriou. "The Euclidean travelling salesman problem is NP-complete". Theoretical Computer Science 4:237244, 1977. doi:10.1016/0304-3975(77)90012-3 [28] Sanjeev Arora. Polynomial Time Approximation Schemes for Euclidean Traveling Salesman and other Geometric Problems. Journal of the ACM, Vol.45, Issue 5, pp.753782. ISSN:0004-5411. September 1998. http:/ / citeseer. ist. psu. edu/ arora96polynomial. html. [29] Roy Jonker and Ton Volgenant. "Transforming asymmetric into symmetric traveling salesman problems". Operations Research Letters 2:161163, 1983. doi:10.1016/0167-6377(83)90048-2 [30] http:/ / comopt. ifi. uni-heidelberg. de/ software/ TSPLIB95/ [31] http:/ / docs. lib. purdue. edu/ jps/ [32] David S. Johnson (http:/ / www. research. att. com/ ~dsj/ papers/ HKsoda. pdf) [33] Christine L. Valenzuela and Antonia J. Jones (http:/ / users. cs. cf. ac. uk/ Antonia. J. Jones/ Papers/ EJORHeldKarp/ HeldKarp. pdf) [34] http:/ / www. tsp. gatech. edu/ concorde. html [35] http:/ / www. andrew. cmu. edu/ user/ neils/ tsp/ [36] http:/ / www. akira. ruc. dk/ ~keld/ research/ LKH/ [37] http:/ / openopt. org/ TSP [38] http:/ / tspsg. info/ [39] http:/ / www. rz. uni-karlsruhe. de/ ~lh71/ [40] Geere, Duncan. "'Travelling Salesman' movie considers the repercussions if P equals NP" (http:/ / www. wired. co. uk/ news/ archive/ 2012-04/ 26/ travelling-salesman). Wired. . Retrieved 26 April 2012.

150

References
Applegate, D. L.; Bixby, R. M.; Chvtal, V.; Cook, W. J. (2006), The Traveling Salesman Problem, ISBN0-691-12993-2. Bellman, R. (1960), "Combinatorial Processes and Dynamic Programming", in Bellman, R., Hall, M., Jr. (eds.), Combinatorial Analysis, Proceedings of Symposia in Applied Mathematics 10,, American Mathematical Society, pp.217249. Bellman, R. (1962), "Dynamic Programming Treatment of the Travelling Salesman Problem", J. Assoc. Comput. Mach. 9: 6163, doi:10.1145/321105.321111. Christofides, N. (1976), Worst-case analysis of a new heuristic for the travelling salesman problem, Technical Report 388, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh. Hassin, R.; Rubinstein, S. (2000), "Better approximations for max TSP", Information Processing Letters 75 (4): 181186, doi:10.1016/S0020-0190(00)00097-1.

Travelling salesman problem Held, M.; Karp, R. M. (1962), "A Dynamic Programming Approach to Sequencing Problems", Journal of the Society for Industrial and Applied Mathematics 10 (1): 196210, doi:10.1137/0110015. Kaplan, H.; Lewenstein, L.; Shafrir, N.; Sviridenko, M. (2004), "Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs", In Proc. 44th IEEE Symp. on Foundations of Comput. Sci, pp.5665. Karp, R.M. (1982), "Dynamic programming meets the principle of inclusion and exclusion", Oper. Res. Lett. 1 (2): 4951, doi:10.1016/0167-6377(82)90044-X. Kohn, S.; Gottlieb, A.; Kohn, M. (1977), "A Generating Function Approach to the Traveling Salesman Problem", ACM Annual Conference, ACM Press, pp.294300. Kosaraju, S. R.; Park, J. K.; Stein, C. (1994), "Long tours and short superstrings'", Proc. 35th Ann. IEEE Symp. on Foundations of Comput. Sci, IEEE Computer Society, pp.166177. Orponen, P.; Mannila, H. (1987), "On approximation preserving reductions: Complete problems and robust measures'", Technical Report C-198728, Department of Computer Science, University of Helsinki. Papadimitriou, C. H.; Yannakakis, M. (1993), "The traveling salesman problem with distances one and two", Math. Oper. Res. 18: 111, doi:10.1287/moor.18.1.1. Serdyukov, A. I. (1984), "An algorithm with an estimate for the traveling salesman problem of the maximum'", Upravlyaemye Sistemy 25: 8086. Woeginger, G.J. (2003), "Exact Algorithms for NP-Hard Problems: A Survey", Combinatorial Optimization Eureka, You Shrink! Lecture notes in computer science, vol. 2570, Springer, pp.185207.

151

Further reading
Adleman, Leonard (1994), Molecular Computation of Solutions To Combinatorial Problems (http://www.usc. edu/dept/molecular-science/papers/fp-sci94.pdf) Applegate, D. L.; Bixby, R. E.; Chvtal, V.; Cook, W. J. (2006), The Traveling Salesman Problem: A Computational Study, Princeton University Press, ISBN978-0-691-12993-8. Arora, S. (1998), "Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems" (http://graphics.stanford.edu/courses/cs468-06-winter/Papers/arora-tsp.pdf), Journal of the ACM 45 (5): 753782, doi:10.1145/290179.290180. Babin, Gilbert; Deneault, Stphanie; Laportey, Gilbert (2005), Improvements to the Or-opt Heuristic for the Symmetric Traveling Salesman Problem (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.89. 9953), Cahiers du GERAD, G-2005-02, Montreal: Group for Research in Decision Analysis. Cook, William (2011), In Pursuit of the Travelling Salesman: Mathematics at the Limits of Computation, Princeton University Press, ISBN978-0-691-15270-7. Cook, William; Espinoza, Daniel; Goycoolea, Marcos (2007), "Computing with domino-parity inequalities for the TSP", INFORMS Journal on Computing 19 (3): 356365, doi:10.1287/ijoc.1060.0204. Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. (2001), "35.2: The traveling-salesman problem", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp.10271033, ISBN0-262-03293-7. Dantzig, G. B.; Fulkerson, R.; Johnson, S. M. (1954), "Solution of a large-scale traveling salesman problem", Operations Research 2 (4): 393410, doi:10.1287/opre.2.4.393, JSTOR166695. Garey, M. R.; Johnson, D. S. (1979), "A2.3: ND2224", Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, pp.211212, ISBN0-7167-1045-5. Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization & Machine Learning, New York: Addison-Wesley, ISBN0-201-15767-5. Gutin, G.; Yeo, A.; Zverovich, A. (2002), "Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the TSP", Discrete Applied Mathematics 117 (13): 8186, doi:10.1016/S0166-218X(01)00195-0.

Travelling salesman problem Gutin, G.; Punnen, A. P. (2006), The Traveling Salesman Problem and Its Variations, Springer, ISBN0-387-44459-9. Johnson, D. S.; McGeoch, L. A. (1997), "The Traveling Salesman Problem: A Case Study in Local Optimization", in Aarts, E. H. L.; Lenstra, J. K., Local Search in Combinatorial Optimisation, John Wiley and Sons Ltd, pp.215310. Lawler, E. L.; Lenstra, J. K.; Rinnooy Kan, A. H. G.; Shmoys, D. B. (1985), The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization, John Wiley & Sons, ISBN0-471-90413-9. MacGregor, J. N.; Ormerod, T. (1996), "Human performance on the traveling salesman problem" (http://www. psych.lancs.ac.uk/people/uploads/TomOrmerod20030716T112601.pdf), Perception & Psychophysics 58 (4): 527539, doi:10.3758/BF03213088. Mitchell, J. S. B. (1999), "Guillotine subdivisions approximate polygonal subdivisions: A simple polynomial-time approximation scheme for geometric TSP, k-MST, and related problems" (http://citeseer.ist.psu.edu/622594. html), SIAM Journal on Computing 28 (4): 12981309, doi:10.1137/S0097539796309764. Rao, S.; Smith, W. (1998), "Approximating geometrical graphs via 'spanners' and 'banyans'", Proc. 30th Annual ACM Symposium on Theory of Computing, pp.540550. Rosenkrantz, Daniel J.; Stearns, Richard E.; Lewis, Philip M., II (1977), "An Analysis of Several Heuristics for the Traveling Salesman Problem", SIAM Journal on Computing 6 (5): 563581, doi:10.1137/0206041. Vickers, D.; Butavicius, M.; Lee, M.; Medvedev, A. (2001), "Human performance on visually presented traveling salesman problems", Psychological Research 65 (1): 3445, doi:10.1007/s004260000031, PMID11505612. Walshaw, Chris (2000), A Multilevel Approach to the Travelling Salesman Problem, CMS Press. Walshaw, Chris (2001), A Multilevel Lin-Kernighan-Helsgaun Algorithm for the Travelling Salesman Problem, CMS Press.

152

External links
Traveling Salesman Problem (http://www.tsp.gatech.edu/index.html) at Georgia Tech TSPLIB (http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/) at the University of Heidelberg Traveling Salesman Problem (http://demonstrations.wolfram.com/TravelingSalesmanProblem/) by Jon McLoone optimap (http://www.gebweb.net/optimap/) an approximation using ACO on GoogleMaps with JavaScript tsp (http://travellingsalesmanproblem.appspot.com/) an exact solver using Constraint Programming on GoogleMaps Demo applet of a genetic algorithm solving TSPs and VRPTW problems (http://www.dna-evolutions.com/ dnaappletsample.html) Source code library for the travelling salesman problem (http://www.adaptivebox.net/CILib/code/ tspcodes_link.html) TSP solvers in R (http://tsp.r-forge.r-project.org/) for symmetric and asymmetric TSPs. Implements various insertion, nearest neighbor and 2-opt heuristics and an interface to Georgia Tech's Concorde and Chained Lin-Kernighan heuristics. Traveling Salesman (on IMDB) (http://www.imdb.com/title/tt1801123/) Traveling Salesman Movie (http://www.travellingsalesmanmovie.com/) Official webpage of Traveling Salesman film (2012)

Turing machine

153

Turing machine
A Turing machine is a hypothetical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described in 1936 by Alan Turing[1] who called it an "a-machine" (automatic machine). The Turing machine is An artistic representation of a Turing machine not intended as practical computing technology, but rather as a (Rules table not represented) hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechanical computation. Turing gave a succinct definition of the experiment in his 1948 essay, "Intelligent Machinery". Referring to his 1936 publication, Turing wrote that the Turing machine, here called a Logical Computing Machine, consisted of: ...an unlimited memory capacity obtained in the form of an infinite tape marked out into squares, on each of which a symbol could be printed. At any moment there is one symbol in the machine; it is called the scanned symbol. The machine can alter the scanned symbol and its behavior is in part determined by that symbol, but the symbols on the tape elsewhere do not affect the behaviour of the machine. However, the tape can be moved back and forth through the machine, this being one of the elementary operations of the machine. Any symbol on the tape may therefore eventually have an innings.[2] (Turing 1948, p. 61) A Turing machine that is able to simulate any other Turing machine is called a universal Turing machine (UTM, or simply a universal machine). A more mathematically oriented definition with a similar "universal" nature was introduced by Alonzo Church, whose work on lambda calculus intertwined with Turing's in a formal theory of computation known as the ChurchTuring thesis. The thesis states that Turing machines indeed capture the informal notion of effective method in logic and mathematics, and provide a precise definition of an algorithm or 'mechanical procedure'. Studying their abstract properties yields many insights into computer science and complexity theory.

Informal description
For visualizations of Turing machines, see Turing machine gallery. The Turing machine mathematically models a machine that mechanically operates on a tape. On this tape are symbols which the machine can read and write, one at a time, using a tape head. Operation is fully determined by a finite set of elementary instructions such as "in state 42, if the symbol seen is 0, write a 1; if the symbol seen is 1, change into state 17; in state 17, if the symbol seen is 0, write a 1 and change to state 6;" etc. In the original article ("On computable numbers, with an application to the Entscheidungsproblem", see also references below), Turing imagines not a mechanism, but a person whom he calls the "computer", who executes these deterministic mechanical rules slavishly (or as Turing puts it, "in a desultory manner").

Turing machine

154

More precisely, a Turing machine consists of: 1. A tape which is divided into cells, one next to the other. Each cell contains a symbol from some finite alphabet. The alphabet contains a special blank symbol (here written as '0') and one or more other symbols. The tape is assumed to be arbitrarily extendable to the left and to the right, i.e., the Turing machine is always supplied with as much tape as it needs for its computation. Cells that have not been written to before are assumed to be filled with the blank symbol. In some models the tape has a left end marked with a special symbol; the tape extends or is indefinitely extensible to the right.

The head is always over a particular square of the tape; only a finite stretch of squares is shown. The instruction to be performed (q4) is shown over the scanned square. (Drawing after Kleene (1952) p.375.)

Here, the internal state (q1) is shown inside the head, and the illustration describes the tape as being infinite and pre-filled with "0", the symbol serving as blank. The system's full state (its complete configuration) consists of the internal state, any non-blank symbols on the tape (in this illustration "11B"), and the position of the head relative to those symbols including blanks, i.e. "011B". (Drawing after Minsky (1967) p. 121).

2. A head that can read and write symbols on the tape and move the tape left and right one (and only one) cell at a time. In some models the head moves and the tape is stationary. 3. A state register that stores the state of the Turing machine, one of finitely many. There is one special start state with which the state register is initialized. These states, writes Turing, replace the "state of mind" a person performing computations would ordinarily be in. 4. A finite table (occasionally called an action table or transition function) of instructions (usually quintuples [5-tuples] : qiajqi1aj1dk, but sometimes 4-tuples) that, given the state(qi) the machine is currently in and the symbol(aj) it is reading on the tape (symbol currently under the head) tells the machine to do the following in sequence (for the 5-tuple models): Either erase or write a symbol (replacing aj with aj1), and then Move the head (which is described by dk and can have values: 'L' for one step left or 'R' for one step right or 'N' for staying in the same place), and then Assume the same or a new state as prescribed (go to state qi1). In the 4-tuple models, erasing or writing a symbol (aj1) and moving the head left or right (dk) are specified as separate instructions. Specifically, the table tells the machine to (ia) erase or write a symbol or (ib) move the head left or right, and then (ii) assume the same or a new state as prescribed, but not both actions (ia) and (ib) in the same instruction. In some models, if there is no entry in the table for the current combination of symbol and state then the machine will halt; other models require all entries to be filled. Note that every part of the machineits state and symbol-collectionsand its actionsprinting, erasing and tape motionis finite, discrete and distinguishable; it is the potentially unlimited amount of tape that gives it an unbounded amount of storage space.

Turing machine

155

Formal definition
Hopcroft and Ullman (1979, p.148) formally define a (one-tape) Turing machine as a 7-tuple where is a finite, non-empty set of states is a finite, non-empty set of the tape alphabet/symbols is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation) is the set of input symbols is the initial state is the set of final or accepting states. is a partial function called the transition function, where L is left shift,

R is right shift. (A relatively uncommon variant allows "no shift", say N, as a third element of the latter set.) Anything that operates according to these specifications is a Turing machine. The 7-tuple for the 3-state busy beaver looks like this (see more about this busy beaver at Turing machine examples):

("blank") (the initial state) see state-table below

Initially all tape cells are marked with 0.

State table for 3 state, 2 symbol busy beaver


Tape symbol Current state A Current state B Current state C

Write symbol Move tape Next state Write symbol Move tape Next state Write symbol Move tape Next state 0 1 1 1 R L B C 1 1 L R A B 1 1 L R B HALT

Additional details required to visualize or implement Turing machines


In the words of van Emde Boas (1990), p.6: "The set-theoretical object [his formal seven-tuple description similar to the above] provides only partial information on how the machine will behave and what its computations will look like." For instance, There will need to be many decisions on what the symbols actually look like, and a failproof way of reading and writing symbols indefinitely. The shift left and shift right operations may shift the tape head across the tape, but when actually building a Turing machine it is more practical to make the tape slide back and forth under the head instead. The tape can be finite, and automatically extended with blanks as needed (which is closest to the mathematical definition), but it is more common to think of it as stretching infinitely at both ends and being pre-filled with blanks except on the explicitly given finite fragment the tape head is on. (This is, of course, not implementable in practice.) The tape cannot be fixed in length, since that would not correspond to the given definition and would seriously limit the range of computations the machine can perform to those of a linear bounded automaton.

Turing machine

156

Alternative definitions
Definitions in literature sometimes differ slightly, to make arguments or proofs easier or clearer, but this is always done in such a way that the resulting machine has the same computational power. For example, changing the set to , where N ("None" or "No-operation") would allow the machine to stay on the same tape cell instead of moving left or right, does not increase the machine's computational power. The most common convention represents each "Turing instruction" in a "Turing table" by one of nine 5-tuples, per the convention of Turing/Davis (Turing (1936) in Undecidable, p.126-127 and Davis (2000) p.152): (definition 1): (qi, Sj, Sk/E/N, L/R/N, qm) ( current state qi , symbol scanned Sj , print symbol Sk/erase E/none N , move_tape_one_square left L/right R/none N , new state qm ) Other authors (Minsky (1967) p.119, Hopcroft and Ullman (1979) p.158, Stone (1972) p.9) adopt a different convention, with new state qm listed immediately after the scanned symbol Sj: (definition 2): (qi, Sj, qm, Sk/E/N, L/R/N) ( current state qi , symbol scanned Sj , new state qm , print symbol Sk/erase E/none N , move_tape_one_square left L/right R/none N ) For the remainder of this article "definition 1" (the Turing/Davis convention) will be used.

Example: state table for the 3-state 2-symbol busy beaver reduced to 5-tuples
Current state Scanned symbol A A B B C C 0 1 0 1 0 1 Print symbol Move tape Final (i.e. next) state 1 1 1 1 1 1 R L L R L N B C A B B H 5-tuples (A, 0, 1, R, B) (A, 1, 1, L, C) (B, 0, 1, L, A) (B, 1, 1, R, B) (C, 0, 1, L, B) (C, 1, 1, N, H)

In the following table, Turing's original model allowed only the first three lines that he called N1, N2, N3 (cf Turing in Undecidable, p.126). He allowed for erasure of the "scanned square" by naming a 0th symbol S0 = "erase" or "blank", etc. However, he did not allow for non-printing, so every instruction-line includes "print symbol Sk" or "erase" (cf footnote 12 in Post (1947), Undecidable p.300). The abbreviations are Turing's (Undecidable p.119). Subsequent to Turing's original paper in 19361937, machine-models have allowed all nine possible types of five-tuples:
Current m-configuration (Turing state) N1 qi qi qi qi qi Tape symbol Sj Sj Sj Sj Sj Print-operation Tape-motion Final m-configuration (Turing state) qm qm qm qm qm 5-tuple 5-tuple comments "blank" = S0, 1=S1, etc. "blank" = S0, 1=S1, etc. "blank" = S0, 1=S1, etc. (qi, Sj, Sk, qm) (qi, Sj, L, qm) (qi, Sj, R, qm) 4-tuple

Print(Sk) Print(Sk) Print(Sk) None N

Left L

(qi, Sj, Sk, L, qm) (qi, Sj, Sk, R, qm) (qi, Sj, Sk, N, qm) (qi, Sj, N, L, qm) (qi, Sj, N, R, qm)

N2

Right R

N3

None N

Left L

None N

Right R

Turing machine

157
qi qi qi qi Sj Sj Sj Sj None N None N qm qm qm qm (qi, Sj, N, N, qm) (qi, Sj, E, L, qm) (qi, Sj, E, R, qm) (qi, Sj, E, N, qm) (qi, Sj, E, qm) Direct "jump" (qi, Sj, N, qm)

Erase

Left L

Erase

Right R

Erase

None N

Any Turing table (list of instructions) can be constructed from the above nine 5-tuples. For technical reasons, the three non-printing or "N" instructions (4, 5, 6) can usually be dispensed with. For examples see Turing machine examples. Less frequently the use of 4-tuples are encountered: these represent a further atomization of the Turing instructions (cf Post (1947), Boolos & Jeffrey (1974, 1999), Davis-Sigal-Weyuker (1994)); also see more at PostTuring machine.

The "state"
The word "state" used in context of Turing machines can be a source of confusion, as it can mean two things. Most commentators after Turing have used "state" to mean the name/designator of the current instruction to be performedi.e. the contents of the state register. But Turing (1936) made a strong distinction between a record of what he called the machine's "m-configuration", (its internal state) and the machine's (or person's) "state of progress" through the computation - the current state of the total system. What Turing called "the state formula" includes both the current instruction and all the symbols on the tape: Thus the state of progress of the computation at any stage is completely determined by the note of instructions and the symbols on the tape. That is, the state of the system may be described by a single expression (sequence of symbols) consisting of the symbols on the tape followed by (which we suppose not to appear elsewhere) and then by the note of instructions. This expression is called the 'state formula'. Undecidable, p.139140, emphasis added Earlier in his paper Turing carried this even further: he gives an example where he places a symbol of the current "m-configuration"the instruction's labelbeneath the scanned square, together with all the symbols on the tape (Undecidable, p.121); this he calls "the complete configuration" (Undecidable, p.118). To print the "complete configuration" on one line he places the state-label/m-configuration to the left of the scanned symbol. A variant of this is seen in Kleene (1952) where Kleene shows how to write the Gdel number of a machine's "situation": he places the "m-configuration" symbol q4 over the scanned square in roughly the center of the 6 non-blank squares on the tape (see the Turing-tape figure in this article) and puts it to the right of the scanned square. But Kleene refers to "q4" itself as "the machine state" (Kleene, p.374-375). Hopcroft and Ullman call this composite the "instantaneous description" and follow the Turing convention of putting the "current state" (instruction-label, m-configuration) to the left of the scanned symbol (p.149). Example: total state of 3-state 2-symbol busy beaver after 3 "moves" (taken from example "run" in the figure below): 1A1 This means: after three moves the tape has ... 000110000 ... on it, the head is scanning the right-most 1, and the state is A. Blanks (in this case represented by "0"s) can be part of the total state as shown here: B01 ; the tape has a single 1 on it, but the head is scanning the 0 ("blank") to its left and the state is B. "State" in the context of Turing machines should be clarified as to which is being described: (i) the current instruction, or (ii) the list of symbols on the tape together with the current instruction, or (iii) the list of symbols on

Turing machine the tape together with the current instruction placed to the left of the scanned symbol or to the right of the scanned symbol. Turing's biographer Andrew Hodges (1983: 107) has noted and discussed this confusion.

158

Turing machine "state" diagrams The table for the 3-state busy beaver ("P" = print/write a "1")
Tape symbol Current state A Current state B Current state C

Write symbol Move tape Next state Write symbol Move tape Next state Write symbol Move tape Next state 0 1 P P R L B C P P L R A B P P L R B HALT

To the right: the above TABLE as expressed as a "state transition" diagram. Usually large TABLES are better left as tables (Booth, p.74). They are more readily simulated by computer in tabular form (Booth, p.74). However, certain conceptse.g. machines with "reset" states and machines with repeating patterns (cf Hill and Peterson p.244ff)can be more readily seen when viewed as a drawing. Whether a drawing represents an improvement on its TABLE must be decided by the reader for the particular context. See Finite state machine for more. The reader should again be cautioned that such diagrams represent a snapshot of their TABLE frozen in time, not the course ("trajectory") of a computation through time and/or space. While every time the busy beaver machine "runs" it will always follow the same state-trajectory, this is not true for the "copy" machine that can be provided with variable input "parameters". The diagram "Progress of the computation" shows the 3-state busy beaver's "state" (instruction) progress through its computation from start to

The "3-state busy beaver" Turing machine in a finite state representation. Each circle represents a "state" of the TABLEan "m-configuration" or "instruction". "Direction" of a state transition is shown by an arrow. The label (e.g.. 0/P,R) near the outgoing state (at the "tail" of the arrow) specifies the scanned symbol that causes a particular transition (e.g. 0) followed by a slash /, followed by the subsequent "behaviors" of the machine, e.g. "P Print" then move tape "R Right". No general accepted format exists. The convention shown is after McClusky (1965), Booth (1967), Hill, and Peterson (1974).

The evolution of the busy-beaver's computation starts at the top and proceeds to the bottom.

Turing machine finish. On the far right is the Turing "complete configuration" (Kleene "situation", HopcroftUllman "instantaneous description") at each step. If the machine were to be stopped and cleared to blank both the "state register" and entire tape, these "configurations" could be used to rekindle a computation anywhere in its progress (cf Turing (1936) Undecidable pp.139140).

159

Models equivalent to the Turing machine model


Many machines that might be thought to have more computational capability than a simple universal Turing machine can be shown to have no more power (Hopcroft and Ullman p.159, cf Minsky (1967)). They might compute faster, perhaps, or use less memory, or their instruction set might be smaller, but they cannot compute more powerfully (i.e. more mathematical functions). (Recall that the ChurchTuring thesis hypothesizes this to be true for any kind of machine: that anything that can be "computed" can be computed by some Turing machine.) A Turing machine is equivalent to a pushdown automaton that has been made more flexible and concise by relaxing the last-in-first-out requirement of its stack. At the other extreme, some very simple models turn out to be Turing-equivalent, i.e. to have the same computational power as the Turing machine model. Common equivalent models are the multi-tape Turing machine, multi-track Turing machine, machines with input and output, and the non-deterministic Turing machine (NDTM) as opposed to the deterministic Turing machine (DTM) for which the action table has at most one entry for each combination of symbol and state. Read-only, right-moving Turing machines are equivalent to NDFAs (as well as DFAs by conversion using the NDFA to DFA conversion algorithm). For practical and didactical intentions the equivalent register machine can be used as a usual assembly programming language.

Choice c-machines, Oracle o-machines


Early in his paper (1936) Turing makes a distinction between an "automatic machine"its "motion ... completely determined by the configuration" and a "choice machine": ...whose motion is only partially determined by the configuration ... When such a machine reaches one of these ambiguous configurations, it cannot go on until some arbitrary choice has been made by an external operator. This would be the case if we were using machines to deal with axiomatic systems. Undecidable, p. 118 Turing (1936) does not elaborate further except in a footnote in which he describes how to use an a-machine to "find all the provable formulae of the [Hilbert] calculus" rather than use a choice machine. He "suppose[s] that the choices are always between two possibilities 0 and 1. Each proof will then be determined by a sequence of choices i1, i2, ..., in (i1 = 0 or 1, i2 = 0 or 1, ..., in = 0 or 1), and hence the number 2n + i12n-1 + i22n-2 + ... +in completely determines the proof. The automatic machine carries out successively proof 1, proof 2, proof 3, ..." (Footnote , Undecidable, p.138) This is indeed the technique by which a deterministic (i.e. a-) Turing machine can be used to mimic the action of a nondeterministic Turing machine; Turing solved the matter in a footnote and appears to dismiss it from further consideration. An oracle machine or o-machine is a Turing a-machine that pauses its computation at state "o" while, to complete its calculation, it "awaits the decision" of "the oracle"an unspecified entity "apart from saying that it cannot be a machine" (Turing (1939), Undecidable p.166168). The concept is now actively used by mathematicians.

Turing machine

160

Universal Turing machines


As Turing wrote in Undecidable, p.128 (italics added): It is possible to invent a single machine which can be used to compute any computable sequence. If this machine U is supplied with the tape on the beginning of which is written the string of quintuples separated by semicolons of some computing machine M, then U will compute the same sequence as M. This finding is now taken for granted, but at the time (1936) it was considered astonishing. The model of computation that Turing called his "universal machine""U" for shortis considered by some (cf Davis (2000)) to have been the fundamental theoretical breakthrough that led to the notion of the Stored-program computer. Turing's paper ... contains, in essence, the invention of the modern computer and some of the programming techniques that accompanied it. Minsky (1967), p. 104 In terms of computational complexity, a multi-tape universal Turing machine need only be slower by logarithmic factor compared to the machines it simulates. This result was obtained in 1966 by F. C. Hennie and R. E. Stearns. (Arora and Barak, 2009, theorem 1.9)

Comparison with real machines


It is often said that Turing machines, unlike simpler automata, are as powerful as real machines, and are able to execute any operation that a real program can. What is missed in this statement is that, because a real machine can only be in finitely many configurations, in fact this "real machine" is nothing but a linear bounded automaton. On the other hand, Turing machines are equivalent to machines that have an unlimited amount of storage space for their computations. In fact, Turing machines are not intended to model computers, but rather they are intended to model computation itself; historically, computers, which compute only on their (fixed) internal storage, were developed only later.

A Turing machine realization in LEGO

There are a number of ways to explain why Turing machines are useful models of real computers: 1. Anything a real computer can compute, a Turing machine can also compute. For example: "A Turing machine can simulate any type of subroutine found in programming languages, including recursive procedures and any of the known parameter-passing mechanisms" (Hopcroft and Ullman p.157). A large enough FSA can also model any real computer, disregarding IO. Thus, a statement about the limitations of Turing machines will also apply to real computers. 2. The difference lies only with the ability of a Turing machine to manipulate an unbounded amount of data. However, given a finite amount of time, a Turing machine (like a real machine) can only manipulate a finite amount of data. 3. Like a Turing machine, a real machine can have its storage space enlarged as needed, by acquiring more disks or other storage media. If the supply of these runs short, the Turing machine may become less useful as a model. But the fact is that neither Turing machines nor real machines need astronomical amounts of storage space in order to perform useful computation. The processing time required is usually much more of a problem. 4. Descriptions of real machine programs using simpler abstract models are often much more complex than descriptions using Turing machines. For example, a Turing machine describing an algorithm may have a few hundred states, while the equivalent deterministic finite automaton (DFA) on a given real machine has quadrillions. This makes the DFA representation infeasible to analyze.

Turing machine 5. Turing machines describe algorithms independent of how much memory they use. There is a limit to the memory possessed by any current machine, but this limit can rise arbitrarily in time. Turing machines allow us to make statements about algorithms which will (theoretically) hold forever, regardless of advances in conventional computing machine architecture. 6. Turing machines simplify the statement of algorithms. Algorithms running on Turing-equivalent abstract machines are usually more general than their counterparts running on real machines, because they have arbitrary-precision data types available and never have to deal with unexpected conditions (including, but not limited to, running out of memory). One way in which Turing machines are a poor model for programs is that many real programs, such as operating systems and word processors, are written to receive unbounded input over time, and therefore do not halt. Turing machines do not model such ongoing computation well (but can still model portions of it, such as individual procedures).

161

Limitations of Turing machines


Computational complexity theory A limitation of Turing machines is that they do not model the strengths of a particular arrangement well. For instance, modern stored-program computers are actually instances of a more specific form of abstract machine known as the random access stored program machine or RASP machine model. Like the Universal Turing machine the RASP stores its "program" in "memory" external to its finite-state machine's "instructions". Unlike the universal Turing machine, the RASP has an infinite number of distinguishable, numbered but unbounded "registers"memory "cells" that can contain any integer (cf. Elgot and Robinson (1964), Hartmanis (1971), and in particular Cook-Rechow (1973); references at random access machine). The RASP's finite-state machine is equipped with the capability for indirect addressing (e.g. the contents of one register can be used as an address to specify another register); thus the RASP's "program" can address any register in the register-sequence. The upshot of this distinction is that there are computational optimizations that can be performed based on the memory indices, which are not possible in a general Turing machine; thus when Turing machines are used as the basis for bounding running times, a 'false lower bound' can be proven on certain algorithms' running times (due to the false simplifying assumption of a Turing machine). An example of this is binary search, an algorithm that can be shown to perform more quickly when using the RASP model of computation rather than the Turing machine model. Concurrency Another limitation of Turing machines is that they do not model concurrency well. For example, there is a bound on the size of integer that can be computed by an always-halting nondeterministic Turing machine starting on a blank tape. (See article on unbounded nondeterminism.) By contrast, there are always-halting concurrent systems with no inputs that can compute an integer of unbounded size. (A process can be created with local storage that is initialized with a count of 0 that concurrently sends itself both a stop and a go message. When it receives a go message, it increments its count by 1 and sends itself a go message. When it receives a stop message, it stops with an unbounded number in its local storage.)

Turing machine

162

History
They were described in 1936 by Alan Turing.

Historical background: computational machinery


Robin Gandy (19191995)a student of Alan Turing (19121954) and his lifelong friendtraces the lineage of the notion of "calculating machine" back to Babbage (circa 1834) and actually proposes "Babbage's Thesis": That the whole of development and operations of analysis are now capable of being executed by machinery. (italics in Babbage as cited by Gandy, p. 54) Gandy's analysis of Babbage's Analytical Engine describes the following five operations (cf p.5253): 1. 2. 3. 4. 5. The arithmetic functions +, , where indicates "proper" subtraction xy=0 if yx Any sequence of operations is an operation Iteration of an operation (repeating n times an operation P) Conditional iteration (repeating n times an operation P conditional on the "success" of test T) Conditional transfer (i.e. conditional "goto").

Gandy states that "the functions which can be calculated by (1), (2), and (4) are precisely those which are Turing computable." (p.53). He cites other proposals for "universal calculating machines" included those of Percy Ludgate (1909), Leonardo Torres y Quevedo (1914), Maurice d'Ocagne (1922), Louis Couffignal (1933), Vannevar Bush (1936), Howard Aiken (1937). However: ...the emphasis is on programming a fixed iterable sequence of arithmetical operations. The fundamental importance of conditional iteration and conditional transfer for a general theory of calculating machines is not recognized ... Gandy p. 55

The Entscheidungsproblem (the "decision problem"): Hilbert's tenth question of 1900


With regards to Hilbert's problems posed by the famous mathematician David Hilbert in 1900, an aspect of problem #10 had been floating about for almost 30 years before it was framed precisely. Hilbert's original expression for #10 is as follows: 10. Determination of the solvability of a Diophantine equation. Given a Diophantine equation with any number of unknown quantities and with rational integral coefficients: To devise a process according to which it can be determined in a finite number of operations whether the equation is solvable in rational integers. The Entscheidungsproblem [decision problem for first-order logic] is solved when we know a procedure that allows for any given logical expression to decide by finitely many operations its validity or satisfiability ... The Entscheidungsproblem must be considered the main problem of mathematical logic. quoted, with this translation and the original German, in Dershowitz and Gurevich, 2008 By 1922, this notion of "Entscheidungsproblem" had developed a bit, and H. Behmann stated that ...most general form of the Entscheidungsproblem [is] as follows: A quite definite generally applicable prescription is required which will allow one to decide in a finite number of steps the truth or falsity of a given purely logical assertion ... Gandy p. 57, quoting Behmann Behmann remarks that ... the general problem is equivalent to the problem of deciding which mathematical propositions are true. ibid.

Turing machine If one were able to solve the Entscheidungsproblem then one would have a "procedure for solving many (or even all) mathematical problems". ibid., p. 92 By the 1928 international congress of mathematicians Hilbert "made his questions quite precise. First, was mathematics complete ... Second, was mathematics consistent ... And thirdly, was mathematics decidable?" (Hodges p.91, Hawking p.1121). The first two questions were answered in 1930 by Kurt Gdel at the very same meeting where Hilbert delivered his retirement speech (much to the chagrin of Hilbert); the thirdthe Entscheidungsproblemhad to wait until the mid-1930s. The problem was that an answer first required a precise definition of "definite general applicable prescription", which Princeton professor Alonzo Church would come to call "effective calculability", and in 1928 no such definition existed. But over the next 67 years Emil Post developed his definition of a worker moving from room to room writing and erasing marks per a list of instructions (Post 1936), as did Church and his two students Stephen Kleene and J. B. Rosser by use of Church's lambda-calculus and Gdel's recursion theory (1934). Church's paper (published 15 April 1936) showed that the Entscheidungsproblem was indeed "undecidable" and beat Turing to the punch by almost a year (Turing's paper submitted 28 May 1936, published January 1937). In the meantime, Emil Post submitted a brief paper in the fall of 1936, so Turing at least had priority over Post. While Church refereed Turing's paper, Turing had time to study Church's paper and add an Appendix where he sketched a proof that Church's lambda-calculus and his machines would compute the same functions. But what Church had done was something rather different, and in a certain sense weaker. ... the Turing construction was more direct, and provided an argument from first principles, closing the gap in Church's demonstration. Hodges p. 112 And Post had only proposed a definition of calculability and criticized Church's "definition", but had proved nothing.

163

Alan Turing's a- (automatic-)machine


In the spring of 1935 Turing as a young Master's student at King's College Cambridge, UK, took on the challenge; he had been stimulated by the lectures of the logician M. H. A. Newman "and learned from them of Gdel's work and the Entscheidungsproblem ... Newman used the word 'mechanical' ... In his obituary of Turing 1955 Newman writes: To the question 'what is a "mechanical" process?' Turing returned the characteristic answer 'Something that can be done by a machine' and he embarked on the highly congenial task of analysing the general notion of a computing machine. Gandy, p. 74 Gandy states that: I suppose, but do not know, that Turing, right from the start of his work, had as his goal a proof of the undecidability of the Entscheidungsproblem. He told me that the 'main idea' of the paper came to him when he was lying in Grantchester meadows in the summer of 1935. The 'main idea' might have either been his analysis of computation or his realization that there was a universal machine, and so a diagonal argument to prove unsolvability. ibid., p. 76 While Gandy believed that Newman's statement above is "misleading", this opinion is not shared by all. Turing had a lifelong interest in machines: "Alan had dreamt of inventing typewriters as a boy; [his mother] Mrs. Turing had a typewriter; and he could well have begun by asking himself what was meant by calling a typewriter 'mechanical'" (Hodges p.96). While at Princeton pursuing his PhD, Turing built a Boolean-logic multiplier (see below). His PhD thesis, titled "Systems of Logic Based on Ordinals", contains the following definition of "a computable function":

Turing machine It was stated above that 'a function is effectively calculable if its values can be found by some purely mechanical process'. We may take this statement literally, understanding by a purely mechanical process one which could be carried out by a machine. It is possible to give a mathematical description, in a certain normal form, of the structures of these machines. The development of these ideas leads to the author's definition of a computable function, and to an identification of computability with effective calculability. It is not difficult, though somewhat laborious, to prove that these three definitions [the 3rd is the -calculus] are equivalent. Turing (1939) in The Undecidable, p. 160 When Turing returned to the UK he ultimately became jointly responsible for breaking the German secret codes created by encryption machines called "The Enigma"; he also became involved in the design of the ACE (Automatic Computing Engine), "[Turing's] ACE proposal was effectively self-contained, and its roots lay not in the EDVAC [the USA's initiative], but in his own universal machine" (Hodges p.318). Arguments still continue concerning the origin and nature of what has been named by Kleene (1952) Turing's Thesis. But what Turing did prove with his computational-machine model appears in his paper On Computable Numbers, With an Application to the Entscheidungsproblem (1937): [that] the Hilbert Entscheidungsproblem can have no solution ... I propose, therefore to show that there can be no general process for determining whether a given formula U of the functional calculus K is provable, i.e. that there can be no machine which, supplied with any one U of these formulae, will eventually say whether U is provable. from Turing's paper as reprinted in The Undecidable, p. 145 Turing's example (his second proof): If one is to ask for a general procedure to tell us: "Does this machine ever print 0", the question is "undecidable".

164

19371970: The "digital computer", the birth of "computer science"


In 1937, while at Princeton working on his PhD thesis, Turing built a digital (Boolean-logic) multiplier from scratch, making his own electromechanical relays (Hodges p.138). "Alan's task was to embody the logical design of a Turing machine in a network of relay-operated switches ..." (Hodges p.138). While Turing might have been just initially curious and experimenting, quite-earnest work in the same direction was going in Germany (Konrad Zuse (1938)), and in the United States (Howard Aiken) and George Stibitz (1937); the fruits of their labors were used by the Axis and Allied military in World War II (cf Hodges p.298299). In the early to mid-1950s Hao Wang and Marvin Minsky reduced the Turing machine to a simpler form (a precursor to the Post-Turing machine of Martin Davis); simultaneously European researchers were reducing the new-fangled electronic computer to a computer-like theoretical object equivalent to what was now being called a "Turing machine". In the late 1950s and early 1960s, the coincidentally parallel developments of Melzak and Lambek (1961), Minsky (1961), and Shepherdson and Sturgis (1961) carried the European work further and reduced the Turing machine to a more friendly, computer-like abstract model called the counter machine; Elgot and Robinson (1964), Hartmanis (1971), Cook and Reckhow (1973) carried this work even further with the register machine and random access machine modelsbut basically all are just multi-tape Turing machines with an arithmetic-like instruction set.

Turing machine

165

1970present: the Turing machine as a model of computation


Today the counter, register and random-access machines and their sire the Turing machine continue to be the models of choice for theorists investigating questions in the theory of computation. In particular, computational complexity theory makes use of the Turing machine: Depending on the objects one likes to manipulate in the computations (numbers like nonnegative integers or alphanumeric strings), two models have obtained a dominant position in machine-based complexity theory: the off-line multitape Turing machine..., which represents the standard model for string-oriented computation, and the random access machine (RAM) as introduced by Cook and Reckhow ..., which models the idealized Von Neumann style computer. van Emde Boas 1990:4 Only in the related area of analysis of algorithms this role is taken over by the RAM model. van Emde Boas 1990:16

Notes
[1] The idea came to him in mid-1935 (perhaps, see more in the History section) after a question posed by M. H. A. Newman in his lectures: "Was there a definite method, or as Newman put it, a mechanical process which could be applied to a mathematical statement, and which would come up with the answer as to whether it was provable" (Hodges 1983:93). Turing submitted his paper on 31 May 1936 to the London Mathematical Society for its Proceedings (cf Hodges 1983:112), but it was published in early 1937 and offprints were available in February 1937 (cf Hodges 1983:129). [2] See the definition of "innings" on Wiktionary

References
Primary literature, reprints, and compilations
B. Jack Copeland ed. (2004), The Essential Turing: Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence, and Artificial Life plus The Secrets of Enigma, Clarendon Press (Oxford University Press), Oxford UK, ISBN 0-19-825079-7. Contains the Turing papers plus a draft letter to Emil Post re his criticism of "Turing's convention", and Donald W. Davies' Corrections to Turing's Universal Computing Machine Martin Davis (ed.) (1965), The Undecidable, Raven Press, Hewlett, NY. Emil Post (1936), "Finite Combinatory ProcessesFormulation 1", Journal of Symbolic Logic, 1, 103105, 1936. Reprinted in The Undecidable pp.289ff. Emil Post (1947), "Recursive Unsolvability of a Problem of Thue", Journal of Symbolic Logic, vol. 12, pp.111. Reprinted in The Undecidable pp.293ff. In the Appendix of this paper Post comments on and gives corrections to Turing's paper of 19361937. In particular see the footnotes 11 with corrections to the universal computing machine coding and footnote 14 with comments on Turing's first and second proofs. Turing, A.M. (1936). "On Computable Numbers, with an Application to the Entscheidungs problem". Proceedings of the London Mathematical Society. 2 42: 23065. 1937. doi:10.1112/plms/s2-42.1.230. (and Turing, A.M. (1938). "On Computable Numbers, with an Application to the Entscheidungsproblem: A correction". Proceedings of the London Mathematical Society. 2 43 (6): 5446. 1937. doi:10.1112/plms/s2-43.6.544.). Reprinted in many collections, e.g. in The Undecidable pp.115154; available on the web in many places, e.g. at Scribd (http://www.scribd.com/doc/2937039/ Alan-M-Turing-On-Computable-Numbers). Alan Turing, 1948, "Intelligent Machinery." Reprinted in "Cybernetics: Key Papers." Ed. C.R. Evans and A.D.J. Robertson. Baltimore: University Park Press, 1968. p.31. F. C. Hennie and R. E. Stearns. Two-tape simulation of multitape Turing machines. JACM, 13(4):533546, 1966.

Turing machine

166

Computability theory
Boolos, George; Richard Jeffrey (1989, 1999). Computability and Logic (3rd ed.). Cambridge UK: Cambridge University Press. ISBN0-521-20402-X. Boolos, George; John Burgess, Richard Jeffrey, (2002). Computability and Logic (4th ed.). Cambridge UK: Cambridge University Press. ISBN0-521-00758-5 (pb.). Some parts have been significantly rewritten by Burgess. Presentation of Turing machines in context of Lambek "abacus machines" (cf Register machine) and recursive functions, showing their equivalence. Taylor L. Booth (1967), Sequential Machines and Automata Theory, John Wiley and Sons, Inc., New York. Graduate level engineering text; ranges over a wide variety of topics, Chapter IX Turing Machines includes some recursion theory. Martin Davis (1958). Computability and Unsolvability. McGraw-Hill Book Company, Inc, New York.. On pages 1220 he gives examples of 5-tuple tables for Addition, The Successor Function, Subtraction (x y), Proper Subtraction (0 if x < y), The Identity Function and various identity functions, and Multiplication. Davis, Martin; Ron Sigal, Elaine J. Weyuker (1994). Computability, Complexity, and Languages and Logic: Fundamentals of Theoretical Computer Science (2nd ed.). San Diego: Academic Press, Harcourt, Brace & Company. ISBN0-12-206382-1. Hennie, Fredrick (1977). Introduction to Computability. AddisonWesley, Reading, Mass.. On pages 90103 Hennie discusses the UTM with examples and flow-charts, but no actual 'code'. John Hopcroft and Jeffrey Ullman, (1979). Introduction to Automata Theory, Languages and Computation (1st ed.). AddisonWesley, Reading Mass. ISBN0-201-02988-X.. A difficult book. Centered around the issues of machine-interpretation of "languages", NP-completeness, etc. Hopcroft, John E.; Rajeev Motwani, Jeffrey D. Ullman (2001). Introduction to Automata Theory, Languages, and Computation (2nd ed.). Reading Mass: AddisonWesley. ISBN0-201-44124-1. Distinctly different and less intimidating than the first edition. Stephen Kleene (1952), Introduction to Metamathematics, NorthHolland Publishing Company, Amsterdam Netherlands, 10th impression (with corrections of 6th reprint 1971). Graduate level text; most of Chapter XIII Computable functions is on Turing machine proofs of computability of recursive functions, etc. Knuth, Donald E. (1973). Volume 1/Fundamental Algorithms: The Art of computer Programming (2nd ed.). Reading, Mass.: AddisonWesley Publishing Company.. With reference to the role of Turing machines in the development of computation (both hardware and software) see 1.4.5 History and Bibliography pp.225ff and 2.6 History and Bibliographypp.456ff. Zohar Manna, 1974, Mathematical Theory of Computation. Reprinted, Dover, 2003. ISBN 978-0-486-43238-0 Marvin Minsky, Computation: Finite and Infinite Machines, PrenticeHall, Inc., N.J., 1967. See Chapter 8, Section 8.2 "Unsolvability of the Halting Problem." Excellent, i.e. relatively readable, sometimes funny. Christos Papadimitriou (1993). Computational Complexity (1st ed.). Addison Wesley. ISBN0-201-53082-1. Chapter 2: Turing machines, pp.1956. Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN0-534-94728-X. Chapter 3: The ChurchTuring Thesis, pp.125149. Stone, Harold S. (1972). Introduction to Computer Organization and Data Structures (1st ed.). New York: McGrawHill Book Company. ISBN0-07-061726-0. Peter van Emde Boas 1990, Machine Models and Simulations, pp.366, in Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, The MIT Press/Elsevier, [place?], ISBN 0-444-88071-2 (Volume A). QA76.H279 1990. Valuable survey, with 141 references.

Turing machine

167

Church's thesis
Nachum Dershowitz; Yuri Gurevich (September 2008). "A natural axiomatization of computability and proof of Church's Thesis" (http://research.microsoft.com/en-us/um/people/gurevich/Opera/188.pdf). Bulletin of Symbolic Logic 14 (3). Retrieved 2008-10-15. Roger Penrose (1989, 1990). The Emperor's New Mind (2nd ed.). Oxford University Press, New York. ISBN0-19-851973-7.

Small Turing machines


Rogozhin, Yurii, 1998, " A Universal Turing Machine with 22 States and 2 Symbols (http://web.archive.org/ web/20050308141040/http://www.imt.ro/Romjist/Volum1/Vol1_3/turing.htm)", Romanian Journal Of Information Science and Technology, 1(3), 259265, 1998. (surveys known results about small universal Turing machines) Stephen Wolfram, 2002, A New Kind of Science (http://www.wolframscience.com/nksonline/page-707), Wolfram Media, ISBN 1-57955-008-8 Brunfiel, Geoff, Student snags maths prize (http://www.nature.com/news/2007/071024/full/news.2007. 190.html), Nature, October 24. 2007. Jim Giles (2007), Simplest 'universal computer' wins student $25,000 (http://technology.newscientist.com/ article/dn12826-simplest-universal-computer-wins-student-25000.html), New Scientist, October 24, 2007. Alex Smith, Universality of Wolframs 2, 3 Turing Machine (http://www.wolframscience.com/prizes/tm23/ TM23Proof.pdf), Submission for the Wolfram 2, 3 Turing Machine Research Prize. Vaughan Pratt, 2007, " Simple Turing machines, Universality, Encodings, etc. (http://cs.nyu.edu/pipermail/ fom/2007-October/012156.html)", FOM email list. October 29, 2007. Martin Davis, 2007, " Smallest universal machine (http://cs.nyu.edu/pipermail/fom/2007-October/012132. html)", and Definition of universal Turing machine (http://cs.nyu.edu/pipermail/fom/2007-October/012145. html) FOM email list. October 2627, 2007. Alasdair Urquhart, 2007 " Smallest universal machine (http://cs.nyu.edu/pipermail/fom/2007-October/ 012140.html)", FOM email list. October 26, 2007. Hector Zenil (Wolfram Research), 2007 " smallest universal machine (http://cs.nyu.edu/pipermail/fom/ 2007-October/012163.html)", FOM email list. October 29, 2007. Todd Rowland, 2007, " Confusion on FOM (http://forum.wolframscience.com/showthread.php?s=& threadid=1472)", Wolfram Science message board, October 30, 2007.

Other
Martin Davis (2000). Engines of Logic: Mathematicians and the origin of the Computer (1st ed.). W. W. Norton & Company, New York. ISBN0-393-32229-7 pbk.. Robin Gandy, "The Confluence of Ideas in 1936", pp.51102 in Rolf Herken, see below. Stephen Hawking (editor), 2005, God Created the Integers: The Mathematical Breakthroughs that Changed History, Running Press, Philadelphia, ISBN 978-0-7624-1922-7. Includes Turing's 19361937 paper, with brief commentary and biography of Turing as written by Hawking. Rolf Herken (1995). The Universal Turing MachineA Half-Century Survey. Springer Verlag. ISBN3-211-82637-8. Andrew Hodges, Alan Turing: The Enigma, Simon and Schuster, New York. Cf Chapter "The Spirit of Truth" for a history leading to, and a discussion of, his proof. Ivars Peterson (1988). The Mathematical Tourist: Snapshots of Modern Mathematics (1st ed.). W. H. Freeman and Company, New York. ISBN0-7167-2064-7 (pbk.).

Turing machine Paul Strathern (1997). Turing and the ComputerThe Big Idea. Anchor Books/Doubleday. ISBN0-385-49243-X. Hao Wang, "A variant to Turing's theory of computing machines", Journal of the Association for Computing Machinery (JACM) 4, 6392 (1957). Charles Petzold, Petzold, Charles, The Annotated Turing (http://www.theannotatedturing.com/), John Wiley & Sons, Inc., ISBN 0-470-22905-5 Arora, Sanjeev; Barak, Boaz, "Complexity Theory: A Modern Approach" (http://www.cs.princeton.edu/ theory/complexity/), Cambridge University Press, 2009, ISBN 978-0-521-42426-4, section 1.4, "Machines as strings and the universal Turing machine" and 1.7, "Proof of theorem 1.9" Isaiah Pinchas Kantorovitz, "A note on turing machine computability of rule driven systems", ACM SIGACT News December 2005.

168

External links
Turing Machine on Stanford Encyclopedia of Philosophy (http://plato.stanford.edu/entries/turing-machine/) Detailed info on the ChurchTuring Hypothesis (http://plato.stanford.edu/entries/church-turing/) (Stanford Encyclopedia of Philosophy) Turing Machine-Like Models (http://www.weizmann.ac.il/mathusers/lbn/new_pages/Research_Turing. html) in Molecular Biology, to understand life mechanisms with a DNA-tape processor. The Turing machine (http://www.SaschaSeidel.de/html/programmierung/download_The_Turing_machine. php)Summary about the Turing machine, its functionality and historical facts The Wolfram 2,3 Turing Machine Research Prize (http://www.wolframscience.com/prizes/tm23/)Stephen Wolfram's $25,000 prize for the proof or disproof of the universality of the potentially smallest universal Turing Machine. The contest has ended, with the proof affirming the machine's universality. " Turing Machine Causal Networks (http://demonstrations.wolfram.com/TuringMachineCausalNetworks/)" by Enrique Zeleny, Wolfram Demonstrations Project. Turing Machines (http://www.dmoz.org/Computers/Computer_Science/Theoretical/Automata_Theory/ Turing_Machines/) at the Open Directory Project Purely mechanical Turing Machine (http://www.turing2012.fr/?p=530&lang=en)

Binary number

169

Binary number
In mathematics and computer science, the binary numeral system, or base-2 numeral system, represents numeric values using two symbols: 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2. Numbers represented in this system are commonly called binary numbers. Because of its straightforward implementation in digital electronic circuitry using logic gates, the binary system is used internally by almost all modern computers and computer-based devices such as mobile phones.

History
The Indian scholar Pingala (around 5th2nd centuries BC) developed mathematical concepts for describing prosody, and in doing so presented the first known description of a binary numeral system.[1][2] He used binary numbers in the form of short and long syllables (the latter equal in length to two short syllables), making it similar to Morse code.[3][4] Pingala's Hindu classic titled Chandastra (8.23) describes the formation of a matrix in order to give a unique value to each meter. An example of such a matrix is as follows (note that these binary representations are "backwards" compared to modern, Western positional notation):[5][6] 0 0 0 0 numerical value 1 1 0 0 0 numerical value 2 0 1 0 0 numerical value 3 1 1 0 0 numerical value 4 A set of eight trigrams (Bagua) and a set of 64 hexagrams ("sixty-four" gua), analogous to the three-bit and six-bit binary numerals, were in usage at least as early as the Zhou Dynasty of ancient China through the classic text Yijing.

Daoist Bagua

In the 11th century, scholar and philosopher Shao Yong developed a method for arranging the hexagrams which corresponds, albeit unintentionally, to the sequence 0 to 63, as represented in binary, with yin as 0, yang as 1 and the least significant bit on top. The ordering is also the lexicographical order on sextuples of elements chosen from a two-element set.[7] Similar sets of binary combinations have also been used in traditional African divination systems such as If as well as in medieval Western geomancy. The base-2 system utilized in geomancy had long been widely applied in sub-Saharan Africa.

Tibetan Buddhist "Mystic Tablet"

Binary number

170

In 1605 Francis Bacon discussed a system whereby letters of the alphabet could be reduced to sequences of binary digits, which could then be encoded as scarcely visible variations in the font in any random text.[8] Importantly for the general theory of binary encoding, he added that this method could be used with any objects at all: "provided those objects be capable of a twofold difference only; as by Bells, by Trumpets, by Lights and Torches, by the report of Muskets, and any instruments of like nature".[8] (See Bacon's cipher.) The modern binary number system was studied by Gottfried Leibniz in 1679. See his article:Explication de l'Arithmtique Binaire[9](1703). Leibniz's system uses 0 and 1, like the modern binary numeral system. As a Sinophile, Leibniz was aware of the Yijing (or I-Ching) and noted with fascination how its hexagrams correspond to the binary numbers from 0 to 111111, and concluded that this mapping was evidence of major Chinese accomplishments in the sort of philosophical mathematics he admired.[10]

Gottfried Leibniz

In 1854, British mathematician George Boole published a landmark paper detailing an algebraic system of logic that would become known as Boolean algebra. His logical calculus was to become instrumental in the design of digital electronic circuitry.[11] In 1937, Claude Shannon produced his master's thesis at MIT that implemented Boolean algebra and binary arithmetic using electronic relays and switches for the first time in history. Entitled A Symbolic Analysis of Relay and Switching Circuits, Shannon's thesis essentially founded practical digital circuit design.[12] In November 1937, George Stibitz, then working at Bell Labs, completed a relay-based computer he dubbed the "Model K" (for "Kitchen", where he had assembled it), which calculated using binary addition.[13] Bell Labs thus authorized a full research programme in late 1938 with Stibitz at the helm. Their Complex Number Computer, completed 8 January 1940, was able to calculate complex numbers. In a demonstration to the American Mathematical Society conference at Dartmouth College on 11 September 1940, Stibitz was able to send the Complex Number Calculator remote commands over telephone lines by a teletype. It was the first computing machine ever used remotely over a phone line. Some participants of the conference who witnessed the demonstration were John Von Neumann, John Mauchly and Norbert Wiener, who wrote about it in his memoirs.[14][15][16]

Representation
Any number can be represented by any sequence of bits (binary digits), which in turn may be represented by any mechanism capable of being in two mutually exclusive states. The following sequence of symbols could all be interpreted as the binary numeric value of 667: 1 | x y 0 1 0 0 | | o x o o n y n n 1 | x y 1 0 | x o y n 1 1 | x x y y

Binary number

171

The numeric value represented in each case is dependent upon the value assigned to each symbol. In a computer, the numeric values may be represented by two different voltages; on a magnetic disk, magnetic polarities may be used. A "positive", "yes", or "on" state is not necessarily equivalent to the numerical value of one; it depends on the architecture in use. In keeping with customary representation of numerals using Arabic numerals, binary numbers are commonly written using the symbols 0 and 1. When written, binary numerals are often subscripted, prefixed or suffixed in order to indicate their base, or radix. The following notations are equivalent: 100101 binary (explicit statement of format) 100101b (a suffix indicating binary format) 100101B (a suffix indicating binary format) bin 100101 (a prefix indicating binary format) 1001012 (a subscript indicating base-2 (binary) notation) %100101 (a prefix indicating binary format) 0b100101 (a prefix indicating binary format, common in programming languages) 6b100101 (a prefix indicating number of bits in binary format, common in programming languages) When spoken, binary numerals are usually read digit-by-digit, in order to distinguish them from decimal numerals. For example, the binary numeral 100 is pronounced one zero zero, rather than one hundred, to make its binary nature explicit, and for purposes of correctness. Since the binary numeral 100 represents the value four, it would be confusing to refer to the numeral as one hundred (a word that represents a completely different value, or amount). Alternatively, the binary numeral 100 can be read out as "four" (the correct value), but this does not make its binary nature explicit.
A binary clock might use LEDs to express binary values. In this clock, each column of LEDs shows a binary-coded decimal numeral of the traditional sexagesimal time.

Counting in binary
Decimal pattern (Hex Value) Binary numbers 0 1 2 3 4 5 6 7 8 9 0 1 10 11 100 101 110 111 1000 1001

Binary number

172
10 (A) 11 (B) 12 (C) 13 (D) 14 (E) 15 (F) 16 (10) 1010 1011 1100 1101 1110 1111 10000

Counting in binary is similar to counting in any other number system. Beginning with a single digit, counting proceeds through each symbol, in increasing order. Decimal counting uses the symbols 0 through 9, while binary only uses the symbols 0 and 1. When the symbols for the first digit are exhausted, the next-higher digit (to the left) is incremented, and counting starts over at 0. In decimal, counting proceeds like so: 000, 001, 002, ... 007, 008, 009, (rightmost digit starts over, and next digit is incremented) 010, 011, 012, ... ... 090, 091, 092, ... 097, 098, 099, (rightmost two digits start over, and next digit is incremented) 100, 101, 102, ... After a digit reaches 9, an increment resets it to 0 but also causes an increment of the next digit to the left. In binary, counting is the same except that only the two symbols 0 and 1 are used. Thus after a digit reaches 1 in binary, an increment resets it to 0 but also causes an increment of the next digit to the left: 0000, 0001, (rightmost digit starts over, and next digit is incremented) 0010, 0011, (rightmost two digits start over, and next digit is incremented) 0100, 0101, 0110, 0111, (rightmost three digits start over, and the next digit is incremented) 1000, 1001, ... Since binary is a base-2 system, each digit represents an increasing power of 2, with the rightmost digit representing 20, the next representing 21, then 22, and so on. To determine the decimal representation of a binary number simply take the sum of the products of the binary digits and the powers of 2 which they represent. For example, the binary number: 100101 is converted to decimal form by: [(1) 25] + [(0) 24] + [(0) 23] + [(1) 22] + [(0) 21] + [(1) 20] = [1 32] + [0 16] + [0 8] + [1 4] + [0 2] + [1 1] = 37 To create higher numbers, additional digits are simply added to the left side of the binary representation.

Fractions in binary
Fractions in binary only terminate if the denominator has 2 as the only prime factor. As a result, 1/10 does not have a finite binary representation, and this causes 10 0.1 not to be precisely equal to 1 in floating point arithmetic. As an example, to interpret the binary expression for 1/3 = .010101..., this means: 1/3 = 0 21 + 1 22 + 0 23 + 1 24 + ... = 0.3125 + ... An exact value cannot be found with a sum of a finite number of inverse powers of two, and zeros and ones alternate forever.

Binary number

173

Fraction 1/1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/12 1/13 1/14 1/15 1/16

Decimal 1or0.999... 0.5or0.4999... 0.333... 0.25or0.24999... 0.2or0.1999... 0.1666... 0.142857142857... 0.125or0.124999... 0.111... 0.1or0.0999... 0.090909... 0.08333... 0.076923076923... 0.0714285714285... 0.0666... 1or0.111...

Binary

Fractional approximation 1/2 + 1/4 + 1/8... 1/4 + 1/8 + 1/16 . . . 1/4 + 1/16 + 1/64 . . . 1/8 + 1/16 + 1/32 . . . 1/8 + 1/16 + 1/128 . . . 1/8 + 1/32 + 1/128 . . . 1/8 + 1/64 + 1/512 . . . 1/16 + 1/32 + 1/64 . . . 1/16 + 1/32 + 1/64 . . . 1/16 + 1/32 + 1/256 . . . 1/16 + 1/64 + 1/128 . . . 1/16 + 1/64 + 1/256 . . .

0.1or0.0111... 0.010101... 0.01or0.00111... 0.00110011... 0.0010101... 0.001001... 0.001or0.000111... 0.000111000111... 0.000110011... 0.00010111010001011101... 0.00010101...

0.000100111011000100111011... 1/16 + 1/128 + 1/256 . . . 0.0001001001... 0.00010001... 1/16 + 1/128 + 1/1024 . . . 1/16 + 1/256 . . . 1/32 + 1/64 + 1/128 . . .

0.0625or0.0624999... 0.0001or0.0000111...

Binary arithmetic
Arithmetic in binary is much like arithmetic in other numeral systems. Addition, subtraction, multiplication, and division can be performed on binary numerals.

Addition
The simplest arithmetic operation in binary is addition. Adding two single-digit binary numbers is relatively simple, using a form of carrying: 0+00 0+11 1+01 1 + 1 0, carry 1 (since 1 + 1 = 0 + 1 binary 10)
The circuit diagram for a binary half adder, which adds

two bits together, producing sum and carry bits. Adding two "1" digits produces a digit "0", while 1 will have to be added to the next column. This is similar to what happens in decimal when certain single-digit numbers are added together; if the result equals or exceeds the value of the radix (10), the digit to the left is incremented:

5 + 5 0, carry 1 (since 5 + 5 = 10 carry 1) 7 + 9 6, carry 1 (since 7 + 9 = 16 carry 1) This is known as carrying. When the result of an addition exceeds the value of a digit, the procedure is to "carry" the excess amount divided by the radix (that is, 10/10) to the left, adding it to the next positional value. This is correct since the next position has a weight that is higher by a factor equal to the radix. Carrying works the same way in binary:

Binary number 1 1 1 1 1 (carried digits) 0 1 1 0 1 + 1 0 1 1 1 ------------= 1 0 0 1 0 0 = 36 In this example, two numerals are being added together: 011012 (1310) and 101112 (2310). The top row shows the carry bits used. Starting in the rightmost column, 1 + 1 = 102. The 1 is carried to the left, and the 0 is written at the bottom of the rightmost column. The second column from the right is added: 1 + 0 + 1 = 102 again; the 1 is carried, and 0 is written at the bottom. The third column: 1 + 1 + 1 = 112. This time, a 1 is carried, and a 1 is written in the bottom row. Proceeding like this gives the final answer 1001002 (36 decimal). When computers must add two numbers, the rule that: x xor y = (x + y) mod 2 for any two bits x and y allows for very fast calculation, as well. A simplification for many binary addition problems is the Long Carry Method or Brookhouse Method of Binary Addition. This method is generally useful in any binary addition where one of the numbers has a long string of 1 digits. For example the following large binary numbers can be added in two simple steps without multiple carries from one place to the next.
1 1 1 1 1 1 1 1 (carried digits) (Long Carry Method) 1 1 1 0 1 1 1 1 1 0 Versus: + 1 0 1 0 1 1 0 0 1 1 + 1 0 0 0 1 0 0 0 0 0 0 ----------------------1 1 0 0 1 1 1 0 0 0 1 add crossed out digits first = sum of crossed out digits now add remaining digits

174

1 1 1 0 1 1 1 1 1 0 + 1 0 1 0 1 1 0 0 1 1

----------------------= 1 1 0 0 1 1 1 0 0 0 1

In this example, two numerals are being added together: 1 1 1 0 1 1 1 1 1 02 (95810) and 1 0 1 0 1 1 0 0 1 12 (69110). The top row shows the carry bits used. Instead of the standard carry from one column to the next, the lowest place-valued "1" with a "1" in the corresponding place value beneath it may be added and a "1" may be carried to one digit past the end of the series. These numbers must be crossed off since they are already added. Then simply add that result to the uncanceled digits in the second row. Proceeding like this gives the final answer 1 1 0 0 1 1 1 0 0 0 12 (164910). Addition table
0 0 0 1 1

1 1 10

The binary addition table is similar, but not the same, as the Truth table of the Logical disjunction operation difference is that , while .

. The

Binary number

175

Subtraction
Subtraction works in much the same way: 000 0 1 1, borrow 1 101 110 Subtracting a "1" digit from a "0" digit produces the digit "1", while 1 will have to be subtracted from the next column. This is known as borrowing. The principle is the same as for carrying. When the result of a subtraction is less than 0, the least possible value of a digit, the procedure is to "borrow" the deficit divided by the radix (that is, 10/10) from the left, subtracting it from the next positional value. * * * * (starred columns are borrowed from) 1 1 0 1 1 1 0 1 0 1 1 1 ---------------= 1 0 1 0 1 1 1 Subtracting a positive number is equivalent to adding a negative number of equal absolute value; computers typically use two's complement notation to represent negative values. This notation eliminates the need for a separate "subtract" operation. Using two's complement notation subtraction can be summarized by the following formula: A B = A + not B + 1 For further details, see two's complement.

Multiplication
Multiplication in binary is similar to its decimal counterpart. Two numbers A and B can be multiplied by partial products: for each digit in B, the product of that digit in A is calculated and written on a new line, shifted leftward so that its rightmost digit lines up with the digit in B that was used. The sum of all these partial products gives the final result. Since there are only two digits in binary, there are only two possible outcomes of each partial multiplication: If the digit in B is 0, the partial product is also 0 If the digit in B is 1, the partial product is equal to A For example, the binary numbers 1011 and 1010 are multiplied as follows: 1 0 1 1 1 0 1 0 --------0 0 0 0 + 1 0 1 1 + 0 0 0 0 + 1 0 1 1 --------------= 1 1 0 1 1 1 0 (A) (B) Corresponds to a zero in B Corresponds to a one in B

Binary numbers can also be multiplied with bits after a binary point: 1 0 1.1 0 1 (A) (5.625 in decimal) 1 1 0.0 1 (B) (6.25 in decimal)

Binary number ------------1.0 1 1 0 1 + 0 0.0 0 0 0 + 0 0 0.0 0 0 + 1 0 1 1.0 1 + 1 0 1 1 0.1 ----------------------= 1 0 0 0 1 1.0 0 1 0 1 See also Booth's multiplication algorithm. Multiplication table
0 1 0 0 0 1 0 1

176

Corresponds to a one in B Corresponds to a zero in B

(35.15625 in decimal)

The binary multiplication table is the same as the Truth table of the Logical conjunction operation

Division
Binary division is again similar to its decimal counterpart: Here, the divisor is 1012, or 5 decimal, while the dividend is 110112, or 27 decimal. The procedure is the same as that of decimal long division; here, the divisor 1012 goes into the first three digits 1102 of the dividend one time, so a "1" is written on the top line. This result is multiplied by the divisor, and subtracted from the first three digits of the dividend; the next digit (a "1") is included to obtain a new three-digit sequence: 1 ___________ ) 1 1 0 1 1 1 0 1 ----0 1 1

1 0 1

The procedure is then repeated with the new sequence, continuing until the digits in the dividend have been exhausted: 1 0 1 ___________ ) 1 1 0 1 1 1 0 1 ----0 1 1 0 0 0 ----1 1 1 1 0 1 ----1 0

1 0 1

Binary number Thus, the quotient of 110112 divided by 1012 is 1012, as shown on the top line, while the remainder, shown on the bottom line, is 102. In decimal, 27 divided by 5 is 5, with a remainder of 2.

177

Square root
Binary square root is similar to its decimal counterpart too. But, it's simpler than that in decimal.

for example 1 0 0 1 -------- 1010001 1 --------101 01 0 -------1001 100 0 -------10001 10001 10001 ------0

Bitwise operations
Though not directly related to the numerical interpretation of binary symbols, sequences of bits may be manipulated using Boolean logical operators. When a string of binary symbols is manipulated in this way, it is called a bitwise operation; the logical operators AND, OR, and XOR may be performed on corresponding bits in two binary numerals provided as input. The logical NOT operation may be performed on individual bits in a single binary numeral provided as input. Sometimes, such operations may be used as arithmetic short-cuts, and may have other computational benefits as well. For example, an arithmetic shift left of a binary number is the equivalent of multiplication by a (positive, integral) power of 2.

Conversion to and from other numeral systems


Decimal
To convert from a base-10 integer numeral to its base-2 (binary) equivalent, the number is divided by two, and the remainder is the least-significant bit. The (integer) result is again divided by two, its remainder is the next least significant bit. This process repeats until the quotient becomes zero. Conversion from base-2 to base-10 proceeds by applying the preceding algorithm, so to speak, in reverse. The bits of the binary number are used one by one, starting with the most significant (leftmost) bit. Beginning with the value 0, repeatedly double the prior value and add the next bit to produce the next value. This can be organized in a multi-column table. For example to convert 100101011012 to decimal:

Binary number

178

Prior value 2 + Next bit Next value 0 2+ 1 1 2+ 0 2 2+ 0 4 2+ 1 9 2+ 0 18 2 + 1 37 2 + 0 74 2 + 1 149 2 + 1 299 2 + 0 598 2 + 1 =1 =2 =4 =9 = 18 = 37 = 74 = 149 = 299 = 598 = 1197

The result is 119710. Note that the first Prior Value of 0 is simply an initial decimal value. This method is an application of the Horner scheme.
Binary 1 0 0 1 0 1 0 1 1 0 1

Decimal 1210 + 029 + 028 + 127 + 026 + 125 + 024 + 123 + 122 + 021 + 120 = 1197

The fractional parts of a number are converted with similar methods. They are again based on the equivalence of shifting with doubling or halving. In a fractional binary number such as 0.110101101012, the first digit is , the second , etc. So if there is a

1 in the first place after the decimal, then the number is at least , and vice versa. Double that number is at least 1. This suggests the algorithm: Repeatedly double the number to be converted, record if the result is at least 1, and then throw away the integer part. For example, , in binary, is: 10
Converting Result 0. 0.0 0.01 0.010 0.0101

Thus the repeating decimal fraction 0.3... is equivalent to the repeating binary fraction 0.01... . Or for example, 0.110, in binary, is:

Binary number

179

Converting 0.1 0.

Result

0.1 2 = 0.2 < 1 0.0 0.2 2 = 0.4 < 1 0.00 0.4 2 = 0.8 < 1 0.000 0.8 2 = 1.6 1 0.0001 0.6 2 = 1.2 1 0.00011 0.2 2 = 0.4 < 1 0.000110 0.4 2 = 0.8 < 1 0.0001100 0.8 2 = 1.6 1 0.00011001 0.6 2 = 1.2 1 0.000110011 0.2 2 = 0.4 < 1 0.0001100110

This is also a repeating binary fraction 0.00011... . It may come as a surprise that terminating decimal fractions can have repeating expansions in binary. It is for this reason that many are surprised to discover that 0.1 + ... + 0.1, (10 additions) differs from 1 in floating point arithmetic. In fact, the only binary fractions with terminating expansions are of the form of an integer divided by a power of 2, which 1/10 is not. The final conversion is from binary to decimal fractions. The only difficulty arises with repeating fractions, but otherwise the method is to shift the fraction to an integer, convert it as above, and then divide by the appropriate power of two in the decimal base. For example:

Another way of converting from binary to decimal, often quicker for a person familiar with hexadecimal, is to do so indirectlyfirst converting ( in binary) into ( in hexadecimal) and then converting ( in hexadecimal) into ( in decimal). For very large numbers, these simple methods are inefficient because they perform a large number of multiplications or divisions where one operand is very large. A simple divide-and-conquer algorithm is more effective asymptotically: given a binary number, it is divided by 10k, where k is chosen so that the quotient roughly equals the remainder; then each of these pieces is converted to decimal and the two are concatenated. Given a decimal number, it can be split into two pieces of about the same size, each of which is converted to binary, whereupon the first converted piece is multiplied by 10k and added to the second converted piece, where k is the number of decimal digits in the second, least-significant piece before conversion.

Binary number

180

Hexadecimal
0hex = 1hex = 2hex = 3hex = 4hex = 5hex = 6hex = 7hex = 8hex = 9hex = 0dec 1dec 2dec 3dec 4dec 5dec 6dec 7dec 8dec 9dec = = = = = = = = 0oct 1oct 2oct 3oct 4oct 5oct 6oct 7oct 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1

= 10oct 1 0 0 0 = 11oct 1 0 0 1

Ahex = 10dec = 12oct 1 0 1 0 Bhex = 11dec = 13oct 1 0 1 1 Chex = 12dec = 14oct 1 1 0 0 Dhex = 13dec = 15oct 1 1 0 1 Ehex = 14dec = 16oct 1 1 1 0 Fhex = 15dec = 17oct 1 1 1 1

Binary may be converted to and from hexadecimal somewhat more easily. This is because the radix of the hexadecimal system (16) is a power of the radix of the binary system (2). More specifically, 16 = 24, so it takes four digits of binary to represent one digit of hexadecimal, as shown in the table to the right. To convert a hexadecimal number into its binary equivalent, simply substitute the corresponding binary digits: 3A16 = 0011 10102 E716 = 1110 01112 To convert a binary number into its hexadecimal equivalent, divide it into groups of four bits. If the number of bits isn't a multiple of four, simply insert extra 0 bits at the left (called padding). For example: 10100102 = 0101 0010 grouped with padding = 5216 110111012 = 1101 1101 grouped = DD16 To convert a hexadecimal number into its decimal equivalent, multiply the decimal equivalent of each hexadecimal digit by the corresponding power of 16 and add the resulting values: C0E716 = (12 163) + (0 162) + (14 161) + (7 160) = (12 4096) + (0 256) + (14 16) + (7 1) = 49,38310

Binary number

181

Octal
Binary is also easily converted to the octal numeral system, since octal uses a radix of 8, which is a power of two (namely, 23, so it takes exactly three binary digits to represent an octal digit). The correspondence between octal and binary numerals is the same as for the first eight digits of hexadecimal in the table above. Binary 000 is equivalent to the octal digit 0, binary 111 is equivalent to octal 7, and so forth.
Octal Binary 0 1 2 3 4 5 6 7 000 001 010 011 100 101 110 111

Converting from octal to binary proceeds in the same fashion as it does for hexadecimal: 658 = 110 1012 178 = 001 1112 And from binary to octal: 1011002 = 101 1002 grouped = 548 100112 = 010 0112 grouped with padding = 238 And from octal to decimal: 658 = (6 81) + (5 80) = (6 8) + (5 1) = 5310 1278 = (1 82) + (2 81) + (7 80) = (1 64) + (2 8) + (7 1) = 8710

Representing real numbers


Non-integers can be represented by using negative powers, which are set off from the other digits by means of a radix point (called a decimal point in the decimal system). For example, the binary number 11.012 thus means:
1 21 1 20 (1 2 = 2) (1 1 = 1) plus plus plus

0 21 (0 = 0) 1 22 (1 = 0.25)

For a total of 3.25 decimal. All dyadic rational numbers have a terminating binary numeralthe binary representation has a finite number of

terms after the radix point. Other rational numbers have binary representation, but instead of terminating, they recur, with a finite sequence of digits repeating indefinitely. For instance = = 0.01010101012

Binary number

182

= 0.10110100 10110100 10110100...2

The phenomenon that the binary representation of any rational is either terminating or recurring also occurs in other radix-based numeral systems. See, for instance, the explanation in decimal. Another similarity is the existence of alternative representations for any terminating representation, relying on the fact that 0.111111 is the sum of the geometric series 21 + 22 + 23 + ... which is 1. Binary numerals which neither terminate nor recur represent irrational numbers. For instance, 0.10100100010000100000100 does have a pattern, but it is not a fixed-length recurring pattern, so the number is irrational 1.0110101000001001111001100110011111110 is the binary representation of another irrational. It has no discernible pattern. See irrational number. , the square root of 2,

Notes
[1] Sanchez, Julio; Canton, Maria P. (2007). Microcontroller programming : the microchip PIC. Boca Raton, Florida: CRC Press. p.37. ISBN0-8493-7189-9 [2] W. S. Anglin and J. Lambek, The Heritage of Thales, Springer, 1995, ISBN 0-387-94544-X [3] Binary Numbers in Ancient India (http:/ / home. ica. net/ ~roymanju/ Binary. htm) [4] Math for Poets and Drummers (http:/ / www. sju. edu/ ~rhall/ Rhythms/ Poets/ arcadia. pdf) (pdf, 145KB) [5] "Binary Numbers in Ancient India" (http:/ / home. ica. net/ ~roymanju/ Binary. htm). . [6] Stakhov, Alexey; Stakhov, Alekse; Olsen, Scott (2009). The mathematics of harmony: from Euclid to contemporary mathematics and computer science (http:/ / books. google. com/ books?id=K6fac9RxXREC). ISBN978-981-277-582-5. . [7] Ryan, James A. (January 1996). "Leibniz' Binary System and Shao Yong's "Yijing"". Philosophy East and West (University of Hawaii Press) 46 (1): 5990. doi:10.2307/1399337. JSTOR1399337. [8] Bacon, Francis (1605). "The Advancement of Learning" (http:/ / home. hiwaay. net/ ~paul/ bacon/ advancement/ book6ch1. html). London. pp.Chapter 1. [9] Leibniz G., Explication de l'Arithmtique Binaire, Die Mathematische Schriften, ed. C. Gerhardt, Berlin 1879, vol.7, p.223; Engl. transl. (http:/ / www. leibniz-translations. com/ binary. htm) [10] Aiton, Eric J. (1985). Leibniz: A Biography. Taylor & Francis. pp.2458. ISBN0-85274-470-6 [11] Boole, George (2009) [1854]. An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities (http:/ / www. gutenberg. org/ etext/ 15114) (Macmillan, Dover Publications, reprinted with corrections [1958] ed.). New York: Cambridge University Press. ISBN978-1-108-00153-3. . [12] Shannon, Claude Elwood (1940). A symbolic analysis of relay and switching circuits (http:/ / hdl. handle. net/ 1721. 1/ 11173). Cambridge: Massachusetts Institute of Technology. . [13] "National Inventors Hall of Fame George R. Stibitz" (http:/ / www. invent. org/ hall_of_fame/ 140. html). 20 August 2008. . Retrieved 5 July 2010. [14] "George Stibitz : Bio" (http:/ / stibitz. denison. edu/ bio. html). Math & Computer Science Department, Denison University. 30 April 2004. . Retrieved 5 July 2010. [15] "Pioneers The people and ideas that made a difference George Stibitz (19041995)" (http:/ / www. kerryr. net/ pioneers/ stibitz. htm). Kerry Redshaw. 20 February 2006. . Retrieved 5 July 2010. [16] "George Robert Stibitz Obituary" (http:/ / ei. cs. vt. edu/ ~history/ Stibitz. html). Computer History Association of California. 6 February 1995. . Retrieved 5 July 2010.

Binary number

183

References
Sanchez, Julio; Canton, Maria P. (2007), Microcontroller programming : the microchip PIC, Boca Raton, FL: CRC Press, p.37, ISBN 0-8493-7189-9

External links
A brief overview of Leibniz and the connection to binary numbers (http://www.kerryr.net/pioneers/leibniz. htm) Binary System (http://www.cut-the-knot.org/do_you_know/BinaryHistory.shtml) at cut-the-knot Conversion of Fractions (http://www.cut-the-knot.org/blue/frac_conv.shtml) at cut-the-knot Binary Digits (http://www.mathsisfun.com/binary-digits.html) at Math Is Fun (http://www.mathsisfun.com/ ) How to Convert from Decimal to Binary (http://www.wikihow.com/Convert-from-Decimal-to-Binary) at wikiHow Learning exercise for children at CircuitDesign.info (http://www.circuitdesign.info/blog/2008/06/ the-binary-number-system-part-2-binary-weighting/) Binary Counter with Kids (http://gwydir.demon.co.uk/jo/numbers/binary/kids.htm) Magic Card Trick (http://gwydir.demon.co.uk/jo/numbers/binary/cards.htm) Quick reference on Howto read binary (http://www.mycomputeraid.com/networking-support/ general-networking-support/howto-read-binary-basics/) Binary converter to HEX/DEC/OCT with direct access to bits (http://calc.50x.eu/) From one to another number system (https://www.codeproject.com/Articles/350252/ From-one-to-another-number-system/), article related to creating computer program for conversion of number from one to another number system with source code written in C# From one to another number system (https://sites.google.com/site/periczeljkosmederevoenglish/matematika/ conversion-from-one-to-another-number-system/From one to another number system.zip?attredirects=0/), free computer program for conversion of number from one to another number system written in C#, it is necessary .NET framework 2.0 From one to another number system (https://sites.google.com/site/periczeljkosmederevoenglish/matematika/ conversion-from-one-to-another-number-system/Solution with source code From one to another number system. zip?attredirects=0&d=1/), full solution with open source code for conversion of number from one to another number system written in IDE SharpDevelop ver 4.1, C#

Ham sandwich theorem

184

Ham sandwich theorem


In measure theory, a branch of mathematics, the ham sandwich theorem, also called the StoneTukey theorem after Arthur H. Stone and John Tukey, states that given n measurable "objects" in n-dimensional space, it is possible to divide all of them in half (with respect to their measure, i.e. volume) with a single (n 1)-dimensional hyperplane. Here the "objects" should be sets of finite measure (or, in fact, just of finite outer measure) for the notion of "dividing the volume in half" to make sense.

Naming
The ham sandwich theorem takes its name from the case when n = 3 and the three objects of any shape are a chunk of ham and two chunks of bread notionally, a sandwich which can then all be simultaneously bisected with a single cut (i.e., a plane). In two dimensions, the theorem is known as the pancake theorem of having to cut two infinitesimally thin pancakes on a plate each in half with a single cut (i.e., a straight line). The ham sandwich theorem is also sometimes referred to as the "ham and cheese sandwich theorem", again referring to the special case when n=3 and the three objects are 1. a chunk of ham, 2. a slice of cheese, and 3. two slices of bread (treated as a single disconnected object). The theorem then states that it is possible to slice the ham and cheese sandwich in half such that each half contains the same amount of bread, cheese, and ham. It is possible to treat the two slices of bread as a single object, because the theorem only requires that the portion on each side of the plane vary continuously as the plane moves through 3-space. The ham sandwich theorem has no relationship to the "squeeze theorem" (sometimes called the "sandwich theorem").

A ham sandwich

History
According to Beyer & Zardecki (2004), the earliest known paper about the ham sandwich theorem, specifically the n = 3 case of bisecting three solids with a plane, is by Steinhaus (1938). Beyer and Zardecki's paper includes a translation of the 1938 paper. It attributes the posing of the problem to Hugo Steinhaus, and credits Stefan Banach as the first to solve the problem, by a reduction to the BorsukUlam theorem. The paper poses the problem in two ways: first, formally, as "Is it always possible to bisect three solids, arbitrarily located, with the aid of an appropriate plane?" and second, informally, as "Can we place a piece of ham under a meat cutter so that meat, bone, and fat are cut in halves?" Later, the paper offers a proof of the theorem. A more modern reference is Stone & Tukey (1942), which is the basis of the name "StoneTukey theorem". This paper proves the n-dimensional version of the theorem in a more general setting involving measures. The paper attributes the n = 3 case to Stanislaw Ulam, based on information from a referee; but Beyer & Zardecki (2004) claim that this is incorrect, given Steinhaus's paper, although "Ulam did make a fundamental contribution in proposing" the BorsukUlam theorem.

Ham sandwich theorem

185

Reduction to the BorsukUlam theorem


The ham sandwich theorem can be proved as follows using the BorsukUlam theorem. This proof follows the one described by Steinhaus and others (1938), attributed there to Stefan Banach, for the n = 3 case. Let A1, A2,,An denote the n objects that we wish to simultaneously bisect. Let S be the unit (n 1)-sphere embedded in n-dimensional Euclidean space , centered at the origin. For each point p on the surface of the sphere S, we can define a continuum of oriented affine hyperplanes (not necessarily centred at 0) perpendicular to the (normal) vector from the origin to p, with the "positive side" of each hyperplane defined as the side pointed to by that vector. By the intermediate value theorem, every family of such hyperplanes contains at least one hyperplane that bisects the bounded object An: at one extreme translation, no volume of An is on the positive side, and at the other extreme translation, all of An's volume is on the positive side, so in between there must be a translation that has half of An's volume on the positive side. If there is more than one such hyperplane in the family, we can pick one canonically by choosing the midpoint of the interval of translations for which An is bisected. Thus we obtain, for each point p on the sphere S, a hyperplane (p) that is perpendicular to the vector from the origin to p and that bisects An. Now we define a function f from the (n 1)-sphere S to (n 1)-dimensional Euclidean space as follows: f(p) = (vol of A1 on the positive side of (p), vol of A2 on the positive side of (p), ..., vol of An1 on the positive side of (p)). This function f is continuous. By the BorsukUlam theorem, there are antipodal points p and q on the sphere S such that f(p) = f(q). Antipodal points p and q correspond to hyperplanes (p) and (q) that are equal except that they have opposite positive sides. Thus, f(p) = f(q) means that the volume of Ai is the same on the positive and negative side of (p) (or (q)), for i = 1, 2, ..., n 1. Thus, (p) (or (q)) is the desired ham sandwich cut that simultaneously bisects the volumes of A1, A2, , An.

Measure theoretic versions


In measure theory, Stone & Tukey (1942) proved two more general forms of the ham sandwich theorem. Both versions concern the bisection of n subsets X1, X2, , Xn of a common set X, where X has a Carathodory outer measure and each Xi has finite outer measure. Their first general formulation is as follows: for any suitably restricted real function
n

, there is a

point p of the n-sphere S such that the surface , dividing X into f(s,x) < 0 and f(s,x) > 0, simultaneously bisects the outer measure of X1, X2, , Xn. The proof is again a reduction to the Borsuk-Ulam theorem. This theorem generalizes the standard ham sandwich theorem by letting f(s,x) = s0 + s1x1 + ... + snxn. Their second formulation is as follows: for any n+1 measurable functions f0, f1, , fn over X that are linearly independent over any subset of X of positive measure, there is a linear combination f = a0f0 + a1f1 + ... + anfn such that the surface f(x) = 0, dividing X into f(x) < 0 and f(x) > 0, simultaneously bisects the outer measure of X1, X2, , Xn. This theorem generalizes the standard ham sandwich theorem by letting f0(x) = 1 and letting fi(x), for i > 0, be the ith coordinate of x.

Ham sandwich theorem

186

Discrete and computational geometry versions


In discrete geometry and computational geometry, the ham sandwich theorem usually refers to the special case in which each of the sets being divided is a finite set of points. Here the relevant measure is the counting measure, which simply counts the number of points on either side of the hyperplane. In two dimensions, the theorem can be stated as follows: For a finite set of points in the plane, each colored "red" or "blue", there is a line that simultaneously bisects the red points and bisects the blue points, that is, the number of red points on either side of the line is equal and the number of blue points on either side of the line is equal.

A ham-sandwich cut of eight red points and seven blue points in the plane.

There is an exceptional case when points lie on the line. In this situation, we count each of these points as either being on one side, on the other, or on neither side of the line (possibly depending on the point), i.e. "bisecting" in fact means that each side contains less than half of the total number of points. This exceptional case is actually required for the theorem to hold, of course when the number of red points or the number of blue is odd, but also in specific configurations with even numbers of points, for instance when all the points lie on the same line and the two colors are separated from each other (i.e. colors don't alternate along the line). A situation where the numbers of points on each side cannot match each other is provided by adding an extra point out of the line in the previous configuration. In computational geometry, this ham sandwich theorem leads to a computational problem, the ham sandwich problem. In two dimensions, the problem is this: given a finite set of n points in the plane, each colored "red" or "blue", find a ham sandwich cut for them. First, Megiddo (1985) described an algorithm for the special, separated case. Here all red points are on one side of some line and all blue points are on the other side, a situation where there is a unique ham sandwich cut, which Megiddo could find in linear time. Later, Edelsbrunner & Waupotitsch (1986) gave an algorithm for the general two-dimensional case; the running time of their algorithm is O(n log n), where the symbol O indicates the use of Big O notation. Finally, Lo & Steiger (1990) found an optimal O(n)-time algorithm. This algorithm was extended to higher dimensions by Lo, Matouek & Steiger (1994). Given d sets of points in general position in d-dimensional space, the algorithm computes a (d1)-dimensional hyperplane that has equal numbers of points of each of the sets in each of its half-spaces, i.e., a ham-sandwich cut for the given points.

References
Beyer, W. A.; Zardecki, Andrew (2004), "The early history of the ham sandwich theorem" [1], American Mathematical Monthly 111 (1): 5861, doi:10.2307/4145019, JSTOR4145019. Edelsbrunner, H.; Waupotitsch, R. (1986), "Computing a ham sandwich cut in two dimensions", J. Symbolic Comput. 2: 171178, doi:10.1016/S0747-7171(86)80020-7. Lo, Chi-Yuan; Steiger, W. L. (1990), "An optimal time algorithm for ham-sandwich cuts in the plane", Proceedings of the Second Canadian Conference on Computational Geometry, pp.59. Lo, Chi-Yuan; Matouek, Ji; Steiger, William L. (1994), "Algorithms for Ham-Sandwich Cuts", Discrete and Computational Geometry 11: 433452, doi:10.1007/BF02574017. Megiddo, Nimrod (1985), "Partitioning with two lines in the plane", Journal of Algorithms 6: 430433, doi:10.1016/0196-6774(85)90011-2. Steinhaus, Hugo (1938), "A note on the ham sandwich theorem", Mathesis Polska 9: 2628. Stone, A. H.; Tukey, J. W. (1942), "Generalized "sandwich" theorems" [2], Duke Mathematical Journal 9: 356359, doi:10.1215/S0012-7094-42-00925-6.

Ham sandwich theorem

187

External links
Weisstein, Eric W., "Ham Sandwich Theorem [3]" from MathWorld. ham sandwich theorem [4] on the Earliest known uses of some of the words of mathematics [5] Ham Sandwich Cuts [6] by Danielle MacNevin An interactive 2D demonstration [7]

References
[1] [2] [3] [4] [5] [6] [7] http:/ / proquest. umi. com/ pqdweb?did=526216421& Fmt=3& clientId=5482& RQT=309& VName=PQD http:/ / projecteuclid. org/ euclid. dmj/ 1077493229 http:/ / mathworld. wolfram. com/ HamSandwichTheorem. html http:/ / jeff560. tripod. com/ h. html http:/ / jeff560. tripod. com/ mathword. html http:/ / cgm. cs. mcgill. ca/ ~athens/ cs507/ Projects/ 2002/ DanielleMacNevin/ index. htm http:/ / gfredericks. com/ sandbox/ ham_sandwich

Enigma machine
An Enigma machine is any of a family of related electro-mechanical rotor cipher machines used for the encryption and decryption of secret messages. Enigma was invented by German engineer Arthur Scherbius at the end of World War I.[1] The early models were used commercially from the early 1920s, and adopted by military and government services of several countries most notably by Nazi Germany before and during World War II.[2] Several different Enigma models were produced, but the German military models are the ones most commonly discussed. In December 1932, the Polish Cipher Bureau first broke Germany's military Enigma ciphers. Five weeks before the outbreak of World War II, on 25 July 1939, in Warsaw, they presented their Enigma-decryption techniques and equipment to French and British military intelligence.[3][4][5] From 1938, additional complexity was repeatedly added to the machines, making the initial decryption Military Enigma machine techniques increasingly unsuccessful. Nonetheless, the Polish breakthrough represented a vital basis for the later British effort.[6] During the war, British codebreakers were able to decrypt a vast number of messages that had been enciphered using the Enigma. The intelligence gleaned from this source, codenamed "Ultra" by the British, was a substantial aid to the Allied war effort.[7] The exact influence of Ultra on the course of the war is debated; an oft-repeated assessment is that decryption of German ciphers hastened the end of the European war by two years.[8][9][10] Winston Churchill told the United Kingdom's King George VI after World War II: "It was thanks to Ultra that we won the war."[11] Although Enigma had some cryptographic weaknesses, in practice it was only in combination with procedural flaws, operator mistakes, captured key tables and hardware, that Allied cryptanalysts were able to be so successful.[12]

Enigma machine

188

Description
Like other rotor machines, the Enigma machine is a combination of mechanical and electrical subsystems. The mechanical subsystem consists of a keyboard; a set of rotating disks called rotors arranged adjacently along a spindle; and one of various stepping components to turn one or more of the rotors with each key press.

Electrical pathway
The mechanical parts act in such a way as to form a varying electrical circuit. When a key is pressed, a circuit is completed with current flowing through the various components in their current configuration and ultimately lighting one of the display lamps, indicating the output letter. For example, when encrypting a message starting ANX, the operator would first press the A key, and the Z lamp might light, so Z would be the first letter of the ciphertext. The operator would next press N, and then X in the same fashion, and so on. The detailed operation of Enigma is shown in the wiring diagram to the right. To simplify the example, only four components of a complete Enigma machine are shown. In reality, there are 26 lamps and keys, rotor wirings inside the rotors (of which there were either three or four) and between six and ten plug leads.

Enigma wiring diagram with arrows and the numbers 1 to 9 showing how current flows from key depression to a lamp being lit. The A key is encoded to the D lamp. D yields A, but A never yields A; this property was due to a patented feature unique to the Enigmas, and could be exploited by cryptanalysts in some situations.

Enigma machine

189

Enigma in use, 1943

Current flowed from the battery (1) through a depressed bi-directional keyboard switch (2) to the plugboard (3). Next, it passed through the (unused in this instance, so shown closed) plug (3) via the entry wheel (4), through the wiring of the three (Wehrmacht Enigma) or four (Kriegsmarine M4 and Abwehr variants) installed rotors (5), and entered the reflector (6). The reflector returned the current, via an entirely different path, back through the rotors (5) and entry wheel (4), proceeding through plug 'S' (7) connected with a cable (8) to plug 'D', and another bi-directional switch (9) to light the appropriate lamp.[13] The repeated changes of electrical path through an Enigma scrambler, implemented a polyalphabetic substitution cipher which provided Enigma's high security. The diagram on the left shows how the electrical pathway changed with each key depression, which caused The scrambling action of Enigma's rotors is rotation of at least the right hand rotor. Current passed into the set of shown for two consecutive letters with the right-hand rotor moving one position between rotors, into and back out of the reflector, and out through the rotors them. again. The greyed-out lines are some other possible paths within each rotor; these are hard-wired from one side of each rotor to the other. Letter A encrypts differently with consecutive key presses, first to G, and then to C. This is because the right hand rotor has stepped, sending the signal on a completely different route; eventually other rotors will also step with a key press.

Enigma machine

190

Rotors
The rotors (alternatively wheels or drums, Walzen in German) formed the heart of an Enigma machine. Each rotor was a disc approximately 10cm (3.9in) in diameter made from hard rubber or bakelite with brass spring-loaded pins on one face arranged in a circle; on the other side are a corresponding number of circular electrical contacts. The pins and contacts represent the alphabet typically the 26 letters AZ (this will be assumed for the rest of this description). When the rotors were mounted side-by-side on the spindle, the pins of one rotor rest against the contacts of the neighbouring rotor, forming an electrical connection. Inside the body of the rotor, 26 wires connected each pin on one side to a contact on the other in a complex pattern. Most of the rotors were identified by Roman numerals and each issued copy of rotor I was wired identically to all others. The same was true of the special thin beta and gamma rotors used in the M4 naval variant.

Enigma rotor assembly. In the Wehrmacht Enigma, the three installed movable rotors are sandwiched between two fixed wheels: the entry wheel, on the right, and the reflector on the left.

By itself, a rotor will perform only a very simple type of encryption a simple substitution cipher. For example, the pin corresponding to the letter E might be wired to the contact for letter T on the opposite face, and so on. The Enigma's complexity, and cryptographic security, came from using several rotors in series (usually three or four) and the regular stepping movement of the rotors, thus implementing a poly-alphabetic substitution cipher. When placed in an Enigma, each rotor can be set to one of 26 possible positions. When inserted, it can be turned by hand using the grooved finger-wheel which protrudes from the internal Enigma cover when closed. So that the operator can know the rotor's position, each had an alphabet tyre (or letter ring) attached to the outside of the rotor disk, with 26 characters (typically letters); one of these could be seen through the window, thus indicating the rotational position of the rotor. In early Enigma models, the alphabet ring was fixed to the rotor disk. An improvement introduced in later variants was the ability to adjust the alphabet ring relative to the rotor disk. The position of the ring was known as the Ringstellung ("ring setting"), and was a part of the initial setting of an Enigma prior to an operating session. In modern terms it was a part of the initialization vector.
Three Enigma rotors and the shaft on which they are placed when in use.

The rotors each contained a notch (more than one for some rotors) which was used to control rotor stepping. In the military variants, the notches are located on the alphabet ring. The Army and Air Force Enigmas were used with several rotors; when first issued, there were only three. On 15 December 1938, this changed to five, from which three were chosen for insertion in the machine for a particular operating session. Rotors were marked with Roman Two Enigma rotors showing electrical contacts, stepping rachet (on the left) and notch (on the numerals to distinguish them: I, II, III, IV and V, all with single right hand rotor opposite D). notches located at different points on the alphabet ring. This variation was probably intended as a security measure, but ultimately allowed the Polish Clock Method and British Banburismus attacks. The Naval version of the Wehrmacht Enigma had always been issued with more rotors than the other services: at first six, then seven, and finally eight. The additional rotors were marked VI, VII and VIII, all with different wiring, and had two notches cut into them resulting in a more frequent turnover. The four-rotor Naval Enigma (M4) machine

Enigma machine accommodated an extra rotor in the same space as the three-rotor version. This was accomplished by replacing the original reflector with a thinner one and by adding a special, also thin, fourth rotor. That fourth rotor was one of two types, Beta or Gamma, and never stepped, but it could be manually set to any of its 26 possible positions, one of which made the machine perform identically to the three-rotor machine.

191

Stepping
To avoid merely implementing a simple (and easily breakable) substitution cipher, every key press caused one or more rotors to step by one twenty-sixth of a full rotation, before the electrical connections were made. This changed the substitution alphabet used for encryption, ensuring that the cryptographic substitution was different at each new rotor position, producing a more formidable polyalphabetic substitution cipher. The stepping mechanism varied slightly from model to model. The right-hand rotor stepped once with each key stroke, and other rotors stepped less frequently.

Turnover
The advancement of a rotor other than the left-hand one was called a turnover by the cryptanalysts at Bletchley Park. This was achieved by a ratchet and pawl mechanism. Each rotor had a ratchet with 26 teeth and every time a key was pressed, the set of spring-loaded pawls moved forward in unison, trying to engage with a ratchet. The alphabet ring of the rotor to the right normally prevented this. As this ring rotated with its rotor, a notch machined into it would eventually align itself with the pawl, allowing it to engage with the ratchet, and advance the rotor on its left. The right-hand pawl, having no rotor and ring to its right, stepped its rotor with every key depression.[14] For a single-notch rotor in the right-hand position, the middle rotor stepped once for every 26 steps of the right-hand rotor. Similarly for rotors two and three. For a two-notch rotor, the rotor to its left would turn over twice for each rotation.

The first five rotors to be introduced (IV) contained one notch each, while the additional naval rotors VI, VII and VIII each had two notches. The position of the notch on each rotor was determined by the letter ring which could be adjusted in relation to the core containing the interconnections. The points on the rings at which they caused the next wheel to move were as follows.[15]

The Enigma stepping motion seen from the side away from the operator. All three ratchet pawls (green) push in unison as a key is depressed. For the first rotor (1), which to the operator is the right-hand rotor, the ratchet (red) is always engaged, and steps with each keypress. Here, the middle rotor (2) is engaged because the notch in the first rotor is aligned with the pawl; it will step (turn over) with the first rotor. The third rotor (3) is not engaged, because the notch in the second rotor is not aligned to the pawl, so it will not engage with the rachet.

Position of turnover notches


Rotor I II III IV V VI, VII and VIII Turnover position(s) BP mnemonic R F W K A A and N Royal Flags Wave Kings Above

Enigma machine The design also included a feature known as double-stepping. This was enabled due to each pawl being aligned with both the ratchet of its rotor and the rotating notched ring of the neighbouring rotor. If a pawl was allowed to engage with a ratchet through alignment with a notch, as it moved forward it would push against both the ratchet and the notch, advancing both rotors at the same time. In a three-rotor machine, the double-stepping would affect rotor two only. This, if in moving forward allowed the ratchet of rotor three to be engaged, would move again on the subsequent keystroke, thus resulting in two consecutive steps. Rotor two also pushes rotor one forward after 26 of its steps, but as rotor one moves forward with every keystroke anyway, there is no double-stepping.[14] This double-stepping caused the rotors to deviate from odometer-style regular motion. With three wheels and only single notches in the first and second wheels, the machine had a period of 26 25 26 = 16,900 (not 26 26 26 because of the double-stepping of the second rotor).[14] Historically, messages were limited to a few hundred letters, and so there was no chance of repeating any net combined rotor position during a single message session, and so cryptanalysts were denied a valuable clue to the substitution used. To make room for the Naval fourth rotors, Beta and Gamma (introduced in 1942), the reflector was changed, by making it much thinner. The special fourth rotors fit into the space made available. No changes were made to the rest of the mechanism, which eased the changeover to the new mode of operation. Since there were only three pawls, the fourth rotor never stepped, but could be manually set into one of its 26 possible positions. A device that was designed, but not implemented before the war's end, was the Lckenfllerwalze (gap-fill wheel) which implemented irregular stepping. It allowed field configuration of notches in all 26 positions. If the number of notches was a relative prime of 26 and the number of notches were different for each wheel, the stepping would be more unpredictable. Like the Umkehrwalze-D it also allowed the internal wiring to be reconfigured.[16]

192

Entry wheel
The current entry wheel (Eintrittswalze in German), or entry stator, connects the plugboard, if present, or otherwise the keyboard and lampboard, to the rotor assembly. While the exact wiring used is of comparatively little importance to the security, it proved an obstacle in the progress of Polish cryptanalyst Marian Rejewski during his deduction of the rotor wirings. The commercial Enigma connects the keys in the order of their sequence on the keyboard: Q A, W B, E C and so on. However, the military Enigma connects them in straight alphabetical order: A A, B B, C C, and so on. It took an inspired piece of guesswork for Rejewski to realise the modification.

Reflector
With the exception of the early Enigma models A and B, the last rotor came before a reflector (German: Umkehrwalze, meaning reversal rotor), a patented feature distinctive of the Enigma family amongst the various rotor machines designed in the period. The reflector connected outputs of the last rotor in pairs, redirecting current back through the rotors by a different route. The reflector ensured that Enigma is self-reciprocal: conveniently, encryption was the same as decryption. However, the reflector also gave Enigma the property that no letter ever encrypted to itself. This was a severe conceptual flaw and a cryptological mistake subsequently exploited by codebreakers. In the commercial Enigma model C, the reflector could be inserted in one of two different positions. In Model D, the reflector could be set in 26 possible positions, although it did not move during encryption. In the Abwehr Enigma, the reflector stepped during encryption in a manner like the other wheels. In the German Army and Air Force Enigma, the reflector was fixed and did not rotate; there were four versions. The original version was marked A, and was replaced by Umkehrwalze B on 1 November 1937. A third version, Umkehrwalze C was used briefly in 1940, possibly by mistake, and was solved by Hut 6.[17] The fourth version, first observed on 2 January 1944, had a rewireable reflector, called Umkehrwalze D, allowing the Enigma operator to alter the connections as part of the key settings.

Enigma machine

193

Plugboard
The plugboard (Steckerbrett in German) permitted variable wiring that could be reconfigured by the operator (visible on the front panel of Figure 1; some of the patch cords can be seen in the lid). It was introduced on German Army versions in 1930, and was soon adopted by the Navy as well. The plugboard contributed a great deal to the strength of the machine's encryption: more than an extra rotor would have done. Enigma without a plugboard (known as unsteckered Enigma) can be solved relatively straightforwardly using hand methods; these techniques are generally defeated by the addition of a plugboard, and Allied cryptanalysts resorted to special machines to solve it.

A cable placed onto the plugboard connected letters up in pairs; for example, E and Q might be a steckered pair. The effect was to swap those letters before and after the main rotor scrambling unit. For example, when an operator presses E, the signal was diverted to Q before entering the rotors. Several such steckered pairs, up to 13, might be used at one time. However, normally only 10 pairs were used at any one time. Current flowed from the keyboard through the plugboard, and proceeded to the entry-rotor or Eintrittswalze. Each letter on the plugboard had two jacks. Inserting a plug disconnected the upper jack (from the keyboard) and the lower jack (to the entry-rotor) of that letter. The plug at the other end of the crosswired cable was inserted into another letter's jacks, thus switching the connections of the two letters.

The plugboard (Steckerbrett) was positioned at the front of the machine, below the keys. When in use during World War II, there were ten connections. In this photograph, just two pairs of letters have been swapped (AJ and SO).

Accessories
A number of additional features were produced to make various Enigma machines more secure or more convenient to use.[18] One feature that was used on some M4 Enigmas was the Schreibmax, a small printer which could print the 26 letters on a narrow paper ribbon. This did away with the need for a second operator to read the lamps and write the letters down. The Schreibmax was placed on top of the Enigma machine and was connected to the lamp panel. To install the printer, the lamp cover and all light bulbs had to be removed. Besides its convenience, it could improve operational security; the printer could be installed remotely such that the signal officer operating the machine no longer had to see the decrypted plaintext information.

Another accessory was the remote lamp panel Fernlesegert. For machines equipped with the extra panel, the wooden case of the Enigma was wider and could store the extra panel. There was a lamp panel version that could be connected afterwards, but that required, just as with the Schreibmax, that the lamp panel and lightbulbs be removed.[13] The remote panel made it possible for a person to read the decrypted plaintext without the operator seeing it.

The Schreibmax was a printing unit which could be attached to the Enigma, removing the need for laboriously writing down the letters indicated on the light panel.

Enigma machine

194

In 1944, the Luftwaffe introduced a plugboard switch, called the Uhr (clock). There was a little box, containing a switch with 40 positions. It replaced the standard plugs. After connecting the plugs, as determined in the daily key sheet, the operator turned the switch into one of the 40 positions, each position producing a different combination of plug wiring. Most of these plug connections were, unlike the default plugs, not pair-wise.[13] In one switch position, the Uhr did not swap any letters, but simply emulated the 13 stecker wires with plugs.

Mathematical analysis
The Enigma transformation for each letter can be specified mathematically as a product of permutations.[19] Assuming a three-rotor German Army/Air Force Enigma, let denote the plugboard transformation, denote that of the reflector, and denote those of the left, middle and right rotors respectively. Then the encryption can be expressed as
The Enigma Uhr attachment

. After each key press, the rotors turn, changing the transformation. For example, if the right hand rotor is rotated positions, the transformation becomes , where is the cyclic permutation mapping A to B, B to C, and so forth. Similarly, the middle and left-hand rotors can be represented as encryption transformation can then be described as and rotations of . and . The

Operation
In use, the Enigma required a list of daily key settings as well as a number of auxiliary documents. The procedures for German Naval Enigma were more elaborate and more secure than the procedures used in other services. The Navy codebooks were also printed in red, water-soluble ink on pink paper so that they could easily be destroyed if they were at risk of being seized by the enemy. The codebook to the right was taken from captured German submarineU-505. In German military usage, communications were divided up into a German Kenngruppenheft (a U-boat codebook with grouped key codes) number of different networks, all using different settings for their Enigma machines. These communication nets were termed keys at Bletchley Park, and were assigned code names, such as Red, Chaffinch, and Shark. Each unit operating on a network was assigned a settings list for its Enigma for a period of time. For a message to be correctly encrypted and decrypted, both sender and receiver had to set up their Enigma in the same way; the rotor selection and order, the starting position and the plugboard connections must be identical. All these settings (together the key in modern terms) must have been established beforehand, and were distributed in codebooks. An Enigma machine's initial state, the cryptographic key, has several aspects: Wheel order (Walzenlage) the choice of rotors and the order in which they are fitted.

Enigma machine Initial position of the rotors chosen by the operator, different for each message. Ring settings (Ringstellung) the position of the alphabet ring relative to the rotor wiring. Plug connections (Steckerverbindungen) the connections of the plugs in the plugboard. In very late versions, the wiring of the reconfigurable reflector.

195

Note that although the ring settings (ringstellung) were a required part of the setup, they did not actually affect the message encryption because the rotors were positioned independently of the rings. The ring settings were only necessary to determine the initial rotor position based on the message setting which was transmitted at the beginning of a message, as described in the "Indicators" section, below. Once the receiver had set his rotors to the indicated positions, the ring settings no longer played any role in the encryption. In modern cryptographic language, the ring settings did not actually contribute entropy to the key used for encrypting the message. Rather, the ring settings were part of a separate key (along with the rest of the setup such as wheel order and plug settings) used to encrypt an initialization vector for the message. The session key consisted of the complete setup except for the ring settings, plus the initial rotor positions chosen arbitrarily by the sender (the message setting). The important part of this session key was the rotor positions, not the ring positions. However, by encoding the rotor position into the ring position using the ring settings, additional variability was added to the encryption of the initialization vector. Enigma was designed to be secure even if the rotor wiring was known to an opponent, although in practice there was considerable effort to keep the wiring secret. If the wiring is secret, the total number of possible configurations has been calculated to be around 10114 (approximately 380 bits); with known wiring and other operational constraints, this is reduced to around 1023 (76 bits).[9] Users of Enigma were confident of its security because of the large number of possibilities; it was not then feasible for an adversary to even begin to try every possible configuration in a brute force attack.

Indicator
Most of the key was kept constant for a set time period, typically a day. However, a different initial rotor position was used for each message, a concept similar to an initialisation vector in modern cryptography. The reason for this is that, were a number of messages to be encrypted with identical or near-identical settings (termed in cryptanalysis as being in depth), it would be possible to attack the messages using a statistical procedure such as Friedman's Index of coincidence.[20] The starting position for the rotors was transmitted just before the ciphertext, usually after having been enciphered. The exact method used was termed the indicator procedure. It was design weakness and operator sloppiness in these indicator procedures that were two of the main reasons that breaking Enigma messages was possible.

Enigma machine

196 One of the earliest indicator procedures was used by Polish cryptanalysts to make the initial breaks into the Enigma. The procedure was for the operator to set up his machine in accordance with his settings list, which included a global initial position for the rotors (Grundstellung, meaning ground setting), AOH, perhaps. The operator turned his rotors until AOH was visible through the rotor windows. At that point, the operator chose his own, arbitrary, starting position for that particular message. An operator might select EIN, and these became the message settings for that encryption session. The operator then typed EIN into the machine, twice, to allow for detection of transmission errors. The results were an encrypted indicatorthe EIN typed twice might turn into XHTLOA, which would be transmitted along with the message. Finally, the operator then spun the rotors to his message settings, EIN in this example, and typed the plaintext of the message.

Figure 2. With the inner lid down, the Enigma was ready for use. The finger wheels of the rotors protruded through the lid, allowing the operator to set the rotors, and their current position, here RDKP, was visible to the operator through a set of windows.

At the receiving end, the operation was reversed. The operator set the machine to the initial settings and typed in the first six letters of the message (XHTLOA). In this example, EINEIN emerged on the lamps. After moving his rotors to EIN, the receiving operator then typed in the rest of the ciphertext, deciphering the message. The weakness in this indicator scheme came from two factors. First, use of a global ground settingthis was later changed so the operator selected his initial position to encrypt the indicator, and sent the initial position in the clear. The second problem was the repetition of the indicator, which was a serious security flaw. The message setting was encoded twice, resulting in a relation between first and fourth, second and fifth, and third and sixth character. This security problem enabled the Polish Cipher Bureau to break into the pre-war Enigma system as early as 1932. However, from 1940 on, the Germans changed the procedures to increase the security. During World War II, codebooks were only used each day to set up the rotors, their ring settings and the plugboard. For each message, the operator selected a random start position, let's say WZA, and a random message key, perhaps SXT. He moved the rotors to the WZA start position and encoded the message key SXT. Assume the result was UHL. He then set up the message key, SXT, as the start position and encrypted the message. Next, he transmitted the start position, WZA, the encoded message key, UHL, and then the ciphertext. The receiver set up the start position according to the first trigram, WZA, and decoded the second trigram, UHL, to obtain the SXT message setting. Next, he used this SXT message setting as the start position to decrypt the message. This way, each ground setting was different and the new procedure avoided the security flaw of double encoded message settings.[21] This procedure was used by Wehrmacht and Luftwaffe only. The Kriegsmarine procedures on sending messages with the Enigma were far more complex and elaborate. Prior to encryption with the Enigma, the message was encoded using the Kurzsignalheft code book. The Kurzsignalheft contained tables to convert sentences into four-letter groups. A great many choices were included, for example, logistic matters such as refueling and rendezvous with supply ships, positions and grid lists, harbor names, countries, weapons, weather conditions, enemy positions and ships, date and time tables. Another codebook contained the Kenngruppen and Spruchschlssel: the key identification and message key.[22]

Enigma machine

197

Some details
The Army Enigma machine used only the 26 alphabet characters. Signs were replaced with rare character combinations. A space was omitted or replaced with an X. The X was generally used as point or full-stop. Some signs were different in other parts of the armed forces. The Wehrmacht replaced a comma with ZZ and the question sign with FRAGE or FRAQ. The Kriegsmarine replaced the comma with Y and the question sign with UD. The combination CH, as in "Acht" (eight) or "Richtung" (direction), was replaced with Q (AQT, RIQTUNG). Two, three and four zeros were replaced with CENTA, MILLE and MYRIA. The Wehrmacht and the Luftwaffe transmitted messages in groups of five characters. The Kriegsmarine, using the four rotor Enigma, had four-character groups. Frequently used names or words were to be varied as much as possible. Words like Minensuchboot (minesweeper) could be written as MINENSUCHBOOT, MINBOOT, MMMBOOT or MMM354. To make cryptanalysis harder, it was forbidden to use more than 250 characters in a single message. Longer messages were divided into several parts, each using its own message key.[23][24]

History of the machine


Far from being a single design, there are numerous models and variants of the Enigma family. The earliest Enigma machines were commercial models dating from the early 1920s. Starting in the mid-1920s, the various branches of the German military began to use Enigma, making a number of changes in order to increase its security. In addition, a number of other nations either adopted or adapted the Enigma design for their own cipher machines.

A selection of seven Enigma machines and paraphernalia exhibited at the USA's National Cryptologic Museum. From left to right, the models are: 1) Commercial Enigma; 2) Enigma T; 3) Enigma G; 4) Unidentified; 5) Luftwaffe (Air Force) Enigma; 6) Heer (Army) Enigma; 7) Kriegsmarine (Naval) EnigmaM4.

Enigma machine

198

Commercial Enigma
On 23 February 1918, German engineer Arthur Scherbius applied for a patent for a cipher machine using rotors and, with E. Richard Ritter, founded the firm of Scherbius & Ritter. They approached the German Navy and Foreign Office with their design, but neither was interested. They then assigned the patent rights to Gewerkschaft Securitas, who founded the Chiffriermaschinen Aktien-Gesellschaft (Cipher Machines Stock Corporation) on 9 July 1923; Scherbius and Ritter were on the board of directors.

1,657,411 , granted in 1928. Chiffriermaschinen AG began advertising a rotor machine Enigma model A which was exhibited at the Congress of the International Postal Union in 1923-1924. The machine was heavy and bulky, incorporating a typewriter. It measured 654535cm and weighed about 50 kilograms (110lb).

Scherbius's Enigma patentU.S. Patent [25]

In 1925 Enigma model B was introduced, and was of a similar construction.[26] While bearing the Enigma name, both models A and B were quite unlike later versions: they differed in physical size and shape, but also cryptographically, in that they lacked the reflector. The reflector an idea suggested by Scherbius's colleague Willi Korn was first introduced in the Enigma C (1926) model. The reflector is a key feature of the Enigma machines. Model C was smaller and more portable than its predecessors. It lacked a typewriter, relying instead on the operator reading the lamps; hence the alternative name of "glowlamp Enigma" to distinguish from models A and B. The Enigma C quickly became extinct, giving way to the Enigma D (1927). This version was widely used, with examples going to Sweden, the Netherlands, United Kingdom, Japan, Italy, Spain, United States, and Poland.
A rare 8-rotor printing Enigma.

Military Enigma
The Navy was the first branch of the German military to adopt Enigma. This version, named Funkschlssel C ("Radio cipher C"), had been put into production by 1925 and was introduced into service in 1926.[27] The keyboard and lampboard contained 29 lettersA-Z, , and which were arranged alphabetically, as opposed to the QWERTZU ordering.[28] The rotors had 28 contacts, with the letter X wired to bypass the rotors unencrypted.[29] Three rotors were chosen from a set of five[30] and the reflector could be inserted in one of four different positions, denoted , , and .[31] The machine was revised slightly in July 1933.[32]
Enigma in use on the Russian front

Enigma machine By 15 July 1928,[33] the German Army (Reichswehr) had introduced their own version of the Enigmathe Enigma G, revised to the Enigma I by June 1930.[34] Enigma I is also known as the Wehrmacht, or "Services" Enigma, and was used extensively by the German military services and other government organisations (such as the railways[35]), both before and during World War II. The major difference between Enigma I and commercial Enigma models was the addition of a plugboard to swap pairs of letters, greatly increasing the cryptographic strength of the machine. Other differences included the use of a fixed reflector, and the relocation of the stepping notches from the rotor body to the movable letter rings.[34] The machine measured 283415cm (11in13.5in6in) and weighed around 12kg (26lb).[29] By 1930, the Army had suggested that the Navy adopt their machine, citing the benefits of increased security (with the plugboard) and easier interservice communications.[36] The Navy eventually agreed and in 1934[37] brought into service the Navy version of the Army Enigma, designated Funkschlssel ' or M3. While the Army used only three rotors at that time, for greater security the Navy specified a choice of three from a possible five.[38] In December 1938, the Army issued two extra rotors so that the three rotors were chosen from a set of five.[34] In 1938, the Navy added two more rotors, and then another in 1939 to allow a choice of three rotors from a set of eight.[38] In August 1935, the Air Force also introduced the Wehrmacht Enigma for their communications.[34] A four-rotor Enigma was introduced by the Navy for U-boat traffic on 1 February 1942, called M4 (the network was known as Triton, or Shark to the Allies). The extra rotor was fitted in the same space by splitting the reflector into a combination of a thin reflector and a thin fourth rotor. There was also a large, eight-rotor printing model, the Enigma II. In 1933 the Polish Cipher Bureau detected that it was in use for high-level military communications, but that it was soon withdrawn from use after it was found to be unreliable and to jam frequently.[39]
Heinz Guderian in the Battle of France with the The Abwehr used the Enigma G (the Abwehr Enigma). This Enigma machine Enigma variant was a four-wheel unsteckered machine with multiple notches on the rotors. This model was equipped with a counter which incremented upon each key press, and so is also known as the "counter machine" or the Zhlwerk Enigma.

199

During World War II the Abwehr used these machines to control and report the locations of submarines in the Atlantic and to pass information about bombing raids, the movement of military units, and the location and cargo of military supply ships. Before the use of the enigma machine Britain was in danger of being starved into submission and after it the roles were virtually reversed. The British were now one step ahead of the Germans and sinking submarines faster than they could be built.[40] Other countries also used Enigma machines. The Italian Navy adopted the commercial Enigma as "Navy Cipher D"; the Spanish also used commercial Enigma during their Civil War. British codebreakers succeeded in breaking these machines, which lacked a plugboard.[41] The Swiss used a version of Enigma called model K or Swiss K for military and diplomatic use, which was very similar to the commercial Enigma D. The machine was broken by a number of parties, including Poland, France, the United Kingdom and the United States (the latter codenamed it INDIGO). An Enigma T model (codenamed Tirpitz) was manufactured for use by the Japanese.

Enigma machine It has been estimated that 100,000 Enigma machines were constructed.[42] After the end of World War II, the Allies sold captured Enigma machines, still widely considered secure, to a number of developing countries.[42]

200

Enigma G, used by the Abwehr, had four rotors, no plugboard, and multiple notches on the rotors.

The Enigma-K used by the Swiss Army sported three rotors and a reflector, and no plugboard. It was made in Germany, but had locally re-wired rotors and an additional lamp panel.

An Enigma model T (Tirpitz)a modified commercial Enigma K manufactured for use by the Japanese.

An Enigma machine in the UK's Imperial War Museum

Enigma in use in Russia (image Bundesarchiv)

Enigma machine

201

Surviving machines
The effort to break the Enigma was not disclosed until the 1970s. Since then, interest in the Enigma machine has grown considerably and a number of Enigmas are on public display in museums around the world. The Deutsches Museum in Munich has both the three- and four-rotor German military variants, as well as several older civilian versions. Several Enigma machines are exhibited at National Codes Centre in Bletchley Park, the Science Museum in London, Polish Institute and Sikorski Museum in London, at the Polish Army Museum in Warsaw, the Armmuseum (Swedish Army Museum) in Stockholm, the National Signals Museum in Finland, and at the Australian War Memorial and in the foyer of the Defence Signals Directorate, both in Canberra, Australia. In the United States, Enigma machines can be seen at the Computer History Museum in Mountain View, California, at the National Security Agency's National Cryptologic Museum at Fort Meade, Maryland, where visitors can try their hand at encrypting messages and deciphering code, and two machines that were acquired after the capture of U-505 during World War II, are on display at the Museum of Science and Industry in Chicago, Illinois. The now-defunct San Diego "Computer Museum of America" had an Enigma in its collection, which has since been given to the San Diego State University Library. In Canada, a Swiss Army issue Enigma-K, is in Calgary, Alberta. It is A fourrotor, Kriegsmarine Enigma machine on on permanent display at The Naval Museum Of Alberta inside the display at the US National Cryptologic Museum Military Museums Of Calgary. There is also a 3-rotor Enigma machine on display at the Communications and Electronics Engineering (CELE) Museum in Kingston, Ontario at Canadian Forces Base (CFB) Kingston. A number of machines are also in private hands. Occasionally, Enigma machines are sold at auction; prices have risen in recent years from US$40,000[43][44] to US$203,000[45] in Sept 2011. Replicas of the machine are available in various forms, including an exact reconstructed copy of the Naval M4 model, an Enigma implemented in electronics (Enigma-E), various computer software simulators and paper-and-scissors analogues. A rare Abwehr Enigma machine, designated G312, was stolen from the Bletchley Park museum on 1 April 2000. In September, a man identifying himself as "The Master" sent a note demanding 25,000 and threatened to destroy the machine if the ransom was not paid. In early October 2000, Bletchley Park officials announced that they would pay the ransom but the stated deadline passed with no word from the blackmailer. Shortly afterwards, the machine was sent anonymously to BBC journalist Jeremy Paxman, but three rotors were missing. In November 2000, an antiques dealer named Dennis Yates was arrested after telephoning The Sunday Times to arrange the return of the missing parts. The Enigma machine was returned to Bletchley Park after the incident. In October 2001, Yates was sentenced to 10 months in prison after admitting handling the stolen machine and demanding ransom for its return, although he maintained that he was acting as an intermediary for a third party.[46] Yates was released from prison after serving three months. In October 2008, the Spanish daily newspaper El Pas reported that 28 Enigma machines had been discovered by chance in an attic of the Army headquarters in Madrid during inventory taking. These 4-rotor commercial machines

US Enigma replica on display at the National Cryptologic Museum in Fort Meade, Maryland, USA.

Enigma machine had helped Franco's Nationalists win the Spanish Civil War because, although the British code breaker Alfred Dilwyn Knox broke the code generated by Franco's Enigma machines in 1937, this was not disclosed to the Republicans and they could not break the code. The Nationalist government continued to use Enigma machines into the 1950s, eventually having a total of 50. Some of the 28 machines are now on display in Spanish military museums.[47] The military forces of Bulgaria used Enigma machines with Cyrillic keyboard; one such machine is on display in the National Museum of Military History in Sofia.

202

Derivatives
The Enigma was influential in the field of cipher machine design, and a number of other rotor machines are derived from it. The British Typex was originally derived from the Enigma patents; Typex even includes features from the patent descriptions that were omitted from the actual Enigma machine. Owing to the need for secrecy about its cipher systems, no royalties were paid for the use of the patents by the British government. The Typex implementation of the Enigma transform is not the same as the transform found in almost all of the German or other Axis versions of the machine. A Japanese Enigma clone was codenamed GREEN by American cryptographers. Little used, it contained four rotors mounted vertically. In the U.S., cryptologist William Friedman designed the M-325, a machine similar to Enigma in logical operation, although not in construction. A unique rotor machine was constructed in 2002 by Netherlands-based Tatjana van Vark. This unusual device was inspired by Enigma but makes use of 40-point rotors, allowing letters, numbers and some punctuation to be used; each rotor contains 509 parts.[48] Machines like the SIGABA, NEMA, Typex and so forth, are deliberately not considered to be Enigma derivatives as their internal ciphering functions are not mathematically identical to the Enigma transform. Several software implementations of Enigma machines do exist, but not all are state machine compliant with the Enigma family. The most commonly used software derivative (that is not compliant with any hardware implementation of the Enigma) is at EnigmaCo.de [49]. Many Java applet Enigmas only accept single letter entry, making use difficult even if the applet is Enigma state machine compliant. Technically Enigma@home [50] is the largest scale deployment of a software Enigma, but the decoding software does not implement encipherment making it a derivative (as all original machines could cipher and decipher). On the other hand, there is a very user-friendly 3-rotor simulator here: http:/ / w1tp. com/ enigma/ enigma_w. zip where users can select rotors, use the plugboard as well as defining new settings to the rotors and reflectors. The output is in separate windows which can be independently made "invisible" to hide decryption. A 32-bit version of this simulator can be found here : http://membres.lycos.fr/pc1/enigma A more complex version is here: http:/ / w1tp. com/ enigma/ EnigmaSim. zip which includes an "autotyping" function which takes plaintext from a clipboard and converts it to cyphertext (or vice-versa) at one of four speeds. The "very fast" option runs through 26 characters in less than one second. There currently are no known open source projects to implement the Enigma in logic gates using either RTL or VHDL logic gate markup languages.

Enigma machine

203

A Japanese Enigma clone, codenamed GREEN by American cryptographers.

Tatjana van Vark's Enigma-inspired rotor machine.

Electronic implementation of an Enigma machine, sold at the Bletchley Park souvenir shop

Fiction
The play Breaking the Code, dramatised by Hugh Whitemore, is about the life and death of Alan Turing, who was the central force in continuing to break the Enigma in the United Kingdom during World War II. Turing was played by Derek Jacobi, who also played Turing in a 1996 television adaptation of the play. Although it is a drama and thus takes artistic license, it is nonetheless a fundamentally accurate account. Robert Harris's 1995 novel Enigma is set against the backdrop of World War II Bletchley Park and cryptologists working to read Naval Enigma in Hut 8. The book, with substantial changes in plot, was made into the 2001 film Enigma, directed by Michael Apted and starring Kate Winslet and Dougray Scott. The film has been criticized for many historical inaccuracies, including neglect of the role of Poland's Biuro Szyfrw in breaking the Enigma cipher and showing the British how to do it. The filmlike the bookmakes a Pole the villain, who seeks to betray the secret of Enigma decryption.[51] An earlier Polish film dealing with Polish aspects of the subject was the 1979 Sekret Enigmy, whose title translates as The Enigma Secret.[52] Wolfgang Petersen's 1981 film Das Boot includes an Enigma machine which is evidently a four-rotor Kriegsmarine variant. It appears in many scenes, which probably capture well the flavour of day-to-day Enigma use aboard a World War II U-Boat. The plot of U-571, released in 2000, revolves around an attempt to seize an Enigma machine from a German U-boat. A re-imagined version of an Enigma Machine is used heavily in the 'Enigma Conundrum' side-mission in the 2011 video game, Batman: Arkham City. An Enigma machine makes a very brief appearance in the 1980s TV show Whiz Kids, episode 12. Neal Stephenson's novel Cryptonomicon prominently features the Enigma machine and efforts by British and American cryptologists to break variants of it, and portrays the German U-boat command under Karl Dnitz using it in apparently deliberate ignorance of its having been broken. In the comedy war film All the Queen's Men, released in 2001 and starring Matt LeBlanc alongside Eddie Izzard, four World War II Allied soldiers are parachuted into Germany, where, dressed as women, they attempt to steal an Enigma machine. They eventually learn that the Allies already had the machine and that the mission was a ruse intended to mislead the Germans into thinking that Enigma was a closed book to the Allies. EnigmaWarsaw is an outdoor city game in Warsaw organised by StayPoland travel agency. This treasure hunt game is devised to help the players imagine pre-war Warsaw. EnigmaWarsaw is named to commemorate the pioneering work of Polish cryptographers Marian Rejewski, Jerzy Rycki, and Henryk Zygalski at decrypting the Enigma machine cipher.

Enigma machine To Say Nothing of the Dog is a science fiction novel about time traveling historians in which the importance of the Allies obtaining the German Enigma machine is heavily stressed. In the TV show Sanctuary, the deciphering of the Enigma cipher is credited to Nikola Tesla during the episode "Into the Black", released on 20 June 2011. In Ian Fleming's From Russia, with Love, released in 1957, the fictitious Lektor code machine, based on the real-life Enigma cipher machine, is used as a plot piece. This device would supposedly intercept Russian coded traffic and decipher it, allowing MI6's cryptanalysts to access decoded information.

204

Notes
[1] Singh, Simon (1999). The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography. London: Fourth Estate. p.127. ISBN1-85702-879-1. [2] Lord, Bob (19982010). "1937 Enigma Manual by: Jasper Rosal - English Translation" (http:/ / www. ilord. com/ enigma-manual1937-english. html). . Retrieved 31 May 2011 [3] "Virtual Bletchley Park" (http:/ / www. codesandciphers. org. uk/ virtualbp/ poles/ poles. htm). Codesandciphers.org.uk. . Retrieved 2012-07-17. [4] Peter, Laurence (20 July 2009). "How Poles cracked Nazi Enigma secret" (http:/ / news. bbc. co. uk/ 2/ hi/ europe/ 8158782. stm). BBC News. . [5] (http:/ / vod. onet. pl/ tajemnice-enigmy,41980,film. html#play) [6] Gordon Welchman, who became head of Hut 6 at Bletchley Park, has written: "Hut 6 Ultra would never have gotten off the ground if we had not learned from the Poles, in the nick of time, the details both of the German military version of the commercial Enigma machine, and of the operating procedures that were in use." Gordon Welchman, The Hut Six Story, 1982, p. 289. [7] Much of the German cipher traffic was encrypted on the Enigma machine, hence the term "Ultra" has often been used almost synonymously with "Enigma decrypts". However, Ultra also encompassed decrypts of the German Lorenz SZ 40 and 42 machines that were used by the German High Command, and decrypts of Hagelin ciphers and of other Italian ciphers and codes, as well as of Japanese ciphers and codes such as Purple and JN-25. [8] Kahn (1991). [9] Miller, A. Ray (2001). The Cryptographic Mathematics of Enigma (http:/ / www. nsa. gov/ about/ _files/ cryptologic_heritage/ publications/ wwii/ engima_cryptographic_mathematics. pdf). National Security Agency. [10] Bletchley Park veteran and historian F.H. Hinsley is often cited as an authority for the two-year estimate, yet his assessment in Codebreakers is much less definitive: "Would the Soviets meanwhile have defeated Germany, or Germany the Soviets, or would there have been stalemate on the eastern fronts? What would have been decided about the atom bomb? Not even counter-factual historians can answer such questions. They are questions which do not arise, because the war went as it did. But those historians who are concerned only with the war as it was must ask why it went as it did. And they need venture only a reasonable distance beyond the facts to recognise the extent to which the explanation lies in the influence of Ultra." F.H. Hinsley, "Introduction: The Influence of Ultra in the Second World War," Codebreakers: The Inside Story of Bletchley Park, edited by F.H. Hinsley and Alan Stripp, Oxford University Press, 1993, pp. 1213. [11] "Code Breaking - World War 2 on History" (http:/ / www. history. co. uk/ explore-history/ ww2/ code-breaking. html). History.co.uk. . Retrieved 2012-07-17. [12] Kahn (1991), Hinsley and Stripp (1993). [13] Rijmenants, Dirk; Technical details of the Enigma machine (http:/ / users. telenet. be/ d. rijmenants/ en/ enigmatech. htm) Cipher Machines & Cryptology [14] David Hamer, "Enigma: Actions Involved in the 'Double-Stepping' of the Middle Rotor", Cryptologia, 21(1), January 1997, pp. 4750, Online version (zipped PDF) (http:/ / web. archive. org/ web/ 20110719081659/ http:/ / www. eclipse. net/ ~dhamer/ downloads/ rotorpdf. zip) [15] Sale, Tony. "Technical specifications of the Enigma rotors" (http:/ / www. codesandciphers. org. uk/ enigma/ rotorspec. htm). Technical Specification of the Enigma. . Retrieved 15 November 2009. [16] "Lckenfllerwalze" (http:/ / www. cryptomuseum. com/ crypto/ enigma/ lf/ index. htm). Cryptomuseum.com. . Retrieved 2012-07-17. [17] Philip Marks, "Umkehrwalze D: Enigma's Rewirable Reflector Part I", Cryptologia 25(2), April 2001, pp. 101141 [18] Reuvers, Paul (2008). "Enigma accessories" (http:/ / www. jproc. ca/ crypto/ enigma_acc. html). . Retrieved 22 July 2010 [19] Rejewski 1980 [20] Friedman, W.F. (1922). The index of coincidence and its applications in cryptology. Department of Ciphers. Publ 22. Geneva, Illinois, USA: Riverbank Laboratories. OCLC55786052. [21] Rijmenants, Dirk; Enigma message procedures (http:/ / users. telenet. be/ d. rijmenants/ en/ enigmaproc. htm) Cipher Machines & Cryptology [22] Rijmenants, Dirk; Kurzsignalen on German U-boats (http:/ / users. telenet. be/ d. rijmenants/ en/ kurzsignale. htm) Cipher Machines & Cryptology [23] "The translated 1940 Enigma General Procedure" (http:/ / www. codesandciphers. org. uk/ documents/ egenproc/ eniggnix. htm). codesandciphers.org.uk. . Retrieved 16 October 2006.

Enigma machine
[24] "The translated 1940 Enigma Officer and Staff Procedure" (http:/ / www. codesandciphers. org. uk/ documents/ officer/ officerx. htm). codesandciphers.org.uk. . Retrieved 16 October 2006. [25] http:/ / www. google. com/ patents?vid=1657411 [26] "image of Enigma Type B" (http:/ / www. armyradio. com/ publish/ Articles/ The_Enigma_Code_Breach/ Pictures/ enigma_type_b. jpg). . [27] Kahn, 1991, pp. 3941, 299 [28] Ulbricht, 2005, p.4 [29] Stripp, 1993 [30] Kahn, 1991, pp. 40, 299 [31] Bauer, 2000, p. 108 [32] Hinsley and Stripp, 1993, plate 3 [33] Kahn, 1991, pp. 41, 299 [34] Deavours and Kruh, 1985, p. 97 [35] Michael Smith Station X, four books (macmillan) 1998, Paperback 2000, ISBN 0-7522-7148-2, Page 73 [36] Kahn, 1991, p. 43 [37] Kahn (1991, p. 43) says August 1934. Kruh and Deavours (2002, p. 15) say October 2004. [38] Deavours and Kruh, 1985, p. 98 [39] Kozaczuk, 1984, p. 28. [40] Adamy, Dave (July). "Bletchley Park". Journal of Electronic Defense: 16. [41] Smith 2006, p.23 [42] Bauer, 2000, p. 112 [43] Hamer, David; Enigma machines - known locations* (http:/ / www. eclipse. net/ ~dhamer/ location. htm) [44] Hamer, David; Selling prices of Enigma and NEMA - all prices converted to US$ (http:/ / www. eclipse. net/ ~dhamer/ enigma_p. htm) [45] Christi's; 3 Rotor enigma auction (http:/ / www. christies. com/ lotFinder/ lot_details. aspx?intObjectID=5480138) [46] "Man jailed over Enigma machine" (http:/ / news. bbc. co. uk/ 1/ hi/ uk/ 1609168. stm). BBC News. 19 October 2001. . Retrieved 2 May 2010. [47] Graham Keeley. Nazi Enigma machines helped General Franco in Spanish Civil War (http:/ / www. timesonline. co. uk/ tol/ news/ world/ europe/ article5003411. ece) The Times 24 October 2008. p 47 [48] van Vark, Tatjana The coding machine (http:/ / www. tatjavanvark. nl/ tvv1/ pht10. html) [49] http:/ / enigmaco. de/ [50] http:/ / www. enigmaathome. net/ [51] Laurence Peter, How Poles cracked Nazi Enigma secret (http:/ / news. bbc. co. uk/ 2/ hi/ europe/ 8158782. stm), BBC News, 20 July 2009 [52] Sekret Enigmy (http:/ / imdb. com/ title/ tt0079878/ ) (Film 1979) Internet Movie Database

205

References
Bauer, F. L. (2000). Decrypted Secrets (Springer, 2nd edition). ISBN 3-540-66871-3 Hamer, David H.; Sullivan, Geoff; Weierud, Frode (July 1998). "Enigma Variations: an Extended Family of Machines", Cryptologia, 22(3). Online version (zipped PDF) (http://www.eclipse.net/~dhamer/downloads/ enigvar1.zip). Stripp, Alan. "The Enigma Machine: Its Mechanism and Use" in Hinsley, F. H.; and Stripp, Alan (editors), Codebreakers: The Inside Story of Bletchley Park (1993), pp.8388. Kahn, David (1991). Seizing the Enigma: The Race to Break the German U-Boats Codes, 1939-1943 ISBN 0-395-42739-8. Kozaczuk, Wadysaw, Enigma: How the German Machine Cipher Was Broken, and How It Was Read by the Allies in World War Two, edited and translated by Christopher Kasparek, Frederick, MD, University Publications of America, 1984, ISBN 0-89093-547-5. Kozaczuk, Wadysaw. The origins of the Enigma/ULTRA (http://www.enigmahistory.org/text.html) Kruh, Louis; Deavours, Cipher (2002). "The Commercial Enigma: Beginnings of Machine Cryptography", Cryptologia, 26(1), pp.116. Online version (PDF) (http://www.dean.usma.edu/math/pubs/cryptologia/ classics.htm) Marks, Philip; Weierud, Frode (January 2000). "Recovering the Wiring of Enigma's Umkehrwalze A", Cryptologia 24(1), pp5566. Rejewski, Marian (1980). "An Application of the Theory of Permutations in Breaking the Enigma Cipher" (http:/ /cryptocellar.org/). Applicationes mathematicae 16 (4). ISSN1730-6280

Enigma machine Smith, Michael (1998). Station X (Macmillan) ISBN 0-7522-7148-2 Smith, Michael (2006). "How it began: Bletchley Park Goes to War". In Copeland, B Jack. Colossus: The Secrets of Bletchley Park's Codebreaking Computers. Oxford: Oxford University Press. ISBN978-0-19-284055-4 Ulbricht, Heinz. Die Chiffriermaschine Enigma Trgerische Sicherheit: Ein Beitrag zur Geschichte der Nachrichtendienste, PhD Thesis, 2005. Online version (http://opus.tu-bs.de/opus/volltexte/2005/705/pdf/ enigmadiss.pdf).(German)

206

Further reading
Richard J. Aldrich, GCHQ: The Uncensored Story of Britain's Most Secret Intelligence Agency, HarperCollins, July 2010. Calvocoressi, Peter. Top Secret Ultra. Baldwin, new edn 2001. 978-0-947712-36-5 Cave Brown, Anthony. Bodyguard of Lies, 1975. A journalist's sensationalist best-seller that purported to give a history of Enigma decryption and its effect on the outcome of World War II. Worse than worthless on the seminal Polish work that made "Ultra" possible. See Richard Woytak, prefatory note (pp.7576) to Marian Rejewski, "Remarks on Appendix 1 to British Intelligence in the Second World War by F.H. Hinsley", Cryptologia, vol. 6, no. 1 (January 1982), pp.7683. Garliski, Jzef Intercept, Dent, 1979. A superficial, sometimes misleading account of Enigma decryption before and during World War II, of equally slight value as to both the Polish and British phases. See Richard Woytak and Christopher Kasparek, "The Top Secret of World War II", The Polish Review, vol. XXVIII, no. 2, 1983, pp.98103 (specifically, about Garliski, pp.1013). Herivel, John. Herivelismus and the German military Enigma. Baldwin, 2008. 978-0-947712-46-4 Keen, John. Harold 'Doc' Keen and the Bletchley Park Bombe. Baldwin, 2003. 978-0-947712-42-6 Large, Christine. Hijacking Enigma, 2003, ISBN 0-470-86347-1. Marks, Philip. "Umkehrwalze D: Enigma's Rewirable ReflectorPart I", Cryptologia 25(2), April 2001, pp.101141. Marks, Philip. "Umkehrwalze D: Enigma's Rewirable ReflectorPart II", Cryptologia 25(3), July 2001, pp.177212. Marks, Philip. "Umkehrwalze D: Enigma's Rewirable ReflectorPart III", Cryptologia 25(4), October 2001, pp.296310. Perera, Tom (2010). Inside ENIGMA. Bedford, UK: Radio Society of Great Britain. ISBN978-1-905086-64-1. Perera, Tom. The Story of the ENIGMA: History, Technology and Deciphering, 2nd Edition, CD-ROM, 2004, Artifax Books, ISBN 1-890024-06-6 sample pages (http://w1tp.com/enigma/ecds.htm) Rejewski, Marian. " How Polish Mathematicians Deciphered the Enigma (http://chc60.fgcu.edu/images/ articles/rejewski.pdf)", Annals of the History of Computing 3, 1981. This article is regarded by Andrew Hodges, Alan Turing's biographer, as "the definitive account" (see Hodges' Alan Turing: The Enigma, Walker and Company, 2000 paperback edition, p.548, footnote 4.5). Quirantes, Arturo. "Model Z: A Numbers-Only Enigma Version", Cryptologia 28(2), April 2004. Sebag-Montefiore, Hugh. Enigma: the battle for the code. Cassell Military Paperbacks, London, 2004. 978-1-407-22129-8 Ulbricht, Heinz. Enigma Uhr, Cryptologia, 23(3), April 1999, pp.194205. Welchman, Gordon. The Hut Six Story: breaking the Enigma codes. Baldwin, new edition, 1997. 978-0-947712-34-1 Winterbotham, F.W, The Ultra Secret, Harper and Row, New York, 1974; Spanish edition Ultrasecreto, Ediciones Grijalbo, Madrid, 1975

Enigma machine

207

External links
Bletchley Park National Code Center (http://www.bletchleypark.org.uk/) Home of the British codebreakers during the Second World War Pictures of a four-rotor naval enigma, including Flash (SWF) views of the machine (http://cnm.open.ac.uk/ projects/stationx/enigma/index.html) Enigma Pictures and Demonstration by NSA Employee at RSA (http://www.cgisecurity.net/2008/04/ getting-to-see-an-enigma-machine-at-rsa-2008-.html) Enigma machine (http://www.dmoz.org/Science/Math/Applications/Communication_Theory/Cryptography/ Historical//) at the Open Directory Project An online Enigma Machine simulator (http://russells.freeshell.org/enigma/) Online Enigma simulator (http://www.students.oamk.fi/~k0khro00/Enigma.html) Kenngruppenheft (http://www.wwiiarchives.net/servlet/action/document/index/97/0) Process of building an Enigma M4 replica (http://www.enigma-maschine.de/en/) Breaking German Navy Ciphers (http://www.enigma.hoerenberg.com)

Colossus computer

208

Colossus computer
Colossus computer

A Colossus Mark 2 computer. The operators are (left to right) Dorothy Du Boisson and Elsie Booker. The slanted control panel on the left was used to set the "pin" (or "cam") patterns of the Lorenz. The "bedstead" paper tape transport is on the right. Developer Manufacturer Type Generation Release date Tommy Flowers Post Office Research Station Special-purpose electronic digital programmable computer First-generation computer Mk 1: December 1943; Mk 2: 1 June 1944 8June1945 10 Paper tape, teleprinter output Custom circuits using valves and Thyratrons. A total of 1500 in each Mk 1 and 2400 in Mk 2. Also relays and stepping switches 20 000 5-bit characters in paper tape loop None (no RAM) Indicator lamp panel console switches, plug panels and photocells reading paper tape

Discontinued Units shipped Media CPU

Storage capacity Memory Display Input

Colossus was the world's first electronic, digital, fixed-program, single-purpose computer with variable coefficients. The Colossus computers were used by British codebreakers during World War II to help in the cryptanalysis of the Lorenz cipher. Without them, the Allies would have been deprived of the very valuable intelligence that was obtained from reading the vast quantity of encrypted high-level telegraphic messages between the German High Command (OKW) and their army commands throughout occupied Europe. Colossus used thermionic valves (vacuum tubes) to perform Boolean operations and calculations. Colossus was designed by engineer Tommy Flowers, to solve a problem posed by mathematician Max Newman at the Government Code and Cypher School (GC&CS) at Bletchley Park. The prototype, Colossus Mark 1, was shown to be working in December 1943 and was operational at Bletchley Park by 5 February 1944.[1] An improved Colossus Mark 2 first worked on 1 June 1944,[2] just in time for the Normandy Landings. Ten Colossus computers were in use by the end of the war.

Colossus computer The destruction of most of the Colossus hardware and blueprints, as part of the effort to maintain a project secrecy that was kept up into the 1970s, deprived some of the Colossus creators of credit for their pioneering advancements in electronic digital computing during their lifetimes. A functioning replica of a Colossus computer was completed in 2007, and is on display at the The National Museum of Computing at Bletchley Park.[3] It has sometimes been erroneously stated that Alan Turing designed Colossus to aid the Cryptanalysis of the Enigma machine.[4] Turing's machine that helped solve Enigma, was the electromechanical Bombe, not Colossus.[5]

209

Purpose and origins


The Colossus computers were used to help decipher (radio) teleprinter messages that had been encrypted using the electromechanical Lorenz SZ40/42 in-line cipher machine. To encipher a message with the Lorenz machine, the 5-bit plaintext characters were combined with a stream of key characters. The keystream was generated using twelve pinwheels. British codebreakers referred to encrypted German teleprinter traffic as "Fish" and called the SZ40/42 machine and the intercepted messages "Tunny". Colossus was used for finding possible Lorenz key settings rather than The Lorenz SZ machines had 12 wheels each with a different number of cams (or "pins"). completely decrypting the message. It compared two data streams, counting a statistic based on a programmable Boolean function. The ciphertext was read at high speed from a paper tape. The other stream was generated internally, and was an electronic simulation of part of the Lorenz machine. If the count for a setting was above a certain threshold, it would be sent as output to an electric typewriter. The logical structure of the Lorenz machine was diagnosed at Bletchley Park without a machine being seensomething that did not happen until almost the end of the war.[6] First, John Tiltman, a very talented GC&CS cryptanalyst derived a key stream of almost 4000 characters from a German operating blunder in August 1941. Then Bill Tutte, a newly-arrived member of the Research Section used this key stream to work out the logical structure of the Lorenz machine. He correctly deduced that it had twelve wheels in two groups of five, which he named the (chi) and (psi) wheels, and the remaining two the mu or "motor" wheels. The wheels stepped regularly with each letter that was encrypted, while the wheels stepped irregularly, under the control of the motor wheels.[7] In order to read messages, there were two tasks that had to be performed. The first task was "wheel breaking", which was discovering the pin patterns for all the wheels. These patterns were set up once on the Lorenz machine and then used for a fixed period of time and for a number of different messages. The second task was "wheel setting", which could be attempted once the pin patterns were known.[8] Each message encrypted using Lorenz was enciphered at a different start position for the wheels. Knowing that in German, as in other languages, there is a non-random distribution of the different letters, and that the psi wheels did not advance with each character, Tutte worked out that trying two impulses of the chi-stream against the ciphertext would produce a statistic that was non-random. This became known as Tutte's "1+2 break in".[9] The process of wheel setting found the start position for a message. Initially Colossus was used to work out the start positions of the chi wheels, but later a method was devised for it to be used for wheel breaking as well.

Colossus computer Colossus was developed for the "Newmanry",[10] the section headed by the mathematician Max Newman at Bletchley Park responsible for machine methods against the Lorenz machine. The Colossus design arose out of a prior project that produced a counting machine dubbed "Heath Robinson". The main problems with the Heath Robinson were the relative slowness of electro-mechanical parts and the difficulty of synchronising two paper tapes, one punched with the enciphered message, the other representing the patterns produced by the wheels of the Lorenz machine. The tapes tended to stretch when being read, at some 2000 characters per second, resulting in unreliable counts. Tommy Flowers of the Post Office Research Station at Dollis Hill had designed the "Combining Unit" of Heath Robinson.[11] He was not impressed by the system of a key tape that had to be kept synchronised with the message tape and, on his own initiative, designed an electronic machine which eliminated the need for the key tape. He presented this design to Max Newman in February 1943, but the idea that the one to two thousand thermionic valves (vacuum tubes and thyratrons) proposed, could work together reliably, was greeted with scepticism, so more Robinsons were ordered from Dollis Hill. Flowers, however, persisted with the idea and obtained support from the Director of the Research Station, W Gordon Radley.[12]

210

Construction
Tommy Flowers was a senior engineer at the Post Office Research Station, Dollis Hill, in northwest London.[13] He had previously been involved with GC&CS at Bletchley Park from February 1941 in an attempt to improve the Bombes that were used in the Cryptanalysis of the Enigma German cipher machine.[14] He was recommended to Max Newman by Alan Turing[15] and spent eleven months from early February 1943 designing and building the first Colossus. His team included Sidney Broadhurst, William Chandler, Allen Coombs and Harry Fensom,[16][17] After a functional test in December 1943, Colossus was dismantled and shipped north to Bletchley Park, where it was delivered on 18 January 1944 and assembled by Harry Fensom and Don Horwood,[18] and attacked its first message on 5 February.[1] The first, prototype, Colossus (Mark 1), was followed by nine Mark 2 machines, the first being commissioned in June 1944, and the original Mark 1 machine was converted into a Mark 2. An eleventh Colossus was essentially finished at the end of the war. Colossus Mark 1 contained 1500 thermionic valves (tubes). Colossus Mark 2 with 2400 valves was both 5 times faster and simpler to operate than Mark 1, greatly speeding the decoding process. Mark 2 was designed while Mark 1 was being constructed. Allen Coombs took over leadership of the Colossus Mark 2 project when Tommy Flowers moved on to other projects.[19] For comparison, later stored-program computers such as the Manchester Mark 1 of 1949 used 4050 valves,[20] while ENIAC (1946) used 17,468 valves. Colossus dispensed with the second tape of the Heath Robinson design by generating the wheel patterns electronically, and processing 5,000 characters per second with the paper tape moving at 40ft/s (12.2m/s or 27.3mph). The circuits were synchronized by a clock signal generated by the sprocket holes of the punched tape. The speed of calculation was thus limited by the mechanics of the tape reader. Tommy Flowers tested the tape reader up to 9,700 characters per second (53mph) before the tape disintegrated. He settled on 5,000 characters/second as the desirable speed for regular operation. Sometimes, two or more Colossus computers tried different possibilities simultaneously in what now is called parallel computing, speeding the decoding process by perhaps as much as double the rate of comparison. Colossus included the first ever use of shift registers and systolic arrays, enabling five simultaneous tests, each involving up to 100 Boolean calculations, on each of the five channels on the punched tape (although in normal operation only one or two channels were examined in any run). Initially Colossus was only used to determine the initial wheel positions used for a particular message (termed wheel setting). The Mark 2 included mechanisms intended to help determine pin patterns (wheel breaking). Both models were programmable using switches and plug panels in a way the Robinsons had not been.

Colossus computer

211

Design and operation


Colossus used state-of-the-art vacuum tubes (thermionic valves), thyratrons and photomultipliers to optically read a paper tape and then applied a programmable logical function to every character, counting how often this function returned "true". Although machines with many valves were known to have high failure rates, it was recognised that valve failures occurred most frequently with the current surge when powering up, so the Colossus machines, once turned on, were never powered down unless they [21] malfunctioned. Colossus was the first of the electronic digital machines with programmability, albeit limited by modern standards:[22] it had no internally stored programs. In 1994, a team led by Tony Sale (right) began a reconstruction of a Colossus at To set it up for a new task, the Bletchley Park. Here, in 2006, Sale supervises the breaking of an enciphered message operator had to set up plugs and with the completed machine. switches to alter the wiring. Colossus was not a general-purpose machine, being designed for a specific cryptanalytic task involving counting and Boolean operations. It was thus not a fully general Turing-complete computer, even though Alan Turing worked at Bletchley Park. It was not then realized that Turing completeness was significant; most of the other pioneering modern computing machines were also not Turing complete (e.g. the AtanasoffBerry Computer, the Bell Labs relay machines (by George Stibitz et al.), or the first designs of Konrad Zuse). The notion of a computer as a general purpose machinethat is, as more than a calculator devoted to solving difficult but specific problemsdid not become prominent for several years. Colossus was preceded by several computers, many of them first in some category. Zuse's Z3 was the first functional fully program-controlled computer, and was based on electromechanical relays, as were the (less advanced) Bell Labs machines of the late 1930s (George Stibitz, et al.). The AtanasoffBerry Computer was electronic and binary (digital) but not programmable. Assorted analog computers were semiprogrammable; some of these much predated the 1930s (e.g., Vannevar Bush). Babbage's Analytical engine design predated all these (in the mid-19th century), it was a decimal, programmable, entirely mechanical constructionbut was only partially built and never functioned during Babbage's lifetime (the first complete mechanical Difference engine No. 2, built in 1991, does work however). Colossus was the first combining digital, (partially) programmable, and electronic. The first fully programmable digital electronic computer was the ENIAC which was completed in 1946.

Colossus computer

212

Influence and fate


The use to which the Colossus computers were put was of the highest secrecy, and the Colossus itself was highly secret, and remained so for many years after the War. Thus, Colossus could not be included in the history of computing hardware for many years, and Flowers and his associates also were deprived of the recognition they were due. Being not widely known, it therefore had little direct influence on the development of later computers; EDVAC was the early design which had the most influence on subsequent computer architecture. However, the technology of Colossus, and the knowledge that reliable high-speed electronic digital computing devices were feasible, had a significant influence on the development of early computers in the United Kingdom and probably in the US. A number of people who were associated with the project and knew all about Colossus played significant roles in early computer work in the UK. In 1972, Herman Goldstine wrote that: Britain had such vitality that it could immediately after the war embark on so many well-conceived and well-executed projects in the computer field.[23] In writing that, Goldstine was unaware of Colossus, and its legacy to those projects of people such as Alan Turing (with the Pilot ACE and ACE), and Max Newman and I. J. Good (with the Manchester Mark 1 and other early Manchester computers). Brian Randell later wrote that: the COLOSSUS project was an important source of this vitality, one that has been largely unappreciated, as has the significance of its places in the chronology of the invention of the digital computer.[24] Colossus documentation and hardware were classified from the moment of their creation and remained so after the War, when Winston Churchill specifically ordered the destruction of most of the Colossus machines into "pieces no bigger than a man's hand"; Tommy Flowers was ordered to destroy all documentation and burnt them in a furnace at Dollis Hill. He later said of that order: That was a terrible mistake. I was instructed to destroy all the records, which I did. I took all the drawings and the plans and all the information about Colossus on paper and put it in the boiler fire. And saw it burn.[25] Some parts, sanitised as to their original use, were taken to Newman's Royal Society Computing Machine Laboratory at Manchester University.[26] The Colossus Mark 1 was dismantled and parts returned to the Post Office. Two Colossus computers, along with two replica Tunny machines, were retained, moving to GCHQ's new headquarters at Eastcote in April 1946, and moving again with GCHQ to Cheltenham between 1952 and 1954.[27] One of the Colossi, known as Colossus Blue, was dismantled in 1959; the other in 1960.[27] In their later years, the Colossi were used for training, but before that, there had been attempts to adapt them, with varying success, to other purposes.[28] Jack Good relates how he was the first to use it after the war, persuading NSA that Colossus could be used to perform a function for which they were planning to build a special purpose machine.[27] Colossus was also used to perform character counts on one-time pad tape to test for non-randomness.[27] Throughout this period the Colossus remained secret, long after any of its technical details were of any importance. This was due to the UK's intelligence agencies use of Enigma-like machines which they promoted and sold to other governments, and then broke the codes using a variety of methods. Had the knowledge of the codebreaking machines been widely known, no one would have accepted these machines; rather, they would have developed their own methods for encryption, methods that the UK services might not have been able to break. The need for such secrecy ebbed away as communications moved to digital transmission and all-digital encryption systems became common in the 1960s. Information about Colossus began to emerge publicly in the late 1970s, after the secrecy imposed was broken when Colonel Winterbotham published his book The Ultra Secret. More recently, a 500-page technical report on the Tunny cipher and its cryptanalysis entitled General Report on Tunny was released by GCHQ to the national Public Record Office in October 2000; the complete report is available online,[29] and it contains a fascinating paean to Colossus by the cryptographers who worked with it:

Colossus computer It is regretted that it is not possible to give an adequate idea of the fascination of a Colossus at work; its sheer bulk and apparent complexity; the fantastic speed of thin paper tape round the glittering pulleys; the childish pleasure of not-not, span, print main header and other gadgets; the wizardry of purely mechanical decoding letter by letter (one novice thought she was being hoaxed); the uncanny action of the typewriter in printing the correct scores without and beyond human aid; the stepping of the display; periods of eager expectation culminating in the sudden appearance of the longed-for score; and the strange rhythms characterizing every type of run: the stately break-in, the erratic short run, the regularity of wheel-breaking, the stolid rectangle interrupted by the wild leaps of the carriage-return, the frantic chatter of a motor run, even the ludicrous frenzy of hosts of bogus scores.[30]

213

Reconstruction
Construction of a fully functional replica[31] of a Colossus Mark 2 was undertaken by a team led by Tony Sale. In spite of the blueprints and hardware being destroyed, a surprising amount of material survived, mainly in engineers' notebooks, but a considerable amount of it in the U.S. The optical tape reader might have posed the biggest problem, but Dr. Arnold Lynch, its original designer, was able to redesign it to his own original specification. The reconstruction is on display, in the historically correct place for Colossus No. 9, at The National Museum of Computing, in H Block Bletchley Park in Milton Keynes, Buckinghamshire. In November 2007, to celebrate the Colossus rebuild seen from the rear project completion and to mark the start of a fundraising initiative for The National Museum of Computing, a Cipher Challenge[32] pitted the rebuilt Colossus against radio amateurs worldwide in being first to receive and decode three messages enciphered using the Lorenz SZ42 and transmitted from radio station DL0HNF in the Heinz Nixdorf MuseumsForum [33] computer museum. The challenge was easily won by radio amateur Joachim Schth, who had carefully prepared[34] for the event and developed his own signal processing and code-breaking code using Ada.[35] The Colossus team were hampered by their wish to use World War II radio equipment,[36] delaying them by a day because of poor reception conditions. Nevertheless the victor's 1.4GHz laptop, running his own code, took less than a minute to find the settings for all 12 wheels. The German codebreaker said: "My laptop digested ciphertext at a speed of 1.2million characters per second240 times faster than Colossus. If you scale the CPU frequency by that factor, you get an equivalent clock of 5.8MHz for Colossus. That is a remarkable speed for a computer built in 1944."[37] The Cipher Challenge verified the successful completion of the rebuild project. "On the strength of today's performance Colossus is as good as it was six decades ago", commented Tony Sale. "We are delighted to have produced a fitting tribute to the people who worked at Bletchley Park and whose brainpower devised these fantastic machines which broke these ciphers and shortened the war by many months."[38]

Colossus computer

214

Footnotes
[1] [2] [3] [4] [5] Copeland 2006, p.75 Copeland 2006, p.427 The National Museum of Computing: The Colossus Gallery (http:/ / www. tnmoc. org/ explore/ colossus-gallery), , retrieved 18 October 2012 See e.g. Golden, Frederic (March 29, 1999), "Who Built The First Computer?", Time Magazine 153 (12) Copeland, Jack, Colossus: The first large scale electronic computer (http:/ / www. colossus-computer. com/ colossus1. html#sdfootnote96sym), , retrieved 21 October 2012 [6] Sale, Tony, The Lorenz Cipher and how Bletchley Park broke it (http:/ / www. codesandciphers. org. uk/ lorenz/ fish. htm), , retrieved 21 October 2010 [7] Tutte 2006, p.357 [8] Good, Michie & Timms 1945, p.15 in 1. Introduction: German Tunny [9] Budiansky 2006, pp.5859 [10] Good, Michie & Timms 1945, p.276 in 3. Organisation: Mr Newman's section [11] Good, Michie & Timms 1945, p.33 in 1. Introduction: Some historical notes [12] Randell 2006 [13] Flowers was appointed MBE in June 1943. [14] Randell, p.9 [15] Budianski 2000, p.314 [16] "Bletchley's code-cracking Colossus" (http:/ / news. bbc. co. uk/ 1/ hi/ technology/ 8492762. stm), BBC News, 2 February 2010, , retrieved 19 October 2012 [17] Fensom 2010 [18] The Colossus Rebuild http:/ / www. tnmoc. org/ colossus-rebuild. aspx [19] Randell, Brian; Fensom, Harry; Milne, Frank A. (15 March 1995), "Obituary: Allen Coombs" (http:/ / www. independent. co. uk/ news/ people/ obituary-allen-coombs-1611270. html), The Independent, , retrieved 18 October 2012 [20] Lavington, S. H. (July 1977), The Manchester Mark 1 and Atlas: a Historical Perspective (http:/ / www. cs. ucf. edu/ courses/ cda5106/ summer03/ papers/ mark1. atlas. 1. pdf), University of Central Florida, , retrieved 8February 2009 [21] Copeland 2006, p.72 [22] A Brief History of Computing. Jack Copeland, June 2000 (http:/ / www. alanturing. net/ turing_archive/ pages/ Reference Articles/ BriefHistofComp. html#Col) [23] Goldstine, Herman H. (1980), The Computer from Pascal to von Neumann (New Ed edition (1 Oct 1980) ed.), Princeton University Press, p.321, ISBN978-0-691-02367-0 [24] B. Randell, "The Colossus", in A History of Computing in the Twentieth Century (N. Metropolis, J. Howlett and G. C. Rota, Eds.), pp.4792, Academic Press, New York, 1980., p. 87 [25] McKay (2010) page 270-271 [26] "A Brief History of Computing" (http:/ / www. alanturing. net/ turing_archive/ pages/ Reference Articles/ BriefHistofComp. html#ACE). alanturing.net. . Retrieved 26 January 2010. [27] Copeland 2006, pp.173175 [28] Horwood, 1973 [29] Good, Michie & Timms 1945 [30] Good, Michie & Timms 1945, p.327 in 51. Introductory: Impressions of Colossus [31] "The Colossus Rebuild Project by Tony Sale" (http:/ / www. codesandciphers. org. uk/ lorenz/ rebuild. htm). . Retrieved 30 October 2011 [32] "Cipher Challenge" (http:/ / web. archive. org/ web/ 20080801175842/ http:/ / www. tnmoc. org/ cipher1. htm). Archived from the original (http:/ / www. tnmoc. org/ cipher1. htm) on 1 August 2008. . Retrieved 1 February 2012. [33] http:/ / en. hnf. de/ default. asp [34] "SZ42 Codebreaking Software" (http:/ / www. schlaupelz. de/ SZ42/ SZ42_software. html). . [35] "Cracking the Lorenz Code (interview with Schth)" (http:/ / www. adacore. com/ home/ ada_answers/ lorenz-code/ ). . [36] Ward, Mark (16 November 2007). "BBC News Article" (http:/ / news. bbc. co. uk/ 1/ hi/ technology/ 7098005. stm). . Retrieved 2 January 2010. [37] "German Codebreaker receives Bletchley Park Honours" (http:/ / www. bletchleypark. org. uk/ news/ docview. rhtm/ 487682). . [38] "Latest Cipher Challenge News 16.11.2007" (http:/ / www. tnmoc. org/ cipher7. htm). .

Colossus computer

215

References
Budiansky, Stephen (2000), Battle of wits: The Complete Story of Codebreaking in World War II, Free Press, ISBN978-0684859323 Budiansky, Stephen (2006), Colossus, Codebreaking, and the Digital Age in Copeland 2006, pp.5263 Chandler, W. W. (1983), "The Installation and Maintenance of Colossus", IEEE Annals of the History of Computing 5 (3): 260262 Coombs, Allen W. M. (July 1983), "The Making of Colossus" (http://www.ivorcatt.com/47d.htm), IEEE Annals of the History of Computing 5 (3): 253259 Copeland, B. Jack (2011) [2001], Colossus and the Dawning of the Computer Age in Erskine & Smith 2011, pp.305327 Copeland, B. J. (Oct.-Dec. 2004), "Colossus: its origins and originators", IEEE Annals of the History of Computing 26 (4): 3845 Copeland, B. Jack, ed. (2006), Colossus: The Secrets of Bletchley Park's Codebreaking Computers, Oxford: Oxford University Press, ISBN978-0-19-284055-4 Erskine, Ralph; Smith, Michael, eds. (2011), The Bletchley Park Codebreakers, Biteback Publishing Ltd, ISBN978 184954078 0 Updated and extended version of Action This Day: From Breaking of the Enigma Code to the Birth of the Modern Computer Bantam Press 2001 Fensom, Jim (8 November 2010), Harry Fensom obituary (http://www.guardian.co.uk/theguardian/2010/ nov/08/harry-fensom-obituary), retrieved 17 October 2012 Good, Jack; Michie, Donald; Timms, Geoffrey (1945), General Report on Tunny: With Emphasis on Statistical Methods (http://www.alanturing.net/turing_archive/archive/index/tunnyreportindex.html), UK Public Record Office HW 25/4 and HW 25/5, retrieved 15 September 2010 That version is a facsimile copy, but there is a transcript of much of this document in '.pdf' format at: Sale, Tony (2001), Part of the "General Report on Tunny", the Newmanry History, formatted by Tony Sale (http://www.codesandciphers.org.uk/documents/ newman/newman.pdf), retrieved 20 September 2010, and a web transcript of Part 1 at: Ellsbury, Graham, General Report on Tunny With Emphasis on Statistical Methods (http://www.ellsbury.com/tunny/tunny-001. htm), retrieved 3 November 2010 I. J. Good, Early Work on Computers at Bletchley (IEEE Annals of the History of Computing, Vol. 1 (No. 1), 1979, pp.3848) I. J. Good, Pioneering Work on Computers at Bletchley (in Nicholas Metropolis, J. Howlett, Gian-Carlo Rota, (editors), A History of Computing in the Twentieth Century, Academic Press, New York, 1980) T. H. Flowers, The Design of Colossus (http://www.ivorcatt.com/47c.htm) (Annals of the History of Computing, Vol. 5 (No. 3), 1983, pp.239252) Flowers, Thomas H. (2006), D-Day at Bletchley Park in Copeland 2006, pp.7883 Horwood, D.C. (1973), A technical description of Colossus I: PRO HW 25/24 (http://www.youtube.com/ watch?v=JF48sl15OCg) McKay, Sinclair (2010), The Secret Life of Bletchley Park: The WWII Codebreaking Centre and the men and women who worked there, London: Aurum Press, ISBN978 1 84513 539 3 Brian Randell, Colossus: Godfather of the Computer, 1977 (reprinted in The Origins of Digital Computers: Selected Papers, Springer-Verlag, New York, 1982) Brian Randell, The COLOSSUS (http://www.cs.ncl.ac.uk/publications/books/papers/133.pdf) (in A History of Computing in the Twentieth Century) Randell, Brian (2006), Of Men and Machines in Copeland 2006, pp.141149 Sale, Tony (2000), "The Colossus of Bletchley Park The German Cipher System", in Rojas, Ral; Hashagen, Ulf, The First Computers: History and Architecture, Cambridge, Massachusetts: The MIT Press, pp.351364, ISBN0-262-18197-5

Colossus computer Albert W. Small, The Special Fish Report (http://www.codesandciphers.org.uk/documents/small/smallix. htm) (December, 1944) describes the operation of Colossus in breaking Tunny messages Tutte, William T. (2006), Appendix 4: My Work at Bletchley Park in Copeland 2006, pp.352369

216

Further reading
Cragon, Harvey G. (2003), From Fish to Colossus: How the German Lorenz Cipher was Broken at Bletchley Park, Dallas: Cragon Books, ISBN0-9743045-0-6 A detailed description of the cryptanalysis of Tunny, and some details of Colossus (contains some minor errors) Enever, Ted (1999), Britain's Best Kept Secret: Ultra's Base at Bletchley Park (3rd ed.), Sutton Publishing, Gloucestershire, ISBN978-0-7509-2355-2 A guided tour of the history and geography of the Park, written by one of the founder members of the Bletchley Park Trust Gannon, Paul (2007), Colossus: Bletchley Park's Greatest Secret, Atlantic Books, ISBN978-1-84354-331-2 Rojas, R.; Hashagen, U. (2000), The First Computers: History and Architectures, MIT Press, ISBN0-262-18197-5 Comparison of the first computers, with a chapter about Colossus and its reconstruction by Tony Sale. Sale, Tony, The Colossus Computer 19431996: How It Helped to Break the German Lorenz Cipher in WWII (M.&M. Baldwin, Kidderminster, 2004; ISBN 0-947712-36-4) A slender (20 page) booklet, containing the same material as Tony Sale's website (see below) Smith, Michael (2007) [1998], Station X: The Codebreakers of Bletchley Park, Pan Grand Strategy Series (Pan Books ed.), London: Pan McMillan Ltd, ISBN978-0-330-41929-1

Other meanings
There was a fictional computer named Colossus in the movie Colossus: The Forbin Project. Also see List of fictional computers. Neal Stephenson's novel Cryptonomicon (1999) also contains a fictional treatment of the historical role played by Turing and Bletchley Park.

External links
The National Museum of Computing (http://www.tnmoc.org) Tony Sale's Codes and Ciphers (http://www.codesandciphers.org.uk/index.htm) Contains a great deal of information, including: Colossus, the revolution in code breaking (http://www.codesandciphers.org.uk/virtualbp/fish/colossus. htm) Lorenz Cipher and the Colossus (http://www.codesandciphers.org.uk/lorenz/index.htm) The machine age comes to Fish codebreaking (http://www.codesandciphers.org.uk/lorenz/colossus. htm) The Colossus Rebuild Project (http://www.codesandciphers.org.uk/lorenz/rebuild.htm) The Colossus Rebuild Project: Evolving to the Colossus Mk 2 (http://www.codesandciphers.org.uk/ lorenz/mk2.htm) Walk around Colossus (http://www.codesandciphers.org.uk/lorenz/colwalk/colossus.htm) A detailed tour of the replica Colossus make sure to click on the "More Text" links on each image to see the informative detailed text about that part of Colossus IEEE lecture (http://www.codesandciphers.org.uk/lectures/ieee.txt) Transcript of a lecture Tony Sale gave describing the reconstruction project BBC news article reporting on the replica Colossus (http://news.bbc.co.uk/1/hi/technology/3754887.stm)

Colossus computer BBC news article: "Colossus cracks codes once more" (http://news.bbc.co.uk/1/hi/technology/7094881. stm) BBC news article: BBC news article: "Bletchley's code-cracking Colossus" with video interviews 2010-02-02 (http://news.bbc.co.uk/1/hi/technology/8492762.stm) Website on Copeland's 2006 book (http://www.colossus-computer.com/contents.htm) with much information and links to recently declassified information Was the Manchester Baby conceived at Bletchley Park? (http://www.bcs.org/upload/pdf/ewic_tur04_paper3. pdf) Walk through video of the Colossus rebuild at Bletchley Park (http://www.youtube.com/ watch?v=NWYzwIjSk6s)

217

Game theory
Game theory is a study of strategic decision making. More formally, it is "the study of mathematical models of conflict and cooperation between intelligent rational decision-makers."[1] An alternative term suggested "as a more descriptive name for the discipline" is interactive decision theory.[2] Game theory is mainly used in economics, political science, and psychology, as well as logic and biology. The subject first addressed zero-sum games, such that one person's gains exactly equal net losses of the other participant(s). Today, however, game theory applies to a wide range of class relations, and has developed into an umbrella term for the logical side of science, to include both human and non-humans, like computers. Classic uses include a sense of balance in numerous games, where each person has found or developed a tactic that cannot successfully better his results, given the other approach. Modern game theory began with the idea regarding the existence of mixed-strategy equilibria in two-person zero-sum games and its proof by John von Neumann. Von Neumann's original proof used Brouwer's fixed-point theorem on continuous mappings into compact convex sets, which became a standard method in game theory and mathematical economics. His paper was followed by his 1944 book Theory of Games and Economic Behavior, with Oskar Morgenstern, which considered cooperative games of several players. The second edition of this book provided an axiomatic theory of expected utility, which allowed mathematical statisticians and economists to treat decision-making under uncertainty. This theory was developed extensively in the 1950s by many scholars. Game theory was later explicitly applied to biology in the 1970s, although similar developments go back at least as far as the 1930s. Game theory has been widely recognized as an important tool in many fields. Eight game-theorists have won the Nobel Memorial Prize in Economic Sciences, and John Maynard Smith was awarded the Crafoord Prize for his application of game theory to biology.

Game theory

218

Representation of games
The games studied in game theory are well-defined mathematical objects. A game consists of a set of players, a set of moves (or strategies) available to those players, and a specification of payoffs for each combination of strategies. Most cooperative games are presented in the characteristic function form, while the extensive and the normal forms are used to define noncooperative games.

Extensive form
The extensive form can be used to formalize games with a time sequencing of moves. Games here are played on trees (as pictured to the left). Here each vertex (or node) represents a point of choice for a player. The player is specified by a number listed by the vertex. The lines out of the vertex represent a possible action for that player. The payoffs are specified at the bottom of the tree. The extensive form can be viewed as a multi-player generalization of a decision tree. (Fudenberg & Tirole 1991, p.67)

An extensive form game

In the game pictured to the left, there are two players. Player 1 moves first and chooses either F or U. Player 2 sees Player 1's move and then chooses A or R. Suppose that Player 1 chooses U and then Player 2 chooses A, then Player 1 gets 8 and Player 2 gets 2. The extensive form can also capture simultaneous-move games and games with imperfect information. To represent it, either a dotted line connects different vertices to represent them as being part of the same information set (i.e., the players do not know at which point they are), or a closed line is drawn around them. (See example in the imperfect information section.)

Normal form
Player 2 chooses Left Player 1 chooses Up Player 1 choosesDown Player 2 chooses Right

4, 3 0, 0

1, 1 3, 4

Normal form or payoff matrix of a 2-player, 2-strategy game

The normal (or strategic form) game is usually represented by a matrix which shows the players, strategies, and pay-offs (see the example to the right). More generally it can be represented by any function that associates a payoff for each player with every possible combination of actions. In the accompanying example there are two players; one chooses the row and the other chooses the column. Each player has two strategies, which are specified by the number of rows and the number of columns. The payoffs are provided in the interior. The first number is the payoff received by the row player (Player 1 in our example); the second is the payoff for the column player (Player 2 in our example). Suppose that Player 1 plays Up and that Player 2 plays Left. Then Player 1 gets a payoff of 4, and Player 2 gets 3. When a game is presented in normal form, it is presumed that each player acts simultaneously or, at least, without knowing the actions of the other. If players have some information about the choices of other players, the game is usually presented in extensive form.

Game theory Every extensive-form game has an equivalent normal-form game, however the transformation to normal form may result in an exponential blowup in the size of the representation, making it computationally impractical. (Leyton-Brown & Shoham 2008, p.35)

219

Characteristic function form


In games that possess removable utility separate rewards are not given; rather, the characteristic function decides the payoff of each unity. The idea is that the unity that is 'empty', so to speak, does not receive a reward at all. The origin of this form is to be found in John von Neumann and Oskar Morgenstern's book; when looking at these instances, they guessed that when a union appears, it works against the fraction as if two individuals

were playing a normal game. The balanced payoff of C is a basic function. Although there are differing examples that help determine coalitional amounts from normal games, not all appear that in their function form can be derived from such. Formally, a characteristic function is seen as: (N,v), where N represents the group of people and normal utility. Such characteristic functions have expanded to describe games where there is no removable utility. is a

Partition function form


The characteristic function form ignores the possible externalities of coalition formation. In the partition function form the payoff of a coalition depends not only on its members, but also on the way the rest of the players are partitioned (Thrall & Lucas 1963).

General and applied uses


As a method of applied mathematics, game theory has been used to study a wide variety of human and animal behaviors. It was initially developed in economics to understand a large collection of economic behaviors, including behaviors of firms, markets, and consumers. The use of game theory in the social sciences has expanded, and game theory has been applied to political, sociological, and psychological behaviors as well. Game-theoretic analysis was initially used to study animal behavior by Ronald Fisher in the 1930s (although even Charles Darwin makes a few informal game-theoretic statements). This work predates the name "game theory", but it shares many important features with this field. The developments in economics were later applied to biology largely by John Maynard Smith in his book Evolution and the Theory of Games. In addition to being used to describe, predict, and explain behavior, game theory has also been used to develop theories of ethical or normative behavior and to prescribe such behavior.[3] In economics and philosophy, scholars have applied game theory to help in the understanding of good or proper behavior. Game-theoretic arguments of this type can be found as far back as Plato.[4]

Game theory

220

Description and modeling


The first known use is to describe and model how human populations behave. Some scholars believe that by finding the equilibria of games they can predict how actual human populations will behave when confronted with situations analogous to the game being studied. This particular view of A three stage Centipede Game game theory has come under recent criticism. First, it is criticized because the assumptions made by game theorists are often violated. Game theorists may assume players always act in a way to directly maximize their wins (the Homo economicus model), but in practice, human behavior often deviates from this model. Explanations of this phenomenon are many; irrationality, new models of deliberation, or even different motives (like that of altruism). Game theorists respond by comparing their assumptions to those used in physics. Thus while their assumptions do not always hold, they can treat game theory as a reasonable scientific ideal akin to the models used by physicists. However, in the centipede game, guess 2/3 of the average game, and the dictator game, people regularly do not play Nash equilibria. These experiments have demonstrated that individuals do not play equilibrium strategies. There is an ongoing debate regarding the importance of these experiments.[5] Alternatively, some authors claim that Nash equilibria do not provide predictions for human populations, but rather provide an explanation for why populations that play Nash equilibria remain in that state. However, the question of how populations reach those points remains open. Some game theorists have turned to evolutionary game theory in order to resolve these issues. These models presume either no rationality or bounded rationality on the part of players. Despite the name, evolutionary game theory does not necessarily presume natural selection in the biological sense. Evolutionary game theory includes both biological as well as cultural evolution and also models of individual learning (for example, fictitious play dynamics).

Prescriptive or normative analysis


Cooperate Defect Cooperate

Defect

-1, -1 -10, 0 0, -10 -5, -5

The Prisoner's Dilemma

On the other hand, some scholars see game theory not as a predictive tool for the behavior of human beings, but as a suggestion for how people ought to behave. Since a strategy, corresponding to a Nash equilibrium of a game constitutes one's best response to the actions of the other players - provided they are in (the same) Nash equilibrium -, playing a strategy that is part of a Nash equilibrium seems appropriate. However, the rationality of such a decision has been proved only for special cases. This normative use of game theory has also come under criticism. First, in some cases it is appropriate to play a non-equilibrium strategy if one expects others to play non-equilibrium strategies as well. For an example, see Guess 2/3 of the average. Second, the Prisoner's dilemma presents another potential counterexample. In the Prisoner's Dilemma, each player pursuing his own self-interest leads both players to be worse off than had they not pursued their own self-interests.

Game theory

221

Economics and business


Game theory is a major method used in mathematical economics and business for modeling competing behaviors of interacting agents.[6] Applications include a wide array of economic phenomena and approaches, such as auctions, bargaining, mergers & acquisitions pricing,[7] fair division, duopolies, oligopolies, social network formation, agent-based computational economics,[8] general equilibrium, mechanism design,[9] and voting systems,[10] and across such broad areas as experimental economics,[11] behavioral economics,[12] information economics,[13] industrial organization,[14] and political economy.[15][16] This research usually focuses on particular sets of strategies known as equilibria in games. These "solution concepts" are usually based on what is required by norms of rationality. In non-cooperative games, the most famous of these is the Nash equilibrium. A set of strategies is a Nash equilibrium if each represents a best response to the other strategies. So, if all the players are playing the strategies in a Nash equilibrium, they have no unilateral incentive to deviate, since their strategy is the best they can do given what others are doing.[7][7] The payoffs of the game are generally taken to represent the utility of individual players. Often in modeling situations the payoffs represent money, which presumably corresponds to an individual's utility. This assumption, however, can be faulty. A prototypical paper on game theory in economics begins by presenting a game that is an abstraction of a particular economic situation. One or more solution concepts are chosen, and the author demonstrates which strategy sets in the presented game are equilibria of the appropriate type. Naturally one might wonder to what use should this information be put. Economists and business professors suggest two primary uses (noted above): descriptive and prescriptive.[3]

Political science
The application of game theory to political science is focused in the overlapping areas of fair division, political economy, public choice, war bargaining, positive political theory, and social choice theory. In each of these areas, researchers have developed game-theoretic models in which the players are often voters, states, special interest groups, and politicians. For early examples of game theory applied to political science, see the work of Anthony Downs. In his book An Economic Theory of Democracy (Downs1957), he applies the Hotelling firm location model to the political process. In the Downsian model, political candidates commit to ideologies on a one-dimensional policy space. Downs first shows how the political candidates will converge to the ideology preferred by the median voter if voters are fully informed, but then argues that voters choose to remain rationally ignorant which allows for candidate divergence. A game-theoretic explanation for democratic peace is that public and open debate in democracies send clear and reliable information regarding their intentions to other states. In contrast, it is difficult to know the intentions of nondemocratic leaders, what effect concessions will have, and if promises will be kept. Thus there will be mistrust and unwillingness to make concessions if at least one of the parties in a dispute is a non-democracy (Levy & Razin2003).

Biology

Game theory

222

Hawk Dove Hawk

Dove

20, 80, 20 40 40, 60, 80 60

The hawk-dove game

Unlike economics, the payoffs for games in biology are often interpreted as corresponding to fitness. In addition, the focus has been less on equilibria that correspond to a notion of rationality, but rather on ones that would be maintained by evolutionary forces. The best known equilibrium in biology is known as the evolutionarily stable strategy (or ESS), and was first introduced in (Smith & Price 1973). Although its initial motivation did not involve any of the mental requirements of the Nash equilibrium, every ESS is a Nash equilibrium. In biology, game theory has been used to understand many different phenomena. It was first used to explain the evolution (and stability) of the approximate 1:1 sex ratios. (Fisher 1930) suggested that the 1:1 sex ratios are a result of evolutionary forces acting on individuals who could be seen as trying to maximize their number of grandchildren. Additionally, biologists have used evolutionary game theory and the ESS to explain the emergence of animal communication (Harper & Maynard Smith 2003). The analysis of signaling games and other communication games has provided insight into the evolution of communication among animals. For example, the mobbing behavior of many species, in which a large number of prey animals attack a larger predator, seems to be an example of spontaneous emergent organization. Ants have also been shown to exhibit feed-forward behavior akin to fashion, see Butterfly Economics. Biologists have used the game of chicken to analyze fighting behavior and territoriality. Maynard Smith, in the preface to Evolution and the Theory of Games, writes, "paradoxically, it has turned out that game theory is more readily applied to biology than to the field of economic behaviour for which it was originally designed". Evolutionary game theory has been used to explain many seemingly incongruous phenomena in nature.[17] One such phenomenon is known as biological altruism. This is a situation in which an organism appears to act in a way that benefits other organisms and is detrimental to itself. This is distinct from traditional notions of altruism because such actions are not conscious, but appear to be evolutionary adaptations to increase overall fitness. Examples can be found in species ranging from vampire bats that regurgitate blood they have obtained from a night's hunting and give it to group members who have failed to feed, to worker bees that care for the queen bee for their entire lives and never mate, to Vervet monkeys that warn group members of a predator's approach, even when it endangers that individual's chance of survival.[18] All of these actions increase the overall fitness of a group, but occur at a cost to the individual. Evolutionary game theory explains this altruism with the idea of kin selection. Altruists discriminate between the individuals they help and favor relatives. Hamilton's rule explains the evolutionary reasoning behind this selection with the equation c<b*r where the cost (c) to the altruist must be less than the benefit (b) to the recipient multiplied by the coefficient of relatedness (r). The more closely related two organisms are causes the incidences of altruism to increase because they share many of the same alleles. This means that the altruistic individual, by ensuring that the alleles of its close relative are passed on, (through survival of its offspring) can forgo the option of having offspring itself because the same number of alleles are passed on. Helping a sibling for example (in diploid animals), has a coefficient of , because (on average) an individual shares of the alleles in its sibling's offspring. Ensuring that enough of a siblings offspring survive to adulthood precludes the necessity of the altruistic individual producing offspring.[18] The coefficient values depend heavily on the scope of the playing field; for example if the choice of whom to favor includes all genetic living things, not just all relatives, we assume the discrepancy between all

Game theory humans only accounts for approximately 1% of the diversity in the playing field, a co-efficient that was in the smaller field becomes 0.995. Similarly if it is considered that information other than that of a genetic nature (e.g. epigenetics, religion, science, etc.) persisted through time the playing field becomes larger still, and the discrepancies smaller.

223

Computer science and logic


Game theory has come to play an increasingly important role in logic and in computer science. Several logical theories have a basis in game semantics. In addition, computer scientists have used games to model interactive computations. Also, game theory provides a theoretical basis to the field of multi-agent systems. Separately, game theory has played a role in online algorithms. In particular, the k-server problem, which has in the past been referred to as games with moving costs and request-answer games (Ben David, Borodin & Karp et al.1994). Yao's principle is a game-theoretic technique for proving lower bounds on the computational complexity of randomized algorithms, and especially of online algorithms. The emergence of the internet has motivated the development of algorithms for finding equilibria in games, markets, computational auctions, peer-to-peer systems, and security and information markets. Algorithmic game theory[19] and within it algorithmic mechanism design[20] combine computational algorithm design and analysis of complex systems with economic theory.[21]

Philosophy
Stag Hare Stag

Hare

3, 0, 3 2 2, 2, 0 2

Stag hunt

Game theory has been put to several uses in philosophy. Responding to two papers by W.V.O. Quine(1960, 1967), Lewis (1969) used game theory to develop a philosophical account of convention. In so doing, he provided the first analysis of common knowledge and employed it in analyzing play in coordination games. In addition, he first suggested that one can understand meaning in terms of signaling games. This later suggestion has been pursued by several philosophers since Lewis (Skyrms (1996), Grim, Kokalis, and Alai-Tafti et al.(2004)). Following Lewis (1969) game-theoretic account of conventions, Edna Ullmann-Margalit (1977) and Bicchieri (2006) have developed theories of social norms that define them as Nash equilibria that result from transforming a mixed-motive game into a coordination game.[22][23] Game theory has also challenged philosophers to think in terms of interactive epistemology: what it means for a collective to have common beliefs or knowledge, and what are the consequences of this knowledge for the social outcomes resulting from agents' interactions. Philosophers who have worked in this area include Bicchieri (1989, 1993),[24] Skyrms (1990),[25] and Stalnaker (1999).[26] In ethics, some authors have attempted to pursue the project, begun by Thomas Hobbes, of deriving morality from self-interest. Since games like the Prisoner's dilemma present an apparent conflict between morality and self-interest, explaining why cooperation is required by self-interest is an important component of this project. This general strategy is a component of the general social contract view in political philosophy (for examples, see Gauthier (1986) and Kavka (1986).[27] Other authors have attempted to use evolutionary game theory in order to explain the emergence of human attitudes about morality and corresponding animal behaviors. These authors look at several games including the Prisoner's

Game theory dilemma, Stag hunt, and the Nash bargaining game as providing an explanation for the emergence of attitudes about morality (see, e.g., Skyrms(1996, 2004) and Sober and Wilson(1999)). Some assumptions used in some parts of game theory have been challenged in philosophy; for example, psychological egoism states that rationality reduces to self-interesta claim debated among philosophers. (see Psychological egoism#Criticisms)

224

Types of games
Cooperative or non-cooperative
A game is cooperative if the players are able to form binding commitments. For instance the legal system requires them to adhere to their promises. In noncooperative games this is not possible. Often it is assumed that communication among players is allowed in cooperative games, but not in noncooperative ones. However, this classification on two binary criteria has been questioned, and sometimes rejected (Harsanyi 1974). Of the two types of games, noncooperative games are able to model situations to the finest details, producing accurate results. Cooperative games focus on the game at large. Considerable efforts have been made to link the two approaches. The so-called Nash-programme has already established many of the cooperative solutions as noncooperative equilibria. Hybrid games contain cooperative and non-cooperative elements. For instance, coalitions of players are formed in a cooperative game, but these play in a non-cooperative fashion.

Symmetric and asymmetric


E E F F

1, 2 0, 0 0, 0 1, 2

An asymmetric game

A symmetric game is a game where the payoffs for playing a particular strategy depend only on the other strategies employed, not on who is playing them. If the identities of the players can be changed without changing the payoff to the strategies, then a game is symmetric. Many of the commonly studied 22 games are symmetric. The standard representations of chicken, the prisoner's dilemma, and the stag hunt are all symmetric games. Some scholars would consider certain asymmetric games as examples of these games as well. However, the most common payoffs for each of these games are symmetric. Most commonly studied asymmetric games are games where there are not identical strategy sets for both players. For instance, the ultimatum game and similarly the dictator game have different strategies for each player. It is possible, however, for a game to have identical strategies for both players, yet be asymmetric. For example, the game pictured to the right is asymmetric despite having identical strategy sets for both players.

Game theory

225

Zero-sum and non-zero-sum


A A B

1, 1 0, 0

3, 3 2, 2

A zero-sum game

Zero-sum games are a special case of constant-sum games, in which choices by players can neither increase nor decrease the available resources. In zero-sum games the total benefit to all players in the game, for every combination of strategies, always adds to zero (more informally, a player benefits only at the equal expense of others). Poker exemplifies a zero-sum game (ignoring the possibility of the house's cut), because one wins exactly the amount one's opponents lose. Other zero-sum games include matching pennies and most classical board games including Go and chess. Many games studied by game theorists (including the infamous prisoner's dilemma) are non-zero-sum games, because the outcome has net results greater or less than zero. Informally, in non-zero-sum games, a gain by one player does not necessarily correspond with a loss by another. Constant-sum games correspond to activities like theft and gambling, but not to the fundamental economic situation in which there are potential gains from trade. It is possible to transform any game into a (possibly asymmetric) zero-sum game by adding an additional dummy player (often called "the board"), whose losses compensate the players' net winnings.

Simultaneous and sequential


Simultaneous games are games where both players move simultaneously, or if they do not move simultaneously, the later players are unaware of the earlier players' actions (making them effectively simultaneous). Sequential games (or dynamic games) are games where later players have some knowledge about earlier actions. This need not be perfect information about every action of earlier players; it might be very little knowledge. For instance, a player may know that an earlier player did not perform one particular action, while he does not know which of the other available actions the first player actually performed. The difference between simultaneous and sequential games is captured in the different representations discussed above. Often, normal form is used to represent simultaneous games, and extensive form is used to represent sequential ones. The transformation of extensive to normal form is one way, meaning that multiple extensive form games correspond to the same normal form. Consequently, notions of equilibrium for simultaneous games are insufficient for reasoning about sequential games; see subgame perfection. In short, the differences between sequential and simultaneous games are as follows:

Game theory

226

Sequential Normally denoted by: Decision Trees

Simultaneous Payoff Matrices No No

Prior knowledge of opponent's move: Yes Time Axis: Also known as: Yes

Extensive Game Strategic Game

Perfect information and imperfect information


An important subset of sequential games consists of games of perfect information. A game is one of perfect information if all players know the moves previously made by all other players. Thus, only sequential games can be games of perfect information because players in simultaneous games do not know the actions of the other players. Most games studied in game theory are imperfect-information games. Interesting examples of A game of imperfect information (the dotted line represents perfect-information games include the ultimatum game ignorance on the part of player 2, formally called an information set) and centipede game. Recreational games of perfect information games include chess, go, and mancala. Many card games are games of imperfect information, for instance poker or contract bridge. Perfect information is often confused with complete information, which is a similar concept. Complete information requires that every player know the strategies and payoffs available to the other players but not necessarily the actions taken. Games of incomplete information can be reduced, however, to games of imperfect information by introducing "moves by nature" (Leyton-Brown & Shoham 2008, p.60).

Combinatorial games
Games in which the difficulty of finding an optimal strategy stems from the multiplicity of possible moves are called combinatorial games. Examples include chess and go. Games that involve imperfect or incomplete information may also have a strong combinatorial character, for instance backgammon. There is no unified theory addressing combinatorial elements in games. There are, however, mathematical tools that can solve particular problems and answer general questions.[28] Games of perfect information have been studied in combinatorial game theory, which has developed novel representations, e.g. surreal numbers, as well as combinatorial and algebraic (and sometimes non-constructive) proof methods to solve games of certain types, including "loopy" games that may result in infinitely long sequences of moves. These methods address games with higher combinatorial complexity than those usually considered in traditional (or "economic") game theory.[29][30] A typical game that has been solved this way is hex. A related field of study, drawing from computational complexity theory, is game complexity, which is concerned with estimating the computational difficulty of finding optimal strategies.[31] Research in artificial intelligence has addressed both perfect and imperfect (or incomplete) information games that have very complex combinatorial structures (like chess, go, or backgammon) for which no provable optimal strategies have been found. The practical solutions involve computational heuristics, like alpha-beta pruning or use of artificial neural networks trained by reinforcement learning, which make games more tractable in computing practice.[28][32]

Game theory

227

Infinitely long games


Games, as studied by economists and real-world game players, are generally finished in finitely many moves. Pure mathematicians are not so constrained, and set theorists in particular study games that last for infinitely many moves, with the winner (or other payoff) not known until after all those moves are completed. The focus of attention is usually not so much on what is the best way to play such a game, but simply on whether one or the other player has a winning strategy. (It can be proven, using the axiom of choice, that there are gameseven with perfect information, and where the only outcomes are "win" or "lose"for which neither player has a winning strategy.) The existence of such strategies, for cleverly designed games, has important consequences in descriptive set theory.

Discrete and continuous games


Much of game theory is concerned with finite, discrete games, that have a finite number of players, moves, events, outcomes, etc. Many concepts can be extended, however. Continuous games allow players to choose a strategy from a continuous strategy set. For instance, Cournot competition is typically modeled with players' strategies being any non-negative quantities, including fractional quantities.

Differential games
Differential games such as the continuous pursuit and evasion game are continuous games where the evolution of the players' state variables is governed by differential equations. The problem of finding an optimal strategy in a differential game is closely related to the optimal control theory. In particular, there are two types of strategies: the open-loop strategies are found using the Pontryagin Maximum Principle while the closed-loop strategies are found using Bellman's Dynamic Programming method. A particular case of differential games are the games with random time horizon.[33] In such games, the terminal time is a random variable with a given probability distribution function. Therefore, the players maximize the mathematical expectation of the cost function. It was shown that the modified optimization problem can be reformulated as a discounted differential game over an infinite time interval.

Many-player and population games


Games with an arbitrary, but finite, number of players are often called n-person games (Luce & Raiffa 1957). Evolutionary game theory considers games involving a population of decision makers, where the frequency with which a particular decision is made can change over time in response to the decisions made by all individuals in the population. In biology, this is intended to model (biological) evolution, where genetically programmed organisms pass along some of their strategy programming to their offspring. In economics, the same theory is intended to capture population changes because people play the game many times within their lifetime, and consciously (and perhaps rationally) switch strategies (Webb 2007).

Stochastic outcomes (and relation to other fields)


Individual decision problems with stochastic outcomes are sometimes considered "one-player games". These situations are not considered game theoretical by some authors. They may be modeled using similar tools within the related disciplines of decision theory, operations research, and areas of artificial intelligence, particularly AI planning (with uncertainty) and multi-agent system. Although these fields may have different motivators, the mathematics involved are substantially the same, e.g. using Markov decision processes (MDP). Stochastic outcomes can also be modeled in terms of game theory by adding a randomly acting player who makes "chance moves", also known as "moves by nature" (Osborne & Rubinstein 1994). This player is not typically considered a third player in what is otherwise a two-player game, but merely serves to provide a roll of the dice

Game theory where required by the game. For some problems, different approaches to modeling stochastic outcomes may lead to different solutions. For example, the difference in approach between MDPs and the minimax solution is that the latter considers the worst-case over a set of adversarial moves, rather than reasoning in expectation about these moves given a fixed probability distribution. The minimax approach may be advantageous where stochastic models of uncertainty are not available, but may also be overestimating extremely unlikely (but costly) events, dramatically swaying the strategy in such scenarios if it is assumed that an adversary can force such an event to happen.[34] (See black swan theory for more discussion on this kind of modeling issue, particularly as it relates to predicting and limiting losses in investment banking.) General models that include all elements of stochastic outcomes, adversaries, and partial or noisy observability (of moves by other players) have also been studied. The "gold standard" is considered to be partially observable stochastic game (POSG), but few realistic problems are computationally feasible in POSG representation.[34]

228

Metagames
These are games the play of which is the development of the rules for another game, the target or subject game. Metagames seek to maximize the utility value of the rule set developed. The theory of metagames is related to mechanism design theory. The term metagame analysis is also used to refer to a practical approach developed by Nigel Howard (Howard 1971) whereby a situation is framed as a strategic game in which stakeholders try to realise their objectives by means of the options available to them. Subsequent developments have led to the formulation of Confrontation analysis.

History
Early discussions of examples of two-person games occurred long before the rise of modern, mathematical game theory. The first known discussion of game theory occurred in a letter written by James Waldegrave in 1713.[7] In this letter, Waldegrave provides a minimax mixed strategy solution to a two-person version of the card game le Her. James Madison made what we now recognize as a game-theoretic analysis of the ways states can be expected to behave under different systems of taxation.[35][36] In his 1838 Recherches sur les principes mathmatiques de la thorie des richesses (Researches into the Mathematical Principles of the Theory of Wealth), Antoine Augustin Cournot considered a duopoly and presents a solution that is a restricted version of the Nash equilibrium. The Danish mathematician Zeuthen proved that the mathematical model had a winning strategy by using Brouwer's fixed point theorem. In his 1938 book Applications aux Jeux de Hasard and earlier notes, John von Neumann mile Borel proved a minimax theorem for two-person zero-sum matrix games only when the pay-off matrix was symmetric. Borel conjectured that non-existence of mixed-strategy equilibria in two-person zero-sum games would occur, a conjecture that was proved false. Game theory did not really exist as a unique field until John von Neumann published a paper in 1928.[37] Von Neumann's original proof used Brouwer's fixed-point theorem on continuous mappings into compact convex sets, which became a standard method in game theory and mathematical economics. His paper was followed by his 1944 book Theory of Games and Economic Behavior. The second edition of this book provided an axiomatic theory of utility, which reincarnated Daniel Bernoulli's old theory of utility (of the money) as an independent discipline. Von

Game theory Neumann's work in game theory culminated in this 1944 book. This foundational work contains the method for finding mutually consistent solutions for two-person zero-sum games. During the following time period, work on game theory was primarily focused on cooperative game theory, which analyzes optimal strategies for groups of individuals, presuming that they can enforce agreements between them about proper strategies.[38] In 1950, the first mathematical discussion of the prisoner's dilemma appeared, and an experiment was undertaken by notable mathematicians Merrill M. Flood and Melvin Dresher, as part of the RAND corporation's investigations into game theory. Rand pursued the studies because of possible applications to global nuclear strategy.[7] Around this same time, John Nash developed a criterion for mutual consistency of players' strategies, known as Nash equilibrium, applicable to a wider variety of games than the criterion proposed by von Neumann and Morgenstern. This equilibrium is sufficiently general to allow for the analysis of non-cooperative games in addition to cooperative ones. Game theory experienced a flurry of activity in the 1950s, during which time the concepts of the core, the extensive form game, fictitious play, repeated games, and the Shapley value were developed. In addition, the first applications of Game theory to philosophy and political science occurred during this time. In 1965, Reinhard Selten introduced his solution concept of subgame perfect equilibria, which further refined the Nash equilibrium (later he would introduce trembling hand perfection as well). In 1967, John Harsanyi developed the concepts of complete information and Bayesian games. Nash, Selten and Harsanyi became Economics Nobel Laureates in 1994 for their contributions to economic game theory. In the 1970s, game theory was extensively applied in biology, largely as a result of the work of John Maynard Smith and his evolutionarily stable strategy. In addition, the concepts of correlated equilibrium, trembling hand perfection, and common knowledge[39] were introduced and analyzed. In 2005, game theorists Thomas Schelling and Robert Aumann followed Nash, Selten and Harsanyi as Nobel Laureates. Schelling worked on dynamic models, early examples of evolutionary game theory. Aumann contributed more to the equilibrium school, introducing an equilibrium coarsening, correlated equilibrium, and developing an extensive formal analysis of the assumption of common knowledge and of its consequences. In 2007, Leonid Hurwicz, together with Eric Maskin and Roger Myerson, was awarded the Nobel Prize in Economics "for having laid the foundations of mechanism design theory." Myerson's contributions include the notion of proper equilibrium, and an important graduate text: Game Theory, Analysis of Conflict (Myerson 1997). Hurwicz introduced and formalized the concept of incentive compatibility.

229

Popular culture
The life story of game theorist and mathematician John Nash was turned into a biopic, A Beautiful Mind starring Russell Crowe,[40] based on the namesake book by Sylvia Nasar.[41] "Games-theory" and "theory of games" are mentioned in the military science fiction novel Starship Troopers by Robert A. Heinlein.[42] In the 1997 film of the same name the character Carl Jenkins refers to his assignment, military intelligence, as "Games and Theory." One of the main gameplay decision-making mechanics of the video game Zero Escape: Virtue's Last Reward is based on game theory. Some of the characters even reference the Prisoner's Dilemma.

Game theory

230

Notes
[1] Roger B. Myerson (1991). Game Theory: Analysis of Conflict, Harvard University Press, p. 1 (http:/ / books. google. com/ books?id=E8WQFRCsNr0C& printsec=find& pg=PA1). Chapter-preview links, pp. vii-xi (http:/ / books. google. com/ books?id=E8WQFRCsNr0C& printsec=find& pg=PR7). [2] R. J. Aumann ([1987] 2008). "game theory," Introduction, The New Palgrave Dictionary of Economics, 2nd Edition. Abstract. (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_G000007& q=game theory& topicid=& result_number=3) [3] Colin F. Camerer (2003). Behavioral Game Theory: Experiments in Strategic Interaction, pp. 5-7 (scroll to at 1.1 What Is Game Theory Good For? (http:/ / press. princeton. edu/ chapters/ i7517. html)). [4] Ross, Don. "Game Theory" (http:/ / plato. stanford. edu/ archives/ spr2008/ entries/ game-theory/ ). The Stanford Encyclopedia of Philosophy (Spring 2008 Edition). Edward N. Zalta (ed.). . Retrieved 2008-08-21. [5] Experimental work in game theory goes by many names, experimental economics, behavioral economics, and behavioural game theory are several. For a recent discussion, see Colin F. Camerer (2003). Behavioral Game Theory: Experiments in Strategic Interaction ( description (http:/ / press. princeton. edu/ titles/ 7517. html) and Introduction (http:/ / press. princeton. edu/ chapters/ i7517. html), pp.125). [6] At JEL:C7 of the Journal of Economic Literature classification codes. R.J. Aumann (2008). "game theory," The New Palgrave Dictionary of Economics, 2nd Edition. Abstract. (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_G000007& edition=current& q=game theory& topicid=& result_number=4) Martin Shubik (1981). "Game Theory Models and Methods in Political Economy," in Kenneth Arrow and Michael Intriligator, ed., Handbook of Mathematical Economics, v. 1, pp. 285-330 doi:10.1016/S1573-4382(81)01011-4. Carl Shapiro (1989). "The Theory of Business Strategy," RAND Journal of Economics, 20(1), pp. 125-137 JSTOR2555656. [7] Game-theoretic model to examine the two tradeoffs in the acquisition of information for a careful balancing act (http:/ / www. insead. edu/ facultyresearch/ research/ doc. cfm?did=46503) Research paper INSEAD [8] Leigh Tesfatsion (2006). "Agent-Based Computational Economics: A Constructive Approach to Economic Theory," ch. 16, Handbook of Computational Economics, v. 2, pp. 831-880 doi:10.1016/S1574-0021(05)02016-2. Joseph Y. Halpern (2008). "computer science and game theory," The New Palgrave Dictionary of Economics, 2nd Edition. Abstract (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_C000566& edition=current& q=& topicid=& result_number=1). [9] From The New Palgrave Dictionary of Economics (2008), 2nd Edition: Roger B. Myerson. "mechanism design." Abstract (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_M000132& edition=current& q=mechanism design& topicid=& result_number=3). _____. "revelation principle." Abstract (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_R000137& edition=current& q=moral& topicid=& result_number=1). Tuomas Sandholm. "computing in mechanism design." Abstract. (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_C000563& edition=& field=keyword& q=algorithmic mechanism design& topicid=& result_number=1) Noam Nisan and Amir Ronen (2001). "Algorithmic Mechanism Design," Games and Economic Behavior, 35(1-2), pp. 166196 (http:/ / www. cs. cmu. edu/ ~sandholm/ cs15-892F09/ Algorithmic mechanism design. pdf). Noam Nisan et al., ed. (2007). Algorithmic Game Theory, Cambridge University Press. Description (http:/ / www. cup. cam. ac. uk/ asia/ catalogue/ catalogue. asp?isbn=9780521872829). [10] R. Aumann and S. Hart, ed., 1994. Handbook of Game Theory with Economic Applications, v. 2, outline links, ch. 30: "Voting Procedures" (http:/ / www. sciencedirect. com/ science/ article/ pii/ S1574000505800621) & ch. 31: "Social Choice." (http:/ / www. sciencedirect. com/ science/ article/ pii/ S1574000505800633) [11] Vernon L. Smith, 1992. "Game Theory and Experimental Economics: Beginnings and Early Influences," in E. R. Weintraub, ed., Towards a History of Game Theory, pp. 241-282 (http:/ / books. google. com/ books?hl=en& lr=& id=9CHY2Gozh1MC& oi=fnd& pg=PA241). _____, 2001. "Experimental Economics," International Encyclopedia of the Social & Behavioral Sciences, pp. 5100-5108. Abstract (http:/ / www. sciencedirect. com/ science/ article/ pii/ B0080430767022324) per sect. 1.1 & 2.1. Charles R. Plott and Vernon L. Smith, ed., 2008. Handbook of Experimental Economics Results, v. 1, Elsevier, Part 4, Games, ch. 45-66 (http:/ / www. sciencedirect. com/ science/ handbooks/ 15740722). Vincent P. Crawford (1997). "Theory and Experiment in the Analysis of Strategic Interaction," in Advances in Economics and Econometrics: Theory and Applications, pp. 206-242 (http:/ / weber. ucsd. edu/ ~vcrawfor/ CrawfordThExp97. pdf). Cambridge. Reprinted in Colin F. Camerer et al., ed. (2003). Advances in Behavioral Economics, Princeton. 1986-2003 papers. Description (http:/ / press. princeton. edu/ titles/ 8437. html), preview (http:/ / books. google. com/ books?id=sA4jJOjwCW4C& printsec=find& pg=PR7), Princeton, ch. 12. Martin Shubik, 2002. "Game Theory and Experimental Gaming," in R. Aumann and S. Hart, ed., Handbook of Game Theory with Economic Applications, Elsevier, v. 3, pp. 2327-2351. doi:10.1016/S1574-0005(02)03025-4. [12] From The New Palgrave Dictionary of Economics (2008), 2nd Edition: Faruk Gul. "behavioural economics and game theory." Abstract. (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_G000210& q=Behavioral economics & topicid=& result_number=2) Colin F. Camerer. "behavioral game theory." Abstract. (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_B000302& q=Behavioral economics & topicid=& result_number=13) _____ (1997). "Progress in Behavioral Game Theory," Journal of Economic Perspectives, 11(4), p. 172, pp. 167-188 (http:/ / authors. library. caltech. edu/ 22122/ 1/ 2138470[1]. pdf).

Game theory
_____ (2003). Behavioral Game Theory, Princeton. Description (http:/ / press. princeton. edu/ chapters/ i7517. html), preview (http:/ / books. google. com/ books?id=cr_Xg7cRvdcC& printsec=find& pg=PR7) ([ctrl]+), and ch. 1 link (http:/ / press. princeton. edu/ chapters/ i7517. pdf). _____, George Loewenstein, and Matthew Rabin, ed. (2003). Advances in Behavioral Economics, Princeton. 1986-2003 papers. Description (http:/ / press. princeton. edu/ titles/ 8437. html), contents (http:/ / books. google. com/ books?id=sA4jJOjwCW4C& printsec=find& pg=PR7), and . Drew Fudenberg (2006). "Advancing Beyond Advances in Behavioral Economics," Journal of Economic Literature, 44(3), pp. 694-711 JSTOR30032349. [13] Eric Rasmusen (2007). Games and Information, 4th ed. Description (http:/ / www. wiley. com/ WileyCDA/ WileyTitle/ productCd-EHEP001009. html) and chapter-preview. (http:/ / books. google. com/ books?id=5XEMuJwnBmUC& printsec=fnd& pg=PR5) David M. Kreps (1990). Game Theory and Economic Modelling. Description. (http:/ / econpapers. repec. org/ bookchap/ oxpobooks/ 9780198283812. htm) R. Aumann and S. Hart, ed. (1992, 2002). Handbook of Game Theory with Economic Applications v. 1, ch. 3-6 (http:/ / www. sciencedirect. com/ science/ handbooks/ 15740005/ 1) and v. 3, ch. 43 (http:/ / www. sciencedirect. com/ science/ article/ pii/ S1574000502030060). [14] Jean Tirole (1988). The Theory of Industrial Organization, MIT Press. Description (http:/ / mitpress. mit. edu/ catalog/ item/ default. asp?ttype=2& tid=8224) and chapter-preview links, pp. vii-ix (http:/ / books. google. com/ books?id=HIjsF0XONF8C& printsec=find& pg=PR7), "General Organization," pp. 5-6 (http:/ / books. google. com/ books?id=HIjsF0XONF8C& dq=find& pg=PA5), and "Non-Cooperative Game Theory: A User's Guide Manual,' " ch. 11, pp. 423-59 (http:/ / books. google. com/ books?id=HIjsF0XONF8C& dq=find& pg=PA423). Kyle Bagwell and Asher Wolinsky (2002). "Game theory and Industrial Organization," ch. 49, Handbook of Game Theory with Economic Applications, v. 3, pp. 1851-1895 (http:/ / www. sciencedirect. com/ science/ article/ pii/ S1574000502030126). Martin Shubik (1959). Strategy and Market Structure: Competition, Oligopoly, and the Theory of Games, Wiley. Description (http:/ / devirevues. demo. inist. fr/ handle/ 2042/ 29380) and review extract (http:/ / www. jstor. org/ pss/ 40434883). _____ with Richard Levitan (1980). Market Structure and Behavior, Harvard University Press. Review extract (http:/ / www. jstor. org/ pss/ 2232276). [15] Martin Shubik (1981). "Game Theory Models and Methods in Political Economy," in Handbook of Mathematical Economics, v. 1, pp. 285-330 doi:10.1016/S1573-4382(81)01011-4. _____ (1987). A Game-Theoretic Approach to Political Economy. MIT Press. Description (http:/ / mitpress. mit. edu/ catalog/ item/ default. asp?tid=5086& ttype=2). [16] Martin Shubik (1978). "Game Theory: Economic Applications," in W. Kruskal and J.M. Tanur, ed., International Encyclopedia of Statistics, v. 2, pp.37278. Robert Aumann and Sergiu Hart, ed. Handbook of Game Theory with Economic Applications (scrollable to chapter-outline or abstract links):

231

1992. v. 1 (http:/ / www. sciencedirect. com/ science/ handbooks/ 15740005/ 1); 1994. v. 2 (http:/ / www. sciencedirect. com/ science/ handbooks/ 15740005/ 2); 2002. v. 3. (http:/ / www. sciencedirect. com/ science/ handbooks/15740005/3)
[17] Evolutionary Game Theory (Stanford Encyclopedia of Philosophy) (http:/ / plato. stanford. edu/ entries/ game-evolutionary/ ). Plato.stanford.edu. Retrieved on 2013-01-03. [18] Biological Altruism (Stanford Encyclopedia of Philosophy) (http:/ / www. seop. leeds. ac. uk/ entries/ altruism-biological/ ). Seop.leeds.ac.uk. Retrieved on 2013-01-03. [19] Noam Nisan et al., ed. (2007). Algorithmic Game Theory, Cambridge University Press. Description (http:/ / www. cup. cam. ac. uk/ asia/ catalogue/ catalogue. asp?isbn=9780521872829). [20] Nisan, Noam; Ronen, Amir (2001). "Algorithmic Mechanism Design" (http:/ / www. cs. cmu. edu/ ~sandholm/ cs15-892F09/ Algorithmic mechanism design. pdf). Games and Economic Behavior 35 (12): 166196. doi:10.1006/game.1999.0790. . [21] Joseph Y. Halpern (2008). "computer science and game theory," The New Palgrave Dictionary of Economics, 2nd Edition. Abstract (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_C000566& edition=current& q=& topicid=& result_number=1). Shoham, Yoav (2008). "Computer Science and Game Theory" (http:/ / www. robotics. stanford. edu/ ~shoham/ www papers/ CSGT-CACM-Shoham. pdf). Communications of the ACM 51 (8): 7579. . Littman, Amy (2007). "Introduction to the Special Issue on Learning and Computational Game Theory". Machine Learning 67 (12): 36. doi:10.1007/s10994-007-0770-1. [22] Ullmann-Margalit, E. (1977). The Emergence of Norms. Oxford University Press. ISBN0198244118. [23] Bicchieri, C. (2006). The Grammar of Society: the Nature and Dynamics of Social Norms. Cambridge University Press. ISBN0521573726. [24] "Self-Refuting Theories of Strategic Interaction: A Paradox of Common Knowledge". Erkenntnis 30 (12): 6985. 1989. doi:10.1007/BF00184816. See also Rationality and Coordination. Cambridge University Press. 1993. ISBN0521381231. [25] The Dynamics of Rational Deliberation. Harvard University Press. 1990. ISBN067421885X. [26] Bicchieri, Cristina; Jeffrey, Richard; Skyrms, Brian, eds. (1999). "Knowledge, Belief, and Counterfactual Reasoning in Games". The Logic of Strategy. New York: Oxford University Press. ISBN0195117158.

Game theory
[27] For a more detailed discussion of the use of Game Theory in ethics see the Stanford Encyclopedia of Philosophy's entry game theory and ethics (http:/ / plato. stanford. edu/ entries/ game-ethics/ ). [28] Jrg Bewersdorff (2005). Luck, logic, and white lies: the mathematics of games. A K Peters, Ltd.. pp.ix-xii and chapter 31. ISBN978-1-56881-210-6. [29] Albert, Michael H.; Nowakowski, Richard J.; Wolfe, David (2007). Lessons in Play: In Introduction to Combinatorial Game Theory. A K Peters Ltd. pp.34. ISBN978-1-56881-277-9. [30] Beck, Jzsef (2008). Combinatorial games: tic-tac-toe theory. Cambridge University Press. pp.13. ISBN978-0-521-46100-9. [31] Robert A. Hearn; Erik D. Demaine (2009). Games, Puzzles, and Computation. A K Peters, Ltd.. ISBN978-1-56881-322-6. [32] M. Tim Jones (2008). Artificial Intelligence: A Systems Approach. Jones & Bartlett Learning. pp.106118. ISBN978-0-7637-7337-3. [33] Petrosjan, L.A. and Murzov, N.V. (1966). Game-theoretic problems of mechanics. Litovsk. Mat. Sb. 6, 423433 (in Russian). [34] Hugh Brendan McMahan (2006), Robust Planning in Domains with Stochastic Outcomes, Adversaries, and Partial Observability (http:/ / www. cs. cmu. edu/ ~mcmahan/ research/ mcmahan_thesis. pdf), CMU-CS-06-166, pp. 3-4 [35] James Madison, Vices of the Political System of the United States, April, 1787. Link (http:/ / www. constitution. org/ jm/ 17870400_vices. htm) [36] Jack Rakove, "James Madison and the Constitution", History Now, Issue 13 September 2007. Link (http:/ / www. historynow. org/ 09_2007/ historian2. html) [37] Neumann, J. v. (1928). "Zur Theorie der Gesellschaftsspiele". Mathematische Annalen 100 (1): 295320. doi:10.1007/BF01448847. English translation: Tucker, A. W.; Luce, R. D., eds. (1959). "On the Theory of Games of Strategy" (http:/ / books. google. com/ books?hl=en& lr=& id=9lSVFzsTGWsC& oi=fnd& pg=PA13& dq==P_RGaKOVtC& sig=J-QB_GglFSVWw9KfXjut62E6AmM#v=onepage& q& f=false). Contributions to the Theory of Games. 4. pp.1342. . [38] Leonard, Robert (2010). Von Neumann, Morgenstern, and the Creation of Game Theory. New York: Cambridge University Press. ISBN9780521562669. [39] Although common knowledge was first discussed by the philosopher David Lewis in his dissertation (and later book) Convention in the late 1960s, it was not widely considered by economists until Robert Aumann's work in the 1970s. [40] Simon Singh "Between Genius and Madness." (http:/ / www. nytimes. com/ books/ 98/ 06/ 14/ reviews/ 980614. 14singht. html) New York Times (June 14, 1998) [41] Sylvia Nasar, A Beautiful Mind, (1998). Simon & Schuster. ISBN 0-684-81906-6 [42] Heinlein, Robert A. (1959). Starship Troopers.

232

References and further reading


Textbooks and general references
Aumann, Robert J. (1987), game theory,, The New Palgrave: A Dictionary of Economics, 2, pp.46082. The New Palgrave Dictionary of Economics (2008). 2nd Edition: "game theory" by Robert J. Aumann. Abstract. (http:/ / www. dictionaryofeconomics. com/ article?id=pde2008_G000007&q=game theory&topicid=&result_number=3) "game theory in economics, origins of," by Robert Leonard. Abstract. (http:/ / www. dictionaryofeconomics. com/article?id=pde2008_G000193&goto=a&topicid=B2&result_number=10) "behavioural economics and game theory" by Faruk Gul. Abstract. (http:/ / www. dictionaryofeconomics. com/article?id=pde2008_G000210&q=Behavioral economics &topicid=&result_number=2) Camerer, Colin (2003), Behavioral Game Theory: Experiments in Strategic Interaction, Russell Sage Foundation, ISBN978-0-691-09039-9 Description (http://press.princeton.edu/titles/7517.html) and Introduction (http:// press.princeton.edu/chapters/i7517.html), pp.125. Dutta, Prajit K. (1999), Strategies and games: theory and practice, MIT Press, ISBN978-0-262-04169-0. Suitable for undergraduate and business students. Fernandez, L F.; Bierman, H S. (1998), Game theory with economic applications, Addison-Wesley, ISBN978-0-201-84758-1. Suitable for upper-level undergraduates. Fudenberg, Drew; Tirole, Jean (1991), Game theory, MIT Press, ISBN978-0-262-06141-4. Acclaimed reference text. Description. (http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=8204) Gibbons, Robert D. (1992), Game theory for applied economists, Princeton University Press, ISBN978-0-691-00395-5. Suitable for advanced undergraduates.

Game theory Published in Europe as Robert Gibbons (2001), A Primer in Game Theory, London: Harvester Wheatsheaf, ISBN978-0-7450-1159-2. Gintis, Herbert (2000), Game theory evolving: a problem-centered introduction to modeling strategic behavior, Princeton University Press, ISBN978-0-691-00943-8 Green, Jerry R.; Mas-Colell, Andreu; Whinston, Michael D. (1995), Microeconomic theory, Oxford University Press, ISBN978-0-19-507340-9. Presents game theory in formal way suitable for graduate level. edited by Vincent F. Hendricks, Pelle G. Hansen. (2007), Hansen, Pelle G.; Hendricks, Vincent F., eds., Game Theory: 5 Questions, New York, London: Automatic Press / VIP, ISBN978-87-991013-4-4. Snippets from interviews (http://www.gametheorists.com). Howard, Nigel (1971), Paradoxes of Rationality: Games, Metagames, and Political Behavior, Cambridge, Massachusetts: The MIT Press, ISBN978-0-262-58237-7 Isaacs, Rufus (1999), Differential Games: A Mathematical Theory With Applications to Warfare and Pursuit, Control and Optimization, New York: Dover Publications, ISBN978-0-486-40682-4 Julmi, Christian (2012), Introduction to Game Theory (http://bookboon.com/en/business-ebooks/strategy/ introduction-to-game-theory), Copenhagen: BookBooN, ISBN978-87-403-0280-6 Leyton-Brown, Kevin; Shoham, Yoav (2008), Essentials of Game Theory: A Concise, Multidisciplinary Introduction (http://www.gtessentials.org), San Rafael, CA: Morgan & Claypool Publishers, ISBN978-1-59829-593-1. An 88-page mathematical introduction; free online (http://www.morganclaypool. com/doi/abs/10.2200/S00108ED1V01Y200802AIM003) at many universities. Miller, James H. (2003), Game theory at work: how to use game theory to outthink and outmaneuver your competition, New York: McGraw-Hill, ISBN978-0-07-140020-6. Suitable for a general audience. Myerson, Roger B. (1991), Game theory: analysis of conflict, Harvard University Press, ISBN978-0-674-34116-6 Osborne, Martin J. (2004), An introduction to game theory, Oxford University Press, ISBN978-0-19-512895-6. Undergraduate textbook. Papayoanou, Paul (2010), Game Theory for Business, Probabilistic Publishing, ISBN978-0-9647938-7-3. Primer for business men and women. Petrosyan, Leon; Zenkevich, Nikolay (1996), Game Theory (Series on Optimization, 3), World Scientific Publishers, ISBN978-981-02-2396-0 Osborne, Martin J.; Rubinstein, Ariel (1994), A course in game theory, MIT Press, ISBN978-0-262-65040-3. A modern introduction at the graduate level. Poundstone, William (1992), Prisoner's Dilemma: John von Neumann, Game Theory and the Puzzle of the Bomb, Anchor, ISBN978-0-385-41580-4. A general history of game theory and game theoreticians. Rasmusen, Eric (2006), Games and Information: An Introduction to Game Theory (http://www.rasmusen.org/ GI/index.html) (4th ed.), Wiley-Blackwell, ISBN978-1-4051-3666-2 Shoham, Yoav; Leyton-Brown, Kevin (2009), Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (http://www.masfoundations.org), New York: Cambridge University Press, ISBN978-0-521-89943-7. A comprehensive reference from a computational perspective; downloadable free online (http://www.masfoundations.org/download.html). Williams, John Davis (1954) (PDF), The Compleat Strategyst: Being a Primer on the Theory of Games of Strategy (http://www.rand.org/pubs/commercial_books/2007/RAND_CB113-1.pdf), Santa Monica: RAND Corp., ISBN978-0-8330-4222-4 Praised primer and popular introduction for everybody, never out of print. Roger McCain's Game Theory: A Nontechnical Introduction to the Analysis of Strategy (http://faculty.lebow. drexel.edu/McCainR//top/eco/game/game.html) (Revised Edition) Christopher Griffin (2010) Game Theory: Penn State Math 486 Lecture Notes (http://www.personal.psu.edu/ cxg286/Math486.pdf), pp.169, CC-BY-NC-SA license, suitable introduction for undergraduates

233

Game theory Webb, James N. (2007), Game theory: decisions, interaction and evolution, Springer undergraduate mathematics series, Springer, ISBN1-84628-423-6 Consistent treatment of game types usually claimed by different applied fields, e.g. Markov decision processes. Joseph E. Harrington (2008) Games, strategies, and decision making, Worth, ISBN 0-7167-6630-2. Textbook suitable for undergraduates in applied fields; numerous examples, fewer formalisms in concept presentation.

234

Historically important texts


Aumann, R.J. and Shapley, L.S. (1974), Values of Non-Atomic Games, Princeton University Press Cournot, A. Augustin (1838), "Recherches sur les principles mathematiques de la thorie des richesses", Libraire des sciences politiques et sociales (Paris: M. Rivire & C.ie) Edgeworth, Francis Y. (1881), Mathematical Psychics, London: Kegan Paul Farquharson, Robin (1969), Theory of Voting, Blackwell (Yale U.P. in the U.S.), ISBN0-631-12460-8 Luce, R. Duncan; Raiffa, Howard (1957), Games and decisions: introduction and critical survey, New York: Wiley reprinted edition: R. Duncan Luce ; Howard Raiffa (1989), Games and decisions: introduction and critical survey, New York: Dover Publications, ISBN978-0-486-65943-5 Maynard Smith, John (1982), Evolution and the theory of games, Cambridge University Press, ISBN978-0-521-28884-2 Maynard Smith, John; Price, George R. (1973), "The logic of animal conflict", Nature 246 (5427): 1518, Bibcode1973Natur.246...15S, doi:10.1038/246015a0 Nash, John (1950), "Equilibrium points in n-person games", Proceedings of the National Academy of Sciences of the United States of America 36 (1): 4849, Bibcode1950PNAS...36...48N, doi:10.1073/pnas.36.1.48, PMC1063129, PMID16588946 Shapley, L. S. (1953), A Value for n-person Games, In: Contributions to the Theory of Games volume II, H. W. Kuhn and A. W. Tucker (eds.) Shapley, L. S. (1953), Stochastic Games, Proceedings of National Academy of Science Vol. 39, pp.10951100. von Neumann, John (1928), "Zur Theorie der Gesellschaftsspiele", Mathematische Annalen 100 (1): p. 295 (http:/ /www.springerlink.com/content/q07530916862223p/)320. English translation: "On the Theory of Games of Strategy," in A. W. Tucker and R. D. Luce, ed. (1959), Contributions to the Theory of Games, v. 4, p. 42. (http:// books.google.com/books?hl=en&lr=&id=9lSVFzsTGWsC&oi=fnd&pg=PA42) Princeton University Press. von Neumann, John; Morgenstern, Oskar (1944), Theory of games and economic behavior, Princeton University Press Zermelo, Ernst (1913), "ber eine Anwendung der Mengenlehre auf die Theorie des Schachspiels", Proceedings of the Fifth International Congress of Mathematicians 2: 5014

Other print references


Ben David, S.; Borodin, Allan; Karp, Richard; Tardos, G.; Wigderson, A. (1994), "On the Power of Randomization in On-line Algorithms" (http://www.math.ias.edu/~avi/PUBLICATIONS/MYPAPERS/ BORODIN/paper.pdf) (PDF), Algorithmica 11 (1): 214, doi:10.1007/BF01294260 Bicchieri, Cristina (1993, 2nd. edition, 1997), Rationality and Coordination, Cambridge University Press, ISBN0-521-57444-7 Downs, Anthony (1957), An Economic theory of Democracy, New York: Harper Gauthier, David (1986), Morals by agreement, Oxford University Press, ISBN978-0-19-824992-4 Allan Gibbard, "Manipulation of voting schemes: a general result", Econometrica, Vol. 41, No. 4 (1973), pp.587601.

Game theory Grim, Patrick; Kokalis, Trina; Alai-Tafti, Ali; Kilb, Nicholas; St Denis, Paul (2004), "Making meaning happen", Journal of Experimental & Theoretical Artificial Intelligence 16 (4): 209243, doi:10.1080/09528130412331294715 Harper, David; Maynard Smith, John (2003), Animal signals, Oxford University Press, ISBN978-0-19-852685-8 Harsanyi, John C. (1974), "An equilibrium point interpretation of stable sets", Management Science 20 (11): 14721495, doi:10.1287/mnsc.20.11.1472 Levy, Gilat; Razin, Ronny (2003), "It Takes Two: An Explanation of the Democratic Peace" (http://papers.ssrn. com/sol3/papers.cfm?abstract_id=433844), Working Paper Lewis, David (1969), Convention: A Philosophical Study, ISBN 978-0-631-23257-5 (2002 edition) McDonald, John (1950 - 1996), Strategy in Poker, Business & War, W. W. Norton, ISBN0-393-31457-X. A layman's introduction. Quine, W.v.O (1967), "Truth by Convention", Philosophica Essays for A.N. Whitehead, Russel and Russel Publishers, ISBN978-0-8462-0970-6 Quine, W.v.O (1960), "Carnap and Logical Truth", Synthese 12 (4): 350374, doi:10.1007/BF00485423 Mark A. Satterthwaite, "Strategy-proofness and Arrow's Conditions: Existence and Correspondence Theorems for Voting Procedures and Social Welfare Functions", Journal of Economic Theory 10 (April 1975), 187217. Siegfried, Tom (2006), A Beautiful Math, Joseph Henry Press, ISBN0-309-10192-1 Skyrms, Brian (1990), The Dynamics of Rational Deliberation, Harvard University Press, ISBN0-674-21885-X Skyrms, Brian (1996), Evolution of the social contract, Cambridge University Press, ISBN978-0-521-55583-8 Skyrms, Brian (2004), The stag hunt and the evolution of social structure, Cambridge University Press, ISBN978-0-521-53392-8 Sober, Elliott; Wilson, David Sloan (1998), Unto others: the evolution and psychology of unselfish behavior, Harvard University Press, ISBN978-0-674-93047-6 Thrall, Robert M.; Lucas, William F. (1963), " -person games in partition function form", Naval Research Logistics Quarterly 10 (4): 281298, doi:10.1002/nav.3800100126

235

Websites
Paul Walker: History of Game Theory Page (http://www.econ.canterbury.ac.nz/personal_pages/paul_walker/ gt/hist.htm). David Levine: Game Theory. Papers, Lecture Notes and much more stuff. (http://dklevine.com) Alvin Roth: Game Theory and Experimental Economics page (http://www.economics.harvard.edu/~aroth/ alroth.html) - Comprehensive list of links to game theory information on the Web Adam Kalai: Game Theory and Computer Science (http://wiki.cc.gatech.edu/theory/index.php/ CS_8803_-_Game_Theory_and_Computer_Science._Spring_2008) - Lecture notes on Game Theory and Computer Science Mike Shor: Game Theory .net (http://www.gametheory.net) - Lecture notes, interactive illustrations and other information. Jim Ratliff's Graduate Course in Game Theory (http://virtualperfection.com/gametheory/) (lecture notes). Don Ross: Review Of Game Theory (http://plato.stanford.edu/entries/game-theory/) in the Stanford Encyclopedia of Philosophy. Bruno Verbeek and Christopher Morris: Game Theory and Ethics (http://plato.stanford.edu/entries/ game-ethics/) Elmer G. Wiens: Game Theory (http://www.egwald.ca/operationsresearch/gameintroduction.php) Introduction, worked examples, play online two-person zero-sum games. Marek M. Kaminski: Game Theory and Politics (http://webfiles.uci.edu/mkaminsk/www/courses.html) syllabuses and lecture notes for game theory and political science. Web sites on game theory and social interactions (http://www.socialcapitalgateway.org/eng-gametheory.htm)

Game theory Kesten Green's Conflict Forecasting (http://conflictforecasting.com) - See Papers (http://www. forecastingprinciples.com/paperpdf/Greenforecastinginconflict.pdf) for evidence on the accuracy of forecasts from game theory and other methods (http://www.decisionworkshops.com/#/graphs-of-findings/ 4553562008). McKelvey, Richard D., McLennan, Andrew M., and Turocy, Theodore L. (2007) Gambit: Software Tools for Game Theory (http://gambit.sourceforge.net). Benjamin Polak: Open Course on Game Theory at Yale (http://oyc.yale.edu/economics/game-theory) videos of the course (http://www.youtube.com/view_play_list?p=6EF60E1027E1A10B) Benjamin Moritz, Bernhard Knsgen, Danny Bures, Ronni Wiersch, (2007) Spieltheorie-Software.de: An application for Game Theory implemented in JAVA (http://www.spieltheorie-software.de).

236

ENIAC
ENIAC (pron.: /ni.k/; Electronic Numerical Integrator And Computer)[1][2] was the first electronic general-purpose computer. It was Turing-complete, digital, and capable of being reprogrammed to solve a full range of computing problems.[3] ENIAC was designed to calculate artillery firing tables for the United States Army's Ballistic Research Laboratory.[4][5] When ENIAC was announced in 1946 it was heralded in the press as a "Giant Brain". It had a speed of one thousand times that of electro-mechanical machines, which was unmatched by mechanical computers. This mathematical power, coupled with general-purpose programmability, excited scientists and industrialists. The inventors promoted the spread of these new ideas by conducting a series of lectures on computer architecture.

Glen Beck (background) and Betty Snyder (foreground) program ENIAC in BRL building 328. (U.S. Army photo)

ENIAC's design and construction was financed by the United States Army during World War II. The construction contract was signed on June 5, 1943, and work on the computer began in secret by the University of Pennsylvania's Moore School of Electrical Engineering starting the following month under the code name "Project PX". The completed machine was announced to the public the evening of February 14, Programmers Betty Jean Jennings (left) and Fran Bilas (right) 1946[6] and formally dedicated the next day[7] at the operate ENIAC's main control panel at the Moore School of University of Pennsylvania, having cost almost Electrical Engineering. (U.S. Army photo from the archives of the $500,000 (approximately $6000,000 today). It was ARL Technical Library) formally accepted by the U.S. Army Ordnance Corps in July 1946. ENIAC was shut down on November 9, 1946 for a refurbishment and a memory upgrade, and was transferred to Aberdeen Proving Ground, Maryland in 1947. There, on July 29, 1947, it was turned on and was in continuous operation until 11:45 p.m. on October 2, 1955.[2]

ENIAC ENIAC was conceived and designed by John Mauchly and J. Presper Eckert of the University of Pennsylvania.[8] The team of design engineers assisting the development included Robert F. Shaw (function tables), Jeffrey Chuan Chu (divider/square-rooter), Thomas Kite Sharpless (master programmer), Arthur Burks (multiplier), Harry Huskey (reader/printer) and Jack Davis (accumulators). ENIAC was named an IEEE Milestone in 1987.[9]

237

Description
ENIAC was a modular computer, composed of individual panels to perform different functions. Twenty of these modules were accumulators, which could not only add and subtract but hold a ten-digit decimal number in memory. Numbers were passed between these units across a number of general-purpose buses, or trays, as they were called. In order to achieve its high speed, the panels had to send and receive numbers, compute, save the answer, and trigger the next operationall without any moving parts. Key to its versatility was the ability to branch; it could trigger different operations that depended on the sign of a computed result. ENIAC contained 17,468 vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 capacitors and around 5 million hand-soldered joints. It weighed more than 30 short tons (27 t), was ENIAC vacuum tubes in holders roughly 8 by 3 by 100feet (2.4m 0.9m 30m), took up 1800 square feet (167m2), and consumed 150 kW of power.[10][11] This led to the rumor that whenever the computer was switched on, lights in Philadelphia dimmed.[12] Input was possible from an IBM card reader, and an IBM card punch was used for output. These cards could be used to produce printed output offline using an IBM accounting machine, such as the IBM 405. ENIAC used ten-position ring counters to store digits; each digit used 36 vacuum tubes, 10 of which were the dual triodes making up the flip-flops of the ring counter. Arithmetic was performed by "counting" pulses with the ring counters and generating carry pulses if the counter "wrapped around", the idea being to emulate in electronics the operation of the digit wheels of a mechanical adding machine. ENIAC had twenty ten-digit signed accumulators which used ten's complement representation and could perform 5,000 simple addition or subtraction operations between any of them and a source (e.g., another accumulator, or a constant transmitter) every second. It was possible to connect several accumulators to run simultaneously, so the peak speed of operation was potentially much higher due to parallel operation. It was possible to wire the carry of one accumulator into another accumulator to perform double precision arithmetic, but the accumulator carry circuit timing prevented the wiring of three or more for even higher precision. ENIAC used four of the accumulators, controlled by a special Multiplier unit, to perform up to 385 multiplication operations per second. Five of the accumulators were controlled by a special Divider/Square-Rooter unit to perform up to forty division operations per second or three square root operations per second. The other nine units in ENIAC were the Initiating Unit (which started and stopped the machine), the Cycling Unit (used for synchronizing the other units), the Master Programmer (which controlled "loop" sequencing), the Reader (which controlled an IBM punched card reader), the Printer (which controlled an IBM punched card punch), the Constant Transmitter, and three Function Tables.

ENIAC The references by Rojas and Hashagen (or Wilkes)[14] give more details about the times for operations, which differ somewhat from those stated above. The basic machine cycle was 200 microseconds (20 cycles of the 100kHz clock in the cycling unit), or 5,000 cycles per second for operations on the 10-digit numbers. In one of these cycles, ENIAC could write a number to a register, read a number from a register, or add/subtract two numbers. A multiplication of a 10-digit number by a d-digit number (for d up to 10) took d+4 cycles, so a 10- by 10-digit multiplication took 14 cycles, or 2800 microsecondsa rate of 357 per second. If one of the numbers had fewer than 10 digits, the operation was Cpl. Irwin Goldstein (foreground) sets the switches on one of faster. Division and square roots took 13(d+1) cycles, ENIAC's function tables at the Moore School of Electrical Engineering. (U.S. Army photo) This photo has been artificially where d is the number of digits in the result (quotient or darkened, obscuring details such as the women who were present and square root). So a division or square root took up to 143 [13] the IBM equipment in use. cycles, or 28,600 microsecondsa rate of 35 per second. (Wilkes 1956:20[14] states that a division with a 10 digit quotient required 6 milliseconds.) If the result had fewer than ten digits, it was obtained faster.

238

Reliability
ENIAC used common octal-base radio tubes of the day; the decimal accumulators were made of 6SN7 flip-flops, while 6L7s, 6SJ7s, 6SA7s and 6AC7s were used in logic functions. Numerous 6L6s and 6V6s served as line drivers to drive pulses through cables between rack assemblies. Several tubes burned out almost every day, leaving it nonfunctional about half the time. Special high-reliability tubes were not available until 1948. Most of these failures, however, occurred during the warm-up and cool-down periods, when the tube heaters and cathodes were under the most thermal stress. Engineers reduced ENIAC's tube failures to the more acceptable rate of one tube every two days. According to a 1989 interview with Eckert, "We had a tube fail about every two days and we could locate the problem within 15 minutes."[15] In 1954, the longest continuous period of operation without a failure was 116 hours - close to five days.
Detail of the back of a section of ENIAC, showing vacuum tubes

Programming

Although the Ballistic Research Laboratory was the sponsor of ENIAC, one year into this three-year project John von Neumann, a mathematician working on the hydrogen bomb at Los Alamos, became aware of this computer.[16] Los Alamos subsequently became so involved with ENIAC that the first test problem that was run was computations for the hydrogen bomb, not artillery tables.[17] The input/output for this test was one million cards.[18] ENIAC could be programmed to perform complex sequences of operations, which could include loops, branches, and subroutines. The task of taking a problem and mapping it onto the machine was complex, and usually took weeks. After the program was figured out on paper, the process of getting the program "into" ENIAC by manipulating its switches and cables took additional days. This was followed by a period of verification and

ENIAC debugging, aided by the ability to "single step" the machine. In 1997, the six women who did most of the programming of ENIAC were inducted into the Women in Technology International Hall of Fame.[19][20] As they were called by each other in 1946, they were Kay McNulty, Betty Jennings, Betty Snyder, Marlyn Wescoff, Fran Bilas and Ruth Lichterman.[21][22] Jennifer S. Light's essay, "When Computers Were Women", documents and describes the role of the women of ENIAC as well as outlines the historical omission or downplay of women's roles in computer science history.[23] The role of the ENIAC programmers was also treated in a 2010 documentary film by LeAnn Erickson.[24] ENIAC was a one-of-a-kind design and was never repeated. The freeze on design in 1943 meant that the computer design would lack some innovations that soon became well-developed, notably the ability to store a program. Eckert and Mauchly started work on a new design, to be later called the EDVAC, which would be both simpler and more powerful. In particular, in 1944 Eckert wrote his description of a memory unit (the mercury delay line) which would hold both the data and the program. John von Neumann, who was consulting for the Moore School on the EDVAC sat in on the Moore School meetings at which the stored program concept was elaborated. Von Neumann wrote up an incomplete set of notes (First Draft of a Report on the EDVAC) which were intended to be used as an internal memorandum describing, elaborating, and couching in formal logical language the ideas developed in the meetings. ENIAC administrator and security officer Herman Goldstine distributed copies of this First Draft to a number of government and educational institutions, spurring widespread interest in the construction of a new generation of electronic computing machines, including EDSAC at Cambridge England and SEAC at the U.S. Bureau of Standards. A number of improvements were also made to ENIAC after 1948, including a primitive read-only stored programming mechanism[25] using the Function Tables as program ROM, an idea included in the ENIAC patent and proposed independently by Dr. Richard Clippinger of the BRL. Clippinger consulted with von Neumann on what instruction set to implement. Clippinger had thought of a 3-address architecture while von Neumann proposed a 1-address architecture because it was simpler to implement. Three digits of one accumulator (6) were used as the program counter, another accumulator (15) was used as the main accumulator, a third accumulator (8) was used as the address pointer for reading data from the function tables, and most of the other accumulators (15, 7, 914, 1719) were used for data memory. The programming of the stored program for ENIAC was done by Betty Jennings, Clippinger and Adele Goldstine. It was first demonstrated as a stored-program computer on September 16, 1948, running a program by Adele Goldstine for John von Neumann. This modification reduced the speed of ENIAC by a factor of six and eliminated the ability of parallel computation, but as it also reduced the reprogramming time to hours instead of days, it was considered well worth the loss of performance. Also analysis had shown that due to differences between the electronic speed of computation and the electromechanical speed of input/output, almost any real-world problem was completely I/O bound, even without making use of the original machine's parallelism. Most computations would still be I/O bound, even after the speed reduction imposed by this modification. Early in 1952, a high-speed shifter was added, which improved the speed for shifting by a factor of five. In July 1953, a 100-word expansion core memory was added to the system, using binary coded decimal, excess-3 number representation. To support this expansion memory, ENIAC was equipped with a new Function Table selector, a memory address selector, pulse-shaping circuits, and three new orders were added to the programming mechanism.

239

ENIAC

240

Comparison with other early computers


Mechanical and electrical computing machines have been around since the 19th century, but the 1930s and 1940s are considered the beginning of the modern computer era. The Bell Telephone Labs Complex Number Calculator - a relay based computer developed by George R. Stibitz in 1939-40 at Bell's New York City laboratory and demonstrated remotely from Hanover, NH at the 1940 Mathematics Conference at Dartmouth College. [26] The German Z3 (shown working in May 1941) was designed by Konrad Zuse. It was the first general-purpose digital computer, but it was electromechanical, rather than electronic, as it used relays for all functions. It computed logically using binary arithmetic. It was programmable by punched tape, but lacked the conditional branch. While not designed for Turing-completeness, it accidentally was, as it was found out in 1998 (but to exploit this Turing-completeness, complex, clever hacks were necessary).[27][28] It was destroyed in a bombing raid on Berlin in December 1943. The American AtanasoffBerry Computer (ABC) (shown working in summer 1941) was the first electronic computing device. It implemented binary computation with vacuum tubes but was not general purpose, being limited to solving systems of linear equations. It also did not exploit electronic computing speeds, being limited by a rotating capacitor drum memory and an input-output system that was intended to write intermediate results to paper cards. It was manually controlled and was not programmable. The ten British Colossus computers (used for cryptanalysis starting in 1943) were designed by Tommy Flowers. The Colossus computers were digital, electronic, and were programmed by plugboard and switches, but they were dedicated to code breaking and not general purpose.[29] Howard Aiken's 1944 Harvard Mark I was programmed by punched tape and used relays. It performed general arithmetic functions, but lacked any branching. ENIAC was, like the Z3 and Mark I, able to run an arbitrary sequence of mathematical operations, but did not read them from a tape. Like the Colossus, it was programmed by plugboard and switches. ENIAC combined full, Turing complete programmability with electronic speed. The ABC, ENIAC and Colossus all used thermionic valves (vacuum tubes). ENIAC's registers performed decimal arithmetic, rather than binary arithmetic like the Z3 or the Atanasoff-Berry Computer. Until 1948, ENIAC required rewiring to reprogram, like the Colossus. The idea of the stored-program computer with combined memory for program and data was conceived during the development of ENIAC, but it was not initially implemented in ENIAC because World War II priorities required the machine to be completed quickly, and ENIAC's 20 storage locations would be too small to hold data and programs.

Public knowledge
The Z3 and Colossus were developed independently of each other and of the ABC and ENIAC during World War II. The Z3 was destroyed by Allied bombing of Berlin in 1943. The Colossus machines were part of the UK's war effort. Their existence only became generally known in the 1970s, though knowledge of their capabilities remained among their UK staff and invited Americans. All but two of the machines that remained in use in GCHQ until the 1960s, were destroyed in 1945. The ABC was dismantled by Iowa State University, after John Atanasoff was called to Washington, D.C. to do physics research for the U.S. Navy. ENIAC, by contrast, was put through its paces for the press in 1946, "and captured the world's imagination".[30] Older histories of computing may therefore not be comprehensive in their coverage and analysis of this period.

ENIAC

241

Patent
For a variety of reasons (including Mauchly's June 1941 examination of the AtanasoffBerry Computer, prototyped in 1939 by John Atanasoff and Clifford Berry), US patent 3,120,606 for ENIAC, granted in 1964, was voided by the 1973 decision of the landmark federal court case Honeywell v. Sperry Rand, putting the invention of the electronic digital computer in the public domain and providing legal recognition to Atanasoff as the inventor of the first electronic digital computer.

Parts on display
The School of Engineering and Applied Science at the University of Pennsylvania has four of the original forty panels and one of the three function tables of ENIAC. The Smithsonian has five panels in the National Museum of American History in Washington D.C. The Science Museum in London has a receiver unit on display. The Computer History Museum in Mountain View, California has three panels and a function table on display (on loan from the Smithsonian). The University of Michigan in Ann Arbor has four panels, salvaged by Arthur Burks. The U.S. Army Ordnance Museum at Aberdeen Proving Ground, Maryland, where ENIAC was used, has one of the function tables. There are also seven panels and detailed history and explanation of ENIAC functions using text, graphics, photographs and interactive touch screen on display at the Perot Group in Plano, Texas.

Four ENIAC panels and one of its three function tables, on display at the School of Engineering and Applied Science at the University of Pennsylvania

In 1995, a very small silicon chip measuring 7.44mm by 5.29mm was built with the same functionality as ENIAC. Although this 20MHz chip was many times faster than ENIAC, it was still many times slower than modern microprocessors of the late '90s.[31][32] The US Military Academy at West Point, NY has one of the data entry terminals from the ENIAC.

Notes
[1] Goldstine, Herman H. (1972). The Computer: from Pascal to von Neumann. Princeton, New Jersey: Princeton University Press. ISBN0-691-02367-0. [2] "The ENIAC Story" (http:/ / ftp. arl. mil/ ~mike/ comphist/ eniac-story. html). Ftp.arl.mil. . Retrieved 2008-09-22. [3] Shurkin, Joel, Engines of the Mind: The Evolution of the Computer from Mainframes to Microprocessors, 1996, ISBN 0-393-31471-5 [4] ENIAC's first use was in calculations for the hydrogen bomb. Moye, William T (January 1996). "ENIAC: The Army-Sponsored Revolution" (http:/ / ftp. arl. mil/ ~mike/ comphist/ 96summary/ index. html). US Army Research Laboratory. . Retrieved 2009-07-09. [5] Goldstine, Herman H. p.214. [6] Kennedy, Jr., T. R. (February 15, 1946). "Electronic Computer Flashes Answers" (http:/ / www. fi. edu/ learn/ case-files/ eckertmauchly/ design. html). New York Times. . Retrieved 2011-01-31. [7] Honeywell, Inc. v. Sperry Rand Corp., 180 U.S.P.Q. (BNA) 673 (http:/ / www. ushistory. org/ more/ eniac/ public. htm), p. 20, finding 1.1.3 (U.S. District Court for the District of Minnesota, Fourth Division 1973) (The ENIAC machine which embodied 'the invention' claimed by the ENIAC patent was in public use and non-experimental use for the following purposes, and at times prior to the critical date: ... Formal dedication use February 15, 1946 ...). [8] Wilkes, M. V. (1956). Automatic Digital Computers. New York: John Wiley & Sons. pp.305 pages. QA76.W5 1956. [9] "Milestones:Electronic Numerical Integrator and Computer, 1946" (http:/ / www. ieeeghn. org/ wiki/ index. php/ Milestones:Electronic_Numerical_Integrator_and_Computer,_1946). IEEE Global History Network. IEEE. . Retrieved 3 August 2011. [10] http:/ / encyclopedia2. thefreedictionary. com/ ENIAC

ENIAC
[11] Weik, Martin H. (December 1955). "Ballistic Research Laboratories Report 971 A Survey of Domestic Electronic Digital Computing Systems page 41" (http:/ / ed-thelen. org/ comp-hist/ BRL-e-h. html). US Department of Commerce. . Retrieved 2009-04-16. [12] Farrington, Gregory. ENIAC: Birth of the Information Age (http:/ / books. google. com/ books?id=-TKv7UHgoTQC& pg=PA74& dq=ENIAC& hl=en& sa=X& ei=tUTqTuDgJeSZiQKZy4GwBA& ved=0CDgQ6AEwAA#v=onepage& q=ENIAC& f=false). Popular Science. . Retrieved 15 December 2011. [13] The original photo can be seen in the article: Rose, Allen (April 1946). "Lightning Strikes Mathematics" (http:/ / books. google. com/ books?id=niEDAAAAMBAJ& pg=PA83& dq=eniac+ intitle:popular+ intitle:science& hl=en& sa=X& ei=MnWLT_OVEuKciALYz5HzCw& ved=0CDsQ6AEwAA#v=onepage& q=eniac intitle:popular intitle:science& f=false). Popular Science: 83-86. . Retrieved 15 April 2012. [14] Wilkes [15] Alexander Randall 5th (14 February 2006). "A lost interview with ENIAC co-inventor J. Presper Eckert" (http:/ / www. computerworld. com/ printthis/ 2006/ 0,4814,108568,00. html). Computer World. . Retrieved 2011-04-25. [16] Goldstine, Herman. p.182. [17] Goldstine, Herman. p.214. [18] Goldstine, Herman. p.226. [19] "WITI Hall of Fame" (http:/ / www. witi. com/ center/ witimuseum/ halloffame/ 1997/ eniac. php). Witi.com. . Retrieved 2010-01-27. [20] "Wired: Women Proto-Programmers Get Their Just Reward" (http:/ / www. wired. com/ culture/ lifestyle/ news/ 1997/ 05/ 3711). 1997-05-08. . [21] "ENIAC Programmers Project" (http:/ / eniacprogrammers. org/ ). Eniacprogrammers.org. . Retrieved 2010-01-27. [22] "ABC News: First Computer Programmers Inspire Documentary" (http:/ / abcnews. go. com/ Technology/ story?id=3951187& page=1). . [23] Light, Jennifer S. "When Computers Were Women." Technology and Culture 40.3 (1999) 455-483 [24] Gumbrecht, Jamie (February 2011). "Rediscovering WWII's female 'computers'" (http:/ / edition. cnn. com/ 2011/ TECH/ innovation/ 02/ 08/ women. rosies. math/ #). CNN. . Retrieved 2011-02-15. [25] "A Logical Coding System Applied to the ENIAC" (http:/ / ftp. arl. mil/ ~mike/ comphist/ 48eniac-coding/ ). Ftp.arl.mil. 1948-09-29. . Retrieved 2010-01-27. [26] http:/ / history-computer. com/ ModernComputer/ Relays/ Stibitz. html Relay computers of George Stibitz, retrieved 2012 Dec 19 [27] Rojas, R. (1998). "How to make Zuse's Z3 a universal computer". IEEE Annals of the History of Computing 20 (3): 5154. doi:10.1109/85.707574. [28] Rojas, Ral. "How to Make Zuse's Z3 a Universal Computer" (http:/ / www. zib. de/ zuse/ Inhalt/ Kommentare/ Html/ 0684/ universal2. html). . [29] B. Jack Copeland (editor), Colossus: The Secrets of Bletchley Park's Codebreaking Computers, 2006, Oxford University Press, ISBN 0-19-284055-X. [30] Kleiman, Kathryn A. (1997). "WITI Hall of Fame: The ENIAC Programmers" (http:/ / www. witi. com/ center/ witimuseum/ halloffame/ 1997/ eniac. php). . Retrieved 2007-06-12. [31] Jan Van Der Spiegel (1996-03). "ENIAC-on-a-Chip" (http:/ / www. upenn. edu/ computing/ printout/ archive/ v12/ 4/ chip. html). PENNPRINTOUT. . Retrieved 2009-09-04. [32] Jan Van Der Spiegel (1995-05-09). "ENIAC-on-a-Chip" (http:/ / www. seas. upenn. edu/ ~jan/ eniacproj. html). University of Pennsylvania. . Retrieved 2009-09-04.

242

References
Burks, Arthur W. and Alice R. Burks, The ENIAC: The First General-Purpose Electronic Computer (in Annals of the History of Computing, Vol. 3 (No. 4), 1981, pp.310389; commentary pp.389399) Eckert, J. Presper, The ENIAC (in Nicholas Metropolis, J. Howlett, Gian-Carlo Rota, (editors), A History of Computing in the Twentieth Century, Academic Press, New York, 1980, pp.525540) Eckert, J. Presper and John Mauchly, 1946, Outline of plans for development of electronic computers, 6 pages. (The founding document in the electronic computer industry.) Fritz, Barkley, The Women of ENIAC (in IEEE Annals of the History of Computing, Vol. 18, 1996, pp.1328) Goldstine, Herman and Adele Goldstine, The Electronic Numerical Integrator and Computer (ENIAC), 1946 (reprinted in The Origins of Digital Computers: Selected Papers, Springer-Verlag, New York, 1982, pp.359373) Mauchly, John, The ENIAC (in Metropolis, Nicholas, J. Howlett, Gian-Carlo Rota, 1980, A History of Computing in the Twentieth Century, Academic Press, New York, ISBN 0-12-491650-3, pp.541550, "Original versions of these papers were presented at the International Research Conference on the History of Computing, held at the Los Alamos Scientific Laboratory, 1015 June 1976.")

ENIAC Rojas, Ral and Ulf Hashagen, editors, The First Computers: History and Architectures, 2000, MIT Press, ISBN 0-262-18197-5.

243

Further reading
Berkeley, Edmund. GIANT BRAINS or machines that think. John Wiley & Sons, inc., 1949. Chapter 7 Speed5000 Additions a Second: Moore School's ENIAC (Electronic Numerical Integrator And Computer) Hally, Mike. Electronic Brains: Stories from the Dawn of the Computer Age, Joseph Henry Press, 2005. ISBN 0-309-09630-8 Lukoff, Herman (1979). From Dits to Bits: A personal history of the electronic computer. Portland, Oregon: Robotics Press. ISBN0-89661-002-0. McCartney, Scott. ENIAC: The Triumphs and Tragedies of the World's First Computer. Walker & Co, 1999. ISBN 0-8027-1348-3. Tompkins, C.B. and J.H Wakelin, High-Speed Computing Devices, McGraw-Hill, 1950. Stern, Nancy (1981). From ENIAC to UNIVAC: An Appraisal of the Eckert-Mauchly Computers. Digital Press. ISBN0-932376-14-2.

External links
ENIAC simulation (http://www.zib.de/zuse/Inhalt/Programme/eniac/) ENIAC-on-a-Chip (http://www.ee.upenn.edu/~jan/eniacproj.html) ENIAC from computing dictionary (http://www.itdictionary.org/term/ Electronic_Numerical_Integrator_and_Computer.aspx) Q&A: A lost interview with ENIAC co-inventor J. Presper Eckert (http://www.computerworld.com/ hardwaretopics/hardware/story/0,10801,108568,00.html) Interview with Eckert (http://americanhistory.si.edu/collections/comphist/eckert.htm) Transcript of a video interview with Eckert by David Allison for the National Museum of American History, Smithsonian Institution on February 2, 1988. An in-depth, technical discussion on ENIAC, including the thought process behind the design. Oral history interview with J. Presper Eckert (http://purl.umn.edu/107275), Charles Babbage Institute, University of Minnesota. Eckert, a co-inventor of ENIAC, discusses its development at the University of Pennsylvania's Moore School of Electrical Engineering; describes difficulties in securing patent rights for ENIAC and the problems posed by the circulation of John von Neumann's 1945 First Draft of the Report on EDVAC, which placed the ENIAC inventions in the public domain. Interview by Nancy Stern, 28 October 1977. Oral history interview with Carl Chambers (http://purl.umn.edu/107216), Charles Babbage Institute, University of Minnesota. Chambers discusses the initiation and progress of the ENIAC project at the University of Pennsylvania Moore School of Electrical Engineering (194146). Oral history interview by Nancy B. Stern, 30 November 1977. Oral history interview with Irven A. Travis (http://purl.umn.edu/107688), Charles Babbage Institute, University of Minnesota. Travis describes the ENIAC project at the University of Pennsylvania (194146), the technical and leadership abilities of chief engineer Eckert, the working relations between John Mauchly and Eckert, the disputes over patent rights, and their resignation from the university. Oral history interview by Nancy B. Stern, 21 October 1977. Oral history interview with S. Reid Warren (http://purl.umn.edu/107704), Charles Babbage Institute, University of Minnesota. Warren served as supervisor of the EDVAC project; central to his discussion are J. Presper Eckert and John Mauchly and their disagreements with administrators over patent rights; discusses John von Neumann's 1945 draft report on the EDVAC, and its lack of proper acknowledgment of all the EDVAC contributors. ENIAC Programmers Project (http://eniacprogrammers.org/index.shtml)

ENIAC The women of ENIAC (http://www.wired.com/news/culture/0,1284,3711,00.html) Programming ENIAC (http://www.columbia.edu/acis/history/eniac.html) Programming example for the modulo function (http://www.myhpi.de/~schapran/eniac/modulo/) How ENIAC took a Square Root (http://www4.wittenberg.edu/academics/mathcomp/bjsdir/ ENIACSquareRoot.htm) Mike Muuss: Collected ENIAC documents (http://ftp.arl.army.mil/~mike/comphist/) ENIAC (http://ftp.arl.mil/~mike/comphist/61ordnance/chap2.html) chapter in Karl Kempf, Electronic Computers Within The Ordnance Corps, November 1961 The ENIAC Story (http://ftp.arl.mil/~mike/comphist/eniac-story.html), Martin H. Weik, Ordnance Ballistic Research Laboratories, 1961 ENIAC museum (http://www.seas.upenn.edu/~museum/index.html) at the University of Pennsylvania ENIAC specifications (http://ed-thelen.org/comp-hist/BRL-e-h.html#ENIAC) from Ballistic Research Laboratories Report No. 971 December 1955, (A Survey of Domestic Electronic Digital Computing Systems) U.S. Patent 3,120,606 (http://www.google.com/patents?vid=3120606) issued in 1964 for ENIAC (TIFF images), also PDF version (http://www.fh-jena.de/~kleine/history/machines/ EckertMauchly-ENIAC-us-patent-3120606.pdf) (18,305kB, 207 pages)

244

A Computer Is Born (http://news.cnet.com/2009-1006_3-6037980.html), Michael Kanellos, 60th anniversary news story, CNet, February 13, 2006 Gumbrecht, Jamie (8 February 2011). "Rediscovering WWII's 'computers'" (http://www.cnn.com/2011/ TECH/innovation/02/08/women.rosies.math/index.html?hpt=C2). CNN.com. Retrieved 9 February 2011.

Prisoner's dilemma
The prisoner's dilemma is a canonical example of a game analyzed in game theory that shows why two individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence rewards and gave it the name "prisoner's dilemma" (Poundstone, 1992), presenting it as follows: Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don't have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. If he testifies against his partner, he will go free while the partner will get three years in prison on the main charge. Oh, yes, there is a catch ... If both prisoners testify against each other, both will be sentenced to two years in jail. In this classic version of the game, collaboration is dominated by betrayal; if the other prisoner chooses to stay silent, then betraying them gives a better reward (no sentence instead of one year), and if the other prisoner chooses to betray then betraying them also gives a better reward (two years instead of three). Because betrayal always rewards more than cooperation, all purely rational self-interested prisoners would betray the other, and so the only possible outcome for two purely rational prisoners is for them both to betray each other. The interesting part of this result is that pursuing individual reward logically leads the prisoners to both betray, but they would get a better reward if they both cooperated. In reality, humans display a systematic bias towards cooperative behavior in this and similar games, much more so than predicted by a theory based only on rational self-interested action.[1][2][3][4][5] There is also an extended "iterative" version of the game, where the classic game is played over and over, and consequently, both prisoners continuously have an opportunity to penalize the other for previous decisions. If the number of times the game will be played is known, the finite aspect of the game means that (by backward induction) the two prisoners will betray each other repeatedly. In an infinite or unknown length game there is no fixed optimum strategy, and Prisoner's Dilemma tournaments have been held to compete and test algorithms.

Prisoner's dilemma In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.

245

Strategy for the classic prisoners' dilemma


The normal game is shown below:
Prisoner B stays silent (cooperates) Prisoner B betrays (defects) Prisoner A stays silent (cooperates) Each serves 1 year Prisoner A: 3 years Prisoner B: goes free Each serves 2 years

Prisoner A betrays (defects)

Prisoner A: goes free Prisoner B: 3 years

Here, regardless of what the other decides, each prisoner gets a higher pay-off by betraying the other. For example, Prisoner A can (according to the payoffs above) state that no matter what prisoner B chooses, prisoner A is better off 'ratting him out' (defecting) than staying silent (cooperating). As a result, based on the payoffs above, prisoner A should logically betray him. The game is symmetric, so Prisoner B should act the same way. Since both rationally decide to defect, each receives a lower reward than if both were to stay quiet. Traditional game theory results in both players being worse off than if each chose to lessen the sentence of his accomplice at the cost of spending more time in jail himself.

Generalized form
The structure of the traditional Prisoners Dilemma can be analyzed by removing its original prisoner setting. Suppose that the two players are represented by colors, red and blue, and that each player chooses to either "Cooperate" or "Defect". If both players play "Cooperate" they both get the payoff A. If Blue plays "Defect" while Red plays "Cooperate" then Blue gets B while Red gets C. Symmetrically, if Blue plays "Cooperate" while Red plays "Defect" then Blue gets payoff C while Red gets payoff B. If both players play "Defect" they both get the payoff D. In terms of general point values:

Canonical PD payoff matrix


Cooperate Defect Cooperate A, A Defect B, C C, B D, D

To be a prisoner's dilemma, the following must be true: B>A>D>C The fact that A>D implies that the "Both Cooperate" outcome is better than the "Both Defect" outcome, while B>A and D>C imply that "Defect" is the dominant strategy for both agents. It is not necessary for a Prisoner's Dilemma to be strictly symmetric as in the above example, merely that the choices which are individually optimal result in an equilibrium which is socially inferior.

Prisoner's dilemma

246

The iterated prisoners' dilemma


If two players play prisoners' dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoners' dilemma. In addition to the general form above, the iterative version also requires that 2A > B + C, to prevent alternating cooperation and defection giving a greater reward than mutual cooperation. The iterated prisoners' dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoners' dilemma has also been referred to as the "Peace-War game".[6] If the game is played exactly N times and both players know this, then it is always game theoretically optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: one might as well defect on the last turn, since the opponent will not have a chance to punish the player. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit. Unlike the standard prisoners' dilemma, in the iterated prisoners' dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. For cooperation to emerge between game theoretic rational players, the total number of rounds N must be random, or at least unknown to the players. In this case always defect may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

Strategy for the iterated prisoners' dilemma


Interest in the iterated prisoners' dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoners' dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth. Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more altruistic strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by natural selection. The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness." When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 15%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents. By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful. Nice

Prisoner's dilemma The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice; therefore, a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first. Retaliating However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players. Forgiving Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counter-revenge, maximizing points. Non-envious The last quality is being non-envious, that is not striving to score more than the opponent (note that a "nice" strategy can never score more than the opponent). The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game. A strategy called Pavlov () cooperates at the first iteration and whenever the player and co-player did the same thing at the previous iteration; Pavlov defects when the player and co-player did different things at the previous iteration. For a certain range of parameters, Pavlov beats all other strategies by giving preferential treatment to co-players which resemble Pavlov. Deriving the optimal strategy is generally done in two ways: 1. Bayesian Nash Equilibrium: If the statistical distribution of opposing strategies can be determined (e.g. 50% tit for tat, 50% always cooperate) an optimal counter-strategy can be derived analytically.[7] 2. Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit for tat players (see for instance Chess 1988), but there is no analytic proof that this will always occur. Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England (led by Professor Nicholas Jennings [8] and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers, Perukrishnen Vytelingum) introduced a new strategy at the 20th-anniversary iterated prisoners' dilemma competition, which proved to be more successful than tit for tat. This strategy relied on cooperation between programs to achieve the highest number of points for a single program. The University submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start.[9] Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result,[10] this

247

Prisoner's dilemma strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom. This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analysing single agent strategies as compared to Axelrod's seminal tournament. However, it provided the framework for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Richard Dawkins in his book The Selfish Gene pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoners' dilemma in that there is no communication allowed between the two players. When the Southampton programs engage in an opening "ten move dance" to recognize one another, this only reinforces just how valuable communication can be in shifting the balance of the game.

248

Continuous iterated prisoners' dilemma


Most work on the iterated prisoners' dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoners' dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd[11] found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoners' dilemma. The basic intuition for this result is straightforward: in a continuous prisoners' dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoners' dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoners' dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein[12]) even though tit for tat seems robust in theoretical models.

Real-life examples
These particular examples, involving prisoners and bag switching and so forth, may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics and sociology, as well as to the biological sciences such as ethology and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

In environmental studies
In environmental studies, the PD is evident in crises such as global climate change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb CO2 emissions. [13] An important difference between climate change politics and the prisoner's dilemma is uncertainty. The pace at which pollution will change climate is not known precisely. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are largely unknown. This difference suggests states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.[14]

Prisoner's dilemma

249

In psychology
In addiction research/behavioral economics, George Ainslie points out[15] that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing, and it is easy to see that not defecting both today and in the future is by far the best outcome, and that defecting both today and in the future is the worst outcome. The case where one abstains today but relapses in the future is clearly a bad outcomein some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections. John Gottman in his research described in "the science of trust" defines good relationships as those where partners know not to enter the (D,D) cell or at least not to get dynamically stuck there in a loop.

In economics
Advertising is sometimes cited as a real life example of the prisoners dilemma. When cigarette advertising was legal in the United States, competing cigarette manufacturers had to decide how much money to spend on advertising. The effectiveness of Firm As advertising was partially determined by the advertising conducted by Firm B. Likewise, the profit derived from advertising for Firm B is affected by the advertising conducted by Firm A. If both Firm A and Firm B chose to advertise during a given period the advertising cancels out, receipts remain constant, and expenses increase due to the cost of advertising. Both firms would benefit from a reduction in advertising. However, should Firm B choose not to advertise, Firm A could benefit greatly by advertising. Nevertheless, the optimal amount of advertising by one firm depends on how much advertising the other undertakes. As the best strategy is dependent on what the other firm chooses there is no dominant strategy, which makes it slightly different than a prisoner's dilemma. The outcome is similar, though, in that both firms would be better off were they to advertise less than in the equilibrium. Sometimes cooperative behaviors do emerge in business situations. For instance, cigarette manufacturers endorsed the creation of laws banning cigarette advertising, understanding that this would reduce costs and increase profits across the industry.[16] This analysis is likely to be pertinent in many other business situations involving advertising. Another example of the prisoner's dilemma in economics is competition-oriented objectives.[17] When firms are aware of the activities of their competitors, they tend to pursue policies that are designed to oust their competitors as opposed to maximizing the performance of the firm. This approach impedes the firm from functioning at its maximum capacity because it limits the scope of the strategies employed by the firms. Without enforceable agreements, members of a cartel are also involved in a (multi-player) prisoners' dilemma.[18] 'Cooperating' typically means keeping prices at a pre-agreed minimum level. 'Defecting' means selling under this minimum level, instantly taking business (and profits) from other cartel members. Anti-trust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers.

In sport
Doping in sport has been cited as an example of a prisoner's dilemma.[19] If two competing athletes have the option to use an illegal and dangerous drug to boost their performance, then they must also consider the likely behaviour of their competitor. If neither athlete takes the drug, then neither gains an advantage. If only one does, then that athlete gains a significant advantage over their competitor (reduced only by the legal or medical dangers of having taken the drug). If both athletes take the drug, however, the benefits cancel out and only the drawbacks remain, putting them both in a worse position than if neither had used doping.[20]

Prisoner's dilemma

250

Multiplayer dilemmas
Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). The commons are not always exploited: William Poundstone, in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom, winner of the 2009 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best case outcome for PD.[21]

The Cold War


The Cold War and similar arms races can be modelled as a Prisoner's Dilemma situation.[22] During the Cold War the opposing alliances of NATO and the Warsaw Pact both had the choice to arm or disarm. From each side's point of view: Disarming whilst your opponent continues to arm would have led to military inferiority and possible annihilation. If both sides chose to arm, neither could afford to attack each other, but at the high cost of maintaining and developing a nuclear arsenal. If both sides chose to disarm, war would be avoided and there would be no costs. If your opponent disarmed while you continue to arm, then you achieve superiority. Although the 'best' overall outcome is for both sides to disarm, the rational course for both sides is to arm. This is in