# effective complexity murray gell-mann[精选推荐pdf]

Santa Fe Institute.June 26, 20031252p.m.Gell-Mannpage 387Effective ComplexityMurray Gell-Mann Seth LloydIt would take a great many different conceptsor quantitiesto capture all of our notions of what is meant by complexity or its opposite, simplicity. However, the notion that corresponds most closely to what we mean by complexity inordinary conversation and in most scientific discourse is “effective complexity.”In nontechnical language, we can define the effective complexity EC of an entity as the length of a highly compressed description of its regularities [6, 7, 8].For a more technical definition, we need a al approach both to the notion of minimum description length and to the distinction between regularities and those features that are treated as random or incidental. We can illustrate with a number of examples how EC corresponds to our intuitive notion of complexity. We may call a novel complex if it has a greatmany different characters, scenes, subplots, and so on, so that the regularities of the novel require a long description. The United States tax code is complex, since it is very long and each rule in it is a regularity. Neckties may be simple, like those with regimental stripes, or complex, like some of those designed by Jerry Garcia.Nonextensive EntropyInterdisciplinary Applications edited by Murray Gell-Mann and Constantino Tsallis, Oxford University Press387Santa Fe Institute.June 26, 20031252p.m.Gell-Mannpage 388388Effective ComplexityFrom time to time, an author presents a supposedly new measure of com- plexity such as the “self-dissimilarity” of Wolpert and Macready [17] with-out recognizing that when carefully defined it is just a special case of effective complexity.Like some other concepts sometimes identified with complexity, the EC of an entity is context-dependent, even subjective to a considerable extent. It depends on the coarse graining level of detail at which the entity is described, the language used to describe it, the previous knowledge and understanding that are assumed, and, of course, the nature of the distinction made between regularity and randomness. Like other proposed “measures of complexity,” EC is most useful when com- paring two entities, at least one of which has a large value of the quantity in question. Now, how do we distinguish regular features of an entity from ones treated as random or incidental There is, as we shall see, a way to make a nearly absolute distinction between the two kinds of features, but that approach is of limited usefulness because it always assigns very low values of EC, attributing almost all ination content to the random category rather than the regular one. In most practical cases, the distinction between regularity and randomness or between regular and random ination contentdepends on some judgment of what is important and what is unimportant, even though the judge need not be human or even alive. Take the case of neckties, as discussed above. We tacitly assumed that ef-fective complexity would refer to the pattern of the tie, while wine stains, coffee stains, and so on, would be relegated to the domain of the random or incidental. But suppose we are dry cleaners. Then the characteristics of the stains might be the relevant regularities, while the pattern is treated as incidental. Often, regularity and randomness are envisaged as corresponding to signal and noise, respectively, for example in the case of music and static on the radio. But, as is well known, an investigation of sources of radio static by Karl Jansky et al. at Bell Telephone Laboratories in the 1930s revealed that one of those sources lies in the direction of the center of our galaxy, thus preparing the way for radio astronomy. Part of what had been treated as random turned into a very important set of regularities. It is useful to encode the description of the entity into a bit string, even though the choice of coding scheme introduces another element of context depen- dence. For such strings we can make use of the well-known concept of algorithmic ination content AIC, which is a kind of minimum description length. The AIC of a bit string and, hence, of the entity it describes is the length of the shortest program that will cause a given universal computer U to print out the string and then halt [3, 4, 11]. Of course, the choice of U introduces yet another of context dependence. For strings of a particular length, the ones with the highest AIC are those with the fewest regularities. Ideally they have no regularities at all except theSanta Fe Institute.June 26, 20031252p.m.Gell-Mannpage 389Murray Gell-Mann and Seth Lloyd389length. Such strings are sometimes called “random” strings, although the termi- nology does not agree precisely with the usual meaning of random stochastic, especially with equal probabilities for all alternatives. Some authors call AIC “algorithmic complexity,” but it is not properly a measure of complexity, since randomness is not what we usually mean when we speak of complexity. Another name for AIC, “algorithmic randomness,” is somewhat more apt.Now we can begin to construct a technical definition of effective complexity, using AIC or something very like it as a minimum description length. We split the AIC of the string representing the entity into two terms, one for regularitiesand the other for features treated as random or incidental. The first term is thenthe effective complexity, the minimum description length of the regularities of the entity [8].It is not enough to define EC as the AIC of the regularities of an entity. We must still examine how the regularities are described and distinguished from features treated as random, using the judgment of what is important. One of the best ways to exhibit regularities is the used in statistical mechanics, say, for a classical sample of a pure gas. The detailed description of the positions and momenta of all the molecules is obviously too much ination to gather, store, retrieve, or interpret. Instead, certain regularities are picked out. The entity consideredthe real sample of gasis embedded conceptually in a set of comparable samples, where the others are all imagined rather than real. The members of the set are assigned probabilities, so that we have an ensemble. The entity itself must be a typical member of the ensemble in other words, not one with abnormally low probability. The set and its probability distribution willthen reflect the regularities. For extensive systems, the statistical-mechanical s of Boltzmann and Gibbs, when described in modern language, amount to using the principle of max- imum ignorance, as emphasized by Jaynes [9]. The ignorance measure or Shannon ination I is introduced. With a multiplicative constant, I is the entropy. Then the probabilities in the ensemble are varied and I is maximized subject tokeeping fixed certain average quantities over the ensemble. For example, if theaverage energy is kept fixedand nothing elsethe Maxwell-Boltzmann distri- bution of probabilities results. We have, of course,I −XrPrlogPr,1where log means logarithm to the base 2 and the P’s are the coarse-grained probabilities for the individual members r of the ensemble. The multiplicative constant that yields entropy is kln2, where k is Boltzmann’s constant. In this situation, with one real member of the ensemble and the rest imag-ined, the fine-grained probabilities are all zero for the members of the ensemble other than e, the entity under consideration or the bit string describing it.Of course, the fine-grained probability of e is unity. The typicality conditionSanta Fe Institute.June 26, 20031252p.m.Gell-Mannpage 390390Effective Complexitypreviously mentioned is just −logPe∼I .2Here the symbol “∼” means “less than or equal” to within a few bits.We can regard the quantities kept fixed while I is maximized as the things judged to be important. In most problems of statistical mechanics, these are, of course, the averages of familiar extensive quantities such as the energy. The choice of quantities controls the regularities expressed by the probability distribution. In some problems, the quantities being averaged have to do with membership in a set. For example, in Gibbs’s microcanonical ensemble, we deal with the set of states having energies in a narrow interval. In such a case, we would make use of the membership function, which is one for members of the set and zero otherwise. When the average of that function over all the members of the ensemble is one, every member with nonzero probability is in the set. In discussing an ensemble E of bit strings used to represent the regularities of an entity, we shall apply a that incorporates the maximizing of ignorance subject to constraints. We introduce the AIC of the ensemble and call it Y. Wethen have our technical definition of effective complexity it is the value of Y forthe ensemble that is finally employed. In general, then, Y is a kind of candidatefor the role of effective complexity. Besides Y KE, the AIC of the ensemble E for a given universal com- puter U, we can also consider Kr|E, the contingent AIC of each member r given the ensemble. The weighted average, with probabilities Pr, of this contin- gent AIC can be related to I in the following way. We note that R¨ udiger Schack [15] has discussed converting any universal computer U into a corresponding U0that incorporates an efficient recoding scheme Shannon-Fano coding. Such a scheme associates longer bit strings with less probable members of the ensemble and shorter ones with more probablemembers. Schack has then shown that if K is defined using U0, then the average contingent AIC of the members lies between I and I 1. We shall adopt his procedure and thus haveXrPrKr|E ≈ I ,3where ≈ means equal to within a few bits here actually one bit.Let us define the total ination Σ as the sum of Y and I. The first term is, of course, the AIC of the ensemble and we have seen that the second is, to within a bit, the average contingent AIC of the members given the ensemble. To throw some light on the role of the total ination, consider the situa- tion of a theoretical scientist trying to construct a theory to account for a large body of data. Suppose the theory can be represented as a probability distribu- tion over a set of bodies of data, one of which consists of the real data and the rest of which are imagined. Then Y corresponds to the complexity of the theory and I measures the extent to which the predictions of the theory are distributedwidely over different possible bodies of data. Ideally, the theorist would like bothSanta Fe Institute.June 26, 20031252p.m.Gell-Mannpage 391Murray Gell-Mann and Seth Lloyd391quantities to be small, the first so as to make the theory simple and the second soas to make it focus narrowly on the real data. However, there may be trade-offs. By adding bells and whistles to the theory, along with a number of arbitrary parameters, one may be able to focus on the real data, but at the expense of complicating the theory. Similarly, by allowing appreciable probabilities for very many possible bodies of data, one may be able to get away with a simple the- ory. Occasionally, of course, a theorist is fortunate enough to be able to make both Y and I small, as James Clerk Maxwell did in the case of the equationsfor electromagnetism. In any case, the first desideratum is to minimize the sum of the two terms, the total ination Σ. Then one can deal with the possibletrade-offs. We shall show that to within a few bits the smallest possible value of Σ is K ≡ Ke, the AIC of the string representing the entity itself. Here we make use of the typicality condition 2 that the log of the coarse-grained probability for the entity is less than or equal to I to within a few bits. We also make use of certain abstract properties of the AICKA∼KA,B4and KA,B∼KB KA|B,5where again the symbol∼means “less than or equal to” up to a few bits. Atrue ination measure would, of course, obey the first relation without the caveat “up to a few bits” and would obey the second relation as an equality.Because of efficient recoding, we haveKe|E∼− logPe.6We can now prove that K Ke is an approximate lower bound for the total ination Σ KE IKKe∼Ke,E, Ke,E∼KE Ke|E, Ke|E∼−logPe, −logPe∼I .7a 7b 7c 7dWe see, too, that when the approximate lower bound is achieved, all these approximate inequalities become approximate equalitiesK≈Ke,E, Ke,E≈Y Ke|E, Ke|E≈−logPe, −logPe≈I .8a 8b 8c 8dThe treatment of this in Gell-Mann and Lloyd [8] is slightly flawed. The approximate inequality 7b, although given correctly, was accidentally replacedSanta Fe Institute.June 26, 20031252p.m.Gell-Mannpage 392392Effective Complexitylater on by an approximate equality, so that condition 8b came out as a truism. Thus 8b was omitted from the list of new conditions that hold when the total ination achieves its approximate lower bound. As a result, we gave only three conditions of approximate equality instead of the four quoted here in 8a–8d. Also, in the discussion at the end of the paragraph preceding eq. 2 of Gell- Mann and Lloyd [8], we wrote logKUa by mistake in place of logKUb 2loglogKUb, but that does not affect any of our results. Clearly the total ination Σ achieves its approximate minimum value K for the singleton distribution, which assigns probability one to the bit string representing our entity and zero probabilities to all other strings. For that dis- tribution, Y is about equal to K and the measure of ignorance I equals zero. There are many other distributions for which Σ ≈ K. If we plot Y against I, the line along which Y I K is a straight line with slope minus one, with the singleton at the top of the line. We are imposing on the ensemblethe onethat we actually use to define the effective complexitythe condition that the total ination approximately achieve its minimum. In other words, we want to stay on the straight line or within a few bits of it. All ensembles of which e is a typical member lie, to within a few bits, above and to the right of a boundary. That boundary coincides with our straight line all the way from the top down to a certain point, where we run out of ensembles that have Y I ≈ K. Below that point the actual boundary for ensembles in theY − I plane no longer follows the straight line but veers offto the right. Now, as we discussed, we maximize the measure of ignorance I subject to staying on that straight line. If we do that and impose no other conditions, we end up at the point where the boundary in the I −Y plane departs from the straight line. As described in the paper of Gell-Mann and Lloyd who are indebted to Charles H. Bennett for many useful discuss