# Fitness requirements for scientific theories containing recursive theoretical terms.

1 Introduction2 Theoretical Terms

3 Recursive Terms in Mendelian Genetics

4 Eliminability and Falsifiability

5 Identifiability of Recursive Theoretical Terms

6 Conclusion

I INTRODUCTION

The criterion of FITness was proposed a few years ago as an appropriate boundary between theories that are observationally testable and theories that are not (Simon and Groen [1973]). The basic (Popperian) idea is that a theory, if false, should be disconfirmable by a finite set of observations (i.e. should be finitely testable), and the theory, if disconfirmed by a set of observations, should be disconfirmed by any superset of those observations (i.e. should be irrevocably testable). Later, the definition of FITness was revised slightly to clarify a number of ambiguities and problems in the earlier formulation (Simon [1983]). Still more recently it has been shown (Simon [1985]) that theories with existentially quantified theoretical terms may be falsifiable according to the FITness criterion.

The papers reporting these results provided a number of examples of FIT theories that contain theoretical terms, but they did not explore the varieties of terms that might be so accommodated. In the present paper, we will be concerned with a class of theoretical terms, which we will call recursive, whose values are inferred from observations of the history of the system under observation. None of the examples previously published contains terms of this kind.

It is the purpose of the paper to analyse the nature of recursive theoretical terms, to determine whether they can be Ramsey-eliminable from theories, and whether a theory containing such terms can still be falsifiable. It will turn out that the answers to both questions are affirmative. We will first review examples of theories containing theoretical terms that are not recursive in the above sense, next introduce more formally the idea of recursivity of terms, and then show their implications for FITness.

2 THEORETICAL TERMS

We start by reviewing briefly several examples of theories containing theoretical terms that are already familiar (Simon [1985]).

First, consider the hydrostatic equilibrium of an incompressible liquid in a constant gravitational field in one (vertical) dimension. At each observation i, let x measure the distance downward from the surface of the liquid, and let p(x), the pressure in excess of atmospheric pressure, be the only other observable. Then we postulate:

[exist]r[for all]i[p.sub.i] = r[x.sub.i]. (1)

In this theory, r is not observable, thus is a theoretical term. In particular, r is a constant, for its value, the ratio of [p.sub.i] to [x.sub.i], does not depend on observation i. As shown in Simon [1983], theoretical terms like r are eliminable. For after we have made a single observation t, we can use the value [p.sub.t]/[x.sub.t] to replace r. So the theory can be rewritten in terms of observables as follows:

[for all]i[p.sub.i] = ([p.sub.t]/[x.sub.i])[x.sub.i]. (2) Moreover, as shown in Simon [1985], theories like (1) can be characterized as FIT, hence they are falsifiable by finite sets of observations.

Now we complicate the example by replacing the liquid with a perfect gas, having a pressure at the upper boundary of p(0). The theory now becomes:

[for all]i[[d[p.sub.x]/dx].sub.i] = [r.sub.i] (the derivative to be evaluated for x = [x.sub.i]) (3)

[exist]k[for all]i[r.sub.i] = k[p.sub.i] (4)

Assume that the only observables are x, the depth of the gas, and p(x), the pressure at x, so that both [r.sub.i] and k in theory (4) are theoretical terms and [p.sub.i] is the only observable appearing in the formula. Theory (4) is slightly more complicated than theory (1) because theoretical terms r and k are mutually dependent. In particular, we have to calculate the value of k before we can calculate [r.sub.i], because the differential equation (replacing [r.sub.i] in (3) by (4)).

d[p.sub.x]/dx = kp (5) gives the value k = [In([p.sub.x]/[p.sub.o])]/x, and now the [r.sub.i] can be obtained from (4).

From this example, we can see that a theory can contain, in addition to the observables, theoretical terms that depend on other theoretical terms. However, in the example the values of these latter terms are still independent of which observation is used to calculate k.

3 RECURSIVE TERMS IN MENDELIAN GENETICS

We are now ready to consider a theory that contains recursive theoretical terms, specifically, the theory of gene inheritance originated by Mendel [1865]. We will make use of a formulation of the Mendelian theory that was induced automatically from data by the LIVE computer program (Shen [1989]). The theoretical terms, the genes, in this theory are recursive because their values change from one generation to another as a function of the values in the previous generation. Using the breeding of plants to compute the terms, these values depend on the measurement (breeding experiment) performed on the plant.

Let us describe some details of Mendel's experiments before we propose the theory. Mendel's experiments consisted of a series of breeding operations and observations on garden peas. The breeding operations are of two kinds: (1) inbreeding, in which both pistil and stamen (hence both genes) are on the same plant; and (2) crossbreeding, in which pollen from one plant is used to fertilize the ovum on another. We associate the flowers on each pea plant with the pea it grew from. The theory assumes that all flowers on a plant have the same genetic constitution. (In the case of peas, since the flowers are bisexual, inbreeding can be accomplished by fertilizing the ovum with pollen from the same flower.) The genetic constitution of the peas produced from a flower's pistils will depend upon the genetic constitutions both of the flower itself and of the flower that produced the pollen that fertilized it.

The peas are classified according to their observable features. For example, peas of one class may be green and have wrinkles, and of another class may be yellow and have long stems. The experiments start with a set of purebred peas that have been inbred for a number of generations until all of them continue to exhibit consistently the same observable features. Mendel hybridized these purebread peas by crossbreeding, and observed the outcomes for several generations. The experiments were very well controlled so that the characteristics of each seed pod were determined by the genes of its known parents, and not by any other factors such as weather, temperature, etc.

In formalizing Mendel's theory for the purposes of this paper, we assume that the color of the peas is the only observable. The color green will be represented as 1 and yellow as 0. We also assume that green color dominates yellow, that is, if a pea has both a green gene and a yellow gene, then its color will be green.

Further, in order to avoid irrelevant statistical and probabilistic issues in our formulation of the theory, we assume that a single breeding, whether inbreeding or crossbreeding, will produce exactly four offspring, one for each possible pairing of genes. By this assumption we replace probability distributions of different genotypes and phenotypes with exact numbers that correspond to the expected values. The specific questions we wish to examine can be answered in this deterministic model as definitively as in a more realistic probabilistic model, and far more simply.

Let ([P.sub.x] c m f) represent a pea plant, where [p.sub.x] is the pea's identifier, c is its color, and m and f its genes that determine color. Each of the variables c, m and f, can assume the value 1 (green) or 0 (yellow). Let AF[[P.sub.x],[P.sub.y]] represent a single inbreeding or crossbreeding of peas in generation i to produce four peas in generation i + 1. Then Mendel's experiments can be summarized as in Table 1. For example, the green pea ([P.sub.7]111) and yellow pea ([P.sub.8]000) produce four offspring [P.sub.21], [P.sub.22], [P.sub.23], and [P.sub.24]; they are all green in color (although their genes are different). For each generation, we have illustrated the various types of matings and offspring that are possible.

[TABULAR DATA OMITTED]

More formally, a theory about gene inheritance is postulated as follows. Let the expression (cab) represent a pea with color c, and genes a and b, and with its identifier unspecified. Let the expression (ab) represent a pea with genes a and b, and with its observable color unspecified. Then each breeding of two (identical or non-identical) peas produces four offspring, according to the rule:

AF[(xy), (wz)] [implies] [(xw), (xz), (yw), (yz)] (6)

Color(11) = Color(10) = Color(01) = 1 (7)

Color(00) = 0 (8)

This set of equations says that any pair of peas, [P.sub.s] and [P.sub.t], possesses four genes x, y, w, z such that the genes of the children are evenly obtained from them, and the colors of children are determined by the pairs of genes that they get and by a function called Color.

Notice that in this theory the genes are not observable; they are theoretical terms. However, they differ from the theoretical terms r and k in the previous theories because their values depend on the values of their parents' terms and change from one generation to another. For example, a green pea in generation i may have both its genes green, but a green pea in the next generation i + 1 may have one gene green and the other yellow. Because the values of these terms depend on the values in earlier generations, we call them 'recursive theoretical terms'.

4 ELIMINABILITY AND FALSIFIABILITY

Theoretical terms like r in (1) and k in (4) can be eliminated by estimating their values, directly or indirectly, from observations. For theories that contain recursive theoretical terms, the situation may become more complicated. The values of the theoretical terms are generally different for different objects, and inferring these values may require noting the values of the observable terms for other, related, objects. By observing only that a pea is green, it cannot be determined whether it is a purebred or hybrid. And even after we see that breeding two particular green peas produces four green offspring, we cannot conclude that both parents are purebreds.

In spite of these complications, the Mendelian theory given by (6) is FIT. Moreover, the theoretical terms are eliminable, although it is not always possible to distinguish between a pea that is (110) and one that is (101) (i.e. the theoretical terms are not completely identifiable).

By definition (Simon and Groen [1973] and Simon [1983]), a scientific theory is FIT iff:

1. The theory, if false, could be disconfirmed by a finite set of observations (i.e. is finitely testable).

2. The theory, if disconfirmed by a set of observations, would be disconfirmed by any superset of those observations (i.e. is irrevocably testable).

In our Mendelian theory, the simplest possibilities for disconfirmation arise from observing the outcomes of inbreeding experiments. If the plant to be self-fertilized derives from a yellow pea, then the theory implies that all of its genes are yellow, thus all four offspring must be yellow; if not, the theory is disconfirmed. If the plant derives from a green pea, then either all four offspring must be green, or three must be green and one yellow; else the theory is disconfirmed. Further, in the latter case, in the next generation of inbreeding of the three green offspring, one must produce only green peas, the other two must produce three green peas and one yellow pea.

These conclusions, which can be derived from the formal theory, can be written as follows (we modify the notation to emphasize that only the color is observed, not the genes.)

AF[(y00), (y00)] [implies] [(y00), (y00), (y00), (y00)]

AF[(g11), (g11)] [implies] [(g11), (g11), (g11), (g11)] or

AF[(g10), (g10)] [implies] [(g11), (g10), (g01), (y00)]

Crossbreeding experiments also offer possibilities for disconfirmation. Crossbreeding two yellow peas must always yield yellow offspring; but crossbreeding two green peas or a yellow and a green can produce several possible results. For example, with two green peas, we can observe four green progeny or three green and one yellow. In the latter case, we can infer that both parents were hybrids. In the former case, one or both were purebred, but we must look at the ratio for the third generation to determine which is the case (all progeny purebred versus half purebred and half hybrid). Moreover, we cannot distinguish the hybrid from the purebred parent without information about their parents and siblings.

AF[(y00), (y00)] [implies] [(y00), (y00), (y00), (y00)]

AF[(g11), (g11)] [implies] [(g11), (g11), (g11), (g11)] or

AF[(g11), (g10)] [implies] [(g11), (g10), (g11), (g10)] or

AF[(g10), (g10)] [implies] [(g11), (g10), (g10), (y00)]

Without pursuing all of the cases exhaustively, we see that there are ample opportunities for disconfirming the Mendelian theory, if false, with a finite set of observations. Moreover, once disconfirmed, additional observations could not resuscitate it. Hence the theory is FIT.

The FITness of the theory implies that its theoretical terms are eliminable (Simon [1983]). How can this elimination be achieved? In the notation we have been using, (cxy) represents a pea that is of (observable) color c, and has genes (theoretical terms) x and y. But the observations from experiments that we have described above to identify the values of x and y can be used in reverse to state the theory without reference to the theoretical terms. Let us sketch out how this can be done.

Our goal is to construct a new theory that has the same predictive power as the theory stated above but uses no theoretical terms (genes). Moreover, since the previous theory is deterministic, the new theory's predictions must be precise. That is, the theory's predictions should not use any logical or. For example, it is not enough for the new theory to state that 'two green peas will, on inbreeding, produce either four green peas or three green peas and one yellow pea'.

There are six possible different input/output cases of the theory (notice that the notation here shows the genes only and we are using symmetry to treat (10) and (01) as identical and indistinguishable): 1. (00)(00)[right arrow](00)(00)(00)(00) 2. (00)(00)[right arrow](01)(00)(01)(00) 3. (00)(11)[right arrow](01)(01)(01)(01) 4. (10)(10)[right arrow](11)(10)(01)(00) 5. (10)(11)[right arrow](11)(11)(01)(01) 6. (11)(11)[right arrow](11)(11)(11)(11)

When color of peas is the only observable, these six cases can be classified into the following five observably distinct classes because cases 5 and 6 (combined in class e) are indistinguishable by colors alone. Notice that the notation now shows the color only. a. (y)(y)[right arrow](y)(y)(y)(y) b. (y)(g)[right arrow](g)(y)(g)(y) c. (y)(g)[right arrow](g)(g)(g)(g) d. (g)(g)[right arrow](g)(g)(g)(y) e. (g)(g)[right arrow](g)(g)(g)(g)

From the given information we cannot always predict the output (the right-hand side) of each class from the input (the lefthand side) observation, for the inputs of classes b and c, and classes d and e appear the same. Notice that if genes were observable these classes would be distinguishable because the input (g) in class b is a hybrid (10), while the input (g) in class c is a purebred (11); the input peas (g)(g) in calss d are both hybrid while the input peas in class e are not. Thus our first task is to define 'purebred' and 'hybrid' in terms of observables.

A pea is purebred (11) or (00), if its progeny, under inbreeding, are all a single color: green (11) if it is green, yellow (00) if it is yellow. A pea is hybrid (10), if its progeny, under inbreeding, are three green and one yellow. Now 'hybrid' and 'purebred' are theoretical terms, but 'parent' and 'offspring' are observable by keeping records of which peas were produced by which breedings. Hence, at the expense of some complexity and awkwardness, we may replace 'hybrid' by 'plant that on inbreeding produces three green and one yellow offspring', and 'purebred' by 'plant that on inbreeding produces offspring all of its own color'. These are observable properties, and we can call peas, x, that satisfy the former or the latter criterion h-test(x) and p-test(x), respectively.

However, we must be a little cautious here. Since the reason for having a theory is to predict the outcomes of inbreedings and crossbreedings, we cannot inbreed the input itself to find out whether it is purebred or hybrid. Instead we must derive such conclusions from the history of previous observations and the results of h-test and p-test applied to the siblings of the input.

If a pea is yellow, then it is a purebred. If a pea is green, then in order to be a purebred (11) it must be produced, according to the theory, by two green peas in case 4, 5 or 6. In case 4, the purebred's siblings must be one yellow and two hybrids. In case 5, two of its siblings must be hybrids and the other be purebred. In case 6, three of its siblings must be purebreds. Based on these cases, we can define the predicate 'purebred(x)' as follows: purebred(x)[if an enly if]yellow(x) or {y z are parents of x, u v w are the siblings of x [green(y) [conjunction] green(z) [conjunction] yellow(u) [conjunction] h-test(v) [conjunction] h-test(w)] or [green(y) [conjunction] green(z) [conjunction] green(u) [conjunction] p-test(u) [conjunction] h-test(v) [conjunction] h-test (w)] or [green(y) [conjunction] green(z) [conjunction] p-test(u) [conjunction] p-test(v) [conjunction] p-test(w)]} Once we have defined purebred, the predicate 'hybrid(x)' can be defined simply as follows: hybrid(x)[if an enly if][logical not]purebred(x)

With these predicates defined in terms of observables and relations among observables such as parent and sibling, plus the five classes that were derived from the theory, it is straightforward to construct a new theory that makes no reference to theoretical terms: AF[(y), (y)] [implies] (y)(y)(y)(y) (9) AF[(y), (g)] [conjunction] hybrid(g) [implies] (y)(y)(g)(g) (10) AF[(y), (g)] [conjunction] purebred(g) [implies] (g)(g)(g)(g) (11) AF[([g.sub.1]), ([g.sub.2])] [conjunction] hybrid([g.sub.1]) [conjunction] hybrid([g.sub.2]) [implies] (g)(g)(g)(y) (12) AF[([g.sub.1]), ([g.sub.2])] [conjunction] purebred([g.sub.1]) [implies] (g)(g)(g)(g) (13)

Notice that in the process of elimination, we have used not only relation predicates such as 'parent' and 'sibling' to specify the relations between observables, but also the action (or measurement) 'inbreeding'. Nothing similar occurred in the previous examples of theories where theoretical terms were non-recursive. Here, if the action, inbreeding, were not allowed in the definition, we would be unable to define the predicates 'purebred' and 'hybrid' without using genes. For in order to know whether a pea is purebred one must know something about its siblings or its parents.

5 IDENTIFIABILITY OF RECURSIVE THEORETICAL TERMS

Recursive theoretical terms also cast some light on the identifiability of theoretical terms in general. By definition, theoretical terms are those terms that are not directly observable but only derivable from observables. For example, the voltage of a battery may be determined from measurements of the current and resistance, the genes of peas may be determined from observations of the peas' color. However, whether it is possible to compute the values of such terms solely on the basis of given experiments and observables depends on the precise structure of terms.

Lesniewski has proposed that definitions should satisfy two conditions, which P. Suppes ([1957], p. 153) (quoted in Simon [1970]) has described thus:

Two criteria which make more specific ... intuitive ideas about the character of definitions are that (i) a defined symbol should always be eliminable from any formula of the theory, and (ii) a new definition does not permit the proof of relationships among the old symbols which were previously unprovable; that is, it does not function as a creative axiom. Following Suppes, we will refer to these two criteria as eliminability and noncreativity. Tarski ([1956], pp. 301--2) has proposed a concept of definition which implies that defined terms are always eliminable. The criteria of eliminability and noncreativity stem from the idea that definitions are mere notational abbreviations, allowing theory to be stated more compactly without changing its content.

Subsequently, Simon [1970] proposed a somewhat weaker condition called general definability. A term is generally definable by means of the other terms in a theory if, upon adding a sufficient number of observations to the axioms, the system including these new propositions defines this term in the sense of Tarski. It turns out that the axioms that define (in this generalized sense) the theoretical terms serve both as definitions and laws, that is to say, they allow the theoretical terms to be eliminated, but at the same time they are creative. We repeat Simon's axiomatization of Ohm's law to illustrate this point.

[gamma] is a system of Ohmic observations iff there exist D, r, c, such that: (1) [gamma] = <D,r,c>: (2) is a non-empty set; (3) r and c are functions from D into the real numbers; (4) for all x [element of] D, r(x) > 0 and c(x) > 0. [gamma]' is an Ohmic circuit iff there exist D, r, c, b, and v such that: (5) [gamma]' = <D,r,c,b,v> (6) [gamma] = <D,r,c,> is a system of Ohmic observations; (7) v and b are real numbers; (8) for all x [element of] D, [[alpha]]. c(x) = v/(b + r(x)).

In this system, r and c are the observables (they stand for the external resistance and the current respectively), and b and v are theoretical terms (they stand for the internal resistance and the voltage of the battery respectively). However, according to the method of Padoa (Tarski [1956], pp. 304--5), v and b are not, as they should be, definable in this system unless the following additional condition is added: (9) D contains at least two members with distinct values of c and r. This condition is required for definability, because only after two such observations ([c.sub.1], [r.sub.1]) and ([c.sub.2], [r.sub.2]), can b and v be uniquely defined, as follows: [[beta]]. b = ([c.sub.2][r.sub.2] - [c.sub.1][r.sub.1])/([c.sub.1] - [c.sub.2]) [[gamma]]. v = [c.sub.2][c.sub.1] ([r.sub.2] - [r.sub.1])/([c.sub.1] - [c.sub.2]). Furthermore, if we substitute these values for v and b in [[alpha]], and substitute the third observation ([c.sub.3], [r.sub.3]) for c(x) and r(x) respectively, then we obtain a relation among three pairs of observations. Thus, the 'definition' (8) is now creative because this relation is required to hold among any three pairs of observations.

This treatment is suitable for environments in which the values of theoretical terms depend only on the values of observables regardless of when and in which order the measurements are made. In the Ohmic circuit, for example, the values of b and v depend only on the values of the c's and r's and do not change from one measurement to another, thus any two observations in the circuit could serve for the definitions of b and v. Examining equations [[beta]] and [[gamma]], we can see that no theoretical terms appear on the right-hand sides of the equations, and the equations do not have a sense of time.

In theories that contain recursive theoretical terms, the situation here is more complex because the values of theoretical terms can be different for different individuals even though the values of observables for those individuals may appear the same. For example, in Mendel's experiments, the values of m and f are different for ([P.sub.3]) and ([P.sub.17]) even though both peas are green. Furthermore, the Mendelian theory contains not only observations (colors) but also observable measurements (inbreedings or crossbreedings).

In general, recursive theoretical terms are not determined just by the current values of observables but by the history of the observations and the actions performed in order to make those observations. For example, a pea is determined to be purebred by observations on its parents' and siblings' colors, and by actions (inbreeding) on its siblings and their results. Therefore, it is not clear whether in the general case there exist any particular sets of observations, as there do for Mendel's theory, that permit the theoretical terms to be defined uniquely.

Formally, in order both to define and use a set of theoretical terms T in a theory, the following three functions might have been specified: E(): Computing T from observables; U(): Using T to determine the values of observables; I(): Showing how T is inherited through actions.

The first function E() may depend on data for more than one generation for identification. In the environment of Mendel's experiments, the constraints on the breeding relations between parents and offspring provide a starting point for the recursive definitions. But there is no guarantee that such starting points will exist for all theories containing recursive theoretical terms. If no E() exists, then some of the theoretical terms may be unidentifiable, although still eliminable.

The second function U(), tells us how the theoretical terms are used at a particular point in time. If U() exists, then the theoretical terms are meaningful because they determine, thus predict, the values of observables. For example, in Mendel's experiments, a pea's m and f determine its color.

Finally, the third function I() constitutes the core of the theoretical structure, for it tells us how the theoretical terms are propagated through measurements and actions. Once we know the function I() and the current values of theoretical terms T, then we can use T to predict the observables for all future points in time.

From this analysis, we can safely say that theoretical terms can be recursive, that is, defined by how they are inherited through actions and observations. Although the values of theoretical terms are derivable from the observables, within the structure of the theory they are propagated from themselves. The presence of recursive terms in some theories adds an interesting dimension to the topic of the eliminability and identifiability of theoretical terms.

6 CONCLUSION

In this paper, we have introduced recursive theoretical terms as a class of theoretical terms whose values must be inferred from a history of observations. Although such terms may add much complication, we have shown by the example of Mendelian genetics that theories containing them may still be FIT. That is, such theories may be falsifiable and their recursive theoretical terms may be eliminable, although the elimination process can be complex. Since recursive theoretical terms are defined in terms of themselves through measurements (or actions), it is not obvious what classes of systems containing them will be FIT. The necessary and sufficient conditions for the eliminability and identifiability of recursive theoretical terms is an important topic for further study.

REFERENCES

MENDEL, G. [1865]: 'Experiments in Plant-hybridization', in J. A. Peters (ed.): Classic Papers in Genetics. Englewood Cliffs, NJ: Prentice-Hall [1973].

SHEN, W. [1989]: 'Learning from the Environment Based on Percepts and Actions', Ph.D. dissertation, Carnegie Mellon University.

SIMON, H. A. [1970]: 'The Axiomatization of Physical Theories', Philosophy of Science, 37, pp. 16--26.

SIMON, H. A. [1983]: 'Fitness Requirement for Scientific Theories', British Journal for the Philosophy of Science, 34, pp. 355--65.

SIMON, H. A. [1985]: 'Quantification of Theoretical Terms and the Falsifiability of Theories', British Journal for the Philosophy of Science, 36, pp. 291--8.

SIMON, H. A. and GROEN, G. J. [1973]: 'Ramsey Eliminability and the Testability of Scientific Theories', British Journal for the Philosophy of Science 24, pp. 367--80.

SUPPES, P. [1957]: Introduction to Logic. Princeton, NJ: Van Nostrand.

TARSKI, A. [1956]: 'Some Methodological Investigations on the Definability of Concepts', Ch. 10, in Logic, Semantics, Metamathematics. Oxford: The Clarendon Press.

Printer friendly Cite/link Email Feedback | |

Author: | Shen, Wei-Min; Simon, Herbert A. |
---|---|

Publication: | The British Journal for the Philosophy of Science |

Date: | Dec 1, 1993 |

Words: | 4775 |

Previous Article: | The caused beginning of the universe: a response to Quentin Smith. |

Next Article: | A biological objection to constructive empiricism. |

Topics: |