Death of paradox: the killer logic beneath the standards of proof.
I contend that for understanding the standards of proof the modern versions of logic--in particular, fuzzy logic and belief functions--work better than classical logic and probability theory. This modern logic suggests that fact-finders first assess evidence of an imprecisely perceived and described reality to form a fuzzy degree of belief in a fact's existence, and they then apply the standard of proof by comparing their belief in a fact's existence to their belief in its negation.
This understanding nicely explains how the standard of proof actually works in the law world. While conforming more closely to what we know of people's cognition, the new understanding captures better how the law formulates and manipulates the standards and it also gives a superior mental image of the fact-finders' task. One virtue of this conceptualization is that it is not a radical reconception. Another virtue is that it nevertheless manages to resolve some stubborn problems of proof including the infamous conjunction paradox.
TABLE OF CONTENTS INTRODUCTION I. ASSESSING EVIDENCE A. Theories 1. Psychology Theories 2. Probablity Theories 3. Zadeh's Fuzzy Logic B. Legal Application: Oradated Likelihood II. COJOINING ASSESSMENTS A. Fuzzy Operators 1. Maximum and Minimum 2. Product Rule Contrasted 3. Negation Operator B. Legal Application: Conjunction Paradox III. ANALYZING BELIERS A. Shafer's Belief Functions 1. Basics of Theory 2. Negation Operator 3. Lack of Proof B. Legal Application: Burden of Production IV. APPLYING STANDARDS A. Comparison of Beliefs B. Legal Application: Burden of Persuasion 1. Traditional View 2. Reformulated View 3. Implications of Reformulation CONCLUSION
Le seul veritable voyage, le seul bain de Jouvence, ce ne serait pas d'aller vers de nouveaux paysages, mais d'avoir d'autres yeux, de voir l'univers avec les yeux d'un autre, de cent autres **
We have made tremendous strides, albeit only recently, toward understanding the process of proof. The wonderful "new evidence" scholarship has made especial progress by shifting the focus of evidence scholarship from rules of admissibility to the nature of proof, while opening the door to interdisciplinary insights, including those from psychology. (1) Yet the new work has tended to remain either too wedded or overly hostile to subjective probabilities for evaluating evidence (2) and to Bayes' theorem for combining evidence, (3) and so caused the debates to become "unproductive and sterile." (4) In any event, the debates have left unsolved some troubling problems and paradoxes in our law on proof.
The "New Logic"
One specific diagnosis of this shortcoming is that the new evidence tended to neglect the contemporaneous advances in logic. (5) The new, so-called nonclassical logic looks and sounds much like standard logic but refuses to accept some critical assumptions. (6) Most commonly, the assumption rejected is that every proposition must either be true or be false, an assumption called the principle of bivalence. But if propositions are not bivalent, so that both P and not P can be true and false to a degree, then one can show that sometimes P equals not P--which is a rather disquieting contradiction. (7) Fashioning the new logic thus faced some challenges in its development.
The first move in the new logic of special interest to lawyers relates to and builds on the branch of modern philosophy, beginning with Bertrand Russell's work, that struggled with the problem of vagueness. (8) Work on vagueness addresses matters such as the famed sorites paradox of ancient Greece ("sorites" comes from the Greek word for heap):
Premise 1: if you start with a billion grains of sand, you have a heap of sand.
Premise 2: if you remove a single grain, you still have a heap.
If you repeat the removal again and again until you have one grain of sand left, then you will by logic still have a heap. But there is no heap. Thus, heap equals nonheap. Two true premises yield an absurd conclusion (or--to picture the paradox in another common way--start with Tom Cruise's full head of hair, and begin plucking hairs, yet Tom will by logic never become bald).
At some point the heap undeniably became a nonheap. Was there a fixed boundary? No, this is not a way out--at least according to most philosophers. A different path taken in the attempt to avoid the paradox leads to the embrace of many-valued logic. (9) This form of logic boldly declines the simplification offered by two-valued, or bivalent, logic built on a foundation of true/false with an excluded middle. It instead recognizes partial truths. Both a statement and its opposite can be true to a degree. In other words, sometimes you have neither a heap nor a nonheap, but something that falls in between, with the statement "this is a heap" being both true and not true. (10)
The second interesting elaboration of the new logic involves developments in the field of imprecise probability. (11) This field of mathematics provides a useful extension of probability theory whenever information is conflicting or scarce. The approach can work with many-valued logic as well as with two-valued logic. The basic idea is to use interval specifications of probability, with a lower and an upper probability. Despite its name, imprecise probability is more complete and accurate than precise probability in the real world where probabilistic imprecision prevails. In fact, traditional bivalent probability (within which I include the doctrine of random chance as well as the much newer subjective probability) appears as a special case in this theory. The rules associated with traditional probability, except those based on assuming an excluded middle, carry over to imprecise probability.
All this logic may be new, but it has an extended history of forerunners. Threads of many-valued logic have troubled thinkers since before Aristotle embraced bivalence, (12) even if their thoughts found more receptive soil in the East than in the West. (13) Imprecise probability goes back to the nineteenth century. (14) Nevertheless, the new logic has enjoyed a recent flowering, inspired by the development of quantum mechanics and instructed by those just-described advances in philosophy and mathematics.
The particular bloom known as fuzzy logic finds its roots in the seminal 1965 article by Berkeley Professor Lotfi Zadeh. (15) His critical contribution was to use degrees of membership in a fuzzy set running from 1 to 0, in place of strict membership in a crisp set classified as yes/no or as either 1 or 0. Yet fuzzy logic is not at all a fuzzy idea. (16) It became a formal system of logic, one that is by now highly developed and hence rather complicated. (17)
I do not mean to suggest that fuzzy logic resolves all the philosophical problems of vagueness (or that it is especially popular with pure philosophers). I am suggesting that fuzzy logic is a very useful tool for some purposes. Of course, it has become so well-known and dominant because of its countless practical applications, especially in the computer business and consumer electronics. (18) But its theory is wonderfully broad, extending easily to degrees of truth. It thereby proves very adaptable in imaging truth just as the law does. Indeed, of the various models for handling uncertainty, fuzzy logic seems to capture best the kinds of uncertainty that most bedevil law. (19) Accordingly, writers have previously voiced suspicions that it might relate to legal standards of proof. (20)
Herein, fuzzy logic will provide a handle on how to represent our legal understandings of likelihood. (21) But it is not an exclusive tool. "In order to treat different aspects of the same problems, we must therefore apply various theories related to the imprecision of knowledge." (22) Another, compatible theory will function herein as a description of how to make decisions based on those likelihood understandings.
Useful for that purpose is Rutgers Professor Glenn Shafer's imposing elaboration of imprecise probability from 1976. (23) His work on belief functions built a bridge between fuzzy logic and traditional probability, and in the process nicely captured our legal decision-making scheme. (24) He used the word "belief' to invoke neither firm knowledge nor some squishy personal feeling, but rather the fact-finders' attempt to express their degree of certainty about the state of the real world as represented by the evidence put before them. (25) By allowing for representation of ignorance and indeterminacy of the evidence, he enabled beliefs to express uncertainty, again on a scale running from 1 to 0. (26) Indeed, his theory of belief functions rests on a highly rigorous mathematical base, managing to get quite close to achieving a unified theory of uncertainty. (27)
Belief function theory does not constitute a system of logic, unlike fuzzy logic. Instead, it is a branch of mathematics, like traditional probability. (28) Just as probability serves two-valued logic by handling a kind of uncertainty that the underlying logic system does not otherwise account for, belief function theory delivers mathematical notions that can extend many-valued logic. While probability treats first-order uncertainty about the existence of a fact, belief function notions supplement fuzzy logic by capturing and expressing the indeterminacy resulting from scarce information or conflictive evidence concerning the fact. Shafer's theory is thus similar to a scheme of second-order probability, (29) which admittedly has hitherto failed both the statistically minded (30) and psychologically minded (31) as a way to explain standards of proof.
Relation to Law
My thesis is that a better explanation of what the law does with proof lies in the new logic than in two-valued logic and its traditional probability. The explanation is indeed so good that one must entertain the notion that the law assumed and embraced the tenets of the new logic long before logicians caught up with the law.
Such an embrace by the law would not be that surprising. Law was one of the first of society's endeavors in which things appeared as neither completely true nor completely untrue. Aristotelian two-valued logic did not work for such partial truths. The common law seemed, early and intuitively, to draw many-valued logic from natural language and daily life. At about the same time, in the late eighteenth century, law and nascent probability theory began interacting. (32) That interaction caused the law to become more open about accepting uncertainty. However, traditional probability's inherent appeal prevented observers, unknowingly indoctrinated in classical bivalent logic, from seeing the law's deep logical underpinning for what it really was. (33)
Now the time has come to excavate law's multivalent foundation. The site chosen for excavation is the highly controversial subject of standards of proof. I refer to the subject in its broad sense, as covering anything that a court subjects to the proof process in order to establish truth. The subject includes many applications of law to fact, and also odd kinds of facts such as a prediction of an event, but for simplicity I shall usually refer to the whole as "facts."
The prevailing but contested view is that fact-finders should determine facts by this line of probabilistic reasoning: although given imperfect evidence, they should ask themselves what they think the chances are that the burdened party would be right if the truth were somehow to become known.
So, how would fuzzy logic and belief functions better explain standards of proof?. The initial step in tying the new logic to law is to admit that the process of proof investigates a world that is not a two-valued world where disputed facts are either true or false. Instead, a good portion of the real world--by which, to be technically fine, I mean the world as perceived by humans and described by natural language-is a vague, imprecise, or many-valued world, where partial truths exist. Or, at the very least, we will never know whether a disputed fact is certainly true or false. So, the probability of truth is not the only relevant legal question. A second step is to recognize that the fact-finder's complexly constructed belief is the more relevant question. We are not as concerned with how certain the fact-finder is in a world of random uncertainty, as we are with the degree of truth the fact-finder constructs in a world of vague imprecision. The third step builds on the idea that the fact-finder will believe facts as true to a degree. We can speak of degrees of belief. Indeed, on the basis of incomplete, inconclusive, ambiguous, dissonant, and untrustworthy evidence, some of the fact-finder's belief should remain indeterminate. The output is not a probability, but what a logician would call a non-additive degree of belief in the fact's existence. (34) In the fourth step, the standard of proof will call on the fact-finder to weigh those degrees of belief.
The key distinction between probabilities and degrees of belief is subtle, as attested by the confusion among people discussing proof over the long years. Both systems numerically quantify uncertainty by using numbers in the unit interval [0,1]. But the distinction's consequences are not subtle. Degrees of belief handle imprecision better than traditional probability theory, and they better capture the effect of imperfect evidence. Also, abandoning probabilities opens the door to the logician's powerful tools for handling beliefs: fuzzy logic provides a way to represent imprecise views of the real world, and belief functions give the tools for translating beliefs about facts into legal decisions.
Another justification for so conceptualizing the process of proof is that several significant paradoxes of the law will melt away. Most paradoxes result from the limits on existing frames of reference and tools for analysis. Some paradoxes remain out of reach of today's comprehension. By "death of paradox" in my title, I simply mean that one can get a handle on many seeming paradoxes of the law by utilizing newly available frameworks and tools. As I say, this Article's view of proof works so well in this regard that it appears to have been all along the law's intuitive conceptualization of proof. Thus, after decades of reading, thinking, and writing about standards of proof, I feel as if I am finally beginning to understand them.
This Article will attempt to make good on these grand claims, which I forward tentatively despite my sometimes-assertive tone. I shall sequentially discuss the four steps of fact-finding: (I) assessing evidence related to a single fact, (II) conjoining separate assessments of different facts, (III) analyzing the resultant beliefs, and (IV) applying the standard of proof. At each step, I shall weave in the relevant learning from the new logic and then show how it illuminates one of the key features of the law of proof: (1) the gradated scale of likelihood, (2) the conjunction paradox, (3) the burden of production, and (4) the burden of persuasion.
The primary focus of the Article is descriptive and explanatory, not prescriptive other than by implication. I seek to reveal what the law actually tells its fact-finders to do. Over the centuries the law's charge to fact-finders has evolved, by a process entailing considerations of both ideal fact-finding and also human limitations. I am not championing new ideals or new limitations. My real interest here lies in exposing the proof process that the law has chosen. I believe and shall try to demonstrate that the law has embraced what became the new logic, likely because that version of logic captures the epistemic function of law better than classical logic and probability theory.
A deeper understanding of what the law says about its process of proof could surely lead to more knowledge and to improved law. It could stimulate new research into how fact-finders actually decide. It could lead to legal reforms, such as clearer instructions on the standards of proof. But reform is not the present exploratory project. Indeed, one message of this Article might be that the law currently does a much better job in structuring fact-finding than one would guess by reading its many critical commentators.
To encapsulate, the aim of this Article is to apply the new logic to the law. The conclusion will be that this logic snaps onto the law as a seemingly perfect fit.
I. ASSESSING EVIDENCE
This Part will convey the basics of fuzzy logic, when used as a way to gauge degrees of truth based on assessment of evidence related to a disputed fact. Then, this Part will show how the law has seemingly employed fuzzy logic to construct, for use in its standards of decision, a gradated scale of likelihood stretching across the spectrum from the slightest possibility up to virtual certainty.
1. Psychology Theories
How does the fact-finder reach a decision? No one knows. Psychologists cannot tell us exactly how people evaluate and combine evidence and, for that matter, can tell us almost nothing about how they weigh assessments of evidence against a standard of proof. (35)
Introspection might suggest that there is a knowledge arc, leading up by induction and abduction, and then down by deductive testing. (36) The upward arc rests on observations, which generate new hypotheses explaining the observations. The downward arc involves testing the hypotheses to reach conclusions. Throughout, there is evaluation, combination, and weighing of evidence by some method.
To expose that method, social scientists have tried to model the cognitive black box by experimentally comparing inputs and outputs. One result is the model called information integration theory. (37) It tries to describe how humans naturally evaluate and combine information to produce judgment. Although only one of many contesting theories, and a relatively optimistic one at that, information integration theory has suggestive powers, making description worthwhile. According to the theory, the human decision-maker who has to make a finding on a fact's existence would begin with an initial impression, or predisposition, and then would process additional pieces of information. Each of these, including the initial impression, would receive a scale value, which is a measure of the likelihood of the fact's existence. Each would also receive a weighting factor, which is a measure of evidential importance that takes into account both directness and credibility. The decision-maker would then combine these into a weighted average that determines the fact's existence.
Even if some such theory reflects reality, it is clear that humans do not naturally use perfectly "rational" techniques, but instead use less accurate "intuitive" techniques. (38) The employed techniques are also subject to all sorts of heuristics and other biases. (39) Any account would further have to incorporate the metaphorically dual processes, automatic and systematic, of cognitive processing. (40)
In actuality, then, the fact-finders' performance of the cognitive process would usually be approximate and non-quantitative. The fact-finders would take a stab at assessing all the related evidence, perhaps by some method similar to information integration. Then, any judged likelihood might find expression in terms of a limited set of broad categories such as more likely than not, high probability, and almost certainty.
2. Probability Theories
A partly separate question from what fact-finders can do, actually do, or think they are doing, on the one hand, is what the law should or does tell the fact-finder to do, on the other hand. Surprisingly, even mathematicians cannot agree on how a fact-finder should perform the task of processing evidence, and so cannot unanimously guide the law on an ideal path to take. (41) The most popular candidate for rigorously evaluating and combining related evidence is the Bayesian approach, utilizing subjective probabilities.
Subjective probability theory allows us to speak of the likelihood of a single event. A subjective probability measures an individual's personal judgment about how likely a particular event is to occur or has occurred. The theory is
based on the notion that it makes sense to ask someone what he would do if offered a reward for guessing correctly whether any proposition, designated X, is true or false. If he guesses that X is true under these circumstances, we say that for him the subjective probability of X, written P(X), exceeds fifty percent. Symbolically, P(X) > .5. If he would be equally satisfied guessing either way, then we say that, for him, P(X) = .5. (42)
Upon expanding the measure into a complete scale of probabilities from 0 to 1 and postulating the usual logical operators, subjective probabilities follow most of the rules of frequentist probabilities. (43)
Bayes' theorem links the perceived probability before and after observing evidence. The starting point is P(A), the prior probability of A. Then the posterior probability of A, after accounting for evidence B, is P(A|B); this is a so-called conditional probability, which may be read as the probability that A will occur if B is known certainly to have occurred. P(A|B) calculates to be P(A) multiplied by the support B provides for A, a support that Thomas Bayes (or really Pierre Simon Laplace) equated to P(B|A) / P(B).
Despite its internal mathematical soundness, and despite the many insights it generates for law, many observers from various disciplines have serious doubts about whether Bayes' theorem should be seen to play a broad role in legal fact-finding. (44) First, its prior probability comes out of thin air, as some sort of subjective guess. The more objective supposition of 50%, on the thought that the fact is either true or false, comports neither with reality nor with where the law tells the fact-finder to begin. (45) Second, Bayes' theorem leaves no place for indeterminacy, thus painting the world as black and white even though most of the world appears in shades of gray. It accordingly does not handle well the situation of conflicting or scarce information, using a fudge factor to account for the state of the evidence. (46) Third, its mathematical approach is not realistic, of course. It does not conform to the way intuitive humans arrive at prior probabilities or the way they combine them with new evidence to produce posterior probabilities. (47)
3. Zadeh's Fuzzy Logic
Fuzzy logic envisages degrees of membership in a so-called fuzzy set, with membership valued anywhere between 0 and 1. That is, x's membership in a fuzzy set H of the universe X may take values throughout the whole interval [0,1], rather than just the two values of 0 and 1. This range allows a more complete and accurate expression of membership, when membership is imprecise.
Take as an example the set A of men from five to seven feet tall. This would be a so-called crisp set. Membership of element x in set A, represented by [X.sub.A](x), is not vague. The values of [X.sub.A] can thus be only either 0 or 1, at least if we ignore complications at the nanoscale level.
Contrast the set H of men somewhere near six feet. It is a fuzzy set. Membership of element x in H, represented by [[mu].sub.H](x), is imprecise. The values of [[mu].sub.H] may start at 0 for a tiny person, but they soon increase by some function to a value of 1 at precisely six feet, and then start decreasing. The membership function could be linear, but it can take on any shape as appropriate.
So Tom might be completely in set A but have a degree of membership in H of .5. The following figure represents these two sets, with (a) representing the crisp set of men from five to seven feet tall and (b) being one representation of the fuzzy set of men somewhere near six feet tall: (48)
Important to note is the role of qualifying language in the prior example as a means of expressing membership. Evaluative linguistic expressions are words like small, medium, and big; another example is about, roughly, or near, when used in contrast to not-at-all or really. These words may not be a large part of our natural language. Yet, they are an important part of that language. They do a lot of work. People use them all the time to evaluate a thing or situation and to communicate their evaluation. People thereafter use them for classification, decision-making, and other tasks.
People employ linguistic hedges to modify their evaluations further. Words such as very or extremely, and fairly or almost, are examples. These words allow people to create a gradated scale for their evaluations. The following figure represents a scale for size, ranging from [v.sub.L] as the left bound of values that are very small, through medium, and on to [v.sub.R] as the right bound of values that are very big: (49)
This scale actually consists of three overlapping fuzzy subsets based on the evaluative linguistic expressions of small, medium, and big. Linguistic hedges subdivide the three subsets as suggested by the four smaller triangles at the top, perhaps equating to very small, almost medium, almost big, and very big. The result is seven gradations of meaning.
Obviously, these seven categories and labels are imprecise. But they are supposed to be, and they work quite well in real life. Fuzzy logic better comports to our view of the real world than does classical bivalent logic with its excluded middle. In the real world, boundaries of things often appear indistinct, thus making important or even essential the human ability to process fuzzy information.
Some research suggests that such verbal categories work better than numerical translations. (50) Other research indicates that they work best as a matter of natural language if the categories are equally sized and shaped. (51) It might even be feasible to construct and employ an ordinally ranked vocabulary of likelihood that improves effective interpersonal communication. (52)
c. Evaluating and Combining Evidence
Fuzzy logic has formal rules for combining likelihoods, (53) once evaluated by one method or another. (54) Moreover, to jump ahead to belief functions, the theoretical work thereon consists mainly of developing tools for combining pieces of evidence to determine a likelihood. In particular, its very prominent Dempster-Shafer rule governs the task. (55) That rule is very complicated because it abstractly addresses the problem in the most general terms possible (Bayes' theorem turns out to be a special case of that approach). (56) The rule is also quite contested, generating many competitors. (57)
In the end, these formal approaches are clearly not realistic representations of human cognition. (58) Humans do not naturally use rational techniques like the Dempster-Shafer rule any more than they calculate Bayes' theorem; they instead use intuitive techniques in a non-quantitative and approximate fashion. Consequently, the law has generally left the combination of related evidence evaluations to its fact-finders' instinctive treatment.
Fortunately, fuzzy logic is not dogmatic on the method used to evaluate or combine pieces of evidence that reinforce or undermine each other. It is compatible with the fact-finders' combining all the related evidence by any means. Just one possibility would be information integration's weighted average, described above. (59) In fact, a weighted-average approach commonly appears in the decision-making process of today's fuzzy computer programs. (60)
My interest in this Article is not so much the initial eyeing of evidence, but rather the subsequent steps that build to a standard of proof's application. For my purposes, the new logic mainly illuminates a way for people to express their views of the evidence, prior to handling those views according to established rules of reasoning such as conjunction. That is, fuzzy logic in this broad sense gives a new and effective way to explore humans' expression of their assessment. (61) And it then opens the door to subjecting those expressions to so-called approximate reasoning, which despite its name can be quite rigorously performed. (62)
I therefore will not pursue fuzzy logic's detailed rules, which constitute the new logic in a narrow sense, nor will I weigh the disputes over the rules' details. Naturally enough, the law built on the intuition that partial truths exist, but it never adopted all the formal complications that logicians have refined since 1965.
d. Fuzziness versus Probability as a Means of Expressing Assessment
I propose considering the broad version of fuzzy logic as the legal model for human expression of uncertainty, in preference to assuming a probability-based bivalent view. The significance of adopting it for this purpose will become evident upon distinguishing a fuzzy statement from a traditionally probabilistic one. (63)
A few anticipatory words of qualification are in order. I am not an anti-probabilist. I am not arguing against a probabilistic approach if "probabilistic" carries its capacious sense of accepting uncertainty and mathematically accounting for it. I am arguing against traditional probability theory when appended to a bivalent view of the world. What I am proposing is a nontraditional means for expressing uncertainty.
Describing different uncertainties--Both a traditionally probabilistic description and a fuzzy one can be accurate statements, but they describe different states. The key distinction is that probability here depends on the existence of a crisply bivalent world. Fuzzy logic accepts an imprecisely multivalent world.
On the one hand, fuzziness is a way, for example, to describe event imprecision. It measures what has occurred--actually, the degree to which the event occurred--which can be vague. On the other hand, probability is a way to describe event occurrence. It can measure the chance that the event will occur or not.
The probability of whether an event occurs in a bivalent world is normally random, as contrasted with the nonrandom uncertainty of vagueness that fuzziness additionally measures. Probability expresses the chance of whether something will occur, all the while knowing it will occur or not on an all-or-nothing basis. It is a mathematical supplement to bivalent logic, used to account for that one kind of uncertainty. Fuzziness expresses vagueness as a degree of membership. It builds its inclusive measure of uncertainty right into the basics of the multivalent logic system.
Probability conveys what we know, when in possession of only partial knowledge, about whether an event will occur. With more information, the uncertainty in the probability will dissipate, and if the event's occurrence becomes known the probability itself will morph into a value of 1 or 0. By contrast, fuzziness conveys all the information we have about an event, which most often ends up expressed as a partial truth. More information will increase the fuzziness of set membership, because any crisp lines become harder to maintain. That is, as one acquires more information about the world, one sees a greater need for measuring fuzziness in lieu of bivalent categorization.
Probabilism and fuzziness can describe much more than events. Maybe another example would help. If a probabilist says, "There is a 30% chance that Tom is tall," the speaker supposes that Tom is either tall or not tall, and given imperfect evidence he thinks that it is only 30% likely that Tom would end up in the tall category upon accurate measurement. But when a fuzzy logician says, "Tom's degree of membership within the set of tall men is .30," he means that Tom is not very tall at all. The difference is real and considerable. It derives from the fact that the probabilist is assuming bivalence with an excluded middle, so that one is tall or not, while the fuzzy logician is speaking of a world where one can be more or less tall.
Choosing between models--Which model to use, probability or fuzzy logic, depends on what one is trying to describe. If the fact in question is or is assumed to be nonvague, and thus readily distinguishable from its opposite, and its occurrence is subject only to random uncertainty, then probability is appropriate. For a probability example: will I pick a black ball from the urn? However, if the fact is vague, and most facts in the world are vague, fuzzy logic is the way to go. For a fuzzy example: how black is this grayish ball?
The choice between probabilism and fuzziness is important. The kind of statement one can make will depend on the choice made. "You paint one picture of the world if you say there is a 50% chance that an apple sits in the refrigerator. You paint a different picture if you say half an apple sits in the refrigerator." (64) The two models are not fully interchangeable, even though people tend to treat them so. People use probability loosely for any sort of uncertainty. They use it to express fuzziness. But it is inappropriate for that purpose.
I am coming to the choice that the law has made. But at this point, it is natural for the reader to jump to the conclusion that the law in its fact-finding usually wants to know if an apple is in the refrigerator, not whether it is half eaten. The court wants to know if Tom was or was not the alleged perpetrator. Just to slow you up, however, I point out that in legal fact-finding no one is ever going to be able to look inside the refrigerator. Also, much more significantly, I can pose another example that makes it much less clear which sort of statement the law seeks. Think of a somewhat sloppily drawn circle: is it more appropriate to say (i) there is a 90% probability that it is a perfect circle or (ii) it has a .90 membership in the set of circles? (65) An analogy to the circle would be the law's trying to determine fault, when degrees of fault are the reality. But also analogous would be causation, consent, coercion, good faith, intent, and a host of other legal issues. After all, remember that Bertrand Russell saw all of natural language as vague. (66) Many, many legal issues are fuzzy concepts, in that they draw indistinct lines, most often unavoidably--and many of these fuzzy concepts are subjects of proof through litigation. (67)
So, the important choice between probabilism and fuzziness is not an easy one. Bearing on that choice, however, consider three advantages of fuzzy logic.
First, it is more accurate than probability whenever one encounters nonrandom uncertainty, such as vagueness. It picks up the extra information about vagueness, extra information expressed in natural language but lost upon classification into a crisp set. Recall that fuzziness includes the imprecision of an event, while probability describes only the chance of the event. The precision of probability thus turns out to be a vice rather than a virtue. Probability has the advantage of bivalent simplicity, but it will often be misleading in the real world of fuzziness:
The question is not whether the glass is half empty or half full. If we had to say all or none, the question is, is the glass full or empty. [Either answer is a half-truth.] ... That is the real state of the world. We don't mean that there is a 50% probability that the glass is full. We mean half a glass. If for some cultural reason we limit what we say to the two bivalent options of all or none, true or false, yes or no, then we pay the price.... (68)
Second, the precision of probability becomes particularly troublesome when trying to gauge a probability resting on imperfect evidence that leaves a lot indeterminate. Fuzzy expression can better handle incomplete, inconclusive, ambiguous, dissonant, and untrustworthy evidence. (69)
Third, another advantage of fuzzy logic is that it is the more inclusive system. Many-valued logic includes two-valued logic. Being a form of many-valued logic, fuzzy logic neither requires nor forbids that anything be of an on-or-off nature, true or false, completely inside a set or outside that set. The two-valued logic of probability demands the existence of sets with strict membership classifications exclusively, an in-or-out characteristic symbolized respectively by values of either 1 or 0. But those crisp sets are a kind among fuzzy sets. Bivalence is a special case of multivalent logic. The world of black or white is a special, extreme case of the world shaded in grays.
In many situations, a single fact is subject both to vagueness and to occurrence uncertainty. Although probability theory can say little on how to reason about things that are not completely true or false, fuzzy logic can handle the more mixed and complex situations. For an example of what might be called "normative" uncertainty, (70) it might be that there was a .70 blameworthy act. Or while the act was completely blameworthy, there was a 70% chance that it occurred, creating "factual" uncertainty. But what about a 70% chance of a .70 degree of fault? Then, these two kinds of uncertainty need to be integrated.
Moreover, a decision may rest on a number of facts, some of which demonstrate only occurrence uncertainty, while others show some vagueness as well. The need in decision-making to combine these fact-findings counsels the use of one logic system, because "we cannot coherently countenance two different kinds of degree of belief." (71) To combine the facts, a common currency is necessary.
The inclusiveness of fuzzy logic suggests its use for each fact and for combining evaluations of different facts. Indeed, it can effortlessly express even a traditional probability as membership in a set; that is, probability is the degree to which the imagined universe of all tries would belong to the set of successful tries. (72) This compatibility is essential. It allows easy combination of a randomly uncertain set with an imprecise set, in accordance with fuzzy logical operators. (73)
In sum, fuzzy logic provides the needed common currency. It can handle all kinds of facts, and can do so much better than probability. The legal reader cannot dodge my argument by concluding that fuzziness reaches, if any, only some kinds of facts. Instead, fuzziness handles facts exhibiting random uncertainty as well as those showing vagueness, facts embodying both factual uncertainty and normative uncertainty, and factual events and legal constructs. Law could choose fuzziness as its sole mode of measurement.
e. The Law's Choice
My concern in this Article is primarily to unearth what the law actually tells its fact-finders to do. Even if in the end it remains up in the air as to which model, fuzzy logic or probability, would be the superior vehicle for the legal proof process, my real interest here is in which model the law actually chose. This is a descriptive rather than prescriptive question: which model better expresses the law's scale of likelihood of truth, which better explains the legal treatment of the well-known conjunction paradox, and which better effectuates the burdens of production and persuasion? On the descriptive question, I believe and shall try to demonstrate that the law embraces fuzzy logic.
In its instructions to fact-finders, the law does not explicitly distinguish between probabilism and fuzziness. It instead speaks in terms of a common currency, mixing the questions of whether and how much an event occurred. Whether the issue is occurrence or blameworthiness of an act, the law deals only in degrees of truth. Asking for a single measure of uncertainty makes sense only in fuzzy logic, because it accounts for the various kinds of uncertainty. Relying instead on the mathematical supplement of the probability calculus would be so awkward and incomplete as to be nonsensible. I therefore submit that the law treats all measures of truth simply as fuzzy sets.
I would further submit, if pressed, that the reason for the law's choosing to speak in terms of degrees of membership is that they behave more appropriately than probabilities in a world filled with various kinds of uncertainty. (74)
Imagine that the law is trying to determine if Tom was at fault. A number of features of this issue of fault indicate that the actual and better approach for law is fuzzy logic. First, we will never know the answer as a 1 or a 0. Therefore, we should not be worrying too much about specifying the chance of a 1 turning up. Second, ours is not a crisp world, so the law is often not interested in establishing that the truth value of an element is a 1 or a 0. Instead, it wants to ascertain whether the element has a sufficient truth value for the purpose at hand. Third, any conclusion based on evidence is necessarily uncertain for five reasons: "Our evidence is never complete, is usually inconclusive, is frequently ambiguous, is commonly dissonant to some degree and comes to us from sources having imperfect credibility." (75) Fourth, the fact-finder might entertain thoughts of both randomness and vagueness, that is, both a sense that Tom was .70 likely to have been completely at fault and also that Tom was at fault to a .70 degree. Fifth, given that some issues in a case might demand the more inclusive measure of imprecision, logical coherency requires that the same type of measure apply to every issue.
I am not calling for a major shift in conceiving the standards of proof. After all, fuzzy logic is not antithetical to classical logic. All I am saying is that the law appreciates that more kinds of uncertainty than that of event occurrence are at play. The law therefore uses a logic appropriate to the task. Fuzzy logic moves its measure of uncertainties into the basics of the system, rather than leaving their treatment to some sort of afterthought. That non-radical move does not call for an overhaul of legal language or imagery, and it even makes many legal consequences easier to comprehend.
Indeed, I am not even saying that if we recognize the role of multivalence, we need to abandon the probability idiom. It can be quite expressive in the realm of standards of proof. Nonetheless, when plunging to the depths, we need always to remember that the legal foundation is ultimately fuzzy in nature. Here is a last clever image to make the point as to what the law deals in:
Suppose you had been in the desert for a week without drink and you came upon two bottles marked K and M [and marked, respectively, with a .91 membership in the fuzzy set of potable liquids and a .91 probability of being a potable liquid]. Confronted with this pair of bottles, and given that you must drink from the one that you chose, which would you choose to drink from? Most people, when presented with this experiment, immediately see that while K could contain, say, swamp water, it would not (discounting the possibility of a Machiavellian fuzzy modeler) contain liquids such as hydrochloric acid. That is, membership of 0.91 means that the contents of K are fairly similar to perfectly potable liquids, e.g. pure water. On the other hand, the probability that M is potable "equals 0.91" means that over a long run of experiments, the contents of M are expected to be potable in about 91% of the trials. In the other 9% the contents will be deadly--about 1 chance in 10. Thus, most subjects will opt for a chance to drink swamp water. (76)
B. Legal Application: Gradated Likelihood
The way to apply fuzzy logic to fact-finding in the legal system is to envisage the fuzzy set of true facts and ask for x, as a particular element of a claim or defense, what is its degree of membership in that set. Recall that membership represents how much a variable is in the set. The membership here will be partial. It will tell how true the fact-finder finds the element to be. One could express [micro](x) as truth(x). Membership thereby creates degrees of truth. (77)
While this membership will turn on likelihood of truth in a sense, it is in a sense different from the classical understanding of the factfinder's subjective probability that the element is true. Such subjective probability crisply deals with the probability of x being in actuality 1, while fuzzy logic vaguely deals with a degree of truth. The degrees of truth range from 0 to 1 in fuzzy theory, but in practice they find expression most often as words, or evaluative linguistic variables that use linguistic hedges to cover all the intervals of partial truth between completely false and completely true.
Fuzzy logic's schema well describes the law's scale of likelihood that I have previously documented. (78) For a significant example, the law today limits its choice to no more than three standards of proof--preponderance, clearly convincing, and beyond a reasonable doubt--from among the infinite range of probabilities stretching from slightly probable to virtual certainty; the law did not always recognize this limitation, but with time the law has acknowledged that the conceivable spectrum of standards coalesced irresistibly into three. (79) For another example, the harmless-error doctrine frequently invokes one of three low possibilities of an error's effect on outcome. (80) More generally, the law's standards of decision invoke a coarsely gradated scale of likelihood stretching across the broader spectrum from the slightest possibility up to virtual certainty.
The reason for this coarse gradation, I argued, lay in the cognitive psychology literature. Cognitive limitations leave humans able only weakly to judge likelihood on any sort of scale. Studies of humans' weak absolute judgment, (81) restricted short-term memory, and use of biased heuristics all supported the limited capability of humankind. Those studies suggested that a step-like scale of intervals accords with how humans naturally process such information: judged likelihood customarily finds expression in terms of a very small set of broad verbal categories. Today, in all the probability and logic theories, there seems to be an emerging sense of the need to confront the limited precision of humans in gradating their beliefs. (82) It might therefore be more psychologically feasible for the law to ask fact-finders for an approximate degree of truth than for their precise view of probability. (83) Perhaps the law has already optimized by intuitively conforming to the coarsely gradated scale of likelihood already in people's customary use:
The law usually does, realistically can, and optimally should recognize only seven categories of uncertainty in its standards of decision: (1) slightest possibility, (2) reasonable possibility, (3) substantial possibility, (4) equipoise, (5) probability, (6) high probability, and (7) almost certainty. First, this essay's description of seemingly diverse legal doctrines demonstrated that standards of decision tend to fall, often in groups of three, into the seven customary categories. Second, a review of cognitive psychology revealed humans to be "boundedly rational." Third, combining the observation with the science suggested that the systematic structure of the standards reflects the law's wise reconciliation with those cognitive limitations. (84)
Now, I espouse expressing that conclusion in the terms of fuzzy logic. I propose viewing the seven gradations as degrees of truth in this way (although I would redraw the separate gradations to be equally sized and shaped):
Thus, fuzzy logic accommodates the cognitive limitations of humans. Fuzzy logic offers a rich approach to elaborating the law's gradated scale of likelihood. Its real value, however, is that it captures the epistemic function in law better than probability. We want to know the belief in truth of a factual element, not the chance that the element will turn out to be 100% true.
II. CONJOINING ASSESSMENTS
This Part will explain that probability's product rule for conjoined events does not apply in fuzzy logic. Then, this Part will show how the law relies on fuzzy logic when it applies the standard of proof to each element of claims or defenses, without worrying about applying the standard to conjoined elements.
A. Fuzzy Operators
The power of the fuzzy conceptualization becomes more obvious when one considers the combination rules of fuzzy logic. These can become quite complicated, but for our purposes those of special interest are the most basic fuzzy operators, or connectives.
One constructs any system of logic by stipulating a small but adequate number of logical operators, such as intersection (or conjunction or ^ or AND), union (or disjunction or v or OR), and negation (or ~ or [logical not] or NOT). They are the premises that generate an internally sound and complete system. (85)
1. Maximum and Minimum
a. Classical Logic's Operators
This bivalent system, which recognizes only the two values of true and false, stipulates the following functions for intersection and union:
truth(x AND y) = 1 if both x and y are true, but 0 otherwise
truth(x OR y) = 1 if either x or y is true, but 0 otherwise
Another way to state these two functions is this:
truth (x AND y) = minimum (truth (x), truth (y))
truth(x OR y) = maximum(truth(x), truth (y))
A different format in which to stipulate an operator is by truth table. The one for negation indicates that the negative of 1 is 0, and vice versa:
All things bivalently logical flow from these three stipulations.
b. Fuzzy Logic's Operators
Those three operators for fuzzy logic are just the same, except that they must extend to give results for values between 0 and 1. (86) Thus, the AND and OR functions work this way for sets in fuzzy logic, when x and y can take any value from 0 to 1: (87)
truth(x AND y) = minimum(truth(x), truth(y))
truth(x OR y) = maximum(truth(x), truth(y))
So, let X be the universe, and let A be one fuzzy set and B be another fuzzy set in the universe. The two sets might be independent, in the sense that the degree of membership in one set has no effect on the degree of membership in the other set, but they need not be. Designate the membership of element x in A as truth(x), and the membership of element y in B as truth (y). Then, the truth of the conjunction of x and y equals the smaller of the truth of x and the truth of y.
For an example involving a common element, let X be the universe of men, and let A be the set of tall men and B be the assumedly independent set of smart men. So, if Tom is a .30 member of A and a .40 member of B, then Tom is a .30 member of the set of tall and smart men. The intersecting set becomes smaller, but Tom's degree of membership in it does not decrease below the lower of his tallness and smartness levels. In other words, the truth value for the intersection would be the minimum value of the two memberships in A and B.
The following diagram may help to visualize this so-called MIN operator by indicating the shaded intersection of the two sets, where [mu] gives the degree of membership of an element in the fuzzy set. Along the x-axis, for any z that falls in the intersection, the degree of membership therein will be the degree of membership in A or B, whichever has the lower membership line at that point z: (88)
c. Justifying Logical Operators
Now, one can generate a logic system from any adequate group of operators. It will be internally sound on a formal level, but it will not be useful unless the operators make sense in our world. What makes sense is a philosophical question. But philosophers have to punt on this question, saying that operators make sense if they proceed from "genuine logical truths" and if their formal logical consequences constitute "genuine logical consequences." (89)
There are several signs that fuzzy logic makes sense. To begin, fuzzy logic does not produce nonsensical results. For example:
The simplest and most fundamental qualitative law of probability is the extension rule: If the extension of A includes the extension of B (i.e., A [contains] B), then P(A) [greater than or equal to] P(B). Because the set of possibilities associated with a conjunction A&B is included in the set of possibilities associated with B, the same principle can also be expressed by the conjunction rule P(A&B) [less than or equal to] P(B): A conjunction cannot be more probable than one of its constituents. This rule holds regardless of whether A and B are independent.... Furthermore, it applies not only to the standard probability calculus, but also to nonstandard models.... (90)
The MIN rule in fuzzy logic conforms to the extension rule by setting the conjoined probability of elements, whether or not independent, as equal to the least likely element.
More than that, the MIN rule affirmatively makes sense as the way for conjoining multivalent values. Tom really appears to be a .30 member of the set of tall and smart men. It is therefore the way to combine truth degrees more generally. (91)
Furthermore, fuzzy logic is not wildly different from classical logic. It does not require a radical overhaul of worldview. The choice posed is between (1) fuzzy logic and (2) bivalent logic with its probability overlay. In essence, fuzzy logic says only that we should account for the undeniable imprecision of the world by altering the system's operators, rather than by some awkward afterthought squeezed into the probability calculus.
At bottom, though, fuzzy logicians are arguing that their logic is different because it makes more sense than classical logic. "There are many reasons to get interested in nonclassical logic, but one exciting one is the belief that classical logic is wrong--that it provides an inadequate model of (genuine) logical truth and logical consequence." (92) The argument is that classical logic, by assuming the principle of bivalence, assumes one too many logical truths. It assumes a world where everything appears as on the left in this figure: (93)
Like Euclidean geometry and Newtonian physics, classical logic is very useful, but an oversimplification.
2. Product Rule Contrasted
a. Applying Different Rules
The reader should nevertheless be sensing that something odd is afoot. Indeed, this is where most readers will abandon ship. After all, the probability operation for AND is multiplication of the probabilities of independent events. (94) But fuzzy logic tells us to apply the MIN operator even for independent events. (95)
Think of a room with ten men, each five feet-six inches tall. We might think of each as .30 tall. What would we term the tallness of the ten men as a group? It would still be .30 by the MIN operator. It would not be .30 (10), a very tiny number yielded by the product rule to reflect the remote chance of them all turning out to be truly tall.
Now if the room has ten men, with three "short" men five feet-six inches or below and three "dumb" men, then one man picked at random has a .09 chance of being both short and dumb, assuming independence. So, here the product rule applies.
That is, the probability operator is not inconsistent with the fuzzy operator. The two just contemplate different contexts. Indeed, the two are fundamentally consistent, because traditional probability is a special case of fuzzy logic's more general theory of uncertainty. So, it is not that one of these theories of uncertainty is correct and the other is wrong. It is that one theory can include the other.
For random uncertainty in a bivalent world, the probability operator will give the right answer, but so would the MIN rule. First, if the world were crisp, and x and y were known to be either true=l or false=0, then their conjunction would be either 1 if both were true or 0 if not. In this narrow setting, the probability and fuzzy operators are equivalent. That is, the product rule would be no different from the MIN operator: truth (x) x truth (y)=minimum (truth (x), truth (y)). Second, the chance of complete conjunction of currently unknown variables--that x and y will both turn out independently to be 1, or completely true--will be the product of their individual probabilities in either logic system. The product will make most sense in connection with frequentist probabilities. Still, the uncertainty could concern unique events, because one can speak of the subjective probability of x and y turning out to be either true=1 or false=0.
Remember that there are multiple kinds of uncertainty, including the indeterminacy resulting from scarce information or conflicting evidence and also the uncertainty characterized as either vagueness or randomness. If one tries to deal with the variedly uncertain real world,
the more inclusive approach to conjunction becomes appropriate. In a fuzzy world, the product rule retreats to a specialized role, applying only when the independent values of x and y happen to be randomly uncertain without being vague. The product of probabilities gives the chance of things, which can take only a value of 1 or 0, coming up as 1 in two independent trials under conditions of random uncertainty. The intersection of degrees of truth is telling you how much you believe two statements put together. The latter is more general.
In sum, the product rule is not a feature only of classical logic. Both under classical logic with a probability overlay and under fuzzy logic, the MIN rule will reduce to the product rule--if one assumes bivalence and then adds an assumption of random independence. But the product rule will prevail under either system only if the elements under consideration are always ascertainable to be completely true or completely false. Thus, the question becomes whether one should so assume bivalence.
b. Choosing Between Rules
Because both the product rule and the MIN operator can give correct, but sometimes different, answers, they must be giving answers to different questions or, rather, questions resting on different assumptions. The product of probabilities is answering a question different from what the intersection of degrees of truth is answering. The nature of the desired answer will determine the correct question to ask and, hence, whether the product rule or the MIN operator is appropriate to apply.
First, as a thought experiment, ponder which is the correct question to ask when one wants to know if Tom is tall and smart. Begin with the two membership statements given above--Tom is a .30 member of A and a .40 member of B. Those numbers mean something like "Tom is not so tall" and "Tom is not so smart."
The fuzzy combination would yield: "Because Tom is not so tall and Tom is not so smart, Tom is not such a tall, smart man." The MIN operator yields a .30 belief in that intersection. The traditionally probabilistic calculation, however, would yield: "Because Tom is not so tall and Tom is not so smart, Tom is likely a short, dumb man." The chance of a tall and smart Tom according to the product rule is .12, not .30, so that the product is lower than either truth(x) or truth (y).
This calculation by the product rule would be appropriate for certain kinds of decisions (and bets), but seems inappropriate for determining one's belief in Tom's membership in the set of tall and smart men. Multiplication of probabilities gives the chance that Tom is both completely tall and completely smart, while what we want to know is the degree to which he is both tall and smart. The inappropriateness becomes much more obvious as one combines more and more elements in the calculation, such as tall, smart, rich, and bald men. The product calculation will approach .00, even if some of the values are very high. The fuzzy combination, however, will go no lower than the minimum truth value. In other words, a fuzzy intersection of very true statements is very true, not almost completely untrue.
Second, one might try to classify a thing as a chair and as a red object, ff the thing has some of the characteristics of a chair (96) and some pinkish hue, one would give it, perhaps, a .6 membership in the set of chairs and a .5 membership in the red set. Now, if one had to give it a membership in the class of red chairs, one would say .5 for this reddish chair-like thing. One would not apply the product rule to say .3.
When would one apply the product rule for probabilities? One would do so when things are completely chairs or not and red or not, and you cannot see the thing, but you have an idea of the likelihood of chairishness and redness. To compute the chances of what bivalent values one will see when the thing is uncovered, and the thing becomes clearly a chair or not and red or not, one would use the product rule.
Many sorts of legal situations call for the product rule. In manipulating and evaluating statistical evidence, the fact-finder would often use it. (97) In calculating the odds of future events, as in computing expected costs on a motion for a preliminary injunction, the product rule would be appropriate. (98) There is a proper realm for the product rule, just as there is one for the straightforward application of the MIN rule. The question before us is whether a significant share of legal applications of the standard of proof falls into the latter realm.
Third, picture a column of one hundred coins, thirty of them heads randomly placed, and another column of one hundred coins, forty of them heads randomly placed. Then only about twelve paired rows will have two heads. Or picture a column of one hundred people, thirty of them tall people randomly placed, selected from a universe where people are either completely tall or completely short; and picture another column of one hundred more people, forty of them smart people randomly placed, selected from a universe where people are either completely smart or completely dumb. Then only about twelve paired rows will be persons tall and smart, respectively. Now, picture instead a column of varying beliefs in the tallness of one hundred people selected from a universe where people have the tallness trait distributed naturally, aligned from tall down to short, and another column of one hundred beliefs about persons, aligned from smart down to dumb. The beliefs concerning the thirtieth pair from the bottom will be not so tall and not so smart, respectively, while the twelfth pair from the bottom will be a diminutive dim couple.
c. The Law's Choice
Traditional probability and degrees of truth do therefore differ. They behave differently in the conjunction setting. Put simply, the product rule gives the random chance of the simultaneous and independent occurrence of multiple crisp elements, while the MIN operator measures the intersection of sets. Once lawmakers have in mind the difference between the product rule and the MIN operator, they have to decide which the law should apply.
Imagine the law is trying to determine if Tom himself was at fault, that is, whether the perpetrator was Tom and whether the perpetrator was at fault. A number of features of this compound question indicate that the better approach for law is fuzzy logic. First, the two parts of the question are epistemically very different, one being a factual event and the other a legal construct; the law needs commensurable measures to combine them. Second, as already argued, the law should not be worrying too much about the chance of a truth value of 1 turning up; instead it should ascertain whether the element has a sufficient truth value. Third, in establishing past truths, the law should be even less concerned with the chance of a 1 repetitively turning up; applying the product rule to subjective probabilities for legal fact-finding actually seems illogical. (99)
That third point is indeed determinative. One can similarly make the point by distinguishing between ex ante probabilities, used to make predictions of what you will eventually know, and ex post probabilities, used to decide what actually happened even though you will never know for sure. If you are placing a bet predicting whether two randomly uncertain events will independently happen together, then multiply their probabilities. But if you are looking back to the past, then you need a different operator. You are no longer trying to figure the odds of two things being sure, but rather how sure you are that one thing happened while you remain somewhat sure that the other happened. "The ex post probability for complete instantiation of the causal law is equal to the lowest ex post probability for instantiation of any constituent element." (100)
In other words, we want to know if it is a reddish chair, not what the chances are that it is 100% a chair and also 100% red. We want to know if Tom's fault is sufficiently true, not the chances of somehow discovering both the perpetrator certainly to be Tom and the perpetrator to be completely at fault. Here is another way to see this. If Tom is 60% likely the perpetrator, and the perpetrator is 70% at fault, the 60% figure means that it is 60% likely that Tom is surely the person who was 70% at fault. We thus have a 60% chance of Tom's being legally at fault, just as the MIN rule would say.
While the MIN rule seems the obvious choice if identity is a matter of occurrence uncertainty and fault is a matter of imprecise vagueness, I think it still should apply even if both fact-finding percentages measure only random uncertainty. Having different operators for different kinds of elements, leading to some weird hybrid calculation unknown to current law, would be more than awkward. But I am arguing that the MIN rule is the right approach, not just a convenient one. A 60% chance of the weakest link represents the chance that all the other elements are more likely than not to exist. Because a 70% chance of fault is good enough for liability, we should not further account for that chance of finding complete fault. To multiply the chances, getting 42%, would be double counting, as it represents the chances of fully establishing both identity and fault. The chances of proving completely each of multiple elements simultaneously would be an odd inquiry when the law does not demand complete proof of any. Because establishing every element to 100% is not what the law calls for, the chances of doing so are irrelevant. The relevant inquiry comprises how likely the weakest element is, given that all the other elements would simultaneously be stronger.
Provability versus probability--Reactions of colleagues have convinced me that elaboration of this assertion, even in multiple ways, is necessary. So, up to this point, I have established that the MIN and product rules are both valid operators, but they govern in different realms. Which, then, should govern in applying the standard of proof?
In explaining the law's choice, let me begin with the contrary intuitive yearning to apply the product rule. If element A is 60% likely and element B is 70% likely, if both A and B either occurred or did not, and if the law wants to allow recovery only if A and B both occurred, then it does seem that the plaintiff should lose on this 42% showing.
My initial counterargument is that this result is tough on plaintiffs. Multiple elements would stack the deck against real-world plaintiffs, who must live with the imperfections of available evidence. Imagine some other plaintiff having proven four elements each to 70%. That plaintiff has done a really good job in presenting a strong case. The plaintiff has well established each element before passing to the next one. The plaintiff has done exactly what we should demand. Yet this plaintiff would lose with a miserable 24% showing under the product rule. What happened? How did a strong case become a sure loser? Regardless of how, plaintiffs apparently would lose strong cases they really should win. Moreover, defendants at fault would not be receiving a corrective message. These errors would detrimentally affect economic efficiency.
Perhaps the law should not interest itself in the probability of the elements all turning out to equal 1, if only the veil on perfect knowledge were lifted. If every event and thought were somehow videotaped, then we would be partially living in an ascertainably bivalent world, and the law's approach to standards of proof might have to change. But I think the law should not imagine the real world to be a videotaped one. Adopting a false assumption simply in order to make a familiar math tool available is usually indefensible. The law should not ask the odds of the elements bivalently and conjoinedly existing on the videotape. The law instead should ask how well the burdened party has proven its case. The fact-finder needs to operate on the basis of its resulting internal beliefs about the world, rather than pretending that external knowledge is attainable.
Provability, not probability, is the law's concern. Forming a belief as to what happened, rather than a prediction about a veil-lifting that will never happen, is the aim. The law wants to know whether the state of our knowledge based on proof justifies recovery. Fuzzy logic vaguely deals with the "probably provable," while traditional probability crisply deals with the "provably probable." (101)
Expression of this provability comes in terms of membership, to a degree, in the set of true statements, with the degree measured as a truth value. Provability of one element does not detract from the provability of another. An easily provable A and an easily provable B mean that it will be easy to prove A and B. The intersection of sets represents the interaction of elements, and the MIN rule governs the intersection of sets. Consequently, if the plaintiff proves element A to 60% and B to 70%, then the provability that the case is in the set of (A AND B) is 60%.
The plaintiff has shown that the conjoined claim is 60% provable and the defense 40% provable. That is, the belief in the claim is stronger than its negation (the belief that one or the other element or both elements failed). To minimize errors, the law should decide in conformity with the stronger belief. If the law were to deny liability in these circumstances because of some attraction to bivalent probability theory, more often than not the law would be wrong. Giving the plaintiff a recovery and the defendant a loss thus is economically efficient. Accordingly, the law should and does instruct the use of the mathematically sound way to combine beliefs, here the MIN rule.
This key distinction between probability and provability was at the heart of Oxford philosopher L. Jonathan Cohen's almost impenetrably brilliant book entitled The Probable and the Provable. (102) He argued that the task of the law court is to decide, by use of inductive reasoning, what is provable. Importing traditional probability into the project, such as the product rule, produces a whole series of anomalies. Instead, the conjunction rule for inductive reasoning is this: "The conjunction of two or more propositions ... has the same inductive probability ... as the least" likely conjunct. (103)
Thus, the respective realms of the MIN and product rules do not turn on the nature of the fact issue, but on the question the system wishes to pose. Which image fits the fact-finding endeavor: the betting table or set theory? I think that standards of proof are looking for provability based on set theory. They therefore take the same approach to facts involving occurrence uncertainty as they do on to facts involving vagueness.
Monty Hall's contribution--Why do smart people so resist accepting that provability differs from probability? "When the plaintiff proves one element to 60% and another to 70%, their conjunction is 42%--and I am sticking to it!" This reaction irresistibly brings to mind the usual reaction to the celebrated Monty Hall problem. "It is customary for books about probability to try to persuade otherwise intelligent people that they are lousy when it comes to reasoning about uncertainty.... In presenting the Monty Hall problem to students I have found the common reactions to follow the well-known five stages of grief." (104) There is denial, anger, bargaining, depression, and then acceptance.
Consider the related "sibling gender problem." (105) A few years back you saw Tom walking down the street with his son. Your companion said that she remembers he has two children. What are the chances that the other child is a boy? The answer is one-third, because the equally probable possibilities are BB, BG, and GB. But if your companion had said that the elder child was a boy, the answer would be one-half! The additional information, seemingly irrelevant, provides ordering that affects the odds.
After you have progressed through a couple of the stages of grief toward acceptance of that result, (106) consider that traditional probability is generating all those emotions. When a problem calls for rejecting bivalence, you should expect that sometimes the answer will be similarly nonobvious. For example, reconsider a plaintiff trying to prove the identity of the perpetrator being Tom and also to prove the perpetrator being at fault. True, if the randomized odds are 60% and 70%, the odds of Tom being at fault are 42%. The product rule gives that result. But if the plaintiff has proved fault to 70%, the odds on the remaining question of Tom being the perpetrator are 60%. The MIN rule sets the likelihood of the conjunction at 60%.
Using the setting of the more familiar "Bertrand box paradox" (107) for elaboration of the shift to multivalence, imagine that there is an identity box and a fault box, each containing a ball that is either black for liability or white for non-liability. The two balls came, respectively, from an urn with 600 of 1000 balls being black and from another urn with 700 of 1000 being black. The odds of the two balls both being black are 42%. But if you uncover or otherwise decide that the fault box has a black ball, the odds of the identity ball being black are 60%.
What is going on? The adherents of 42% are assuming that the pairings are randomized. But in inductively proving a case--by establishing two truth values or fuzzy provabilities--the plaintiff was ordering the information. (108) The plaintiff thereby removed the randomization feedback loop between the boxes, a randomization of information essential to the 42% calculation.
Under its standard of proof, the law has decided to act on the basis of a partially proved case, not on the basis of the probability of a fully proved case. Fault proven to 70% will never convert to 1 or 0. That means that 30% of the results are not findings of nonfault, but erroneous failures to find fault. Fault having satisfied the standard of proof, the 30% of pairings with nonfault then become errors. To minimize errors, the fact-finder should consider only the 70% of pairings of identity with established fault. Because 60% of those pairings will result in liability and 40% not, deciding in line with the 60% showing will minimize errors and optimize efficiency. (109)
Role of assumptions--There lies the key to the paradox. If one assumes bivalence, then one must convert fault to 1 or 0 before proceeding. If one instead recognizes multivalence, one can proceed with fault standing as a partial truth. The mathematically sound way to conjoin partial truths is the MIN rule. Therefore, recognition that the plaintiff can prove any element only to a degree produces an element-by-element approach.
Instinctive resistance to the MIN rule derives from residual yearning to apply a multiplicative rule of traditional probability to a problem it cannot handle, the problem of fuzzy provability. It can handle only randomly uncertain estimates of independent events in a binary world, because it is built on the logical assumption of bivalence. When an assumption no longer prevails, one cannot apply the rules built on the assumption. We tend to forget that mathematical constructs operate only within their assumed system, and that the probability calculus assumes all events will take a value of either 1 or 0. Multivalence calls for new math. We must move up to MIN.
There is more at work in obscuring the picture. Even if one acknowledges that the multiplicative rule should apply only in an abstract world of bivalence, one will not intuitively sense the subtle shift in a problem's setting from a bivalent assumption to a multivalent reality in which the middle is no longer excluded. The shift can be almost imperceptible. But when the problem is finding facts of which one will never be sure, the picture must be painted in multivalent grays.
The bottom line is this: as an artifact of bivalence, the product rule does not apply to subjective probabilities for fact-finding. I therefore submit that degrees of truth behave more appropriately than classical logic and probability theory for the purposes of the standard of proof. (110) But, again, my central question is which representation the law employs. Here, as I have already said, and as I shall show in the upcoming resolution of the conjunction paradox, I am confident that it is degrees of truth across the board.
3. Negation Operator
Accepting the usefulness of fuzzy logic prompts interest in other fuzzy operators. Another basic one is negation. Here truth(not x) =
(1-truth(x)), just as in classical logic. The negation is the complement. (111)
However, the whole fuzzy set A and its complement do not necessarily add to unity, because fuzzy logic does not obey the law of the excluded middle. The following figure demonstrates this fact, with the left graph representing the fuzzy set A by the solid line and its complement by the dotted line, and the right graph's dark upper line representing the union of A and its complement by operation of the MAX function: (112)
The serrations in the upper line in the right-hand graph show that A and its complement do not add to equal the universe X, reflecting that the law of the excluded middle does not hold. There will be an area where beliefs are neither in the set of belief nor in the set of disbelief, but instead are indeterminate.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Introduction to II. Conjoining Assessments A. Fuzzy Operators, p. 1061-1106|
|Author:||Clermont, Kevin M.|
|Publication:||Notre Dame Law Review|
|Date:||Feb 1, 2013|
|Previous Article:||Doctor's orders: a new prescription for ADHD medication abuse.|
|Next Article:||Death of paradox: the killer logic beneath the standards of proof.|