Death of paradox: the killer logic beneath the standards of proof.
The payoff of the fuzzy logic approach emerges as one realizes how it affects the view of the proof process. Consider the best-known statement of the infamous conjunction paradox:
We purport to decide civil cases according to a more-probable-than-not standard of proof. We would expect this standard to take into account the rule of conjunction, which states that the probability of two independent events occurring together is the product of the probability of each event occurring separately. The rule of conjunction dictates that in a case comprised of two independent elements the plaintiff must prove each element to a much greater degree than 50%: only then will the plaintiff have shown that the probability that the two elements occurred together exceeds 50%. Suppose, for example, that a plaintiff must prove both causation and fault and that these two elements are independent. If the plaintiff shows that causation is 60% probable and fault is 60% probable, then he apparently would have failed to satisfy the civil standard of proof because the probability that the defendant both acted negligently and caused injury is only 36%. In our legal system, however, jurors do not consider whether it is more probable than not that all elements occurred in conjunction. Judges instruct jurors to decide civil cases element by element, with each element decided on a more-probable-than-not basis. Once jurors have decided that an element is probable, they are to consider the element established, repress any remaining doubts about it, and proceed to consider the next element. If the plaintiff proves each element by a preponderance of the evidence, the jury will find in his favor.... Thus, jurors may find a defendant liable even if it is highly unlikely that he acted negligently, that is, the conjoined probability of the elements is much less than 50%. In such cases, the verdict fails to reflect a probable account of what happened and thus fails to minimize the cost of judicial errors.... .... ... Although courts direct juries to consider and decide each element seriatim, juries do not consider each item of evidence seriatim when deciding whether a given element is proved. The jury must decide each element by looking at all of the evidence bearing on proof of that element. Thus, although the jury does not assess the conjunction of the elements of a case, it does decide each element by assessing the conjunction of the evidence for it. (113)
The implications are profound but boggling. Allowing recovery on a 36% showing of causation and fault is not only unfair but inefficient. How embarrassing for the law!
For another boggle, ponder the apparent criticality of how exactly the ancients (and moderns) divided our causes of action and defenses into elements: the more subdivisions, the lower the conjunctive probability that would produce victory. (114) And yet:
Anyone who has ever litigated a real case knows the exact opposite of the conjunction paradox is true: the more disputed elements the plaintiff has to prove, the less likely the plaintiff is to prevail .... [A]lthough it is possible that a particular plaintiff could obtain an unjust verdict in a case with several disputed elements, [there is an increased] probability that the jury will find at least one element to be less likely than not. (115)
Admittedly, the conjunction paradox turns out to be not such a serious problem in practice. Only one element might be in dispute, or the disputed elements might not be really independent. The judge might not clearly state, or the jury might not fully understand, the proper element-by-element approach to the standard of proof.
Or, because humans might tend to construct a story for the whole case rather than proceeding element-by-element, the fact-finder might end up applying the standard of proof to the conjoined elements. In fact, many psychologists agree that the fact-finder naturally constructs such stories, although perhaps not in a very systematic manner. (116) The broadly accepted story model of evidence processing holds that the fact-finder, over the trial process's course, constructs from the evidence the story that makes maximal sense; and the fact-finder then chooses, among the available decisions, the one that fits best with the constructed story:
Several authors have recently proposed a model for juror decision-making based on the concept of a story as an organizing and interpreting schema. The story model attempts to explain how jurors organize and interpret the vast amount of information they encounter at trial and apply the appropriate decision criteria.... ... The jurors construct a story adequately describing what happened. At the conclusion of the trial, they construct the verdict categories based on the instructions given by the judge. The individual juror arrives at his decision by determining the best match between his story and the available verdict categories. The task of the jury in deliberations then becomes one of selecting a story from among those offered by the jurors and fitting it to the available verdict options. (117)
If the jurors construct a story (or stories (118)) for the whole case, or otherwise cognitively process the entirety while the trial progresses, and then the judge instructs on standard of proof, it might be that the jurors actually apply the standard to the whole claim or defense. It might also be that, being human, a judge when acting as fact-finder proceeds in essentially the same manner, testing whether the already conjoined elements are more likely than not.
Indeed, by providing obscure instructions only at the end of oral trials, the law seems determined to encourage overall consideration and to discourage applying the standard of proof element-by-element. Although the judge does instruct literally in element-by-element terms, (119) this may work only to encourage the jurors' detailed evaluation of the evidence and to stress the requirement that any story must contain all of a series of elements--just as many evidence rules may work to brake any undesirable tendency of the fact-finder to rush toward creating a story. (120)
So, the conjunction paradox may not inflict great practical effects. Nonetheless, the big theoretical problem of the conjunction paradox will unavoidably pose at least some practical difficulties. The law sometimes enforces its element-by-element theory and thereby impedes the holistic practice. An obvious example would be when the judge requires a special verdict that asks the jury to find each element by a preponderance. (121) The conjunction paradox therefore remains troubling, and theorists twist themselves into pretzels trying to explain it away.
It would be troubling, however, only if theory really calls for the product rule. But theory does not. Instead, it invokes the MIN rule. The truth of the conjunction equals the minimum of the truths of the elements. If each element is more likely than not, then the truth of the conjunction is more likely than not. To use the above example, if the plaintiff shows that fault is .60 true and that causation is .60 true, then he has shown to .60 that the defendant both acted negligently and caused the injury.
Thus, there is no conjunction paradox. It implodes under the force of fuzzy logic. The MIN operator provides that belief in the conjunction will match the least likely element, which has already passed the standard of proof. The MAX operator meanwhile indicates that belief in the negative of the conjunction, that is, in the disjunction of each element's negation, will never reach equipoise. The story of liability will not only be the most believable story, but will be more believable than all the stories of non-liability combined.
Comfortingly, under the MIN rule, applying the standard of proof element-by-element works out to be equivalent to applying it to the whole conjoined story. So, if the fact-finder actually does follow the story model, that practice would not directly endanger the standard of proof. The apparent criticality of the number of elements melts away too. Because the MIN rule applies to each set of evidence to be conjoined, it does not matter where the law draws formal lines between elements, or whether the elements are independent or interdependent. Nor does it matter if I sloppily labeled identity as an "element" in my examples above. (122)
Moreover, the proof process within elements is not dissimilar to the proof process between elements. Within elements, the fact-finder uses intuitive techniques in a non-quantitative and approximate fashion. Between elements, and for separate facts within elements, the fact-finder uses the fuzzy operator for conjunction that works in a similar style.
The law does seem to know what it is doing, then. Whenever it phrases its instruction to require applying the standard of proof element-by-element, it is instructing to apply the MIN operator. But do actual fact-finders apply the MIN operator as they should and as the law tells them to do? We do not know. Some experimental evidence arguably suggests that the lay person tends to apply the product rule rather than the MIN operator. (123) Nevertheless, no sign exists that fact-finders in the legal system are using the product rule. After all, a concern that they were ignoring the product rule generated the unfounded fear of the conjunction paradox in the first place.
Theorists also claim there is a converse paradox, involving multiple theories. These observers lament that the law denies relief to a supposedly deserving plaintiff (or to a defendant with multiple defenses almost proved):
Consider a case involving three different legal theories and three different factual foundations. Plaintiffs deserve to win if one of the stories embodying one legal theory is true; defendants deserve to win only if all of their competing stories are true (for if this is false, one of the plaintiff's stories is true). For example, assume the plaintiff has alleged defective design, defective manufacture, and failure to warn theories. If the probability of each is .25, the "probability" of each not being true is .75, but, the probability of at least one being true is 1-[.75.sup.3]=.58, and perhaps plaintiff should win, even though the individual probabilities of each being false is .75. (124)
However, this paradox implodes under the force of fuzzy logic too. The MAX rule indicates that the plaintiff proved his case to only .25 and so should lose, just as the plaintiff does lose under current law.
Let me use this multiple-theory, or aggregation, paradox in trying further to explain what is not intuitive, and what is therefore difficult to explain. Imagine a claim A that is 25% likely to succeed and an independent claim B that is 40% likely to succeed. I am saying that the likelihood of (A OR B) is equal to that of the likeliest of A and B, that is, 40%. It is easy to say that I am being obtuse, because anyone can intuit that the probability of A and B's union must be higher. But the fact that the law says otherwise, and that a widely accepted logic system says otherwise, should give pause to intuition.
First, imagine a claim A for tort, where the defendant did acts that were 25% bad, and an independent claim B for contract, where the defendant did acts that constituted 40% of what would be an unquestionable breach. These are members of fuzzy sets. The breaching quality of the acts has no bearing on the tortiousness of the acts. Then we can say only that the defendant went 40% of the way toward liability. The 25% showing has no effect on the 40% showing. The "likelihood" of their union is 40%.
Second, imagine a ball A drawn from an urn with only 25% black bails among white balls, and a ball B drawn from another urn with 40% black balls. The odds that one of them is black (and let us say that black represents liability), when the drawn balls are both revealed, are 55% by De Morgan's rule. In a sense, the act of revealing the balls to be white or black affects the odds, because the balls must be either white or black and only one has to be black. If one turns up black, this takes the pressure off the other's being black.
Third, imagine a claim A for tort, where the defendant was actually 25% likely to have done the bad tortious acts alleged, and an independent claim B for contract, where the defendant was actually 40% likely to have done the bad breaching acts alleged. Assume that all available evidence would not change those numbers, which is what I mean by "actually." Is this situation more like the first or the second situation? All our intuitions, honed by life-long exposure to traditional probability theory, point us to the second. But the real world of uncertainty and imprecision makes the appropriate analogy the first situation. The likelihood that A exists has no effect on whether B exists. More complete evidence will not arrive to change the likelihoods--and we cannot pull off a veil to show what really happened, we will not get to see if the "ball" is truly black, we will never get to reduce the world to bivalence. It was the reduction to bivalence that affected the joint odds in the second situation. The likelihoods in the third situation will never be anything but 25% and 40%, unlike the drawn balls whose probabilities will change upon unveiling. We have nothing more than 25% of a claim and 40% of another claim, when only a provable claim justifies liability. If we can never convert the likelihood of a claim to one or zero, then all we can say is that the defendant is liable to a certain degree. Thus, when we can never know with certainty what happened, a likelihood of occurrence is not different from a degree of misfeasance: now, likelihood of occurrence is not a traditional probability, it is a fuzzy set.
III. ANALYZING BELIEFS
The ultimate focus on fuzzy provability pushed traditional probability farther into the background, so setting the stage for a shift of focus onto beliefs as being at the core of the standards of proof. This Part will introduce belief functions into the mix, in order better to represent how imperfect evidence keeps fact-finders from committing all of their belief. Then, this Part will use this theory to explain why the law's initial burden of production starts the fact-finders at point zero. While the key idea introduced heretofore has been multi-valence, the key idea henceforth will be the non-additivity of beliefs.
A. Shafer's Belief Functions
1. Basics of Theory
I have already implicitly advocated that we treat the degree of S's truth, which is a degree of membership in the set of true facts, as a degree of belief in S as a true proposition. The broad version of the theory of belief functions will now give us a handle on how to manipulate such beliefs. (125) It will also provide us with a better mental image for representing indeterminacy. (126)
In fact-finding, I therefore contend, we should not ask how likely S is but rather how much we believe S to be a real-world truth based on the evidence, as well as how much we believe notS--while remaining conscious of indeterminacy and so recognizing that part of our belief will remain uncommitted. Beliefs can range anywhere between 0 and 1. If the belief in S is called Bel(S), then 0 [less than or equal to] Bel(S) [less than or equal to] 1.
Consider belief function theory's treatment of a single factual hypothesis. Take as an example the issue of whether Katie is dead or alive, with S representing death. Although you have no tangible evidence, three witnesses said she is dead. One seems somewhat credible. But you think that another saw a different woman's body, which discounts the evidence of death but gives no support to her being alive. And you think that the third was lying as part of a cover-up of her escape from captivity, which is compatible with both S and notS and so gives some thin support to her being alive. In sum, this evidence supports your .5 belief that she is dead, or Bel(S). That evidence also supports your weaker belief that she is alive, with Bel (notS) coming in at .2. That is, Bel(notS) is not determined by the value of Bel(S). The remaining .3 is indeterminate, meaning she could be either alive or dead because the evidence is imperfect. The defects in evidence might be probative, affecting Bel(S) and Bel(notS); but the defects might be nonprobative, so that they just leave some belief uncommitted. (This example actually involves a so-called power set of four beliefs: S, notS, neither S nor notS, and either S or notS. The belief in the "null" of neither alive nor dead is set by definition to be 0. The belief in the "catchall" of either alive or dead is 1.0.)
Belief is sometimes called the lower probability. Bel(S) is the extent to which you believe Katie to be dead. The upper probability bound represents "possibility" in Zadeh's terminology or "plausibility" in Shafer's. (127) It is the extent to which you think her being dead is plausible, that is, the sum of the affirmative belief plus the indeterminate belief. The plausibility that she is dead is .8, being .5 + .3. (A traditionally expressed probability of her being dead would fall somewhere within the range from the lower to the upper probability.) The plausibility that she is alive totals .5, being .2 + .3. Plausibility equals one minus the belief in the opposite.
Belief functions thus harness the idea of imprecise probability to capture indeterminacy. Although they can be used with ordinary expressions of probability, combining belief functions with fuzzy logic's degrees of truth and its operators makes an even bigger step toward understanding. (128) The resultant beliefs can be expressed, if expression is ever necessary, as coarsely gradated beliefs. In addition to the benefits of utilizing natural language, these terms capture the uncertainty and imprecision in determining the belief. Thus, in lieu of expressing beliefs in terms of decimals, one should use the coarse gradations of (1) slightest possibility, (2) reasonable possibility, (3) substantial possibility, (4) equipoise, (5) probability, (6) high probability, and (7) almost certainty.
In the end, the representation of findings in the form of beliefs captures the effect of imperfect evidence, which was a rallying cry of Baconian theorists. (129) The shift from probability to belief is also a slight nod to the civil-law emphasis on inner belief as captured by its intime conviction standard, (130) and to the frequent cris de coeur of theorists who lament any intrusion of probabilistic mathematics into the very human process of proof. (131) Finally, belief functions can make a contribution to understanding law independently of fuzzy theory, as I shall try to show.
2. Negation Operator
By traditional probability theory, the probability of a hypothesis's negation equals 1 minus the probability of the hypothesis. If Katie is 60% likely dead, she is 40% likely alive.
Under the scheme of belief functions, Bel(S) and Bel(notS) do not necessarily add to 1, because normally some belief remains uncommitted. Thus, for Katie, Bel(S)=.5 and Bel(notS)=.2, so the sum of determinate beliefs adds to .7. We are now squarely in the realm of non-additive beliefs.
The complement of Bel(S) equals (1 - Bel(S)), but it gives the plausibility of notS, not the belief in notS. Indeed, the plausibility of notS equals (Bel(notS) + uncommitted belief). Hence, there is a big difference between the complement and the belief in the negation: the difference is the uncommitted belief. Belief function theory thus utilizes the very useful distinction between a lack of belief and a disbelief. After all, disbelief and lack of belief are entirely different states of mind. (132)
3. Lack of Proof
Traditional probability encounters legendary difficulties with a state of ignorance. (133) The reason is that it cannot distinguish between lack of belief and disbelief. In classical terms, S=0 means that S is impossible. And it means that notS is certain. No amount of evidence could alter an impossibility or a certainty into a possibility under Bayes' theorem. (134) As a way out, probabilists sometimes assert that the ignorant inquirer should start in the middle where the probabilities of S and notS are both 50%. But this trick does not accord with the actual probabilities, and it produces inconsistencies when there are more than two hypotheses in play. (135)
Meanwhile, one of the great strengths of belief function theory is that it well represents a state of ignorance. (136) An inquirer, if ignorant, starts at zero, not at a 50% belief. When Bel(S)=0, it does not mean that S is so highly unlikely as to be impossible. It means there is no evidence in support. Accordingly, the inquirer starts out with everything indeterminate, because the lack of evidence makes one withhold all of one's belief. Although Bel(S)=0, Bel(notS) equals zero too. The uncommitted belief is the entirety or 1, meaning that S is completely plausible, as is notS. In other words, the inquirer does not believe or disbelieve S. Belief function theory thus utilizes the very useful distinction between disproof and lack of proof.
B. Legal Application: Burden of Production
Let me start with some background on how the law has traditionally viewed the burden of proof, say, in a jury trial. The burden of proof dictates who must produce evidence and ultimately persuade the fact-finder on which elements of the case. Burden of proof thus encompasses two concepts: burden of production and burden of persuasion. The burden of production might require either party at a given time during trial to produce evidence on an element or suffer the judge's adverse determination on that element; one party has the initial burden of production on any particular element, but that burden may shift during the trial if that party produces certain kinds or strengths of evidence. The burden of persuasion requires a certain party ultimately to persuade the fact-finder of the truth of an element or suffer adverse determination on that element.
Imagine a single disputed issue of typical fact on which the plaintiff bears the initial burden of production and the burden of persuasion. Then imagine a grid representing the judge's disagreement with a potential verdict for the plaintiff, or equivalently the judge's view of likelihood of error in such a verdict, with disagreement or likelihood decreasing from one on the left to zero on the right. (137) It is important to realize that this diagram represents the likelihood of jury error in finding that the disputed fact exists, not the judge's view of the evidential likelihood that the disputed fact exists. In other words, this diagram represents the judge's thought process in externally overseeing the jury that acts as fact-finder, not the judge's thought process as if the judge were finding facts. Alternatively stated, this diagram represents the burden of production, not the burden of persuasion.
The plaintiff in the imagined case starts at the left of the diagram. If he presents no evidence, the judge would ordinarily grant a motion for judgment as a matter of law against him. He is consequently bound to go forward with his evidence until he satisfies the judge that a reasonable jury would be warranted in finding for him. That is, he must get to line X in order to make a jury question of the imagined single issue of fact, doing so by presenting evidence. The plaintiff's getting to or beyond line X means that although the judge might still disagree with a verdict for the plaintiff, the judge thinks a reasonable jury could find that the plaintiff sustained his persuasion-burden, and therefore the judge will hold that the plaintiff sustained his production-burden. If the plaintiff does not get to line X, that means that the judge would so vehemently disagree with a verdict for the plaintiff as to consider the jury irrational, and so the judge can grant the motion for judgment as a matter of law. Line X, again, represents the judge's view on the limit of rationality in the jury's finding for the plaintiff, rather than the judge's view of the evidential likelihood that the disputed fact exists. For example, if the judge disbelieved all of the plaintiff's abundant evidence, but still acknowledged that a reasonable jury could believe it, then the judge should rule that the plaintiff has carried his production-burden, because a reasonable jury could conclude that the plaintiff sustained his persuasion-burden.
This diagrammatic scheme works pretty well to represent the law's approach. Moreover, the diagram helps in understanding other concepts and special rules. A permissive inference (and res ipsa loquitur is one in the view of most courts (138)) describes an inference that a jury is authorized but not required to draw from certain evidence; in other words, the inference satisfies the plaintiff's production-burden by getting the case to line X, although not beyond line Y. A true presumption (such as the presumption against suicide as the cause of death) shifts the burden of production to the opponent after the introduction of the evidential premise; in other words, the presumption puts the case to the fight of line Y and so requires the jury to find the presumed fact, unless the opponent introduces enough evidence to carry her production-burden and push the case at least back into the jury zone between Y and X. (139)
Among special rules, certain kinds of evidence will not satisfy an initial burden of production. To satisfy that burden, the burdened party cannot rely on the opponent's failure to testify, (140) on mere disbelief of the opposing testimony, (141) or on demeanor evidence drawn from the opponent's testimony. (142) Similarly, naked statistical evidence normally will not satisfy the initial burden of production. (143) However, any of these kinds of evidence is perfectly proper to introduce as a supplement to positive evidence that satisfies the initial burden of production. (144) The idea behind these special rules is that they are necessary to protect the notion of an initial burden of production, which serves to facilitate early termination of weak claims or defenses, to safeguard against irrational error, and to effectuate other process and outcome values. (145) In the absence of these special rules, any burdened party could produce enough evidence to reach the jury, this evidence possibly being merely in the form of silence, disbelief, demeanor, or general statistics (such as that the defendant manufactured 60% of the supply of the injury-causing device of unknown provenance). Perhaps we harbor a special fear of the jury's mishandling of such evidence when undiluted by other admitted evidence and consequently rendering an unreasoned verdict for the proponent based either on prejudice without regard to the evidence or on undue deference to such bewildering evidence. To avoid such an outcome, and to ensure that the burden of production means something, the judge should require sufficient evidence of other kinds. Once the proponent clears that hurdle, the tribunal should allow the feared evidence its probative effect.
At first glance, this whole accepted scheme seems fairly compatible with traditional probability. One diagrammatic qualification coming from the new logic would be that representing the judge's view of jury error as a fuzzy interval rather than a point would better capture reality.
But the biggest difficulty for traditional probability is fixing the starting point. The probabilist might assume that when you know nothing, the rational starting point is 50% (thus, many a Bayesian would make 50% the initial prior probability). Indeed, some experimental evidence indicates that lay people do tend to start at 50%. (146) Then, if the plaintiff offers a feather's weight of evidence, he would thereby carry not only his burden of production but also his burden of persuasion.
The real-life judge, however, hands only defeat to the plaintiff with nothing more than a feather's weight of evidence, and does so by summary means. Why is that? The law says that we should start not at 50% but at the far left, and to get to X requires more than a feather's weight. The proper representation of lack of proof is zero belief in the plaintiff's position, but also zero belief in the defendant's position. The full range of belief is properly uncommitted. That insight makes sense of the notion of the burden of production. It also suggests that, in starting at zero belief, the law is proceeding by belief function theory.
IV. APPLYING STANDARDS
This Part will introduce the idea of comparing belief and disbelief of a fact, which the fact-finder would do after putting any indeterminate belief aside. Then, this Part will demonstrate how the law already conceives of its three standards of proof as different ways of so comparing belief and disbelief.
A. Comparison of Beliefs
My conceptualization has thus far led me to think that the law should not and does not employ the traditional academic view of the proof process resting on a two-valued logical approach. Fact-finders instead determine their beliefs as fuzzy degrees of real-world truth based on the evidence, just as the law expects of them. Eventually they end up with Bel(S) and Bel(notS), falling between 0 and 1, but not necessarily adding to 1. What then do they do?
So, finally, I come to the matter of applying a standard of decision. The law dictates that fact-finders decide by subjecting their fuzzy beliefs to a standard of proof in order to come to an unambiguous output. That is, at this point the law forces fact-finders back into what looks like a two-valued logic, by forcing them to decide for one party or the other. (147) Such disambiguation is not a practice unique to law. All fuzzy computer programs end with a step that produces an unambiguous output, a step called defuzzification. (148)
Application of a standard of proof is a different step from evidential argument; the academic disputes as to standards do not overlap with the disputes over how to assess, conjoin, and analyze evidence. (149) Psychologists have contributed almost nothing here, (150) leaving the dispute to logicians so far.
As to the psychology involved, I assume only that people, if told to do so, can apply a simple standard of proof imposed by law. Similarly, in the absence of studies to the contrary, I believe jurors and others will try to do so. Thus, it matters what the law says about standards.
On the logic front, I contend that speaking in terms of two-valued logic tends to mislead on standards, just as it does elsewhere. Admittedly, the determined theorist could pursue the two-valued image of traditional probability. Then the ultimate task of applying a standard of proof would unavoidably involve placement on a scale of likelihood. (151)
A better understanding of standards of proof would result from thinking in terms of many-valued logic and belief functions, however. Even though decision-making requires converting from a many-valued logic to an output that sounds two-valued, the law does not need to require enough evidence to make the fact more likely than 50% or whatever. The path to decision might involve only comparing Bel(S) and Bel(notS) while ignoring the indeterminate belief. All the fact-finder need do is compare the strengths of belief and disbelief. By requiring only a comparison, belief functions would never require placement on a scale of likelihood. (152)
B. Legal Application: Burden of Persuasion
1. Traditional View
In going from discussing the burden of production to explaining the academic view of the burden of persuasion, I need to use a different diagram, one that represents the internal thought process of the fact-finder in ultimately weighing the evidence. The grid now measures the fact-finder's view of the evidential likelihood that the disputed fact exists, with likelihood increasing from 0% on the left to 100% on the right.
The plaintiff in an imagined civil trial again starts at the left. By presenting evidence on the issue, he must get beyond the midpoint to win. That is, he must show that it is more likely than not that the disputed fact exists. If, after the plaintiff has given his best shot, the fact-finder thinks that he has not passed the 50% line, then the fact-finder should decide for the defendant.
A necessary qualification is that even under this traditional view, this diagram serves mainly as an impetus to thinking about these matters, rather than as a source of definitive statements thereon. For example, the diagram does not mean that a 50% line exists in reality. The psychological truth is that equipoise is more of a zone, or range of probabilities, than a line. A range of evidential states may strike the fact-finder as evenly balanced. (153)
That equipoise is a zone means that the burden of persuasion will affect many more cases than those in which the conflicting evidence results precisely in a dead heat. The fact-finder will rely on the burden of persuasion more often than one might imagine. Also, how the law frames an issue--whether the plaintiff or the defendant bears the brunt of nonpersuasion of a fact, that is, whether the plaintiff or the defendant appears to start from zero--matters. (154) An anchoring heuristic lowers the willingness of the fact-finder to determine that the burdened party has prevailed, because people fail to adjust fully from a given starting point, even if arbitrarily set. (155) In sum, the burden of persuasion is not a mere tiebreaker, which explains why lawyers and judges fight and suffer over it in practice.
Again, this diagrammatic representation of the traditionally viewed burden of persuasion appears fairly compatible with traditional probability. But having to draw a fat 50% line encourages a reconsideration of the proof standards. And that reconsideration leads to reformulating those standards to reflect the role of the new logic. The conclusion will be that this diagram for the burden of persuasion is fundamentally misleading. The diagramed view thus needs redrawing rather than mere refinement. The law does not and should not conform to the traditional academic view.
2. Reformulated View
a. Current Standards
The law has settled on three standards of proof that apply in different circumstances: (1) The standard of preponderance of the evidence translates into more-likely-than-not. It is the usual standard in civil litigation, but it appears throughout law. (2) Next comes the intermediate standard or standards, often grouped under the banner of clear and convincing evidence and roughly translated as much-more-likely-than-not. These variously phrased but equivalently applied standards govern on certain issues in special situations, such as when terminating parental rights. (156) (3) The standard of proof beyond a reasonable doubt means proof to a virtual-certainty. It very rarely prevails outside criminal law. (157)
b. Relative-Plausibility Theory
The insightful relative-plausibility theory of Professor Ron Allen shows a nontraditional embrace of relative judgment, in preference to our weaker skills at absolute judgment of likelihood. (158) He builds on the story model of evidence-processing to produce another theoretical brand of holism. The relative-plausibility theory posits that the fact-finder constructs the story (or stories) that the plaintiff is spinning and another story (or stories) that the defendant is spinning. The fact-finder then compares the two stories (or collections of stories) and gives victory to the plaintiff if the plaintiffs version is more plausible than the defendant's. (159) This choice between alternative competing narratives is largely an ordinal process rather than a cardinal one.
Allen's ordinal comparison cannot easily explain standards of proof higher or lower than preponderance of the evidence. (160) Its more obvious, and admitted, (161) difficulty is that it does not track well what the law tells its fact-finders about how to proceed, and it diverges from the law by compelling the non-burdened party to choose and formulate a competing version of the truth. Finally, it comes with baggage, such as requiring acceptance of the story model. (162)
c. Reformulated Standards
Consider what else preponderance of the evidence, or its translation of more-likely-than-not, could mean in a comparative sense.
One could compare the proof to some threshold. Although one could say that the proof must exceed 50%, this formulation does not accord with the import of real cases. The law does not require the completeness of proof that would be necessary to get a belief above 50%. The law is willing to rest decisions on the evidence presented. (163)
The law does not inquire which side has the stronger evidence, however. It looks instead to belief in the burdened party's position. (164) Although one could measure the belief against some absolute measure, say, requiring that Bel(S) exceed 50%, the better approach is to invoke the more powerful human ability of relative judgment by comparing beliefs. One could compare Bel(S) relative to Bel(notS). (165)
In comparing them, Bel(notS) is the belief in the negation of S, not the complement of Bel(S). It represents how much the fact-finder actively disbelieves S, the fact in dispute. The comparison thus should look at actual belief in S and actual disbelief of S.
If you were to work with only those two beliefs, and discard the indeterminate belief, the most obvious course in civil cases would be to say that the burdened party should win if and only if Bel(S) > Bel(notS). You would decide for the plaintiff if Bel(S) exceeds Bel(notS), but decide for the defendant if Bel(S) does not exceed Bel(notS).
This comparative approach to the civil standard of proof does not mean that the non-burdened party needs to formulate a competing version of the truth, other than negation. A belief in the falsity of the burdened party's version of the truth may develop naturally in the course of trial. It could arise even upon hearing only the burdened party's evidence. The non-burdened party's evidence, if any, should contribute to raising Bel(notS).
Relatedly, the non-burdened party need not fight imaginary fights. Some scholars worry that looking at negation puts the burdened party in the impossible situation of disproving every alternative possibility. (166) But that worry comes from confusing lack of belief with disbelief. Disbelieving S entails the degree to which the fact-finder thinks S is false. The mere possibility of other states of the world in which S is not true go into the uncommitted belief, not into Bel(notS); recall that the "plausibility" of notS equals Bel(notS) plus the uncommitted belief; again, the degree of believing that Katie is not dead, or actually alive, is quite different from envisaging the chance that she is possibly alive. The proposed comparison involves the belief in notS, and does not involve the plausibility of notS.
Now, as to the other two standards of proof, clear and convincing evidence should mean Bel(S) >> Bel(notS). (167) This standard would not be that difficult to apply. We are quite used to such a standard of being clearly convinced, in life and in law. Judges apply it on a motion for a new trial based on the verdict's being against the weight of the evidence. (168) Appellate courts use it in reviewing judge-found facts. (169) Those standards of decision mean that it is not enough to disagree with the jury or the judge; the reviewer must think there was a serious error.
However, the cases do not make very evident what clear and convincing means. Alternatively, or perhaps additionally, it imposes a requirement about the completeness of evidence. It may require admission of enough evidence to reduce uncommitted belief to the point that Bel(S) exceeds the plausibility of notS. I am open to those viewpoints, but unconvinced so far.
As to proof beyond a reasonable doubt, it is different in kind. It must mean more than Bel(S) >> Bel(notS). Placing separate demands on Bel(notS) and Bel(S), it should mean that no reasonable doubt persists and that no great uncommitted belief remains. (170)
No reasonable doubt means that no reasonable person could hold Bel(notS) > 0. On the view that anything is possible, zero as a coarsely gradated degree of belief equates to a "slightest possibility." (171) Therefore, Bel(notS) > 0 refers to a step up from the slightest possibility of innocence. No reasonable fact-finder should see a "reasonable possibility" of innocence. In other words, for a conviction the prosecutor must show that no reasonable possibility of innocence exists.
No great uncommitted belief reflects the idea that Bel(S) cannot be weak, measured in an absolute sense. We do not want to convict when, although there is some evidence of guilt, we really do not know what happened. The belief in guilt must outweigh all alternative possibilities, including fanciful ones. The belief in guilt must exceed the plausibility of innocence, so that Bel(S) > .50. Given the usual limits on available evidence, achieving such a high degree of absolute belief represents a demanding standard. (172)
d. Compatibility of Reformulated and Current Standards
A reader always entertains the temptation, upon seeing what looks like a plea for reconceptualization, to dismiss it as a pie-in-the-sky academic musing. When the reconceptualization involves the standards of proof, the specialists have the added temptation of dismissing it as another of the common anti-probabilist rants or proprobabilist paeans. After all, if my view were a sound one, someone would have come up with it before. So I hasten to undercut my contribution by stressing that my ideas are not that new. I am trying little more than to explain what the law has been doing all along.
The easiest way to grasp the lack of newness is to picture an alternative fashion of converting from fuzzy beliefs back into a two-valued output. Picture a normalization process of disregarding the indeterminate beliefs and scaling Bel(S) and Bel(notS) up proportionately so that they add to one. Call the recalculations b(S) and b(notS). If Bel(S)=.50 and Bel(notS)=.20, then b(S)=.71 and b(notS)=.29. These new numbers represent much less mental distance from the traditional view of standards of proof, because b(S) > b(notS) if and only if b(S) > .50. Thus, preponderance could retain a meaning of likelihood exceeding 50%, while clear and convincing means much more likely than 50% and beyond a reasonable doubt means almost certainty. This alternative renders my conceptualization much less jarring, and it also demonstrates that I did not pull my formulations out of thin air.
Yet, I resist taking that normalization route. First, converting to additive beliefs would reintroduce the probabilistic imaging that originally led us astray into all the problems and paradoxes of the traditional view. Second, I contend that directly comparing Bel(S) and Bel (notS) actually conforms better to the actual law than the probabilistic view does. Third, normalization requires measurement of b(S) and b(notS), a step otherwise unnecessary, and a step that is much more difficult for humans than relative judgment.
The evidence at trial will support S to an extent while supporting notS to another extent, and the reformulated standards say that the fact-finder need only compare these two fuzzy beliefs. How does the current law actually state, say, preponderance of the evidence? Consider a couple of classic cases.
In Livanovitch v. Livanovitch, (173) the trial court gave the following charge: "If ... you are more inclined to believe from the evidence that he did so deliver the bonds to the defendant, even though your belief is only the slightest degree greater than that he did not, your verdict should be for the plaintiff." (174) The appellate court said:
The instruction was not erroneous. It was but another way of saying that the slightest preponderance of the evidence in his favor entitled the plaintiff to a verdict.... All that is required in a civil case of one who has the burden of proof is that he establish his claim by a preponderance of the evidence.... When the equilibrium of proof is destroyed, and the beam inclines toward him who has the burden, however slightly, he has satisfied the requirement of the law, and is entitled to the verdict. "A bare preponderance is sufficient, though the scales drop but a feather's weight." This rule accords with the practice in this state as remembered by the justices of this court, and is well supported by the authorities. (175)
In Lampe v. Franklin American Trust Co., (176) one of the defendant's contentions was that the note in suit had been altered after it had been signed by the defendant's decedent. The trial court refused the defendant's request for an instruction that the jury should find that the instrument was not the decedent's note
if you find and believe that it is more probable that such changes or alterations have been made in the instrument after it was signed by the deceased and without his knowledge and consent, than it is that such alterations and changes were made at or about the time that the deceased signed the instrument and under his direction and with his knowledge and consent. (177)
Holding the refusal to have been proper, the appellate court said:
The trouble with this statement is that a verdict must be based upon what the jury finds to be facts rather than what they find to be "more probable." ... This means merely that the party, who has the burden of proof, must produce evidence, tending to show the truth of those facts, "which is more convincing to them as worthy of belief than that which is offered in opposition thereto." (178)
These two cases' formulations sound contradictory. But if one interprets the quotations as speaking in terms of the coarsely gradated belief in the fact compared with the coarsely gradated belief in the fact's negation, based on the evidence presented, the apparent contradiction evaporates. They both seem to be saying that the burdened party should win if and only if Bel(S) > Bel(notS).
Other courts sometimes express more divergent views of the standard of proof. Some writers conclude that courts interpret preponderance in one of three ways: (1) "more convincing," which requires the burdened party to tell a better tale than the opponent tells; (2) "more likely than not," which requires a showing of the fact's existence stronger than the showing of its nonexistence; or (3) "really happened," which requires a showing by evidence of what probably transpired outside in the real world. (179) My approach would conform to the middle option of (2), rather than either (1) relative plausibility or (3) absolute measure.
In the end, I submit that comparison of coarsely gradated beliefs is an accurate representation of what the law tells a fact-finder to do with a standard of proof. In civil cases, the fact-finder has to find that Bel(S) is more likely than not, which means Bel(S) > Bel(notS). Or as the judge tells the jurors, preponderance means that the evidence "produces in your minds belief that what is sought to be proved is more likely true than not true" (180) or "more probably true than false." (181) By literally instructing fact-finders to decide between S and notS, the law effectively urges them to focus on those two fuzzy beliefs and compare them.
3. Implications of Reformulation
My views, then, are not seditious. Overall I merely contend, in accordance with the new logic's teaching, that the law charges fact-finders to form a set of fuzzy beliefs, while leaving some belief uncommitted in the face of imperfect evidence, and then to apply the standard of proof by comparing their resultant belief in the burdened party's version to their belief in its negation. Many observers of the legal system would find that contention, putting its slightly new vocabulary to the side, unobjectionable.
Tracing the implications of my contention reveals its hidden powers, however. It implies that the fact-finders at the end of a case would properly apply the standard to each separate element. It also implies that the fact-finders should start the case, being in a state of ignorance with lack of proof, at a zero belief. Thus, two paradoxes in the nature of legal proof simply vaporize. The four parts of this Article generate related insights.
First, the linguistic evaluations that humans tend to use in their fuzzy logic, as opposed to quantifications, nicely express the law's development of a coarsely gradated scale of possibilities and probabilities: (1) slightest possibility, (2) reasonable possibility, (3) substantial possibility, (4) equipoise, (5) probability, (6) high probability, and (7) almost certainty. And the coarseness of the scale of likelihood means that the fact-finder in comparing beliefs will not have to draw paper-thin distinctions.
Second, when the fact-finders face multiple elements, it has long appeared that they seek the most believable story by applying the standard of proof to each element. But theorists worry that this conjoined story itself may not meet the standard of proof. Rest assured, because the law knows what it is doing. The MIN operator demonstrates that belief in the conjunction will match the belief in the least likely element, which has already passed the standard of proof.
Third, the notion of burden of proof becomes much clearer. The paradoxical difficulties in applying the burden to weak proof dissipate. For an example, a directed verdict motion by a civil defendant meshes the burden of production with the new view of the preponderance standard. The motion requires the judge to ask if no reasonable jury could view Bel(S) > Bel(notS). (182) At the end of the plaintiffs case, if a reasonable Bel(notS) equals 0 (effectively a "slightest possibility"), then the inequality requires a compatibly reasonable Bel(S) to exceed 0 (effectively a "reasonable possibility"). That the plaintiff must have established a reasonable possibility is the embodiment of the burden of production, and it is what keeps the plaintiff from surviving with a mere feather's weight of evidence. An illustrative situation would be where the plaintiff has produced a little evidence, but it is "pure" evidence that gives the defendant no support. (183) If a reasonable jury could find for the plaintiff on such proof, the judge should deny the directed verdict motion. If the defendant then produces no effective evidence during the rest of the trial, but moves again for a directed verdict at the end of all the evidence, the judge should deny the motion and the case should go to the jury. The jury, if it were to take the same view of the evidence as the judge hypothesized, could find for the plaintiff--even on such thin evidence.
Fourth, a new understanding of how to apply the standard of proof to the party with the burden of persuasion follows naturally, even if not inevitably, from the foregoing logical conceptualization of the nature of proof. The standard should concede that upon incomplete, inconclusive, ambiguous, dissonant, and untrustworthy proof, some of our belief will remain indeterminate. The standard should look only to committed belief, comparing belief in the burdened party's version versus disbelief.
Not only does this comparative approach comport with the natural cognitive method that follows from telling the fact-finders they must decide for one side or the other, but also it does nothing to interfere with the current procedural and substantive functioning of the standard of proof. For example, the traditional view of the preponderance standard as a showing of a probability greater than 50% appeared appropriate for civil cases: among competing fixed standards, (184) it minimizes the expected number of erroneous decisions and also the expected sum of wrongful amounts of damages, (185) which is the goal that the law apparently pursues in preference to optimizing incentives for primary conduct. (186) My reformulated standard has the same error-minimizing properties, but achieves them in the real world where the law of the excluded middle does not hold and where some indeterminacy prevails. For an idea of a proof adapted from a probabilist's proof, let b(S)=p be the apparent probability that the defendant is liable (for D dollars) under a two-valued view. If Bel(S) > Bel(notS), then p > 1/2; call p by the name [p.sub.1] in that case. If Bel(S) [less than or equal to] Bel(notS), call it [p.sub.2]. On the one hand, under the preponderance standard, the expected sum of false positives and false negatives over the run of cases is [summation][(1-[p.sub.1]) D + [p.sub.2]D]. On the other hand, under a very high standard that eliminates false positives, the analogous sum is [summation][[p.sub.1]D + [p.sub.2]D]. Therefore, given that (1-[p.sub.1]) is less than [p.sub.1], the reformulated preponderance standard lowers the system's expected error costs.
To close, a comprehensive example would perhaps be beneficial. Suppose that someone has seriously injured Suzie, in circumstances suggesting fault. She sues Tom, which means that she must prove his identity as the tortfeasor--as well as fault, causation, and injury. She introduces a fair amount of evidence.
First, the fact-finder would assess that evidence and might conclude as follows: (1) The evidence points to Tom being the perpetrator. If the fact-finder were a bettor, he would put the odds at 3:2, or 60%. Using words, he would say that Tom was probably the perpetrator. (2) The question of fault was a tough one. There are uncertainties as to what was done, but there is also a vagueness concerning how blameworthy the supposed acts really were. The fact-finder needs commensurable measures, so that he can evaluate a mix of random and nonrandom uncertainty. If forced to assess all the evidence on this issue and put it on a scale of truth running from zero to one, he would say .7. He might feel more comfortable saying fault was probable. (3) The acts, whatever they were, apparently caused the injury. Proximate cause is about as vague and multivalent as a legal concept can get. The fact-finder is pretty convinced nevertheless. He would put causation at .8, or highly probable. (4) Suzie's injuries are not really very vague or uncertain. He would put this element of the tort at .95, or beyond a reasonable doubt. Note that the new conceptualization changes nothing, to this point, regarding the fact-finder's task as traditionally envisaged.
Second, the fact-finder may want to combine these findings. They are a mixture of probabilities and degrees of truth. But viewing them all as degrees of truth invokes the MIN operator, so that he can say that Suzie's story comes in at .6, or probable. Suzie should win, by the use of fuzzy logic.
Third, this approach does not do a terribly good job of accounting for the state of the evidence. It still poses an odd question to the fact-finder: given imperfect evidence, what is the degree to which the plaintiff is right? Belief functions work better here to reflect the factfinder's actual knowledge: belief starts at zero, and some belief will remain uncommitted in the absence of perfect evidence. That is, on a fact to which the standard of proof applies, the belief function route is the one to take, rather than invoking the simplistic scale of likelihood just described. Instead of saying that Tom's fault is probable, the factfinder should speak and think in terms of degrees of belief.
Fourth, although belief functions do not require placement on a scale, the fact-finder in effect might end in believing Suzie's position on Tom's fault to be only substantially possible. That situation does not mean that Suzie should lose, however. The fact-finder might, if forced to express likelihood, believe the falsity of Tom's fault merely to a reasonable possibility. All the fact-finder must do is to compare belief and disbelief: all that preponderance of the evidence requires is that the strength of the fact-finder's belief that Tom was at fault must exceed his belief that Tom was not at fault. Belief functions thus add the idea that the fact-finder in such a case must have a belief in the case's truth stronger than his belief in its falsity. While some of the fact-finder's belief remains uncommitted, he did find Suzie's position to be a good one: more likely true than false. So, Suzie should still win, by the use of belief functions.
This Article deploys the new logic--in particular, fuzzy logic and belief functions in their broad senses--to conceptualize the standards of proof. This was not a heavily prescriptive endeavor, which would have tried to argue normatively for the best way to apply standards. Instead, it was mainly a descriptive and explanatory endeavor, trying to unearth how standards of proof actually work in the law world. Compared to the traditionally probabilistic account, this conceptualization conforms more closely to what we know of people's cognition, captures better what the law says its standards are and how it manipulates them, and improves our mental image of the fact-finders' task. One virtue of the conceptualization is that it is not radically new, as it principally acts to confirm the law's ancient message that fact-finders should simply compare their non-quantified views of the fact's truth and falsity. The conceptualization leaves the law's standards essentially intact to accomplish their current purposes. Another virtue is that it nevertheless manages to resolve some stubborn problems of proof, including the fabled conjunction paradox. Thus, for understanding the standards of proof, degrees of fuzzy belief work better than traditional probabilities.
In brief, the new logic reveals that the law wants fact-finders to form degrees of belief that would conform to a fuzzy scale, to combine them in a logical fashion while leaving some belief uncommitted in the face of imperfect evidence, and then to apply the standard of proof by comparing their resultant belief in the burdened party's version of fact to their belief in its negation.
(1) See generally WILLIAM TWINING, RETHINKING EVIDENCE 237-48 (2d ed. 2006) ("[I]t is illuminating to view questions about evidence and proof as questions about the processing and uses of information in important decisions in litigation."); Richard Lempert, The New Evidence Scholarship: Analyzing the Process of Proof in PROBABILITY AND INFERENCE IN THE LAW OF EVIDENCE 61, 61 (Peter Tillers & Eric D. Green eds., 1988) ("Evidence is being transformed from a field concerned with the articulation of rules to a field concerned with the process of proof... [D]isciplines outside the law, like mathematics, psychology and philosophy, are being plumbed for the guidance they can give."); Roger C. Park & Michael J. Saks, Evidence Scholarship Reconsidered: Results of the Interdisciplinary Turn, 47 B.C.L. REV. 949, 949 (2006) ("[T]he changing field of evidence scholarship ... has become decidedly interdisciplinary."). Although "new," this work represented the necessary return to abandoned efforts by past greats such as Wigmore. See, e.g., TERENCE ANDERSON ET AL., ANALYSIS OF EVIDENCE (2d ed. 2005) (building on JOHN HENRY WIGMORE, THE SCIENCE OF JUDICIAL PROOF (3d ed. 1937)).
(2) See generally RICHARD EGGLESTON, EVIDENCE, PROOF AND PROBABILITY 3 (2d ed. 1983) ("It is the purpose of this book to discuss the part that probabilities play in the law, and the extent to which existing legal doctrine is compatible with the true role of probabilities in the conduct of human affairs."); Eric D. Green, Foreword: Probability and Inference in the Law of Evidence, 66 B.U.L. REV. 377, 377 (1986) ("[Q]uestions about the nature of proof invariably raise questions about theories of inference and the proper use of mathematical and statistical evidence and probability arguments in courts."); Symposium, Decision and Inference in Litigation, 13 CARDOZO L. REV. 253, 253 (1991) ("One of the more striking features of this new approach to the study of evidence was the use of symbolic notation and formal argument, particularly mathematical notation and mathematical argument."); cf. V.C. Ball, The Moment of Truth: Probability Theory and Standards of Proof 14 VAND. L. REV. 807, 809-12 (1961) (treating frequentist theory); Laurence H. Tribe, Trial by Mathematics: Precision and Ritual in the Legal Process, 84 HARV. L. REV. 1329, 1344-50 (1971) (treating subjective theory).
(3) See generally SHARON BERTSCH McGRAYNE, THE THEORY THAT WOULD NOT DIE (2011) (recounting the centuries of controversy generated by Bayes' theorem).
(4) Peter Tillers, Trial by Mathematics--Reconsidered, 10 LAW PROBABILITY & RISK 167, 169-170 (2011) (providing a nice summary of the major developments since 1970). But see Roger C. Park et al., Bayes Wars Redivivus--An Exchange, 8 INT'L COMMENT. ON EVIDENCE iss. 1, art. 1 (2010) (presenting an electronic exchange amongst evidence scholars debating the major issues in evidence law).
(5) See Lea Brilmayer, Second-Order Evidence and Bayesian Logic, 66 B.U.L. REV. 673, 688-91 (1986) (suggesting that diagnosis); Tillers, supra note 4, at 171 (presenting a similar argument). Some Bayesians, however, were sympathetic to the new logic. See, e.g., David A. Schum, Probability and the Processes of Discovery, Proof and Choice, 66 B.U. L. REV. 825, 847-53, 865-69 (1986).
(6) See THEODORE SIDER, LOGIC FOR PHILOSOPHY 72--73 (2010).
(7) See Peter Suber, Non-Contradiction and Excluded Middle, http://www.earlham. edu/~peters/courses/logsys/pnc-pem.htm (last visited Jan. 15, 2013).
(8) Bertrand Russell, Vagueness, 1 AUSTRALASlAN J. PSYCHOL & PHIL. 84 (1923); see Bertrand Russell, The Philosophy of Logical Atomism, in LOGIC AND KNOWLEDGE 175, 180 (Robert Charles Marsh ed., 1956) ("Everything is vague to a degree you do not realize till you have tried to make it precise, and everything precise is so remote from everything that we normally think, that you cannot for a moment suppose that is what we really mean when we say what we think."); see also, e.g., TIMOTHY WILIAMSON, Preface to VAGUENESS, at xi (1996) ("[V]agueness consists in our ignorance of the sharp boundaries of our concepts, and therefore requires no revision of standard logic."); Hartry Field, No Fact of the Matter, 81 AUSTRALASIAN J. PHIL. 457 (2003) (countering the Williamson view); Hartry Field, Indeterminacy, Degree of Belief, and Excluded Middle, 34 NOUS 1, 20 (2000) (referencing work on belief functions); Hartry Field, Vagueness, Partial Belief, and Logic, in MEANINGS AND OTHER THINGS (G. Ostertag ed., forthcoming 2013), available at http://philosophy.fas.nyu.edu/docs/IO/1158/schiffer2004b.pdf (incorporating ideas similar to fuzzy logic).
(9) See generally J.C. BEALL & BAS C. VAN FRAASSEN, POSSIBILITIES AND PARADOX (2003) (introducing the concept of many-valued logic).
(10) The new logic gives a pretty good answer to the sorites paradox itself, by allowing a response to "heap vel non?" in terms of a degree of heapness rather than a yes-or-no response. See BART KOSKO, FUZZY THINKING 94--97 (1993). But it is not a perfect answer, say the super-sophisticated. See R. M. Sainsbury & Timothy William-son, Sorites, in A COMPANION TO THE PHILOSOPHY OF LANGUAGE 458, 475-77 (Bob Hale & Crispin Wright eds., 1997) ("It does not do justice to higher-order vagueness[.]"); Nicholas J.J. Smith, Fuzzy Logic and Higher-Order Vagueness, in LOGICAL MODELS OF REASONING WITH VAGUE INFORMATION 1, 1 (Petr Cintula et al. eds., forthcoming 2012), available at http://www-personal.usyd.edu.au/~njjsmith/papers/SmithFuzLogHOV ag.pdf ("[T]heories of vagueness based on fuzzy logic ... give rise to a problem of higher-order vagueness or artificial precision."). But see TIMOTHY A. O. ENDICOTT, VAGUENESS IN LAW 77--136 (2000) ("I conclude that higher-order vagueness is truculent., a theory should neither deny it, nor assert a particular number of orders of vagueness, nor even assert that ordinary vague expressions are vague at all orders."). See generally LIARS AND HEARS (JC Beall ed., 2003) (examining soritical paradoxes in detail); Kevin M. Clermont, Foreword: Why Comparative Civil Procedure?, in KUO-CHANG HUANG, INTRODUCING DISCOVERY INTO CML LAW, at xviii n.50 (2003) (discussing self-contradictory statements).
(11) See generally Peter Walley, Statistical Reasoning with Imprecise Probabilities, 42 MONOGRAPHS ON STATISTICS AND APPLIED PROBABILITY (1991) (examining the concept of, and methods of assessing, imprecise probabilities).
(12) See James F. Brule, Fuzzy Systems--A Tutorial, http://www.austinlinks.com/ Fuzzy/tutorial.html (last visited Jan. 15, 2013) (recounting the history briefly).
(13) See KOSKO, supra note 10, at 69-78 (giving an impassioned attack on the West's hostility to many-valued logic, but also fleshing out the historical account).
(14) See GEORGE BOOLE, AN INVESTIGATION OF THE LAWS OF THOUGHT (London, Walton and Maberly 1854) (doing the early work); cf. JOHN MAYNARD KEYNES, A TREATISE ON PROBABILITY" (1921).
(15) L.A. Zadeh, Fuzzy Sets, 8 INFO. & CONTROL 338 (1965).
(16) Zadeh substituted the term "fuzzy" for "vague" more to be provocative than to be descriptive. See KOSKO, supra note 10, at 19-20, 142, 145, 148.
(17) For an accessible introduction, see Timothy J. Ross & W.Jerry Parkinson, Fuzzy Set Theory, Fuzzy Logic, and Fuzzy Systems, in Fuzzy LOGIC AND PROBABILITY APPLICATIONS 29 (TimothyJ. Ross et al. eds., 2002).
(18) See KOSKO, supra note 10, at 157-200 (describing fuzzy computer systems).
(19) See Liu Sifeng, Jeffrey Forrest & Yang Yingjie, A Brief Introduction to Grey Systems Theory, in 2011 IEEE INTERNATIONAL CONFERENCE ON GREY SYSTEMS AND INTELLIGENT SERVICES I, 6 (2011).
(20) See, e.g., Kevin M. Clermont, Procedure's Magical Number Three: Psychological Bases for Standards of Decision, 72 CORNELL L. REV. 1115, 1122 n.36 (1987); Schum, supra note 5, at 865-69.
(21) In statistical terminology, "likelihood" (the chance that the data would be observed, given a hypothesis as true) is not wholly equivalent to "probability" (the chance that a hypothesis is true, given the observed data). ,See Richard M. Royall, Statistical Evidence, 71 MONOGRAPHS ON STATISTICS AND APPLIED PROBABILITY, at 5-6, 28 (1997). But for most people, likelihood means probability. I use likelihood here in that way, with perhaps the connotation of an intuitive measure of probability and with the benefit of conforming to the common legal, and probabilistic, usage of "more likely than not."
(22) MIRCEA REGHIS & EUGENE ROVENTA, CLASSICAL AND Fuzzy CONCEPTS IN MATHEMATICAL LOGIC AND APPLICATIONS 354 (1998) (referencing Shafer's work on imperfect reasoning); see DAVID A. SCHUM, THE EVIDENTIAL FOUNDATIONS OF PROBABILISTIC REASONING 41,201 (1994) (" [N] o single view of probabilistic reasoning captures all of the behavioral richness evident in such tasks.").
(23) GLENN SHAFER, A MATHEMATICAL THEORY OF EVIDENCE (1976) (using "evidence" in a much broader sense than legal evidence); see also Glenn Shafer, Perspectives on the Theory and Practice of Belief Functions, 4 INT'L J. APPROXIMATE REASONING 323 (1990) (complementing his own earlier work); Lotfi A. Zadeh, Book Review, AI MAG., Fall 1984, at 81, 83 (reviewing SHAVER, supra, and treating Shafer's theory as a version of fuzzy logic's possibility theory).
(24) See Ron A. Shapira, Economic Analysis of the Law of Evidence: A Caveat, 19 CARDOZO L. REV. 1607, 1614 (1998) ("In the legally relevant literature, it was Professor Glenn Shafer who introduced fuzzy measures as appropriate formalizations of epistemic functions.").
(25) See Glenn Shafer, The Construction of Probability Arguments, 66 B.U.L. REV. 799, 801-04 (1986). But cf. DAVID CHRISTENSEN, PUTTING LOGIC IN ITS PLACE 12--13, 69 (2004) (saying that some use "belief' as an unqualified assertion of an all-or-nothing state of belief); L. Jonathan Cohen, Should a jury Say What It Believes or What It Accepts ?, 13 CARBOZO L. REV. 465 (1991) (using "belief," for his purposes, in the sense of a passive feeling). One could view belief as a black box, thereby avoiding my preliminary foray into fuzzy logic by starting analysis at the post-belief stage: an anti-probabilist might say that the fact-finder somehow forms a belief and that the law's concern lies in how the fact-finder should handle that belief. However, I view degrees of belief as resting on degrees of certainty, thus necessitating initial consideration of the various models for handling uncertainty.
(26) For an accessible introduction, see SCHUM, supra note 22, at 222-43.
(27) See Didier Dubois & Henri Prade, A Unified View of Uncertainty Theories (Mar. 7, 2012) (unpublished manuscript).
(28) See IRVING M. COPI ET AL., INTRODUCTION TO LOGIC ch. 14 (14th ed. 2011).
(29) Cf. Schum, supra note 5, at 868 (distinguishing belief functions from second-order probability).
(30) A suggestive elaboration on traditional probability theories appeared in Neil B. Cohen, Confidence in Probability: Burdens of Persuasion in a World of Imperfect Knowledge, 60 N.Y.U.L. REV. 385 (1985) [hereinafter Cohen, Confidence in Probability]. That article advanced a new theory of standards of proof based on the statistical concept of confidence intervals. It described how sure a fact-finder is, employing not only a point estimate of probability but also a level of confidence, with sureness increasing as either component rises. But the attempt admittedly failed. Compare Neil B. Cohen, Commentary, The Costs of Acceptability: Blue Buses, Agent Orange, and Aversion to Statistical Evidence, 66 B.U.L. REV. 563, 569 (1986) (qualifying his own argument), and Neil B. Cohen, Conceptualizing Proof and Calculating Probabilities: A Response to Professor Kaye, 73 CORNELL L. REV. 78, 91-93 (1987) [hereinafter Cohen, Conceptualizing Proof] (conceding that confidence relates solely to the probability of avoiding false positives), with D.H. Kaye, Commentary, Do We Need a Calculus of Weight to Understand Proof Beyond a Reasonable Doubt?, 66 B.U.L. REV. 657, 667 n.22 (1986) (suggestively criticizing Neil Cohen's approach), and D.H. Kaye, Apples and Oranges: Confidence Coefficients and the Burden of Persuasion, 73 CORNELL L. REV. 54, 54, 56-58 (1987) (expanding his criticism).
(31) A similarly suggestive attempt to explain standards of proof in terms of psychological confidence appeared in Christoph Engel, Preponderance of the Evidence Versus Intime Conviction : A Behavioral Perspective on a Conflict Between American and Continental European Law, 33 VT. L. REV. 435 (2009). That article posited that over the proof process's course, the fact-finder generates a level of confidence in the decision by considering the degree of coverage (which means the story accounts for all the evidence), coherence (which means it is internally consistent, plausible with respect to the fact-finder's world knowledge, and complete without striking gaps in expected components), and uniqueness (which means the absence of plausible alternative stories). Id. at 453. The fact-finder achieves this by an automatic or unconscious process. The clearer the view of the case that the chosen story delivers to the fact-finder, the more confident the fact-finder will be. Id. Against this level of confidence, the fact-finder would somehow apply the standard of proof. I tried to demonstrate his theory's prescriptive and descriptive failings in Kevin M. Clermont, Standards of Proof Revisited, 33 VT. L. REV. 469 (2009).
(32) See Kevin M. Clermont & Emily Sherwin, A Comparative View of Standards of Proof 50 AM. J. COMP. L. 243, 256-57 (2002).
(33) Otherwise, probabilistic theorizing influenced the law only lightly until it began quite recently a major assault. The first effective volleys of that assault on the law arguably were John Kaplan, Decision Theory and the Factfinding Process, 20 STAN. L. REV. 1065 (1968), and Michael O. Finkelstein & William B. Fairley, A Bayesian Approach to Identification Evidence, 83 HARV. L. REV. 489 (1970).
(34) See Shapira, supra note 24, at 1613-16 (distinguishing additive from non-additive). Consequently, a belief and a belief in its negation will most often not add to one. See generally Rolf Haenni, Non-Additive Degrees of Belief in DEGREES OF BELIEF 121 (Franz Huber & Christoph Schmidt-Petri eds., 2009) (elaborating on this concept).
(35) See Clermont, supra note 31, at 475-79, 485 (recounting the state of psychological knowledge on evidence processing and standards of proof).
(36) See David A. Schum, A Science of Evidence: Contributions from Law and Probability, 8 LAW PROBABILITY &: RISK 197, 203--04 (2009) (crediting DAVID OLDROYD, THE ARCH OF KNOWLEDGE (1986), for this image).
(37) See Martin F. Kaplan, Cognitive Processes in the Individual Juror, in THE PSYCHOLOGY OF THE COURTROOM 197 (Norbert L. Kerr & Robert M. Bray eds., 1982); see also Jennifer Groscup & Jennifer Tallon, Theoretical Models of Jury Decision--Making, in JURY PSYCHOLOGY 41 (Joel D. Lieberman & Daniel A. Krauss eds., 2009) (sketching some contesting models); Nancy Pennington & Reid Hastie, Juror Decision-Making Models: The Generalization Gap, 89 PSYCHOL. BULL. 246 (1981) (evaluating various models). For treatment of the story model, see infra text accompanying note 116.
(38) See generally DANIEL KAHNEMAN, THINKING, FAST AND SLOW (2011). For treatment of humans' bounded rationality, see infra text accompanying note 84.
(39) See J. Alexander Tanford & Sarah Tanford, Better Trials Through Science: A Defense of Psychologist-Lawyer Collaboration, 66 N.C.L. REv. 741, 748-59 (1988).
(40) See Anna Ronkainen, Dual-Process Cognition and Legal Reasoning, in ARGUMENTATION 2011, at 1, 1 (Michal Araszkiewicz et al. eds., 2011), available at http://ssrn.com/ abstract=2004336 ("The dual-process framework is a set of theories on human cognition in which cognition is seen as consisting of (at least) two substantially different yet interdependent systems: the older, faster, partly unconscious and automatic System 1 and the newer, slower, fully conscious and considered System 2.").
(41) See Peter Tillers, Introduction: A Personal Perspective on "Artificial Intelligence and Judicial Proof," 22 CARDOZO L. REV. 1365 (2001) (discussing different mathematical approaches to judicial proof).
(42) Tribe, supra note 2, at 1347 (citing the innovative work of LEONARD J. SAVAGE, FOUNDATIONS OF STATISTICS (1954)).
(43) See id. at 1347-48. Compare id. at 1348 n.63 (accepting the product rule because he is assuming bivalence), with infra text accompanying note 110 (rejecting the product rule for subjective probabilities in fact-finding).
(44) See, e.g., Pennington & Hastie, supra note 37, at 262-68; Sharer, supra note 25, at 809-16. Compare Paul Bergman & Al Moore, Mistrial by Likelihood Ratio: Bayesian Analysis Meets the F-Word, 13 CARDOZO L. REV. 589, 590 (1991) (attacking), with D.H. Kaye, Commentary, Credal Probability, 13 CARDOZO L. REV. 647 (1991) (defending).
(45) See Leonard R. Jaffee, Of Probativity and Probability: Statistics, Scientific Evidence, and the Calculus of Chance at Trial, 46 U. PITT. L. REV. 925, 980-85 (1985).
(46) See Lea Brilmayer & Lewis Kornhauser, Review: Quantitative Methods and Legal Decisions, 46 U. CHI. L. REV. 116, 135-48 (1978).
(47) Compare SAMUEL KOTZ & DONNA F. STROUP, EDUCATED GUESSING (1983) (discussing the sophisticated techniques of theory), with Craig R. Callen, Notes on a Grand Illusion: Some Limits on the Use of Bayesian Theory in Evidence Law, 57 IND. L.J. 1 (1982) (discussing the simplified practices of law).
(48) The figure comes from Ross & Parkinson, supra note 17, at 30.
(49) The figure comes from Vilem Novak, Modeling with Words, SCHOLARPEDIA, http://www.scholarpedia.org/article/Modeling_with_words_(last visited Jan. 15, 2013).
(50) See Alf C. Zimmer, Verbal vs. Numerical Processing of Subjective Probabilities, in DECISION MAKING UNDER UNCERTAINTY 159, 180 (Roland W. Scholz ed., 1983). On the one hand, considerable empirical work on the legal standards of proof suggests that fact-finders show considerable confusion in translating the standards into numerical probabilities. The results argue for a verbal approach to standards of proof. See Clermont, supra note 20, at 1144-50 (recounting the empirical studies that show factfinders' difficulties in comprehending standards of proof). On the other hand, not only is such translation unnecessary, but also it might be a wrongheaded step if fuzzy logic is really in use. The prior empirical work might therefore need redoing. See Mandeep K. Dhami, On Measuring Quantitative Interpretations of Reasonable Doubt, 14 J. EXPERIMENTAL PSYCHOL.: APPLIED 353, 362 (2008) (developing a new technique for empirical work on standards of proof, called the membership function and based on fuzzy logic, and highlighting "the need to reevaluate the reliability and validity of past research findings on quantifying" standards of proof).
(51) See Zimmer, supra note 50, at 166.
(52) See Mandeep K. Dhami & Thomas S. Wallsten, Interpersonal Comparison of Subjective Probabilities: Toward Translating Linguistic Probabilities, 33 MEMORY & COGNITION 1057 (2005).
(53) See Petr Hajek, Fuzzy Logic, in THE STANFORD ENCYCLOPEDIA OF PHILOSOPHY (Edward N. Zalta ed., 2010), available at http://plato.stanford.edu/archives/fall2010/ entries/logic-fuzzy (discussing the varying versions of fuzzy logic).
(54) Cf. Brule, supra note 12 ("The skeptical observer will note that the assignment of values to linguistic meanings (such as 0.90 to 'very') and vice versa, is a most imprecise operation. Fuzzy systems, it should be noted, lay no claim to establishing a formal procedure for assignments at this level; in fact, the only argument for a particular assignment is its intuitive strength. What fuzzy logic does propose is to establish a formal method of operating on these values, once the primitives have been established.").
(55) See SHAVER, supra note 23, at 6, 57-67 (using orthogonal sums); Jeffrey A. Barnett, Computational Methods for A Mathematical Theory of Evidence, in CLASSIC WORKS OF THE DEMPSTER--SHAFER THEORY OF BELIEF FUNCTIONS 197, 198-204 (Ronald R. Yager & Liping Liu eds., 2008). By the Dempster-Shafer rule,
we construct a belief function to represent the new evidence and combine it with our "prior" belief function--i.e., with the belief function that represents our prior opinions. This method deals symmetrically with the new evidence and the old evidence on which our prior opinions are based: both bodies of evidence are represented by belief functions, and the result of the combination does not depend on which evidence is the old and which is the new.
SHAFER, supra note 23, at 25.
(56) For a comparison of Bayesian probability judgments and belief functions, see Glenn Shafer & Amos Tversky, Languages and Designs for Probability Judgment, in CLASSIC WORKS OF THE DEMPSTER--SHAFER THEORY OF BELIEF FUNCTIONS, supra note 55, at 345.
(57) See Karl Sentz & Scott Ferson, Combination of Evidence in Dempster-Shafer Theory 17-27 (Apr. 2002) (unpublished manuscript), available at http://www.sandia. gov/epistemic/Reports/SAND2002-0835.pdf (describing thirteen alternatives).
(58) See Schum, supra note 5, at 852-53 (recounting the lack of empirical support).
(59) See supra text accompanying note 37.
(60) See KOSKO, supra note 10, at 176-80 (describing its use in fuzzy computer systems).
(61) See Novak, supra note 49 ("Mathematical fuzzy logic has two branches: fuzzy logic in narrow sense (FLn) and fuzzy logic in broader sense (FLb). FLn is a formal fuzzy logic which is a special many-valued logic generalizing classical mathematical logic.... FLb is an extension of FLn which aims at developing a formal theory of human reasoning.").
(62) See Ronald R. Yager, New Paradigms for Reasoning with Uncertain Information, 13 CARDOZO L. REV. 1005, 1017-24 (1991) (explaining approximate reasoning).
(63) See generally Mark Colyvan, Is Probability the Only Coherent Approach to Uncertainty?, 28 RISK ANALYSIS 645 (2008) (arguing that probability theory's dealing with uncertainty stands on controversial premises and suggesting examples of non-probabilistic uncertainty); Bart Kosko, Fuzziness vs. Probability, 17 INT'L J. GEN. SYS. 211 (1990) (discussing and contrasting fuzziness and probability).
(64) KOSKO, supra note 10, at 15.
(65) See id. at 44-46 (using this image).
(66) See supra note 8.
(67) See ENDICOTT, supra note 10 (arguing that vagueness plays a significant role in law, a role not owing solely to the vagueness of language); LEO KATZ, WHY THE LAW IS SO PERVERSE 139--56 (2011) (cataloging examples of vague concepts); Andrei Marmor, Varieties of Vagueness in the Law (USC Legal Studies Research Paper No. 12-8, Apr. 2012), available at http://ssrn.com/abstract=2039076 (articulating the different types of vagueness in law, beyond those entailing only a simple sorites sequence); Scott Soames, Vagueness in the Law, in THE ROUTLEDGE COMPANION TO PHILOSOPHY OF LAW 95 (Andrei Marmor ed., 2012) (bridging between philosophical logic and legal philosophy). Consequently, legal scholars are increasingly using fuzzy logic. See Michael T. Nguyen, Note, The Myth of "Lucky" Patent Verdicts: Improving the Quality of Appellate Review by Incorporating Fuzzy Logic injury Verdicts, 59 HASTINGS L.J. 1257, 1261 n.28 (2008) (listing examples).
(68) KOSKO, supra note 10, at 25-26; see id. at 33 ("At the midpoint you cannot tell a thing from its opposite, just as you cannot tell a half-empty glass from a half-full glass.").
(69) See generally MACIEJ WYGRALAK, VAGUELY DEFINED OBJECTS (1996). The degree of membership in a fuzzy set can itself be imprecise or otherwise uncertain, making what is called an ultra-fuzzy set or a type-2 fuzzy set, which operates as an initial step toward type-n fuzzy sets and operates in contrast to the type-1 fuzzy sets discussed up to here. See Mark Jablonowski, An Ultra-fuzzy Model of Aggregate Growth in Catastrophic Risk Potentials, 2008 ANN. MEETING N. AM. FUZZY INFO. PROCESSING SOC'Y (May 19-22, 2008) (modeling using ultra-fuzzy sets). Basically, representation of the uncertainty in the degree of membership comes out in a third dimension from each degree of membership represented in two dimensions. See Jerry M. Mendel, Type-2 Fuzzy Sets and Systems: An Overview, IEEE COMPUTATIONAL INTELLIGENCE MAG., Feb. 2007, at 20, available at http://sipi.usc.edu/~mendel/publications/MENDEL%20CI%20Magazine%202007.pdf. The third dimension can be projected back into the two dimensions to create the footprint of uncertainty, or FOU, which I mention just to suggest how cool the terminology in tiffs field gets.
Of course, the logical operators become much more complicated to account for this third dimension. See Nilesh N. Karnik & Jerry M. Mendel, Operations on Type-2 Fuzzy Sets, 122 FUZZY SETS & SYS. 327 (2001); see also Jerry M. Mendel, Type-2 Fuzzy Sets and Systems: How to Learn About Them, IEEE SMC ENEWSL. (Sys., Man & Cybernetics Soc'y, New York, N.Y.), June 2009, available at http://sipi.usc.edu/~mendel/publications/MendelSMCeNewsletter6-09.pdf (discussing existing work on ultra-fuzzy sets). However, here there is no reason to explore the sophistication of logical operators for ultra-fuzzy sets, because the law seems to treat all measures of truth simply as typed fuzzy sets and instead roughly accounts for other second-order-like uncertainty through belief functions.
(70) See Ariel Porat & Eric A. Posner, Aggregation and Law, 122 YALV. L.J. 2, 2 (2012) (coining the terms "normative aggregation" and "factual aggregation").
(71) Nicholas J.J. Smith, Degree of Belief Is Expected Truth Value, in CUTS AND CLOUDS 491, 491 (Richard Dietz & Sebastiano Moruzzi eds., 2010) (cataloging the difficulties that would come from entertaining both probabilities and degrees of belief, when the two underlying logic systems employ different operators).
(72) See KOSKO, supra note 10, at 55-64 (calling this set "the whole in the part").
(73) See infra Part II.A. (explaining the MIN operator). This approach would also handle the uncertainty of whether an imprecise event occurred at all. See Charles M. Yablon, On the Allocation of Burdens of Proof in Corporate Law: An Essay on Fairness and Fuzzy Sets, 13 CARDOZO L. REV. 497 (1991) (treating a transaction's fairness as a fuzzy set, while trying at the same time to account for subjective probability of occurrence).
(74) See Shapira, supra note 24, at 1614-15 (arguing for the superiority of degrees of belief).
(75) Schum, supra note 36, at 216 (internal quotation marks omitted).
(76) Bogdan R. Kosanovic, Fuzziness and Probability 2-3 (Feb. 8, 1995) (unpublished manuscript) (emphasis omitted), available at http://www.neuronet.pitt.edu/ ~bogdan/research/fuzzy/fvsp/fvsp.html. In early 2012 some congressional offices received threatening letters containing a suspicious powder. The letter promised additional mailings and said there was a "'10 percent chance that you have just been exposed to a lethal pathogen.'" Andrew Taylor, Congressional Offices Receive Mailed Threats, YAHOO! NEWS (Feb. 23, 2012, 1:47 AM), http://news.yahoo.com/congressional-offices-receive- mailed-threats-220538717.html.
(77) See Richard Bellman & Magnus Giertz, On the Analytic Formalism of the Theory of Fuzzy Sets, 5 INFO. SCI. 149, 151-52 (1973) (showing the equivalence between fuzzy sets and fuzzy statements).
(78) See Clermont, supra note 20, at 1116-34 (drawing examples from such areas as standard of proof, standard of review, harmless error, trial motions, police actions, and administrative law).
(79) See J. P. McBaine, Burden of Proof: Degrees of Belief, 32 CALIF. L. REV. 242 (1944) (arguing that, for standards of proof, only three levels of strength exist; using "degrees of belief' in Bentham's sense of strength of belief); C.M.A. McCauliff, Burdens of Proof" Degrees of Belief, Quanta of Evidence, or Constitutional Guarantees?, 35 VAND. L. REV. 1293 (1982).
(80) See Clermont, supra note 20, at 1121-23.
(81) Absolute judgment involves reference to a remembered scale. Although not entirely distinct, relative judgment concerns the considerably greater capacity of people to distinguish between two or more different stimuli that they can compare directly. See WILLIAM N. DEMBER & JOEL S. WARM, PSYCHOLOGY OF PERCEPTION 113, 116-17 (2d ed. 1979).
(82) See Terrence L. Fine, [The Axioms of Subjective Probability]: Comment, 1 STAT. SCI. 352, 353 (1986).
(83) See Clermont, supra note 20, at 1139-44 (recounting empirical evidence); Yanlong Sun et al., Probabilistic Judgment on a Coarser Scale, 9 COGNITIVE SYS. RES. 161 (2008) (recounting more recent, consistent results); supra text accompanying note 50.
(84) Clermont, supra note 20, at 1156.
(85) See SIDER, supra note 6, at 25, 35-37, 67-80 (showing also that in going beyond two-valued logic, one needs to stipulate the implication operator as well).
(86) See Brian R. Gaines, Fuzzy and Probability Uncertainty Logics, 38 INFO. & CONTROL 154 (1978) (showing that the operators for fuzzy logic and probability theory are the same until one adds the assumption of the excluded middle).
(87) For elaborations of fuzzy intersection and union, see Radim Belohlavek et al., On the Capability of Fuzzy Set Theory to Represent Concepts, 31 INT'LJ. GEN. SES. 569, 575 (2002); Ronald R. Yager, Connectives and Quantifiers in Fuzzy Sets, 40 Fuzzy SETS & SYS. 39 (1991).
(88) The figure comes from Ross & Parkinson, supra note 17, at 33; see also id. at 34-36 (extending the operator from an element's membership in multiple fuzzy sets to the relationship of different elements' memberships in different fuzzy sets).
(89) See SIDER, supra note 6, at 1-2, 6-11.
(90) Amos Tversky & Daniel Kahneman, Extensional Versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment, in HEURISTICS AND BIASES 19, 20 (Thomas Gilovich et al. eds., 2002) (emphasis omitted).
(91) The proof would go as follows. Reasoning backward from what is necessary for a system to make sense,
x [conjunction] x = x (1)
x [disjunction] x = x (2)
x [conjunction] y [less than or equal to] x (3)
x [disjunction] y [greater than or equal to] x (4)
while associativity and distributivity need to prevail as well,
(x [conjunction] y) [conjunction] z = x [conjunction] (y [conjunction] z) (5)
x [disjunction] (y [conjunction] z) = (x [disjunction] y) [conjunction] (y [disjunction] z) (6).
Then, using (2) and (3),
x [conjunction] (x [disjunction] y) = (x [disjunction] x) [conjunction] (x [disjunction] y) [less than or equal to] x
and, using (4) and (6),
x [disjunction] (x [conjunction] y) = (x [disjunction] x) [conjunction] (x [disjunction] y) [greater than or equal to] x
and their having been shown to be equal, and both [less than or equal to] and [greater than or equal to] x,
x [conjunction] (x [disjunction] y) = x [disjunction] (x [conjunction] y) = x (7).
Now, designate y as the lesser or equal of the two truth values x and y. There should be a z such that x [conjunction] z = y, which allows the final conversions with the use of (7) and of (5) and (1), respectively:
x [disjunction] y = x [disjunction] (x [conjunction] z) = x = MAX(x, y)
x [conjunction] y = x [conjunction] (x [conjunction] z) = (x [conjunction] x) [conjunction] z = x [conjunction] z = y = MIN (x,y).
See Bellman & Giertz, supra note 77, at 151-55 (proving that the MIN and MAX operators "are not only natural, but under quite reasonable assumptions the only ones possible" for fuzzy sets); D. Dubois & H. Prade, A Review of Fuzzy Set Aggregation Connectives, 36 INFO. SCI. 85, 89-92 (1985) (showing that conjoined membership must of course be less than or equal to the minimum membership, but that accepting a value less than that minimum would produce nonsensical results).
(92) SIDER, supra note 6, at 72.
(93) The figure comes from 3 AVI SION, LOGICAL AND SPIRITUAL REFLECTIONS ch. 4 (2008), available at http://www.thelogician.net/6_reflect/6_book_3/6c_chapter_04. htm.
(94) For interdependent events, the probability operation for conjunction is P(A) multiplied by P(B|A). Meanwhile, fuzzy logic tells us to apply the MIN operator, which is so much easier to comprehend and apply.
(95) Analogously, De Morgan's rule provides that the product rule works for the OR operator on probabilities too: the disjunction of two independent statements equals the negation of the conjunction of the negations of those statements. Meanwhile, fuzzy logic tells us to apply the MAX operator.
(96) See H. G. WELLS, A MODERN UTOPIA 381 (1905) ("I would undertake to defeat any definition of chair or chairishness that you gave me.").
(97) A colleague gave this illustration:
Suppose that the Black Death strikes some town in England in 1349. Let's suppose that by the end of the year it kills 500 of the 1000 people then living in the town. A historian today is interested in figuring out whether ten particular people who lived in the town at the beginning of 1349 were killed by the Black Death later that year. The historian searches through the cemeteries, through church records and through other materials but comes up empty. There is simply no specific credible evidence about how any of these ten died. The historian can't even figure out how old each of them was at the time and thus adjust the odds based on different survival rates for different ages. Accordingly, his best guess is that for each of the townspeople, there is 50% probability that he or she died from the Black Death. Now the historian wants to know what are the odds that all ten died from the Black Death. The product rule says it's 1/1024 (unless there's some reason to think these are connected events, like they shared a household, so let's assume no info is known about such things). Fuzzy logic says it's one in two, which seems very obviously wrong. Indeed, assuming again that we know nothing further about any of the inhabitants of the town, fuzzy logic would tell us that the odds that everyone in the town died from the Black Death are one in two, but we know--because we assumed it to begin the inquiry--that the odds that everyone in the town died of the Black Death are zero. Only half of the inhabitants died of the Black Death. This seems to me a proof by contradiction of the applicability of the product rule to this sort of case.
E-mail from Michael Dorf to author (June 3, 2012, 22:04 EST).
(98) See John Leubsdorf, The Standard for Preliminary Injunctions, 91 HARV. L. REV. 525, 542 (1978) ("The court, in theory, should assess the probable irreparable loss of rights an injunction would cause by multiplying the probability that the defendant will prevail by the amount of the irreparable loss that the defendant would suffer if enjoined from exercising what turns out to be his legal right. It should then make a similar calculation of the probable irreparable loss of rights to the plaintiff from denying the injunction. Whichever course promises the smaller probable loss should be adopted." (footnote omitted)).
(99) See Didier Dubois & Henri Prade, A Set--Theoretic View of Belief Functions: Logical Operations and Approximations by Fuzzy Sets, in CLASSIC WORKS OF THE DEMPSTER-SHAFER THEORY OF BELIEF FUNCTIONS, supra note 55, at 375, 403 (rejecting the application of "arguments deriving from the study of statistical experiments").
(100) Richard W. Wright, Proving Facts: Belief Versus Probability, in EUROPEAN TORT LAW 2008, at 79, 93 (Helmut Koziol & Barbara C. Steininger eds., 2009).
(101) Cf. SCHUM, supra note 22, at 243 (discussing belief functions, and crediting Judea Pearl, Bayesian and Belief-Functions Formalisms for Evidential Reasoning: A Conceptual Analysis, in READINGS IN UNCERTAIN REASONING 540, 571 (Glenn Sharer & Judea Pearl eds., 1990), for this phrasing).
(102) L. JONATHAN COHEN, THE PROBABLE AND THE PROVABLE (1977), reviewed by David A. Schum, 77 MICH. L. REV. 446 (1979), and Carl G. Wagner, 1979 DUKE L.J. 1071; see also BERTRAND RUSSELL, HUMAN KNOWLEDGE: ITS SCOPE AND LIMITS 359--61 (1948) (arguing comparably that his "degrees of credibility" do not follow the rules of traditional probability); Susan Haack, The Embedded Epistemologist: Dispatches from the Legal Front, 25 RATIO JURIS 206, 217-18 (2012) (arguing comparably that her "degrees of warrant" do not follow the rules of traditional probability).
(103) COHEN, supra note 102, at 266.
(104) JASON ROSENHOUSE, THE MONTY HALL PROBLEM 5 (2009). The literature here is immense. One nifty entry was a report on how overlooking the additional-information effect on probabilities had invalidated decades of research on cognitive dissonance. See John Tierney, And Behind Door No. 1, a Fatal Flaw, N.Y. TIMES, Apr. 8, 2008, at F1, available at http://www.nytimes.com/2008/04/08/science/08tier.html ("Even some of the smartest mathematicians initially come up with the wrong answer to the Monty Hall Problem. Perhaps the best way to understand it is to play the game yourself."). The web-version of the article links to a site, http://www.nytimes.com/2008/04/08/ science/08monty.html#, that allows you to play the game repetitively and so build to the right strategy.
(105) See ROSENHOUSE, supra note 104, at 26, 138-41, 147-48.
(106) If you resist the result, and persist with two-out-of-three odds that the younger child is a girl, I propose the following gamble to you. You will bet on whether a hidden flipped coin is heads. But before placing the bets, I flip another coin, and it comes up tails. You then should believe there is a two-thirds chance that the hidden coin is heads, and so should offer me better than even money.
The effect of additional information emerges from this sequence: Flip two coins. What are the odds that they will both be heads? One-in-four. If you know one of them was heads, what are the odds they both were heads? One-in-three, because knowing the result of one flip tells us something about the other. If instead you know the first flip was heads, what are the odds they both were heads? One-in-two.
(107) See ROSENHOUSE, supra note 104, at 14-16. Three boxes respectively contain two black balls, two white balls, and one black ball and one white ball. You pick a ball from one box, and it is black. What are the odds that it came from the mixed box? The answer is one-in-three.
(108) Recall my illustration of the two ordered columns of tall and smart people, supra text accompanying note 98.
(109) Another colleague, after putting aside problems of market share and statistical evidence, challenges me to
assume the plaintiffs decedent took a drug either from manufacturer D1 or from manufacturer D2 (the drugs are identical). Assume 60% probability of D1 and 40% of D2. Assume further that it is 60% likely that the drug (from whichever manufacturer) actually caused the death. So 4 possibilities: Dl's drug caused the death (36%); D2's drug caused the death (24%); neither D1 nor D2 caused the death (40%). Why should P collect against DI?
E-mail from George Alan Hay to author (June 7, 2012, 11:15 EST) (names of parties altered).
Professor Hay is conducting a thought experiment, in which we pull off the veil to reveal a bivalent scheme and then randomly distribute the cause over the identity results. The drug as cause of death sometimes becomes a 1, but in forty out of one hundred cases it will become 0. The zeros fall randomly, instead of dropping out.
The thought experiment has no relevance to what the law or economic theory should do based on the actual proof, however. The thought experiment changes the problem, changing it in ways that affect what law and economics would do with respect to D1's liability. In the case against D1, liability when the cause will randomly be either 1 or 0 is a different question from liability when the plaintiff has proved cause to 60%.
So, I am indeed saying that P has a 60% provable case against D1, and should win just as the law says. P enjoys a truth value of 60% on the proposition that D1 made the drug, and another of 60% that the drug caused the death. This means that the proposition that Dl's drug caused the death is 60% a member of the set of true statements. In other words, given the current state of our knowledge, P has 60% of full proof against D1. To decide against P would be to favor a defendant with 40% of a defense.
(110) I rest on the positive arguments for the appropriateness of fuzzy logic. Additionally, there are practical arguments against the product rule. It might be cognitively challenging to apply. See Porat & Posner, supra note 70, at 47-48. Or the effect of an element-by-element approach might offset the inefficiencies of other legal rules. See Alex Stein, Of Two Wrongs That Make a Right: Two Paradoxes of the Evidence Law and Their Combined Economic Justification, 79 TEX. L. REV. 1199 (2001). I do not need to rely on these practical arguments.
Some such arguments, however, are just wrong. For example, some argue that for jury decision-making the necessity of convincing multiple fact-finders means, by virtue of the Condorcet theorem and the supermajority requirement, that the plaintiff's task is way too demanding; accordingly, to ameliorate the difficulty of proof, the system does not impose the additional demand of a product rule and instead proceeds element-by-element. See Saul Levmore, Conjunction and Aggregation, 99 MICH. L. REV. 723, 734-45 (2001). This position rests on several errors. See Ronald J. Allen & Sarah A.Jehl, Burdens of Persuasion in Civil Cases: Algorithms v. Explanations, 2003 MICH. ST. L. REV. 893, 904-19; Paul H. Edelman, On Legal Interpretations of the Condorcet Jury Theorem, 31J. LEGAL STUD. 327, 343--48 (2002). To me, the most obvious error lies in ignoring that a decision for the defendant also requires the agreement of the multiple fact-finders.
(111) But cf. Bellman & Giertz, supra note 77, at 155-56 (showing that other negation operators are possible).
(112) The figure comes from TIMOTHY J. Ross, Fuzzy LOGIC WITH ENGINEERING APPLICATIONS 37 (3d ed. 2010).
(113) Charles Nesson, The Evidence or the Event? On Judicial Proof and the Acceptability of Verdicts, 98 HARV. L. REV. 1357, 1385-88 (1985) (footnotes omitted). Professor Nesson saw the paradox as illustrating his broad thesis that the law's process of proof aims at generating acceptable statements about past events and thus at projecting behavioral norms to the public, rather than at reaching probable conclusions in a search for truth:
Application of the more-probable-than-not test to each element produces the most acceptable conclusion as to that element. The conjunction of these conclusions constitutes a story that is more probable than any other story about the same elements. Suppose, for example, that the elements of a story are A and B, and A (70%) is more probable than not-A (30%), and B (60%) is more probable than not-B (40%). The conjunction (A & B) (42%) may not be more probable than its negation (not-(A & B)) (58%). But the conjunction (A & B) (42%) is more probable than any other version: (A & (not-B)) (28%), ((not-A) & B) (18%), or ((not-A) & (not-B)) (12%). The application of the more-probable-than-not standard of proof on an element-by-element basis will produce the single most probable story.
Id. at 1389-90 (footnotes omitted). See generally J.S. COVINGTON, JR., THE STRUCTURE or LEGAL ARGUMENT AND PROOF 347--57 (2d ed. 2006) (discussing the misuses of probability theory at trial).
(114) See James A. Henderson, Jr. et al., Optimal Issue Separation in Modern Products Liability Litigation, 73 TEX. L. REV. 1653, 1655-59, 1667-75 (1995).
(115) David A. Moran, Jury Uncertainty, Elemental Independence and the Conjunction Paradox: A Response to Allen and Jehl, 2003 MICH. ST. L. REV. 945, 946-47, 950.
(116) See generally JEWREV T. FREDERICK, THE PSYCHOLOGY OF THE AMERICAN JURY 296--99 (1987) (providing a brief overview of the story model of evidence processing); REID HASTIE ET AL., INSIDE THE JURY 22-23 (1983) (providing a brief summary of empirical studies supporting the story model); Paula L. Hannaford et al., The Timing of Opinion Formation by Jurors in Civil Cases: An Empirical Examination, 67 TENN. L. REV. 627, 629-32 (2000) (discussing "three predominant models of jury decision making"); Jill E. Huntley & Mark Costanzo, Sexual Harassment Stories: Testing a Story-Mediated Model of Juror Decision-Making in Civil Litigation, 27 LAW & HUM. BEHAV. 29, 29 (2003) (presenting research that "extends the story model to civil litigation and tests a story-mediated model against an unmediated model of jury decision-making"); Nancy Pennington & Reid Hastie, The Story Model for Juror Decision Making, in INSIDE THE JUROR 192 (Reid Hastie ed., 1993) (detailing the story model and summarizing empirical studies testing it).
(117) FREDERICK, supra note 116, at 296-97 (citations omitted).
(118) Compare Reid Hasfie, What's the Story ? Explanations and Narratives in Civil Jury Decisions, in CIVIL JURIES AND CIVIL JUSTICE 23, 31--32 (Brian H. Bornstein et al. eds., 2008) (expanding the theory to allow for a party's multiple stories), with Michael S. Pardo, The Nature and Purpose of Evidence Theory, 66 VAND. L. REV. (forthcoming 2013), available at http://ssrn.com/abstract=2060340 (discussing the theory's difficulties in handling multiple stories).
(119) 3 KEVIN F. O'MALLEY ET AL., FEDERAL JURY PRACTICE AND INSTRUCTIONS: CIVIL [section] 104.01 (6th ed. 2011):
Plaintiff has the burden in a civil action, such as this, to prove every essential element of plaintiff's claim by a preponderance of the evidence. If plaintiff should fail to establish any essential element of plaintiff's claim by a preponderance of the evidence, you should find for defendant as to that claim.
See Allen & Jehl, supra note 110, at 897-904 (criticizing Dale A. Nance, Commentary, A Comment on the Supposed Paradoxes of a Mathematical Interpretation of the Logic of Trials, 66 B.U.L. REV. 947, 949-51 (1986) (finding this instruction ambiguous)).
(120) See Bruce Ching, Narrative Implications of Evidentiary Rules, 29 QUINNIPIAC L. REV. 971 (2011) (discussing a narrative perspective in persuasion, and for evidentiary rules such as hearsay and party admissions); Doron Menashe & Mutal E. Shamash, The Narrative Fallacy, 3 INT'L COMMENT. ON EVIDENCE iss. 1, art. 3 (2006) (using narrative theory to criticize holistic evidence theory).
(121) But cf. Elizabeth G. Thornburg, The Power and the Process: Instructions and the Civil Jury, 66 FORDHAM L. REV. 1837, 1857-63 (1998) (questioning whether a special verdict actually changes the jury's decision-making practice).
(122) See COHEN, supra note 102, at 267 ("So on the inductivist analysis, if the plaintiff gains each of his points on the balance of probability, he can be regarded as gaining his case as a whole on that balance ..., without any constraint's being thereby imposed on the number of independent points in his case or on the level of probability at which each must be won.").
(123) See Gregg C. Oden, Integration of Fuzzy Logical Information, 3 J. EXPERIMENTAL PSYCHOL.: HUM. PERCEPTION & PERFORMANCE 565, 568--572 (1977). His experiment involved having students judge the degree of truthfulness of statements like "a chair is furniture" and "a pelican is a bird," and asking them for the degree to which both statements together were true. The students seemed to use the product rule rather than the MIN rule. But it seems to me that the students could have interpreted these statements as verifiably being either completely true or completely false, thus making the product rule appropriate. Moreover, other experiments indicate that people do use fuzzy operators. See Rami Zwick, David V. Budescu & Thomas S. Wallsten, An Empirical Study of the Interpretation of Linguistic Probabilities, in FUZZY SETS IN PSYCHOLOGY 91, 114--16 (Tamas Zetenyi ed., 1988) (indicating that people do not use the product rule naturally). In any event, in the legal system any human failing to conjoin properly would be offset by the human tendency to construct a story for the whole case instead of proceeding element-by-element. Cf. Tversky & Kahneman, supra note 90, at 19 (discussing biases that tend to ignore conjunction); Amos Tversky & Derek J. Koehler, Support Theory: A Nonextensional Representation of Subjective Probability, in HEURISTICS AND BIASES, supra note 90, at 441 (same).
(124) Allen & Jehl, supra note 110, at 939; see Massachusetts v. U.S. Dep't of Health & Human Servs., 682 F.3d 1 (1st Cir. 2012) (striking down DOMA on both equal protection and federalism grounds), noted in Mike Doff, Is the First Circuit's Opinion in the DOMA Case Insufficiently "Fuzzy"?, DORF ON LAW (June 4, 2012, 12:30 AM), http:// www.dorfonlaw.org.
A recent article attacks the law's general refusal to aggregate the probabilities of independent claims and defenses, even arguing for conviction upon the basis of a number of criminal offenses almost proved. Porat & Posner, supra note 70. Compare Alon Harel & Ariel Porat, Aggregating Probabilities Across Cases: Criminal Responsibility for Unspecified Offenses, 94 MINN. L. REV. 261, 261-62 (2009) ("Should a court convict a defendant for an unspecified offense if there is no reasonable doubt that he committed an offense, even though the prosecution cannot prove his guilt as to a particular offense beyond a reasonable doubt? Stated otherwise, is committing an offense sufficient for a conviction or must a prosecutor establish what this offense is to justify a conviction? This Article contends that, under certain conditions, a prosecutor should not have to establish the particular offense committed by a defendant--proof that the defendant committed an offense should be sufficient."), with Frederick Schauer & Richard Zeckhauser, On the Degree of Confidence for Adverse Decisions, 25 J. LEGAL STUD. 27, 41-47 (1996) (conceding the probability argument, but arguing that the criminal law still should not convict for reasons of abundant caution). My position is that the paradox that motivated their whole article does not exist. The law should not, and does not, aggregate. Their article's cited exceptions, where the law does seem to aggregate (for example, alternative or market share liability), actually constitute changes to the substantive law in a manner comparable to imposing strict liability, rather than resulting from an odd application of traditional probability to the standard of proof. See Kevin M. Clermont, Aggregation of Probabilities and Illogic, 47 GA. L. REV. 165 (2012).
(125) See SHAFER, supra note 23, at 35-37.
(126) See Liping Liu & Ronald R. Yager, Classic Works of the Dempster-Shafer Theory of Belief Functions: An Introduction, in CLASSIC WoRKs OF THE DEMPSTER--SHAFERE THEORY OF BELIEF FUNCTIONS, supra note 55, at 1, 2-19 (recounting also the history of belief function theory).
(127) See Barnett, supra note 55, at 200-01 (providing a neat mental image for these bounds); A. P. Dempster, Upper and Lower Probabilities Induced by a Multivalued Mapping, 38 ANNALS MATHEMATICAL STAT. 325 (1967); L.A. Zadeh, Fuzzy Sets as a Basis for a Theory of Possibility, 1 Fuzzy SETS & SYS. 3 (1978).
(128) See Dubois & Prade, supra note 99, at 375 (arguing for the basic compatibility of the two approaches); John Yen, Generalizing the Dempster-Shafer Theory to Fuzzy Sets, in CLASSIC: WORKS OF THE DEMPSTER--SHAFER THEORY OF BELIEF FUNCTIONS, supra note 55, at 529 (showing how to form beliefs about membership in a fuzzy set); cf. SCHUM, supra note 22, at 266-69 (observing that one can fuzzify belief functions).
(129) See COHEN, supra note 102, at 49-57, 245-64 (developing, as an alternative to Pascalian (or mathematicist) probability, a Baconian (or inductive) theory of probability). Baconian theory tries to look not only at the evidence presented, but also at the evidence not available. It makes evidential completeness a key criterion, and thereby stresses an important concern.
(130) See Clermont & Sherwin, supra note 32; Kevin M. Clermont, Standards of Proof in Japan and the United States, 37 CORNELL INT'L L.J. 263 (2004); Clermont, supra note 31; Wright, supra note 100; Richard W. Wright, Proving Causation: Probability Versus Belief in PERSPECTIVES ON CAUSATION 195 (Richard Goldberg ed., 2011). With their emphasis on "conviction" in the intime conviction standard, the civil-law countries signal their devotion to belief, albeit a belief seemingly built upon a binary world view (and perhaps a belief compared to an absolute threshold inherited from the criminal model). Such an approach fit better with an inquisitorial model than it did with an adversarial model, allowing it to persist for centuries. But its survival until today may rest instead on the civil-law system's desire to enhance the appearance of legitimacy.
(131) See, e.g., Jaffee, supra note 45, at 934-51 (attacking the use of probability in analyzing proof); Tribe, supra note 2 (writing the classic version of the lament, in which Professor Tribe stressed not only the risk of misuse of mathematical techniques, including inaccurate meshing of numerical proof with soft or unquantifiable variables, but also the undercutting of society's values, including the dehumanization of the legal process); Adrian A.S. Zuckerman, Law, Fact or Justice?, 66 B.U.L. REV. 487, 508 (1986) (arguing that probabilistic assessment diminishes "the hope of seeing justice supervene in individual trials," while seeing fact-finding as an individualized but value-laden process).
(132) See Bellman & Giertz, supra note 77, at 155-56 (showing that negation can have multiple meanings).
(133) See, e.g., Richard Lempert, The New Evidence Scholarship: Analyzing the Process of Proof 66 B.U.L. REV. 439, 462-67 & n.60 (1986) (noting that employing 1:1 as the appropriate odds for someone who is ignorant of the true facts can cause many problems).
(134) See Brilmayer, supra note 5, at 686-88.
(135) See supra note 45 and accompanying text.
(136) See SHAVER, supra note 23, at 22-24 (exploring the role of the "representation of ignorance" in belief functions).
(137) See 9 JOHN H. WIGMORE, EVIDENCE [section] 2487 (James H. Chadbourn rev., 1981); cf. John T. McNaughton, Burden of Production of Evidence: A Function of a Burden of Persuasion, 68 HARV. L. REV. 1382 (1955) (offering alternative diagrams).
(138) See John Farley Thorne III, Comment, Mathematics, Fuzzy Negligence, and the Logic of Res Ipsa Loquitur, 75 Nw. U. L. REV. 147 (1980) (justifying the res ipsa loquitur doctrine by use of fuzzy logic).
(139) See FED. R. EVID. 301.
(140) See Stimpson v. Hunter, 125 N.E. 155, 157 (Mass. 1919) ("[T]he failure of the defendant and his son to testify although present in court was not equivalent to affirmative proof of facts necessary to maintain the action.").
(141) See Cruzan v. N.Y. Cent. & Hudson River R.R. Co., 116 N.E. 879, 880 (Mass. 1917) ("Mere disbelief of denials of facts which must be proved is not the equivalent of affirmative evidence in support of those facts.").
(142) See Dyer v. MacDougall, 201 F.2d 265, 269 (2d Cir. 1952) (holding that although demeanor evidence is probative, it does not suffice to escape a directed verdict).
(143) See Guenther v. Armstrong Rubber Co., 406 F.2d 1315, 1318 (3d Cir. 1969) (dictum) (saying, in a case where the plaintiff had been injured by an exploding tire, that a 75 to 80% chance it came from the defendant manufacturer was not enough for the case to go to the jury). For a more complete consideration of statistical evidence and its ultimately non-paradoxical nature, see RICHARD H. FIELD, BENJAMIN KAPLAN & KEVIN M. CLERMONT, MATERIALS FOR A BASIC COURSE IN CIVIL PROCEDURE 1352-56 (10th ed. 2010) (explaining how a fact-finder converts statistical evidence into a belief).
(144) See Baxter v. Palmigiano, 425 U.S. 308, 316-20 (1976) (treating failure to testify as sufficient evidence).
(145) See Robert S. Summers, Evaluating and Improving Legal Processes--A Plea for "Process Values, "60 CORNELL L. REV. 1, 4 (1974) (arguing that a legal process can be both a means to good results and a means to serve process values such as "participatory governance, procedural rationality, and humaneness").
(146) See Anne W. Martin & David A. Schum, Quantifying Burdens of Proof" A Likelihood Ratio Approach, 27 JURIMETRICS J. 383, 390-93 (1987) (surveying a small sample of students for their odds of guilt used as the prior probability, which turned out to be 1:1 or 50%).
(147) A separable question is whether the law should instead deliver partial relief following partial proof. See ENDICOTT, supra note 10, at 72-74. The answer is not obviously affirmative. See David Kaye, The Limits of the Preponderance of the Evidence Standard: Justifiably Naked Statistical Evidence and Multiple Causation, 1982 AM. B. FOUND. RES. J. 487 (showing current law's economic superiority to an expected-value approach that would award damages proportional to probabilistic certainty). After cataloging the prevalence of fuzzy concepts in law, Professor Katz concludes that law is correct to draw lines in the fuzz and so separate all-or-nothing remedies; he argues that law must establish discontinuities in order to retain its rational coherence. See KATZ, supra note 67, at 157-81. Most significantly, I add that partial relief would have to contend with the difficulty that the system does not, and logically could not, charge the fact-finder to find upon imperfect evidence the degree to which the plaintiff is right, as I next develop.
(148) See KOSKO, supra note 10, at 172 (describing the step in fuzzy computer systems).
(149) See Peter Tillers & Jonathan Gottfried, Case Comment--United States v. Copeland, 369 F. Supp. 2d 275 (E.D.N.Y. 2005): A CollateraI Attack on the Legal Maxim That Proof Beyond a Reasonable Doubt Is Unquantifiable?, 5 LAw PROBABILITY & RISK 135, 142 (2006) (observing the difference between processing evidence and applying standards of proof); cf. Pardo, supra note 118, at 8 (calling these two stages the micro-level and the macro-level of proof). Two Belgian scholars helpfully elaborated belief functions to distinguish the formulation of beliefs during a "credal" stage (from the Latin for believe) and decision-making during the "pignistic" stage (from the Latin for a bet). Philippe Smets & Robert Kennes, The Transferable Belief Model, in CLASSIC WORKS OF THE DEMPSTER--SHAFER THEORY OF BELIEF FUNCTIONS, supra note 55, at 693.
(150) See Clermont, supra note 31, at 477, 485 (recounting the state of psychological knowledge on standards of proof).
(151) See, e.g., Brown v. Bowen, 847 F.2d 342, 345 (7th Cir. 1988) ("[T]he trier of fact rules for the plaintiff if it thinks the chance greater than 0.5 that the plaintiff is in the right.").
(152) See Smith, supra note 71, at 503 (describing a betting scheme based on such comparison). For the mathematics involved, see Smets & Kennes, supra note 149, at 703-11.
(153) See United States ex rel. Bilyew v. Franzen, 686 F.2d 1238, 1248 (7th Cir. 1982) (stressing importance of the persuasion burden and observing that "a judge or a jury can experience only a small, finite number of degrees of certainty.... Thus cases when the evidence ... seem[s] in balance are not unique among some infinite variety of evidentiary balances, but instead are among a much smaller number of [ranges of] possibilities that may be perceived by the fact-finder."); Clermont, supra note 20, at 1119 n.13, 1122 n.36, 1147-48; Cohen, Conceptualizing Proof supra note 30, at 90-91; Cohen, Confidence in Probability, supra note 30, at 418-19.
(154) See Eyal Zamir & Ilana Ritov, Loss Aversion, Omission Bias, and the Burden of Proof in Civil Litigation, 41 J. LEGAL STUD. 165, 197 n.23 (2012).
(155) See Amos Tversky & Daniel Kahneman, Judgment Under Uncertainty: Heuristics and Biases, 185 SCIENCE (n.s.) 1124, 1128 (1974). An example of the anchoring heuristic comes from a study involving subjects asked to estimate quickly, without paper and pencil, the product of 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1, while another group faced 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8; the first group's median estimate was 2250, while the other's was 512; the correct answer is 40,320. Id.
(156) See Santosky v. Kramer, 455 U.S. 745, 757 (1982) ("[T]his Court never has approved a case-by-case determination of the proper standard of proof for a given proceeding.").
(157) See generally Clermont, supra note 20, at 1118-21 (listing the standards of proof and their relative applications to different areas of the law).
(158) See Allen & Jehl, supra note 110, at 929-43 (summarizing Allen's previous work on the theory); see also Edward K. Cheng, Reconceptualizing the Burden of Proof, 122 YALE L.J. (forthcoming 2013) (manuscript at 3-4, 6), available at http://ssrn.com/abstract =2087254 (arguing that statisticians perform hypothesis testing by comparison, so that "evidence scholars need only let go of their love for p > 0.5"; but incorrectly assuming that theory calls for comparing the plaintiffs story to each of the defendant's stories "separately, not simultaneously"); supra note 81 and infra note 166.
(159) The weight of the evidence methodology in science is a similar approach, as is the differential diagnosis approach in medicine that diagnoses by successively eliminating likely causes of a medical condition to reveal the best explanation. See Milward v. Acuity Specialty Prods. Grp., Inc., 639 F.3d 11, 18 (1st Cir. 2011) ("The scientist [when admitting expert evidence based on the weight of the evidence approach] must (1) identify an association between an exposure and a disease, (2) consider a range of plausible explanations for the association, (3) rank the rival explanations according to their plausibility, (4) seek additional evidence to separate the more plausible from the less plausible explanations, (5) consider all of the relevant available evidence, and (6) integrate the evidence using professional judgment to come to a conclusion about the best explanation."); Westberry v. Gislaved Gummi AB, 178 F.3d 257 (4th Cir. 1999) (admitting expert evidence based on differential diagnosis). These methods involve consideration and analysis of alternative explanations to get the one that best explains the evidence, a mode of reasoning called inference to the best explanation. Allen is drifting in his thinking in this direction. See Ronald J. Allen & Michael S. Pardo, Juridical Proof and the Best Explanation, 27 LAW & PHIL. 223, 226 (2008) (providing an account of the "abductive reasoning process of inference to the best explanation"). However, Larry Laudan, Strange Bedfellows: Inference to the Best Explanation and the Criminal Standard of Proof 11 INT'L J. EVIDENCE & PROOF 292 (2007), powerfully demonstrates that inference to the best explanation holds little additional promise of explaining or illuminating standards of proof.
(160) See Richard D. Friedman, "E" Is for Eclectic: Multiple Perspectives on Evidence, 87 VA. L. REV. 2029, 2046-47 (2001); cf. Clermont, supra note 20, at 1119-20, 1122-26 (discussing not only the standard of clear and convincing evidence, but also procedure's lower standards including slightest, reasonable, and substantial possibilities). But cf. Ronald J. Allen, The Nature of Juridical Proof 13 CARDOZO L. REV. 373, 413 (1991) (attempting to explain the beyond-a-reasonable-doubt standard as not being satisfied if the fact-finder "concludes that there is a plausible scenario consistent with innocence," while admitting that the clear-and-convincing standard is "troublesome" under his theory because it seems cardinal); Ronald J. Allen & Brian Leiter, Naturalized Epistemology and the Law of Evidence, 87 VA. L. REV. 1491, 1528 (2001) ("IT]he prosecution must provide a plausible account of guilt and show that there is no plausible account of innocence.").
(161) See Ronald J. Allen, Standards of Proof and the Limits of Legal Analysis 14 (May 25-26, 2011) (unpublished conference paper), available at http://ssrn.com/ abstract= 1830344.
(162) See Craig R. Callen, Commentary, Kicking Rocks with Dr. Johnson: A Comment on Professor Allen's Theory, 13 CARDOZO L. REV. 423 (1991) (arguing that for the purposes of the study of evidence and fact-finding, the insights from cognitive science may be more far-reaching than Professor Allen suggested in The Nature of Juridical Proof).
(163) See Laudan, supra note 159, at 304-05 ("The trier of fact cannot say, 'Although plaintiff's case is stronger than defendant's, I will reach no verdict since neither party has a frightfully good story to tell.' Under current rules, if the plaintiff has a better story than the defendant, he must win the suit, even when his theory of the case fails to satisfy the strictures required to qualify his theory as the best explanation.").
(164) See McBaine, supra note 79, at 248-49.
(165) See COHEN, supra note 102, at 255 ("The cardinal question to be settled by the trier of fact may always be construed as this: on the facts before the court, is the conclusion to be proved by the plaintiff more inductively probable than its negation?").
(166) See, e.g., Michael S. Pardo, Second-Order Proof Rules, 61 FLA. L. REV. 1083 (2009) (speaking of the comparison imposed by more likely than not, but using "negation" in the sense of the complement of Bel(S)). Pardo explains that the comparison
might mean the likelihood of the plaintiff's factual allegations versus the negation of those allegations, or it might mean the likelihood of the plaintiff's allegations versus the likelihood of the defendant's alternative allegations. The first interpretation appears to better fit the instructions, but it fails.... If the plaintiff must prove that some fact, X, is more probable than its negation, not-X, then the plaintiff should have to show not only the probability that the state of the world is such that X is true, but also the probability of every other possible state of the world in which X is not true. This would mean that in order to prevail, plaintiffs would have to disprove (or demonstrate the low likelihood of) each of the virtually limitless number of ways the world could have been at the relevant time. This would be a virtually impossible task, and thus, absent conclusive proof, plaintiffs would lose. This would plainly be inconsistent with the goals of the preponderance rule, and thus some comparison with the defendant's case is necessary. In order to facilitate the goals of the preponderance rule, the plaintiff ought to prevail whenever the likelihood of his allegations exceeds that of the defendant's.
Id. at 1093-94 (footnotes omitted). The difficulty for theorists who compare the plaintiff's story to the defendant's story, rather than to all versions of non-liability, is that plaintiffs will recover more often than normatively desirable. Realization of this difficulty leads some of the theorists to argue that the aim of the system is not truth but, say, acceptability of decision. See Nesson, supra note 113.
(167) See McBaine, supra note 79, at 263 (proposing an instruction to the effect that "the probability that they are true or exist is substantially greater than the probability that they are false or do not exist"); Edmund M. Morgan, Instructing the Jury upon Presumptions and Burden of Proof 47 t-baby. L. REV. 59, 67 (1933) ("[I]f the judge charges that the burden is upon a party to prove a proposition by clear and convincing evidence.... it requires the jury to be convinced ... that its truth is much more probable than its falsity...."); cf. Laudan, supra note 159, at 299-300 (discussing attempts to append such a notion to the approach of inference to the best explanation).
(168) See Clermont, supra note 20, at 1126-28, 1152-56 (describing new-trial practice).
(169) See id. at 1128-30 (describing appellate review).
(170) See Allen & Leiter, supra note 160, at 1528 ("[T]he prosecution must provide a plausible account of guilt and show that there is no plausible account of innocence."); McBaine, supra note 79, at 266 ("A reasonable doubt is a doubt which exists when ... you cannot honestly say that it is almost certain that the defendant did the acts which he is charged to have done."); cf. Laudan, supra note 159, at 300-02 (discussing attempts to append such notions to the approach of inference to the best explanation).
(171) See Michael J. Saks & Robert F. Kidd, Human Information Processing and Adjudication: Trial by Heuristics, 15 LAW & SOC'Y REV. 123, 126 (1981) ("Most legal decision making, like that in many other areas of complex activity, is done under conditions of uncertainty."). One can play with the notion of uncertainty, of course. For example, "to be certain of uncertainty ... is to be certain of at least one thing." Milton Dawes, Multiordinality: A Point of View, ETC, Summer 1986, at 128, 131; cf. Neal Gabler, The Elusive Big Idea, N.Y. TIMES, Aug. 14, 2011 (Sunday Review), at 1, available at http:// www.nytimes.com/2011/08/14/opinion/sunday/the-elusive-big-idea.html (announcing as the big idea that we are in a post-idea era). Philosophers can do more than play with the notion. See, e.g., Daniel Greco, Probability and Prodigality, 4 OXFORD STUD. EPISTEMOLOGY (forthcoming 2013), available at http://web.mit.edu/dlgreco/ www/ProbAndProd.pdf (objecting to the view that what we know has a probability of one); Gary Lawson, Proving the Law, 86 Nw. U. L. REV. 859,871-74 (1992) (discussing possible Cartesian arguments regarding uncertainty). For some propositions in closed systems, uncertainty will seem rather thin, except in the metaphysical sense that nothing is absolutely certain. But in almost all circumstances calling for the application of law, out in the real world, uncertainty will be a palpable concern. See TWINING, supra note 1, at 104 (rebutting "the myth of certainty").
(172) On justifying what still may seem to be a low threshold, Bel(S) > .50, see Ronald J. Allen & Larry Laudan, Deadly Dilemmas, 41 TEX. TECH. L. REV. 65 (2008); Larry Laudan & Harry D. Saunders, Re-thinking the Criminal Standard of Proof" Seeking Consensus About the Utilities of Trial Outcomes, 7 INT'L COMMENT. ON EVIDENCE iss. 2, art. 1 (2009).
(173) 131 A. 799 (Vt. 1926).
(174) Id. at 800.
(176) 96 S.W.2d 710 (Mo. 1936).
(177) Id. at 723.
(178) Id. (quoting Rouchene v. Gamble Constr. Co., 89 S.W.2d 58, 63 (Mo. 1935)).
(179) See COVINGTON, supra note 113, at 99-100.
(180) 3 O'MALLEY ET AL., supra note 119, [section] 104.01:
"Establish by a preponderance of the evidence" means evidence, which as a whole, shows that the fact sought to be proved is more probable than not. In other words, a preponderance of the evidence means such evidence as, when considered and compared with the evidence opposed to it, has more convincing force, and produces in your minds belief that what is sought to be proved is more likely true than not true.
(181) Nissho-Iwai Co. v. M/T Stolt Lion, 719 F.2d 34, 38 (2d Cir. 1983) ("The term 'preponderance' means that 'upon all the evidence ... the facts asserted by the plaintiff are more probably true than false.'" (quoting Porter v. Am. Exp. Lines, Inc., 387 F.2d 409, 411 (3d Cir. 1968))); see McBaine, supra note 79, at 261-62; Morgan, supra note 167, at 66-67.
(182) The reference to a "reasonable" jury reflects the fact that on such a motion the judge is reviewing the jury's hypothesized application of the standard of proof. The judge's standard of review turns on whether a jury could not reasonably, or rationally, find for the non-movant. That is, the defendant must show that a verdict for the plaintiff, given the standard of proof, is not reasonably possible. See Clermont, supra note 20, at 1126-27.
We can state this standard of review simply and fuzzily in terms of the law's coarsely gradated scale of possibilities and probabilities, without the complications that belief functions impose on the standard of proof The reason is that we do not expect the judge to retain uncommitted belief in applying a standard of review. The "evidence" for applying the standard is complete. We want from the judge the likelihood of jury error in finding for the plaintiff, with the complement being the likelihood of jury correctness in finding for the plaintiff.
(183) See Liu & Yager, supra note 126, at 18-19 (discussing Liebniz's notions of pure and mixed evidence).
(184) If we knew more about the base rates for the type of case or the realities of the particular case itself, we might want to adjust the standard of proof. For example, a variable standard of proof, set on a case-by-case basis by the ideal judge, could serve accuracy by offsetting the unavailability or inadmissibility of evidence in the particular case. See Dominique Demougin & Claude Fluet, Deterrence Versus Judicial Error: A Comparative View of Standards of Proof 161 J. INSTITUTIONAL & THEORETICAL ECON. 193 (2005) (arguing by sophisticated analysis for a variable standard of proof). More generally, in an idealized system, one could argue that the standard of proof should slightly vary issue-by-issue in response to the expected utility of each outcome. See Richard A. Posner, An Economic Approach to Legal Procedure and Judicial Administration, 2 J. LEGAL STUD. 399, 414-16 (1973) (using economic analysis); cf. Richard D. Friedman, Standards of Persuasion and the Distinction Between Fact and Law, 86 NW. U. L. REV. 916, 926 (1992) (extending the arguments to law-determining). But the path of the law has not been toward variable standards of proof but instead toward standards generally applicable for whole categories of cases--while making gross adjustments as to whole categories of issues when substantive considerations, such as the high social cost of criminally convicting the innocent, counsel adjustment.
(185) This argument for the preponderance standard is strong, because it seems demonstrably optimal given two conditions that are plausible. The first condition is that an error in favor of the plaintiff is neither more undesirable nor less undesirable than an error in favor of the defendant, or that a dollar mistakenly paid by the defendant (a false positive) is just as costly to society as a dollar mistakenly uncompensated to the plaintiff (a false negative). The second condition is that the goal is to minimize the sum of expected costs from these two types of error, that is, the system wants to keep the amounts suffered mistakenly to a minimum.
Accepting that these conditions generally prevail outside the criminal law--and discounting more intangible possibilities, such as there being differential perceptions of loss and gain or varying marginal utilities of wealth that are worthy of consideration--the preponderance standard should perform better than any other non-variable standard of proof. The reason is that by so deciding in accordance with apparent probabilities, the legal system in the long run will make fewer errors than, for example, the many false negatives that a virtual-certainty standard would impose. The preponderance standard also minimizes the system's expected error costs. Let p be the apparent probability that the defendant is liable (for D dollars). If p > 1/2, call it [p.sub.1]; and if p [less than or equal to] 1/2, call it [p.sub.2]. On the one hand, under the preponderance standard, the expected sum of false positives and false negatives over the run of cases is [SIGMA] [(1-[p.sub.1]) D + [p.sub.2]D]. On the other hand, under a very high standard of proof that eliminates false positives, the analogous sum is [SIGMA][[p.sub.1]D + [p.sub.2]D]. Given that (1-[p.sub.1]) is less than [p.sub.1], the preponderance standard therefore lowers the system's expected error costs. See D.H. Kaye, The Error of Equal Error Rates, 1 LAW PROBABILITY & RISK 3, 7 (2002) ("The general appeal of the [p > 1/2] rule lies in the fact that it minimizes expected losses." (citation omitted)); David Hamer, Probabilistic Standards of Proof Their Complements and the Errors That Are Expected to Flow from Them, 1 U. NEW ENG. L.J. 71 (2004); cf. Neil Orloff & Jery Stedinger, A Framework for Evaluating the Preponderance-of-the-Evidence Standard, 131 U. PA. L. REV. 1159 (1983) (considering bias in the distribution of errors).
(186) It could well be that the goal should not focus only on minimizing the sum of expected costs from the two types of erroneous decisions measured from an ex post perspective. Going forward from decision, correct decisions matter too, in that they increase the deterrent effect and reduce the chilling effect of the law's applications. From a social welfare point of view, the law should set the standard of proof only after taking these effects into account. See Louis Kaplow, Burden of Proof 121 YALE L.J. 738 (2012); see also Fredrick E. Vars, Toward a General Theory of Standards of Proof 60 CATH. U. L. REV. 1 (2010) (using a sophisticated utility analysis to set the standard for the issue of mental incapacity in will contests).
[c] 2013 Kevin M. Clermont. Individuals and nonprofit institutions may reproduce and distribute copies of this Article in any format at or below cost, for educational purposes, so long as each copy identifies the author, provides a citation to the Notre Dame Law Review, and includes this provision in the copyright notice.
Kevin M. Clermont, Ziff Professor of Law, Cornell University. I want to thank my philosophical wife, Emily Sherwin, for getting me to think about vagueness! Then came some incredibly helpful comments from Ron Allen, Adrienne Clermont, Sherry Colb, Mike Doff, Simona Grossi, George Hay, Ori Herstein, Bob Hillman, Robert Hockett, Kuo-Chang Huang, Anne-Claire Jamart, Sheri Johnson, John Leubsdorf, John Palmer, Ariel Porat, Jeffrey Rachlinski, and Ted Sider--even if a few of the commenters (but not my family members) disagreed sharply with my position.
** 12 MARCEL PROUST, A LA RECHERCHE DU TEMPS PERDU 69 (1923). But cf. PIERRE BAYARD, COMMENT PARLER DES LIEUX OU L'ON N'A PAS ETE? (2012) ("contrairement aux idles revues, il est tout d fait possible d'avoir un echange passionnant d propos d'un endroit ou l'on n'ajamais mis les pieds"); PIERRE BAYARD, HOW TO TALK ABOUT BOOKS YOU HAVEN'T READ, at xvii (2007) ("In my experience ... it's totally possible to carry on an engaging conversation about a book you haven't read...."); Deborah Solomon, My Reader, My Double, N.Y. TIMES MAG., http://www.nytimes.com/2007/10/28/ magazine/28wwln-Q4-t.html (Oct. 28, 2007)(interviewing Pierre Bayard, a Proust expert, who admits to having only skimmed Proust); The Books We Lie About, THE 6TH FLOOR (May 10, 2012, 5:54 PM), http://6thfloor.blogs.nytimes.com/2012/05/10/ the-books-we-lie-about/ (listing How to Talk About Books You Haven't Read among "books we've been less than truthful about having read"); Top Ten Books People Lie About Reading, SUPERGIRL SAVES THE WORLD (Mar. 13, 2010), http://realsupergirl. wordpress.com/2010/03/13/top-ten-books-people-lie-about-reading (listing In Remembrance of Things Past/A la recherche du temps perdu).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||II. Conjoining Assessments B. Legal Application: Conjunction Paradox through Conclusion, with footnotes, p. 1106-1138|
|Author:||Clermont, Kevin M.|
|Publication:||Notre Dame Law Review|
|Date:||Feb 1, 2013|
|Previous Article:||Death of paradox: the killer logic beneath the standards of proof.|
|Next Article:||The tort foundation of duty of care and business judgment.|