Research strategies for enhancing conceptual development and replicability.
Strategies are described for advancing conceptual development and replicability by testing conceptual hypotheses. Emphasis is given to creating and testing hypotheses to explain specific findings (e.g., that psi believers perform better than nonbelievers). This work often includes testing hypotheses that predict specific interactions as governing the phenomenon. Preferred statistical methods are noted for studying Person x Situation interactions and for examining mediational hypotheses. Cautions are provided about potential interpretational problems with pre.post designs, within-subjects methodology, and multiple psychological testing in a single session. Nonpsi performance-based indices derived from the ESP task itself are discussed as potentially useful predictors of ESP performance.
Some inconsistency of outcomes is not unique to any particular research field. The consistency with which inconsistency is found in many research domains is a major reason that meta-analysis has become something of a cottage industry. Failures to replicate psi research sometimes have led a few individual parapsychologists to suggest that perhaps no psi experiments are, in the long run, repeatable or that "Psi is intrinsically elusive." Before succumbing to nihilism, any parapsychologist or aspiring psi researcher should contemplate the considerable past accomplishments of parapsychology, which are reflected in some degree of replicability in specific domains (not reviewed here), and should consider that the most fundamental key to replication-and, more generally, to prediction and control-in any field is conceptual advancement, which means research-based understanding of the mechanisms underlying the phenomena. It is in regard to conceptual advancement that researchers might profitably consider the several sp ecific strategies described and discussed herein. Some, several, or even all of these strategies may be familiar to many or even to most parapsychologists, but a review of them seems worthwhile because they often would appear to have been substantially underused in parapsychology, not to mention in some nonpsi research domains.
Among, but not nearly exhausting, the potential causes of replication failure are the following: (a) Gross-studies inconsistency in situational variables may differentially affect psi-task performance; (b) sensorially mediated experimenter-expectancy effects may bias the outcome to falsely support the individual experimenter's expectations; (c) psi-mediated experimenter effects may exist and may have similar consequences as (b) above; (d) different populations may be sampled, thus affecting the outcomes; (e) a replication study may invalidly operationalize a hypothetical construct that was validly operationalized in the original work; (f) an original finding may be a statistical fluke; and (g) the replication study may have an insufficient sample size to reach statistical significance. The last point suggests the wisdom of performing a power analysis prior to attempting replication of a study and of basing judgment of replication on the comparative effect size rather than on statistical significance.
AN INTERACTIONIST PERSPECTIVE ON REPLICATION FAILURE
This article presents the case for a somewhat different perspective on replication issues, one based on the assumption that all effects have boundary conditions and that replication failures potentially can provide clues to those boundary conditions. Above and beyond the replication-failures issue, learning about boundary conditions for effects is one of the most important strategies for understanding the underlying causes of those effects. In that sense, replication failures, if thoughtfully examined, can provide major directions for future work that can proactively identify boundary conditions, thereby promoting understanding of replication failures, enhancing replicability, and supporting conceptual advance. The perspective advocated in this article may be best understood after a few terms are defined. The terms likely are familiar to most readers, but what follows will ensure a commonality of understanding, given that the terms sometimes are used in different ways.
In the present article an independent variable means a variable that either is experimentally manipulated or is measured and used in statistical analysis for the purpose of predicting a measured variable, the latter termed the dependent variable (or dependent measure). If the independent variable is experimentally manipulated in a properly controlled study (i.e., one in which potential cross-manipulation confounds have been eliminated), one may deem the predictive value of the manipulation to derive from a causal effect on the dependent variable. (Experimental psychologists, unlike statisticians and mathematicians, sometimes use the term independent variable exclusively to designate independent variables that have been experimentally manipulated. This experimentalist's definition is not the one used in the present article.) In this article, a manipulated independent variable is termed a manipulated independent variable (MIV). In the present study a predictor variable means an independent variable not manipula ted by the experimenter but that is measured with the intention of examining whether and in what way it predicts performance on a dependent variable.
An effect in the following discussion refers to a demonstrated (e.g., statistically significant) functional relationship between some independent variable (or more than one independent variable, if the variables function interactively--see below) and some dependent measure. (Effect in this sense may or may not involve a direct causal influence of one variable on another, given that some effects, thus defined, are purely correlational in character. In the case of a properly done experimental study, evidence of a statistical effect can be construed as evidence for a causal effect.)
In the present article, particular attention is given to the special class of effect termed an interaction. Two or more variables are said to interact when those variables jointly (i.e., interdependently, as distinct from independently) affect (or predict) the dependent measure. Variables that interact thus can be situational, personalistic (e.g., personality or ability variables), or a combination of situational and personalistic factors. In the latter case, one speaks of a Person x Situation ("person by situation") interaction. The central idea in all interaction effects is that the degree (or even the direction) of the effect of one independent variable on the dependent variable depends on the level of another independent variable. A moderator variable is a variable that interacts with an independent variable in determining its consequences. For example, the consequences of an experimental manipulation of hypnosis (i.e., of an induction condition as contrasted with a control condition) might depend on the hypnotizability of participants. If so, hypnotizability would be a moderator trait. Moderator variables need not, though, be personalistic in character, and experimentally manipulated independent variables often are studied as potential moderators of the consequences of other experimental manipulations, often in line with a prediction from a theory.
Theory development and successful replication can be advanced by identifying critical interactions that govern effects, because interactions provide clues about the boundary conditions (i.e., the necessary conditions) for those effects. The nature of those boundary conditions can reveal something about the underlying nature of the effect itself. Potential boundary conditions may be suggested incidentally (e.g., by failures to replicate) and by systematic cross-studies comparisons (as in meta-analysis), but the demonstration of circumstances as boundary conditions requires the deliberate experimental examination of those conditions. The proliferation of studies intended to replicate results with ostensible psi enhancement paradigms may well have hampered the discovery of particular boundary conditions that might have advanced understanding of performance in those paradigms. In trying to validate claims about enhancement paradigms, investigators have tended to use the same or very similar methods repeatedly, in stead of introducing conceptually guided experimental manipulations. This has limited the possibility of discovering boundary conditions and thus has limited learning about what really is happening. In parapsychology, the infrequency of deliberate efforts to identify boundary conditions may reflect, in considerable degree, the inclination of some investigators to seek "effects" rather than to advance understanding. This seems unfortunate, because reliably producing effects surely requires understanding of those effects, and conceptualizing what is happening in a given paradigm suggests what boundary conditions should be investigated.
It may well be Person x Situation interactions that have most commonly been studied--usually inadvertently, but, occasionally, deliberately--in parapsychology. Such interactions may be particularly useful for study, at least at the present stage of this field in which there would seem to be minimal insight into the nonpsychological situational boundary conditions for the effects of interest. Psychological research and theories can provide potentially useful hints about Person x Situation interactions that may be of interest in psi research (see, e.g., later discussion on extraversion and psychological response to the white/pink noise of the ganzfeld situation).
The sheep-goat effect in ESP research is a finding of better ESP-task performance for believers in the possibility of ESP (in, e.g., the ESP task at hand) than for those not holding such a belief. Although this effect was discovered decades ago, it still is not known why or through what means the effect occurs. This may, in considerable part, be due to a lack of testing of clear conceptual hypotheses about this effect's underlying nature. This article suggests that conceptual closure about the sheep-goat effect might be achieved by, first, the creation of clear hypotheses to explain it and, second, by the testing of the implications of those hypotheses for the circumstances under which the effect should and should not appear--in other words, through hypothetico-deductive research. This involves, in essence, the testing of specific, predicted forms of interactions between the belief variable and whatever situational variable(s) the hypothesis predicts should control (i.e., produce or eliminate) the effect. The present article, then, examines in detail the sheep-goat issue as an example of the potential application of the research strategies advocated here. It also examines other potential applications of this approach, including its application to psi-enhancement paradigms and, briefly, to the nonpsi domain of certain correlates of hypnotizability.
ON THE PAUCITY or CONCEPTUALLY BASED STUDIES OF PERSON x SITUATION INTERACTIONS IN PARAPSYCHOLOGY, INCLUDING IN WORK WITH PSI-ENHANCEMENT PARADIGMS
Investigators often institute some procedure presumed to favor ESP-task performance (e.g., ganzfeld), and if significant results are obtained with that procedure, they then may be inclined to call the procedure "psi conducive." (Ganzfeld, as used in parapsychology, is a sensory isolation procedure that has been widely investigated because of the claim that it is conducive to successful performance in free-response ESP tasks [see, e.g., the review by Bern & Honorton, 1994]. During the ganzfeld session the participant typically is asked to report spontaneous mental activity in the hope that it will extrasensorially encode a target picture [or video] on which another individual is focusing his or her attention.) Nonetheless, the procedure, be it ganzfeld, hypnosis, or some other psi-enhancement paradigm, typically "works" (i.e., produces evidence of psi) for some participants but not for others. Whether it does or does not do so may depend on one or more personal characteristics of the participants, characterist ics that make the particular procedure a productive one for a given participant. In other words, person and situation characteristics may interactively influence psi outcomes in such a paradigm. Honorton's (1997) report on characteristics that differentiate successful from unsuccessful first-timers in ganzfeld amply illustrates the role of individual differences, although it stops short of addressing whether person and situational characteristics act interactively to influence ganzfeld success.
An interactive hypothesis of this kind can be definitively tested only by (a) measuring a potentially performance-relevant person trait and (b) experimentally manipulating, in the same study, the presence/absence of a circumstance that theoretically should make the person trait useful, salient, or important under one condition (e.g., in a hypnosis condition) but not under the other (e.g., in a control condition). Only such a design, properly developed to test a conceptual hypothesis that links the trait to performance under the conditions studied, can provide a firm foundation for causal inferences about the role of the trait variable. Otherwise, the results remain in the realm of the correlational, and they cannot support causal inference (see later discussion).
I know of only one effort (Honorton, 1972) to study hypnotizability as a moderator of the consequences of a hypnosis manipulation (in this case, hypnotic induction vs. waking-imagination as a control). Honorton here heeded his own advice (Honorton & Krippner, 1969) to measure hypnotizability, using a standardized test, and to examine whether it moderates the effect of a hypnosis manipulation on ESP-task performance. Although this study had several serious methodological deficits, including but not limited to a failure of random assignment to the conditions of the manipulation (see Stanford, 1993), it was, conceptually and in terms of general design, an innovative, imaginative, and very important advance. Unfortunately, it never received the follow-up that it deserved.
Although parapsychologists rarely have deliberately studied the interaction of variables, including the study of potential Person x Situation interactions, such work would seem critical to understanding psi events and, ultimately, to their reliable production. Numerous studies have looked at the correlation of person variables with success in one or another allegedly psi-conducive setting, but those studies have not met an essential condition of studying Person x Situation interactions: that of demonstrating an interaction by showing that the degree of contrast between performance under an experimental condition and a control condition is determined by (i.e., moderated by) a person variable. Studies that have used a control condition in addition to an experimental condition generally have not examined a person variable as a possible moderator of the effect of that manipulation, and studies that have examined a person variable as a predictor of performance in an allegedly psi-conducive setting rarely have prov ided a control condition. In the absence of the control condition, one cannot draw inferences about whether there truly is a Person x Situation interaction, that is, about whether a specific situation is important to a particular type of person in generating evidence of psi. In the absence of a demonstrated interaction, there might only be a main effect of person type (e.g., of hypnotizability) on ESP-task performance with, say, high-hypnotizables doing better regardless of the conditions of testing. (Alternatively, there may be a main effect of the manipulation, with hypnosis, say, producing superior performance than its absence, regardless of the type of person tested.) In view of these considerations, it should be clear that the study of Person x Situation interactions can be extremely useful to understanding why a particular condition has the capacity to elicit psi performance (or performance on whatever kind of task one wishes to study).
Readers probably are aware that ganzfeld studies have rarely, if ever, included anything that might be called a control condition, so, for this reason, there has not been an opportunity to learn whether the person variables fairly commonly studied in the ganzfeld setting may be involved in a Person x Situation interaction. In the absence of a control condition, any correlation of ganzfeld performance with a trait measure could simply represent a main effect of that trait-those with the trait do well generally, ganzfeld aside-and the absence of a control condition also means that causal inferences about the role of the trait are not possible. The reason for the lack of a control condition seems obvious: It would appear that investigators have not usually specified feature(s) of ganzfeld that are hypothesized to be particularly efficacious for the occurrence of ESP. Given that fact, no control condition is possible because a control condition is supposed to include everything except the feature(s) whose effect is being examined. The concept of a control condition is meaningful only when a guiding hypothesis or theory indicates what element of an experimental condition should be responsible for the effect of that condition. It is at the point of the development of such a guiding hypothesis that Person x Situation research can have real payoffs, because, as will be shown in later examples in this article, such an interaction can help the investigator to close in on a particular causal interpretation of what happens in a particular research setting. This takes the research, parapsychological or not, beyond speculation and mere correlation into the realm of the testing of causal hypotheses.
Despite discussion of the importance of Person x Situation interactions by some parapsychologists, there have been few examples of deliberate, systematic, experimental efforts to study conceptually interesting Person x Situation interactions (but see Stanford, 1969, interaction between a performance measure of the spontaneous tendency to visualize and an experimental manipulation of whether the method of trying to influence the PK task was visualization based or not; also, Stanford, Frank, Kass, & Skoll, 1989, interaction of extraversion and stimulation level [noise/silence] as they jointly affected temporal trend of length of verbal utterances in ganzfeld, in line with earlier theorization of Eysenck, 1967, concerning the need of extraverts for stimulation; but see, also, Stanford & Frank, 1991). In anothervein, Palmer (1972) had applied the concept of need for vindication to explain the sheep-goat effect, and Lovitts (1981) tested a prediction from that concept. She showed that the consequences of the belie f variable for ESP-task performance depend on whether participants believe that success in the study will provide evidence for or against the reality of ESP.
Among the historical factors that potentially have slowed empirical and theoretical progress in parapsychology are (a) strongly empiricist, nonconceptual approaches to research and (b) relatively few efforts to systematically study empirical puzzles, including replication failures, by creation of hypotheses about their underlying causes and by empirically testing the effects, including the interactions, logically implied by such conceptualization. On a positive note, work based closely on testable theories or models might plausibly be anticipated to increase in this field as it moves away from the empiricism that has sometimes seemed to hold it captive to nonconceptual solutions to the problems of replicability, such as thinking that a single research paradigm (e.g., ganzfeld) will adequately address the replicability issue. Indeed, at least a few parapsychologists presently are advocating the importance of exploring interactions in order to address the replicability issue. Smith (2001), for example, emphasiz ed the potential importance, in trying to understand the so-called psi-conducive experimenter, of examining potential interactions of various psi-relevant variables. He also emphasized that such work might profitably be guided by social psychological theory borrowed from placebo research. Theory-guided studies of interactions are at the heart of the present recommendations.
LOOKING FOR POTENTIAL INTERACTIONS AS GLUES TO REPLICATION FAILURE/SUCCESS? A CAUTIONARY NOTE ON META-ANALYSIS AND THE STUDY or MODERATORS
Meta-analysis can aid in conceptual development by providing evidence suggestive of potential moderators for the effects parapsychologists (and other scientists) study. It is extremely important to recognize, though, that meta-analysis cannot, in principle, unequivocally demonstrate or even very strongly suggest the existence (or the nonexistence) of moderators on the basis simply of cross-studies covariation between the magnitude of some effect and the presence/absence of some study characteristic(s). This is because such cross-studies analysis of potential moderator variables is correlational in nature, and many other things than the putative moderator variable may covary with it, across studies, and may affect the dependent measure. If that is the case, a confound exists, and there often are many such potential confounds. Among the many such potential confounds are experimenter, laboratory milieu, instructional set, and populations sampled. These confounds may spuriously suggest the existence (or exaggerat e the strength) of a moderator variable, may spuriously obscure an actual moderator, or may reduce its measured strength. The greatest caution is therefore warranted before drawing causal conclusions on the basis of cross-studies meta-analytic findings related to potential moderator variables, whether those findings would seem to support a particular variable as a moderator or would appear to rule it out. On these grounds, I regard as inadequately justified, for the present, Lawrence's (1993) conclusion, based on his meta-analysis, that the type of sheep-goat question apparently does not moderate the sheep-goat effect. That conclusion appears to have relied, in considerable degree, on cross-study contrasts and, hence, to have been potentially compromised by confounds. That is not, of course, to say that Lawrence's negative conclusion about question type as a moderator must be wrong.
Students of meta-analysis know that if there is significant heterogeneity of the effect sizes in a given database, this can help justify a meta-analytic search for moderators of the effect in question. Regrettably, Lawrence (1993) did not provide an analysis of the heterogeneity of sheep-goat effect sizes. Such an analysis potentially could have placed in some perspective his finding of overall significance for the sheep-goat effect with a very small mean effect size but with a failure to have identified any moderator variables. Specifically, a finding of significant heterogeneity of sheep-goat effect sizes would have suggested the likelihood of moderator(s) "out there some place" and would have placed in a wider perspective his having found no clear statistical significance in his examination of a few specific potential moderators of the sheep-goat effect.
Even if a meta-analysis integrates truly experimental findings on a potential moderator (rather than examining this matter with cross-studies contrasts) but falls to reject the null hypothesis, this still does not mean that the potential moderator plays no role in outcomes. It may do so by way of a high-order, but perfectly meaningful, interaction. Suppose that a series of investigators each had studied several different sheep-goat questions in a single study and that these questions were the same across the studies. Suppose, further, that meta-analysis had shown no hint of the sheep-goat effect being moderated by question type, even with this within-studies comparison. One then could conclude that question type alone is not a moderator, but one could not conclude that question type is irrelevant to the sheep-goat effect and that, therefore, such questions are essentially interchangeable. This is because such meta-analytic integration would ignore the possibility that which question type predicts performance depends on particular circumstances being present in a given study. A purely hypothetical example would be that in a competitive situation the question, "I believe I can use psi to win games," might be a useful predictor, but the same predictor might be unimportant in a noncompetitive setting. As another hypothetical example, a sheep-goat question such as, "I think ESP happens with me in everyday life," might be useful as a predictor in a laboratory experiment intended to model everyday potential psi uses but might prove nonpredictive in more artificial settings. When a possible interaction of question-type and ESP-task factors is ignored, the meta-analytic result could well be that one finds no overall evidence that sheep-goat questions differ in their ability to predict ESP-task outcomes, whereas they in fact may have such value but their consequences depend on test circumstances. Conceptually, at least, this does not seem a farfetched possibility. There have been parallel findings in social-psychological ( nonparapsychological) work on the relationship of attitude measures as predictors of behavior. In this work attitude measures predict behavior better when there is a match between the question and the behavior to be predicted, in terms of level of generality of each and in terms of correspondence of these two on factors such as action and the target of the action (Ajzen & Fishbein, 1977). More recent reviews, including a meta-analytic one (Kraus, 1995), have upheld those conclusions (see overview in Eagly & Chaiken, 1998, pp. 296-297). I am not suggesting that the same kind of correspondence is important in sheep-goat research, in which psi performance, not consciously mediated behavior, is studied, but it seems reasonable to suppose that attitude questions might have greater predictive power for ESP-task performance when those particular questions have relevance to psychologically focal elements related to task demands.
The review, just cited, by social psychologists Eagly and Chaiken (1998, pp. 297-303) also discussed how research in the attitude-behavior domain now has moved into creation of causal models for the mental processing that occurs between attitude activation and behavior (or lack of it) toward the attitude-relevant object. The creation of such models may be very useful in parapsychology's particular brand of attitude-behavior work, namely in sheep-goat research. If parapsychologists wish to fulfill the potential inherent in sheep-goat research, they perhaps should develop explicit, testable models about the processes by which an attitude can affect ESP-task performance. Similar, detailed modeling may prove useful in other psi-research domains, such as the correlation of personality measures and psi-task performance. Investigators might, for example, profitably develop and test hypotheses about how extraversion could influence performance on free-response ESP tasks if, as suggested by the report of Honorton, Fer rari, and Bern (1998), that correlation is a real one. In other words, the testing of low-level causal models may be, at this state of investigation, a productive route to follow. Such models provide a map for how, very concretely, specific effects are deemed to be produced.
ADVANCING THEORETICAL DEVELOPMENT THROUGH THE STUDY OF INTERACTIONS: A HYPOTHETICAL ILLUSTRATION
The term effect, as it is used in this particular section, means a causal influence of some variable on another variable. Many, perhaps all, effects in nature have boundary conditions. In other words, varying the presence or the magnitude of variable X will causally influence a measured variable Y only under some specified circumstance or set of circumstances; or X affects Y in relation to the magnitude of another variable, Z, that is present. The nature of those boundary conditions depends on the underlying character of the effect being studied. To understand, predict, and, conceivably, control the effects observed in studies, scientists must very deliberately investigate the boundary conditions for those effects. Replication failures, when contrasted with replication successes, may suggest important boundary conditions, and the study of potential boundary conditions for effects is equivalent to studying interactions that govern those effects. What the present proposal entails is the investigation of potenti al interactions and the use of such interactions in theory building. A closer look at the relationship between interactions and conceptual hypotheses (i.e., explanatory hypotheses, henceforth, simply hypotheses) may be useful. A hypothesis is a tentative explanation for an observation or for a set of observations.
As was suggested by earlier remarks, hypothetico-deductive research proceeds by testing deductions from a hypothesis, deductions about the boundary conditions for the effects the hypothesis is intended to explain. To be testable, a hypothesis must, therefore, be explicit as to the kind(s) of mechanism(s) involved.
The testing of hypotheses related to interactions may be considered to have two phases, although both phases may occur in the same experiment. One phase is simply testing whether an apparent interaction that has been suggested nonexperimentally (i.e., by cross-studies comparisons, whether meta-analytically adduced or not) can be obtained when there is experimental manipulation of the proposed moderator variable. This first-phase experimental work can demonstrate that the proposed moderator truly is a moderator (i.e., is an interacting variable) and is not merely an expression of a cross-studies confound. This phase of experimental work may demonstrate that a particular interaction occurs in the absence of any potential confound, but it cannot properly be deemed to test any conceptual hypothesis that might have been developed to explain the interaction. It may, though, be deemed to strengthen the empirical basis for knowing what needs to be explained.
Experimental testing of a hypothesis intended to explain either a possible (e.g., meta-analytically inferred) interaction or an experimentally demonstrated one (as in the first phase just discussed) requires experimentation of a somewhat different kind. This work must test one or more predictions logically deduced from the newly created hypothesis.
Consider, now, a purely fictional example of the process just discussed. Suppose that, in reading the literature on the sheep-goat effect (relative to ESP-task performance), an alert researcher is impressed by what, subjectively, seems to be a cross-studies trend in the data. The sheep-goat effect seems evident when the ESP test involves calling geometric symbols as in a standard forced-choice ESP test, but it does not seem evident when participants specifically are asked to call each trial very spontaneously, saying simply what comes to mind at the moment. Suppose that this researcher firms up this impression, first by showing through meta-analysis that there is significant heterogeneity of sheep-goat effect sizes in the meta-analytic database. This is then taken to justify systematic moderator analyses using meta-analytic techniques. Suppose that such a meta-analytic moderator analysis suggests that studies with standard calling do, in fact, produce a larger sheep-goat effect size than studies that encourag e more spontaneous calling and that the latter studies do not, by themselves, even show an overall, significant, sheep-goat effect.
What seems called for next is an experimental demonstration that the implied interaction actually exists. The meta-analytic sets of studies with the two kinds of calling instructions suggested but did not prove that such an interaction exists, because potential cross-studies confounds could have covaried with the potential moderator variable. This researcher therefore conducts a study in which there is an attempt to hold constant any potentially relevant extraneous conditions, across the two calling-instructions conditions, and participants are randomly assigned to these two conditions. Suppose that this experiment provides clear statistical evidence for the same Person (belief) x Situation (calling instructions) interaction that had been suggested both by subjective review and by meta-analysis.
How might this interaction be explained? Various possibilities might exist, but let us examine just one of them--that studies with traditional calling instructions in a forced-choice test show the sheep-goat effect because sheep, believing that ESP is possible, call more freely and spontaneously than do those who do not believe in that possibility. The rationale behind this is when calls are, as traditionally, made without special instructions, participants who do not believe ESP is possible here (i.e., goats) think they must somehow capitalize off chance through being clever. They therefore make their calls very nonindependently of each other (e.g., being careful not to call very many of a given symbol in a row and being careful to balance calls across the several possibilities). This, then, might interfere with their use of ESP to identify the randomly arranged targets, producing typical unimpressive goat performance. In contrast, those who believe ESP is possible here (i.e., the sheep) may rely more strong ly on whatever spontaneously comes to mind on each trial, even without special instructions, thus allowing ESP freer reign to guide their responding. On the other hand, with special instructions that emphasize spontaneity of calling, this putatively indirect effect of belief on ESP performance should be reduced or eliminated, assuming that participants, including goats, act as instructed. This is an ad hoc hypothesis to explain an experimentally demonstrated interaction (although please remember that this entire scenario is fictional). How could this hypothesis be tested?
Using once again the same research design that initially demonstrated the interaction (and gave rise to this hypothesis) could only demonstrate the reliability of that outcome, not the validity of any proposed explanation for it. Confirming the initial findings with the initial design logically can provide no support for any hypothesis concocted to account for the initial outcomes with that design. What is needed is to either (a) use the original design to examine previously unexamined implications of the ad hoc hypothesis or (b) test logical implication(s) of the hypothesis by means of one and, ideally, several different designs (or by testing several different implications within the same new design). Whether one or several research designs are used, testing a conceptual hypothesis in a variety of ways and finding support from several such tests is the optimal approach. If a variety of kinds of evidence support the same conclusion, it is unlikely that the evidence is an artifact of a particular element of d esign. This is what is termed convergent evidence.
To illustrate the process of testing a conceptual hypothesis, consider just a few of the potential ways to test the hypothesis that was invoked ad hoc to explain the (fictional) interaction of belief and calling instructions. (a) Even in the original experiment that established the existence of the interaction of belief and calling instructions, the investigator, having established the reality of the interaction and having provided an ad hoc explanation for it, could have statistically checked on the actual calling patterns under the two conditions to see if they matched that predicted from the hypothesis (i.e., under traditional instructions there should be more statistically demonstrable, nonrandom, calling constraints by the goats than by the sheep, and this difference should not exist or should exist in lesser degree under spontaneity instructions). If this set of predictions is confirmed, some elegant statistical analyses (e.g., with multiple regression or, better, with structural equation modeling--see later discussion) could profitably examine whether the relevant calling patterns likely act as mediators of the observed ESP-task performance differences. (b) A variety of new experiment(s) could be undertaken to examine this hypothesis. For example, one could experimentally test the hypothesis by trying to short-circuit the proposed effect of belief on calling patterns in the traditional-instructions condition. If memory for preceding calls was somehow disrupted, then the traditional-instructions condition would not result in differential calling patterns for sheep and goats (and, thus, not in differential ESP-task performance for sheep and goats). The investigator might, for example, administer a drug known to interfere with short-term memory, to see whether, in the quick-calling condition, this eliminates the differential calling patterns for sheep and goats and, consequently, their differential ESP-task performance. Alternatively, as a very different approach to testing this hypothesis, the investigator m ight, instead of manipulating a spontaneous calling set by instruction, distract the participants during calling, much as rehearsal is prevented in memory studies, by having them engage in a demanding task during the intercall interval (e.g., counting backward in three from some number). Calls themselves could be paced by a cued-calling procedure in which after a random number of distracting back-counts have been completed, there is a computerized request for the next call and with limited time for response. In this way participants presumably could neither rehearse nor anticipate when the next call would have to be made. An empirical check could be made of whether this obviates (or reduces) rational-analytic calling strategies and, if it does, whether the sheep-goat effect is, as expected, absent from this condition (but is present in a traditional-instructions condition).
Working in such ways, this fictional investigator would have succeeded in carrying through a systematic program of testing the logical implications of a hypothesis, a program inspired by an observation related to inconsistency in the sheep-goat effect, a "problem" in replicability. In the process, this investigator both explained the observed effect and clarified something about the underlying nature of the sheep-goat effect. Of course, this example is fictional, but it illustrates the kind of programmatic work that could emerge from failure(s) to replicate effects in particular condition(s).
Fictional case or not, pinpointing a Person x Situation interaction potentially can help to clarify a number of problems in replicability. It all begins with careful reading of the initial reports, combined with careful consideration of the differences between studies that might be relevant to replication failure or success. "Differences in studies" can involve situational factors (e.g., instructions, experimenter attributes, test attributes, and social environment) as well as person factors inherent in participants. A major thrust of this article is that situations and persons can affect results interactively. Therefore, if a specific kind of research outcome depends on an interaction of these factors, it is obvious that change in either the situational or the participant attributes responsible for an effect can eliminate that effect.
The demonstrated existence of a Person x Situation interaction suggests that a specific kind of situation makes relevant or important a particular personal characteristic that has consequences for information processing (or for decision making) in the task at hand. The kind of situation that makes a difference for a specific type of individual can reveal much about the processes that are at work. Although the elucidation of such interactions would be potentially very useful to enhancing replicability and advancing conceptual development in parapsychology, the instances of this in parapsychology--as in some areas of psychology--have historically been relatively few. Parapsychology may be ready for such developments, if, as I suspect, substantial numbers of us have become disenchanted with the dry-bones empiricism that too often has tended to dominate the field, right up through the ganzfeld era.
Failures to examine theoretically important boundary conditions for trait-task correlations or to follow up seeming Person x Situation interactions do not belong to parapsychology alone. One example must here suffice. Although gender differences in the correlation of traits and hypnotizability have often been reported-for example, women often show an objectively greater creativity--hypnotizability correlation than do men (reviewed by Shames & Bowers, 1992)--the explanation for such differences has tended to be addressed more by speculation than by systematic investigation. Discovering a reason for gender as a possible moderator of the creativity--hypnotizability correlation likely will require the experimental manipulation of characteristics of one or more elements of the session. It might, for example, require experimental manipulation of session elements that, under one or another theory of gender effects, should have made gender role norms relevant. The key point here is that one cannot unequivocally demon strate causes of such effects without, under the guidance of some theory (or hypothesis), manipulating a particular characteristic of the session that the theory suggests might have been responsible for the effect in question.
Although we parapsychologists may be somewhat comforted by the realization that we are not alone in having failed to follow up persistent and potentially important trait-related findings, we should realize the potentially adverse consequences of continuing to do so. We should also absorb the take-home lesson that causal inferences about correlational findings can only be developed through experimental manipulations guided by precise theorization. Correlational studies with single conditions generally raise more questions than they answer, regardless of outcomes and however consistently they may be replicated. Even studies with more than one condition are likely to do likewise unless the predictor/person variable and the experimental manipulation are carefully selected (and measured or operationalized) such that they should, if some theory or model is correct, exhibit a particular form of interaction. Then the study stands a chance of contributing meaningful information. We have all too often seen in parapsych ology and elsewhere single-condition studies in which a person measure has been used as a predictor, but such a study gives no clue, because of the absence of a conceptually relevant manipulation, about why any observed correlation should have occurred. There is no indication of which situational factors should have made the trait relevant. As another example, one sometimes sees manipulation of, say, target type, without the inclusion of a potentially relevant individual difference measure that theoretically should predict response to that manipulation and that should thereby help to clarify the conceptual significance of any effect of the target manipulation. It is easy to forget that even if a manipulation "makes a difference," it generally accounts for a minority percentage--often a small minority percentage--of the variability in outcomes. It is also easy to forget that this generally means that the manipulation affects some participants but not others or that it even has had distinctively different conse quences for different people. Carefully selected person variables help to interpret the consequences of a manipulation, as assessed by whether they do or do not moderate those consequences. Likewise, a carefully constructed experimental manipulation can help to pin down the psychological meaning of a person-task correlation that was previously observed in work with only one of those conditions. Without a manipulation that varies along the dimension thought to be relevant to the personal characteristic being studied (and that thus enables a search for a specific, expected form of Person x Situation interaction), the study cannot move from the realm of the purely correlational to that of demonstrating causation. An experimental manipulation can be designed to make relevant (or irrelevant, depending on the condition) something about the particular trait being studied. Finding a theory-based, predicted interaction (or lack of it) between a particular trait and a manipulation is extremely helpful in deciding wheth er that trait is (or is not) relevant to the consequences of the manipulation, as it should (or should not) be under a particular hypothesis or theory. If a Person x Situation interaction is observed and takes the expected form, the evidence points strongly toward that particular trait having played a causative role, not some trait correlated with it that lacks potential conceptual relevance to the experimental manipulation. (Of course, one can also measure the alternative trait and see if it shows the same interaction.) Even with today's elegant statistical analyses (e.g., structural equation modeling) intended to allow correlational work to provide some support for causal inference, such inference can far more strongly and realistically be supported (or refuted) by examining the consequences of experimental manipulations that should affect the regression of the dependent measure on the person measure if the person-variable-related theory behind the manipulation is correct (and if the person variable is vali dly measured).
Successful parapsychological work of the kind recommended here, which focuses on theoretically meaningful conclusions, might well garner the interest and even the sympathy of interested nonparapsychologists. The reason is simple: Scientists take data more seriously when they seem to make sense. Mere anomalies, even if replicated, can too easily be ignored as nonsensical.
The preceding discussion has focused largely on Person x Situation interactions, but other kinds of interactions also are important, both to replication failures and to understanding the phenomena. Among them are Situation x Situation interactions, in which the effect of one situational variable depends on the presence of another situation. For example, a hypnosis manipulation (hypnosis vs. a control group) may fail to facilitate ESP-task performance because a hypnotist is not involved whose personal demeanor or interactive skills engage the trust and confidence of the participants. This might be true even if the study used a standardized measure of hypnotic susceptibility, with actual words of induction and testing held constant. Often when a condition plausibly can be deemed "necessary but not sufficient" to produce a particular outcome, an interaction is at hand: The condition being studied facilitates the effect only when combined with the presence of some other situation. Investigators interested in tech niques for facilitating psi can profit by devoting some thought to what kinds of circumstances must be assumed in order for those techniques to accomplish their desired objective. This is not necessarily a simple matter, because such analysis would seem, implicitly, to include the development of very clear ideas concerning through what means the technique should bolster psi performance in the first place and concerning how it should bring about those effects. Nonetheless, the price of progress is clear, well-articulated, fine-grained, conceptual analysis of these kinds.
MORE EFFECTIVELY STUDYING INTERACTIONS TO UNDERSTAND WHAT THEY MEAN
Interactions Between Continuous Person Variables and Experimental Manipulations
Readers very likely will know that in experimental situations the analysis of variance (ANOVA) maybe used to analyze both main effects (i.e., single-variable effects) and interactions (i.e., effects of variables acting jointly). What may be less well known is how one could proceed if one had a continuously measured predictor variable and wished to examine both it and an experimental independent variable for their separate and joint (i.e., potentially interactive) relationships with the dependent measure. For such purposes investigators all too commonly categorize (e.g., dichotomize or trichotomize) a predictor variable and then apply ANOVA. This analytic approach is extremely problematic, despite its historically common use both inside and outside of parapsychology. It loses much of the information about the predictor variable because, for example, a median split classifies scores immediately below the median as being the same as the lowest score in the distribution. (Also, median splits can mean very differe nt things in different samples.) When a single continuous predictor variable is examined, loss of information through categorization often will result in both a spuriously low estimate of the strength of relationship and in a decreased ability to statistically detect a real effect, including of interactions, but if two continuous such variables are dichotomized, the test of their interaction can show spurious statistical significance, particularly when there is a strong correlation between the two continuous variables (Maxwell & Delaney, 1993). It should also be noted that dichotomization of a continuous variable cannot disclose any nonlinear trend related to that variable, although a check on this possibility is important to conceptually interpreting statistical outcomes (even with continuous predictor-variable scores), especially given that nonlinear effects can, in a study with two predictor variables, result in a spurious interaction effect (Lubiuski & Humphreys, 1990). Parapsychologists who have used the information-wasteful approach of categorizing intrinsically continuous variables should, like the many psychologists who have used it, reform their practice. Many investigators presently are realizing the pitfalls of information loss and are learning appropriate techniques for the study of continuous predictor variables. MaCallum, Zhang, Preacher, and Rucker (2002) have examined all of these issues and more as related to dichotomization of continuous variables, and they join in the call for reform.
Hierarchical multiple regression analysis (HMRA) provides a suitable means for analysis of a design involving both experimental and predictor variables, because it allows use of continuous scores for the predictor variable and allows the testing of potential interactions between predictor variables (including continuous ones) and experimentally manipulated independent variables. By entering into the HMRA computation, at the first step, the single variables (e.g., person variable and the dummy-coded experimental variable) and, subsequently, the interaction term (which must be created by multiplying the values for the single variables involved), the analysis can proceed, using, say, ESP-task scores as the dependent measure. This order of entry is justified because the investigator wishes to examine the statistical effect related to the interaction, after effects related to single variables (i.e., main effects) have been considered. (Incidentally, if the interaction to be studied is a higher order interaction [i .e., one between three or more variables, lower order interactions involving the relevant variables must also be entered into the equation prior to entering the desired interaction, in addition to all relevant main effects.) If a significant interaction is observed (i.e., a significant change in variance accounted for due to entering the targeted interaction into the equation after the main effects and any lower order interactions have been entered), then subsequent analyses should be undertaken to reveal the nature of the interaction (just as in the case of ANOVA but, in this case, with different analytical techniques; see, e.g., Pedhazur, 1982, chap. 12). In the case of Person x Situation interactions, one should report the correlations of the person variable and the dependent measure separately for the levels of the experimental variable (e.g., report the correlation of an attitude measure with ESP-task performance separately for the traditional-calling and spontaneous-calling conditions, in the earlier fi ctional example). Additional analysis of an interaction often is desirable or even mandatory, depending on how the data from the interaction are to be used or interpreted. It is important to know, for example, whether the regression lines for the (let us say, two) levels of the independent variable cross over or simply tend to converge within the range of the predictor variable that is of interest to the investigator. The former is sometimes termed a disordinal interaction and the latter, an ordinal interaction (see, e.g., Pedhazur, 1982). If there is a disordinal interaction, the manipulated independent variable has different consequences at the extremes of the predictor variable. This would be the case, for example, if a conscious ESP task gets better performance than an unconscious one in the case of strong belief (in the possible occurrence of ESP), but an unconscious ESP task gets better performance than a conscious one in the case of strong disbelief. Also of interest to compute is the point of intersec tion of the two regression lines (for the two levels of the independent variable). To follow through on the earlier fictional sheep-goat example, this would be the level along the belief continuum for which the experimental manipulation makes no difference to performance on the dependent measure. The level(s) of the predictor variable at which an experimental manipulation makes a significant difference may also be of interest. The reader unfamiliar with regression techniques should consult a multiple regression text to learn how to examine these kinds of interactions, to compute the intercept, and so forth. These topics are beyond the scope of the present article. Once again, a check on potential nonlinear relationships is important, but exposition of that topic, too, is outside the domain of this article.
Studying Mediators of Relationships
Recall the fictional example in which an investigator proposed that the sheep-goat effect occurs in the usual test situation because the situation causes goats to think about how they "rationally" should respond on a given trial instead of responding more spontaneously, in the manner of sheep, in whom this spontaneity is presumed to foster their superior ESP-task performance. This hypothesis assumes that the differential effect of the usual test situation upon the ESP-task performance of sheep and goats is mediated by differential inclinations of these two groups as to how they make their calls of ESP targets. If so, the effect of the belief variable on ESP-task performance is an indirect or mediated one.
If the investigator had studied only a traditional testing situation and had found that sheep performed better than goats, one of the first things to do would be to see if the sheep-goat measure correlates, in this setting, with the use of rational-analytic strategies in test taking, as the hypothesis suggests. (This obviously requires the development of some direct measure of such tendencies.) This would not, of course, demonstrate the truth of the hypothesis that such strategies mediate--that is, are directly responsible for--relatively low ESP-task performance by the participants. It would, though, demonstrate a necessary condition for mediation, namely that this correlation does exist.
Additional statistical analysis would be needed to provide evidence that the sheep-goat (belief) variable achieves its effects on ESP-task performance indirectly, through its consequences on calling strategies. In short, a mediational analysis would be needed. For such an analysis one must, in the present case, have three kinds of measures: (a) the predictor measure, in this case, an attitude (sheep-goat) measure; (b) a measure of the hypothesized mediator (i.e., a measure of the degree of rational-analytic constraint on calling); and (c) a measure of ESP-task performance. What kind(s) of statistical tools are best here? Instruction in mediational analysis is not the purview of the present article, but a few comments are in order.
Baron and Kenny (1986) discussed in detail how proposed mediation can be statistically examined through multiple regression analysis, and they discussed analytic requirements for providing evidence that the effects of independent variable X on dependent measure Y are partially or completely mediated by M, a proposed mediator variable. I say "providing evidence" rather than "demonstrating" because, as noted by Kenny, Kashy, and Bolger (1998), even positive findings with their approach to studying complete mediation do not unequivocally prove that mediation actually has occurred. Still, such analyses are a valuable step forward in providing evidence for mediation, and such techniques should be used by theory-testing parapsychologists. If one has a mediational hypothesis, the general strategy is to show that the effect of X on Y is an indirect one, through M as a mediator, not a direct one. Of course, effects of X on Y may be both direct and indirect, in which case the effect in question is less than completely mediated. Statistical analysis can throw light on both possibilities.
Statistics, like all disciplines, marches on, and, as noted by Kenny et al. (1998), structural equation modeling is now a preferred approach to mediational analysis, rather than the multiple regression approach described earlier by Baron and Kenny (1986). When investigators posit mediational processes as underlying events, they should examine proposed mediators. Modern statistical techniques are available that go far beyond what ANOVA, analysis of covariance (ANCOVA), or even HMRA can accomplish in this regard. These techniques, which can be extremely useful in process-oriented research, could benefit parapsychological research. This is true even while--whatever impression one may gain in reading the claims in certain contemporary research publications--such techniques cannot obviate all potential sources of ambiguity relative to causal inference. What such techniques depend on heavily for their usefulness is the readiness of investigators to create and test specific models of what is believed to be happening . It is precisely this kind of process-oriented work that the present article is intended to foster and that methods such as structural equation modeling can help to advance.
Consider Within-Group Variances
When individual differences affect performance in a given test setting, they show up in enhanced variation of scores within that condition. Such an effect may be evident if one of the experimental conditions exhibits substantially greater within-group variation as contrasted with that in other cell(s) of the design. (The conceptual analysis suggested here assumes random assignment of participants to the conditions and that there has not been differential attrition across conditions.) By statistically contrasting within-condition variances across conditions, one may get a clue, from substantially greater within-condition variance, that a particular condition is being responded to substantially differently by different persons. The nature of the condition with greater variance--or even a single condition at an earlier stage of research, if participants seem to be reacting very differently to it--may supply a hint of the cause of this differential response among persons, a hint that can become a hypothesis for l ater testing.
A caveat is in order: When one is contrasting variances across conditions, greater variance in one condition than in another does not prove that the condition with greater variance was one in which some person characteristic was playing an especially large role. Perhaps something about the contrasted condition (e.g., the one with low variance) was, instead, actively reducing the opportunity for between-subjects variations in performance on the dependent measure. Situations that reduce the effects of person differences on the dependent measure are called strong situations (Snyder & Ickes, 1985), and because such situations encourage or force participants to respond in similar ways, they largely or entirely obviate the influence of individual differences on performance. Strong situations usually involve salient and powerful circumstances, but it should not be thought that relatively uniform responding across participants within a condition is necessarily due to deliberate, conscious response to a strong situati on. A strong situation might actually gain its influence because it elicits automatic, uncontrolled responding of a given kind from most or all of the participants.
The potential value of comparing variation across conditions in a study and thereby potentially illuminating differential consequences of a condition for different participants (or potentially finding evidence of a strong situation) is one more good reason for always reporting standard deviations, not just means, for all of one's test conditions. Potential explanations for variance differences are best followed up through subsequent research intended to replicate the variance outcome(s) and to test implications of the proposed explanation(s).
Replicability and Sampling Considerations
At the predictor-variable level, circumstances related either to unintended subject selection by recruitment or to attrition also may tend to reduce the magnitude of correlational findings and the likelihood of finding Person x Situation interactions. Recruitment can act as a selection factor, thereby restricting the range of a predictor variable. The consequence is the artifactual reduction of the observed correlation relative to the population correlation (and reduction in the likelihood of detecting any Person x Situation interaction). As one possible example, variability in scores on a given sheep-goat attitude measure could be restricted in typical psi experiments, at least if, as sometimes seems plausible, the volunteers tend to be persons who think such happenings are possible or who deem them to be, at least, of potential interest. The result could be a reduced sheep-goat-effect correlation, assuming linearity of regression.
Restriction of range on certain person variables may be common in a variety of research situations, given that participation cannot be coerced. Drawing substantive, conceptual conclusions from the absence of a correlation in a particular study that was found in other studies (or the absence of an earlier-reported Person x Situation interaction) may be ill advised if, on the person measure, a sample's variability was reduced or its mean was substantially different from that reported in normative testing (or in the studies investigators have tried to replicate). A methodological ramification of this consideration is that one should always report means and standard deviations for a predictor variable, not simply its correlation with the dependent or criterion measure, and one might profitably compare means and standard deviation with those from normative samples (e.g., as discussed in other reports). This is the more important because circumstances of recruitment and sampling may vary from situation to situation and from laboratory to laboratory.
The following example from nonparapsychological hypnosis work illustrates how personality variables may play a role in volunteering-related self-selection, and, additionally, it illustrates the possibility of different degrees of volunteering-related selection according to gender. Hilgard, Weitzenhoffer, Landes, and Moore (1961) found that women high on Self-Control on the California Psychological Inventory either tended not to volunteer for hypnosis work or, if they did so, tended to produce relatively low hypnotizability scores. These patterns did not reach significance for men. I can think of no reasons that personality- and gender-related factors might not also influence volunteering in parapsychological studies, to the extent that the situations posed by the recruitment phase activate dispositions that are characteristic of the individual, whether those dispositions be approach or avoidance related. Similar self-selection-related possibilities apply in regard to cultural variables, not just to personalit y and gender.
Relative to gender, the social or personal meaning of certain kinds of situations may differ for men and women--thanks, for example, to gender role norms (i.e., social learning)--so there may be different degrees of sampling selectivity for men and for women and differences in the kind or degree of personal-gender relevance of the situations they encounter in a study if they actually participate. As a consequence of the latter, the pattern of correlations across conditions in a study may differ for men and women, a case of Gender x Situation interaction. It is advisable, in any study, to treat gender as a variable from the outset to see whether it acts as a main effect or interacts with the independent variables being studied. For simplicity's sake, gender might be eliminated from the reported analyses if such effects seem unlikely (as assessed by a liberal alpha error), but the reason for gender's elimination in the analysis should be stated. If there are no reasons for a priori, theory-based, predictions re lative to gender, any gender findings must be treated as post hoc (and, thus, taken as the basis for conclusions only when they can be replicated). Nonetheless, correlations of personality with dependent measures seem, at times, to depend in some degree on gender, as in the case of the creativity-hypnotizability correlation discussed earlier and in at least one major parapsychological research paradigm. Palmer (1997) noted that in the work at Charles Honorton's laboratory with "first timers" in the ganzfeld, the extraversion--ESP correlation held up only for female participants and that a similar pattern held in the ganzfeld work at the Rhine Research Center.
Given such possibilities, investigators in both psi and nonpsi work, should, at least initially, routinely include gender in their regression equations. (Gender is, after all, cheap and easy to assess and has a history of relevance in a variety of psychological research domains.) If an observed effect (computed without considering gender) actually depends on gender, then replicability might or might not be had in subsequent studies, depending on the gender composition of their samples, and, of course, the effect in question will not be generalizable across gender. One stays in ignorance of such matters in the absence of gender analyses.
When viewed and investigated from a systematic gender-theoretical perspective (e.g., Deaux & LaFrance, 1998; Deaux & Major, 1987; Eagly & Karau, 1991), gender differences may be highly illuminating about the psychology of the test situation (or of many other situations). Investigators should think carefully a priori about how the requirements for the participants in their study may relate to matters such as gender role demands and considerations of gender identity and, potentially, to selection through volunteering. These considerations underscore the importance of investigators' noting and reporting demographic information relative to their samples. If some element related to the population(s) sampled affects results--and one will never know without doing the statistical analyses--then reporting of this fact and specification of the demographic composition of one's sample (e.g., gender, age, cultural identity, and other potentially relevant person factors) is a fundamental step in advancing an understanding of problems in replicability.
In some of the procedures used by parapsychologists, such as ganzfeld and hypnosis, considerable passivity is required on the part of research participants. Because of male gender role norms that favor staying in charge and being in control (sometimes called agency in the technical literature), procedures that require a high level of overt passivity may disfavor psychological engagement on the part of men (or, at least, on the part of men who think of themselves as stereotypically masculine and not easily manipulated). Pancza (1982) found that how well the trait of psychological absorption correlated with suggestibility seemed to depend on the combination of gender and how assertively the suggestions were worded. (An assertive suggestion states directly that the suggested response [e.g., arm levitation] is happening to/with someone. It provides no experiential justification for the suggested action to happen, but just asserts that it is happening. A hypothetical example is, "Your arm is starting to rise up, g oing up, faster and faster every moment." A nonassertive suggestion may ask the participant to actively imagine something that is compatible with the suggested movement or effect, such as a helium-filled balloon attached to the arm, lifting it up.) The traditional positive correlation of absorption and suggestibility essentially dropped out for males, but not for females, when the suggestions were assertively worded, but not when they were nonassertively worded. Perhaps men, in the face of assertive suggestions, might have disengaged themselves from deep processing of the suggestions, thereby malting irrelevant the trait of psychological absorption.
There probably is no reason that degree of compatibility between an experimental task and gender role norms would not play a role in parapsychological experiments that require a high degree of passivity on the part of participants. Investigators should be vigilant, when planning studies, about the possibility of Gender Role x Situation interactions (and, more generally, about potential gender role conflicts with particular experimental tasks), and they might profit by statistically examining for such effects. If an investigator has no interest in such matters, he or she may wish simply to obviate such possibilities by verbally framing task instructions in ways that should not produce gender role conflicts. An example might be making participants, including male participants, feel comfortable in a hypnosis setting by explaining that they can play an active role in making the suggested things occur, and this might be accompanied by specific strategies for so doing.
It is troubling to see some parapsychological reports in recent years (as in earlier times) that make individual research participants statistically invisible by lumping results across participants by using statistics that are not subject based and with no supplementary subject-based analyses provided. Some reports do not describe recruitment practices or even characterize their participants, generally or in terms of variability, much less provide demographic data. Statistical analyses that ignore between-subjects variability cannot sustain generalization across subjects and, thus, usually are statistically inadvisable, just as they were 30 years ago (as noted by Stanford & Palmer, 1972). Nor can failure to specify populations sampled illuminate the potential generalizability of outcomes across populations. Any investigator orientation that tends to ignore the reality and importance of person variables and personality when individuals confront experimental psi tasks may ultimately be at a loss to account for much that happens in parapsychology.
SPECIAL INTERPRETATIONAL ISSUES RELATED TO PRE-POST DESIGNS, WITHIN-SUBJECTS DESIGNS, AND WITHIN-STUDY CONTEXT EFFECTS: CONSIDERING IMPLICIT INTERACTIONS
Both pre-post designs (i.e., premeasure, followed by intervention, followed by postmeasure) and within-subjects designs can, because of demand characteristics and task-juxtaposition effects, introduce serious interpretational problems that can mislead regarding the nature of the underlying effects unless special precautions are taken in reporting and interpreting the data and, ideally, in designing the study. Additionally, context effects, namely giving one psychological test in the same research context as another (e.g., two personality measures or other questionnaires), potentially may introduce confounds into the correlational (and other) outcomes. It can, for example, substantially alter the correlation between the measures in question. These three problem areas are considered in turn.
Potential Problems Related to Pre-Post Designs
As an example of problems that can occur with pre-post designs, consider a series of studies (Honorton, 1970, 1971; McCallam & Honorton, 1973) intended to show that trial-by-trial feedback in ESP tests can enhance insight about the accuracy of the calls, as well as enhance overall performance. Two other sets of investigators followed up on this seemingly very promising work (Jackson, Franzoi, & Schmeidler, 1977; Kreiman & Ivnisky, 1973). All five studies used Honorton's or a very similar paradigm, and the three from Honorton's laboratory were said to have produced a significant increase in the accuracy of confidence calls consequent to the feedback manipulation. (Confidence calls are calls for which the participant indicates special confidence of accuracy, as per instructions from the experimenter.) For three of the five studies there also was reportedly a significant increase in ESP-task performance from before to after feedback (Honorton, 1970; Kreiman & Ivnisky, 1973; McCallam & Honorton, 1973), a situatio n that, as discussed below, might have contributed, through an artifact of statistical analysis, to the false impression of participants' having learned insight into response accuracy. Jackson et al., through their own attempted replication and retrospective analysis of Honorton's work, would seem to have refuted Honorton's claim of having used feedback to create participant insight into the correctness of ESP-task responses. Jackson et al. reported significantly inaccurate confidence-call accuracy in their own prefeedback condition and disclosed the same in two of Honorton's three studies (based on analyses of data supplied by Honorton at their request). They felt, therefore, that the basic set of findings in this work was one of significantly inaccurate prefeedback confidence-call accuracy combined with no definitive evidence of postfeedback accuracy on confidence calls, which is very different from Honorton's preferred interpretation of having trained insight.
It seemed as though demand characteristics and a resulting expectancy effect might have been involved in pretest below-chance confidence-call accuracy, in the form of suppression of performance in the pretest condition, a suppression that contributed greatly to the objective increase in confidence-call accuracy from pre- to posttest. Stanford (1993) noted a false statistical assumption in the original analyses of studies of this kind, an assumption that surely contributed to the bogus impression of ni-' creased insight consequent to feedback, at least when ESP-task accuracy had increased, for whatever reason, after feedback. The computation of chance accuracy for confidence calls must consider the objective success rate in a given condition, but investigators had not considered this rate and thus had overestimated confidence-call accuracy when, in the postfeedback period, participants had turned in increased calling accuracy. Stanford (1993) reanalyzed this work in light of the objective success rate, and his analyses strongly sustained the most fundamental and important conclusion of Jack son et al., namely that participants tend to be systematically inaccurate in making confidence calls prior to being given the feedback training intended to enhance the accuracy of insight in calling. Although the specific psychological interpretation of these data is unclear in the absence of additional investigation, demand characteristics might well have played a role here, as they presumably often can with pre-post designs and within-subjects designs, both in and outside of parapsychology The feedback paradigm presumably had made evident to participants that a pre-post shift in confidence-calling accuracy was expected. That expectation might have included the idea that they would/should perform poorly on confidence calls before the feedback, given that the feedback evidently was intended to enhance insight. In any event, participants might have had little confidence in their ability to make accurate confidence calls prior to getting feedback, or at a minimum, little inclination to take seriously, in making those calls, the possibility of their being accurate prior to feedback. One or more of these factors might have adversely affected their performance on prefeedback confidence calls. More generally, the capacity of pre-post designs to carry demand characteristics means that their outcomes should be carefully scrutinized for alternative interpretations.
Potential Problems Related to Within-Subjects Designs
Although presently there is no way to be sure of the proper psychological interpretation of the work just discussed, Stanford and Stein (1994), in their meta-analysis of ESP studies involving hypnosis and a comparison condition, found something that seems curiously parallel. They were able to conduct an ANOVA that pooled the results of three very similar within-subjects studies by Casler (1962, Main Experiment; 1964; 1967). This ANOVA showed a significant effect of the manipulation (hypnosis vs. comparison condition) but, also, a significant interaction between that manipulation and the order of testing. This interaction seriously qualifies how one can legitimately interpret the significant effect of the experimental manipulation. The effect of the hypnosis-comparison manipulation was not significant or even suggestive when hypnosis occurred prior to the comparison condition, but it was decidedly significant, in the expected direction, when the comparison condition preceded the hypnosis condition. Of special interest is the fact that when participants did the comparison condition while awaiting the hypnosis condition, they turned in the statistically strongest evidence of ESP in the entire study, but in the form of strong psi missing (below-chance performance) in the comparison condition! (On the other hand, performance for the comparison condition was objectively above, but very close to, mean chance expectation when the comparison condition followed the hypnosis condition.) The psychological circumstances here would seem to parallel those in Honorton's feedback work. In both instances participants performed significantly below mean chance expectation--the clearest statistical evidence of ESP in the studies--when, knowing that they would subsequently participate in a condition in which much better performance was expected, they nonetheless had first to go through a relatively uninteresting comparison or control condition.
Three lessons from such developments are clear: (a) As Palmer (1975) pointed out long ago, it can be very misleading to report change or difference scores in psi performance without also reporting and analyzing the performance in the individual conditions thus contrasted; (b) the data just discussed from two different parapsychological problem areas suggest strongly that cautions are needed in interpreting within-subjects designs (including pre-post designs) generally, because participants do not passively respond to the conditions as individual conditions but to their juxtaposition in the same session (see Poulton, 1973, in relation to awide variety of nonparapsychological work), and, additionally, such designs are almost always fraught with demand characteristics; and (c) when a within-subjects design is used, it is imperative to examine order of testing to learn whether the effects of a manipulation depend on that order--in other words whether order interacts with the manipulation. The psychological meanin g of a condition (experimental or control/comparison) can very plausibly depend on the order of testing. As just one possibility, if participants are eager to experience hypnosis, they may resent--or, at the least, find very boring--first having to sit through a control/comparison condition. Here is where experimenters must put on their psychologists' hats and never consider research participants as passive objects of an experimental manipulation, but as thinkers and feelers who compare and contrast those conditions. By the way, counterbalancing in within-subjects designs does not obviate the kind of potential problem discussed here.
Poulton's (1973) very important nonparapsychological paper, cited above, merits further consideration than it apparently has received either from psychologists generally or from parapsychologists. Poulton addressed at length, citing massive empirical evidence, potential unwanted consequences of exposing participants to a variety of conditions or stimuli. (I can think of no reasons that Poulton's ideas and findings would not generalize to parapsychology, given that psi experiments also involve psychology.) Poulton documented changes in outcomes that can occur from the fact of participants' being subjected to more than one situation-even a variety of situations-in a single session. There is evidence that within-subjects designs, precisely because of these often unwanted "range effects" (Poulton's terminology), sometimes can produce different outcomes than do independent-groups designs, both in terms of performance in a given condition and in terms of differential performance across conditions. Poulton opined th at psychologists should not use within-subjects manipulations unless they also use between-subjects ones so that they can assess the specific consequences of juxtaposing conditions (with the exception of work in which the question being addressed is specifically a within-subjects issue). Poulton's ideas suggest that disparate results between studies ostensibly addressing the same problem sometimes can be resolved when one realizes that they used these two very different kinds of designs. Within-subjects designs can often address a very different issue, psychologically speaking, than do between-subjects designs. Experimental manipulations sometimes interact with the type of design used (within-subjects/between-subjects), presumably because the meaning of the stimuli presented varies depending on the context of their presentation. Those who are designing new studies should consider that the psychological meaning of an effect from a within-subjects study often can be very different than the effect of the same in dependent variable in a between-subjects design and that, in the former case, careful examination of order effects and, especially, the interaction of order with manipulation are crucial to interpreting outcomes.
Potential Problems With Context Effects
The discussion of the two preceding sections has been focused on the consequences of having research participants do two or more things in a single session, but such discussion would not be complete without some mention of the consequences of giving two or more questionnaires or personality tests in the same experimental session. These consequences fall under the rubric context effects (CEs) in personality research. As discussed here, this term designates instances in which the measured degree of correlation between two personality measures depends on whether or not those tests have been given in the same testing context. CEs are of great potential importance to parapsychologists, who, like psychologists, often administer more than one trait-related measure in the same session. Council (1993) documented the frequency of this practice in a majorjournal of personality research, provided an illuminating review of the distortion of outcomes by CEs, and showed the wide generality of their occurrence. Council's rev iew should be read by everyone who uses trait measures, but one of his findings may hold special interest for parapsychologists. When psychological-symptom subscales and a measure of paranormal beliefs were given in separate sessions by different experimenters, the expected positive correlations between the clinical measures and paranormal beliefs were more evident than when these measures were given in the same session by the same experimenter. Thus, when participants easily could have inferred that an investigator expected a positive relationship between paranormal beliefs and clinical symptoms, a bona fide relationship between these variables seemingly was reduced or obscured.
The occurrence, magnitude, and direction of CEs likely depend on attributes of particular studies and, hence, should vary substantially across them. The actual mechanisms underlying CEs have been inadequately studied. Council (1993) did mention several potentially important factors: social desirability, need for consistency, compliance with transparent demand characteristics, and impression management, among others.
The outcomes of psychological testing depend on far more than the nature of the test items themselves, however "valid and reliable" such tests may be said to be on the basis of other studies. Investigators who use psychological tests without considering the meaning in participants' minds that can be created by their contextual juxtaposition do so at considerable risk of obtaining misleading findings and of reaching doubtful conclusions. Council's (1993) very useful review also discusses how hierarchical multiple regression analysis may be used to study CEs, analysis that focuses on the potential interaction between the context manipulation and a predictor variable as they jointly affect the dependent variable. Readers who wish a broader perspective on CEs in psychological research may wish to consult, additionally, the volume edited by Schwarz and Sudman (1992).
Council (1993, p. 33) made the important point that counterbalancing the order of tests does not necessarily control for context effects. The reasons for this are beyond the present discussion, but I would point out that the nature and kind of context effects may be order dependent. Ganzfeld-ESP research may provide a case in point. Honorton et al. (1998) used meta-analysis to examine the extraversion-ESP correlation wherein extraverts perform better. They found evidence that this correlation was, in the case of forced-choice ESP tests, confined to work in which the ESP test preceded the extraversion test. What is more, this correlation differed significantly for the two orders of testing, an effect that appeared to generalize across the variables of individual-group testing, type of extraversion scale, and extrasensory mode (with agent or clairvoyance). (This generality of outcome may help a bit to rule out possible cross-studies confounds in this evidence of an order moderator, but because not all possible confounds were addressed, it does not fully resolve the issue.) This set of findings raised a question-not answered fully or adequately, to date, so far as I am aware--about whether the seeming order dependency of the extraversion-ESP correlation with forced-choice ESP tasks means that the extraversion-ESP correlation was artifactual. Taking the ESP test before the extraversion scale might, if participants learn their ESP scores, influence their extraversion scores--and all of this would have nothing to do with the occurrence of ESP.
What the preceding discussion suggests is that there is a psychology to the experimental session as a whole that often extends beyond matters of immediate interest to the experimenter and that sometimes can compromise the internal validity of the study. Participants in within-subjects designs may be responding to a configuration of test conditions or to a juxtaposition of them, or even to a particular order of testing. Potential order effects all too often simply are not discussed in reports of psychological and of parapsychological research, and relevant analyses often are not reported. Potential context effects too often are ignored in discussing results involving psychological tests, even when a study is rife with possibilities of that kind. These remarks about a substantial incidence of past failures in these regards are, in my view, applicable both to the psychological and to the parapsychological literatures.
A final suggestion may be useful in relation to assessing the psychological consequences of a within-subjects design. It is always wise to examine and report the correlation of performance across conditions when a within-subjects design has been used. If the participants in general tend to see the two conditions as similar and react to them affectively in much the same way--perhaps a kind of assimilation effect--one might expect a positive correlation of performance between conditions. If, on the other hand, participants in general tend to see the two conditions as very different or even react to them with different affect--a kind of contrast effect--one might expect a negative correlation of performance between those two conditions. With both psi and nonpsi measures, computing such correlations potentially can strongly inform one's understanding of the psychological consequences of an experimental manipulation, yet this potentially very revealing information seems too rarely reported, both in psychology and in parapsychology.
WORKING CLOSE TO THE PHENOMENA THEMSELVES: THE VALUE OF PERFORMANCE-BASED MEASURES AS PREDICTORS
If an investigator believes that certain psychological events (or even personal dispositions) shape the likelihood of psi occurring and of its occurring in a particular way, those events (or those dispositions) must somehow be measured or manipulated and their potential consequences measured. It is important that an investigator measure as directly and, hence, validly as possible the manifestations of those events (or those dispositions) that should directly affect psi performance. If participants take a forced-choice ESP task and the investigator believes that spontaneity (i.e., freedom from analytical and pseudo-rational response constraints) will allow more ESP to occur, one can envision a broad continuum of ostensible dispositional measures. One investigator may take the broad personality-testing approach and try to develop and validate a self-report (or other-report) questionnaire to assess the participant's inclinations with regard to spontaneity in everyday life, in the hope that this operationalized c onstruct will predict ESP-task performance. Others may want to confine their questions to retrospective ones about what happened during the session as the participant tried to do the ESP task, but it should be remembered that persons often simply lack insight into (or, sometimes, accurate recall of) what they did and why they did it. In short, their self-reports may be invalid, even if they are honest. Both of the above approaches seem somewhat remote from what the investigator actually wants to study. What logically should best predict ESP-task performance would be a measure of the degree of relevant constraint that the individual exhibits while actually taking the ESP test. The participant may have a broad personality trait that predicts general tendencies across a variety of situations, or he or she may not. Or the participant may have a disposition for spontaneity in particular settings but not in others. But what about the current setting? What does that setting, which is here and now, do to spontaneity for this individual participant? The situation at hand may affect some participants one way and do something very different to others, or it may press all toward a similar level of spontaneity. This could well leave the personality-process-oriented investigator a bit frustrated. On the other hand, an investigator might profitably develop response-based measures, based on calling patterns that should reflect the operation of response constraints (or, conversely, of spontaneity) in the test at hand. The investigator should also experimentally manipulate spontaneity to ascertain the effect of this variable on both the putative measure of spontaneity and ESP-task performance. Mediational analysis would also be in order for the purpose of assessing whether any effect of the spontaneity manipulation on ESP-task performance was mediated by reduction in participant response constraints. The experimental manipulations should help to eliminate the possibility that what originally was studied purely correlationally simp ly was an extraneous (i.e., unwanted) confounding variable. This general approach to investigation of spontaneity/constraint is one example of what is meant by working close to the phenomena of interest. Those phenomena certainly occur within the ESP task itself.
Performance-based measures have the advantage of measuring what participants do, not merely what they say they have done in the past or what they think they will do in the future. Even in the most honest of participants, accuracy of memory and judgment may always be questioned. Performance-based measures often can be had in the same session (and, sometimes, even in the same task setting) in which one wishes to use them to predict performance. Performance-based measures, must, of course, be carefully developed and constructed to reflect the construct one wishes to measure and should, ideally, be measured in a setting psychologically close to that in which prediction is desired. (This assumes that the researcher's goal is to predict performance on the basis of a theoretically relevant construct. If the goal is to develop a performance-based measure of a general trait of a certain kind, then measurement across a variety of settings of a demonstrably similar or identical construct is necessary.) Over the years I have successfully applied the performance-measure strategy for a variety of parapsychological purposes.
Here are some examples: (a) in one study (Stanford, 1969) I used free word association to provide a measure of the spontaneous tendency to use sensory imagery and used that measure to predict whether participants did better with instructions to visually imagine a PK-task target than with instructions that did not involve imagery; this demonstrated a conceptually meaningful interaction between a dispositional measure and the character of PK-task instructions. (b) In another study (Stanford, 1970), a performance-based measure of participants' proficiency at incidental memory (of circumstances in a laboratory setting that they had not expected to be asked to remember) was useful in predicting extrasensory effects upon memory for a different set of materials, namely for a story they had heard during the session. (c) Other work (reviewed by Stanford, 1975, with confirmation by another investigator) showed that response constraints on calling in a forced-choice ESP task (i.e., the tendency to balance calls across t argets) result in (or are associated with) reduced variation in scores about mean chance expectation. (d) Other work (reviewed by Stanford, 1975, conducted by Stanford and confirmed by others) provided evidence that when ESP participants produce a response contrary to their own biases, it is particularly likely to be correct; I have speculated that this is due to a reduction in false alarms relative to a target against which the participant is statistically, but not necessarily emotionally, biased.
Items (c) and (d) above suggest that "If you want to predict psi performance, you can make the prediction on the basis of the psychology of what is happening right in the test." The most valid and reliable prediction often should be had on the basis of nonpsi measures derived within the psi test itself. One distinct advantage of ESP-test-based nonpsi predictors is that one can, using such measures, potentially establish whether a particular psychological situation exists in the test, with a particular participant (or, even participants, generally), at a particular time. One can hope, on those grounds, that the predictor will have some utility. For example, it can be ascertained whether a given participant exhibits (or participants generally exhibit) a degree of distinct bias against a particular target or whether a given participant tends (or participants generally tend) to show evidence in this test setting of pseudo-rational constraints in their calling (e.g., a strong tendency to balance calls across the t argets). Thus, such performance-based measures have the special advantage--aside from being potential predictors of ESP-task performance--that they illuminate the psychology of the test situation. Physiological measures of a variety of kinds conceivably also could be among the task-based predictors of ESP-task outcomes, and there is already a considerable literature of that kind related to EEG measures as predictors of ESP-task performance. I will not attempt here a review of that complex literature.
Performance-based predictors, including those generated during ESP testing, seem potentially very useful, for reasons already discussed. What is critical here is great care in developing a measure that represents the construct that one really wishes to study.
RATHER THAN SUMMING IT UP
The proposals advanced in this article reflect a fundamental optimism, optimism that parapsychology has the same potential for systematic, incremental, conceptual advance as is found in other sciences. I hope that readers will, through implementing some of the strategies proposed here, find ways to enhance replicability by advancing a research-based understanding of psi events. This article has focused on the development and testing of conceptual hypotheses and has, in the process, detailed some potentially fruitful approaches and exposed some well-documented potential pitfalls. Conceptual advances in parapsychology may not be easy or quick and may not attract the public attention accorded claims of having created psi-demonstration paradigms (e.g., with ganzfeld or hypnosis). Nonetheless, efforts such as those advocated here should, in the long run, more effectively advance the fundamental scientific objectives of understanding, prediction, and control than do efforts to create demonstrations in the absence o f understanding. It can be hoped that such advances will reveal, in the events that psi researchers study, the lawfulness and internal coherence found everywhere in nature.
AJZEN, I., & FISHBEIN, M. (1977). Attitude--behavior relations: A theoretical analysis and review of empirical research. Psychological Bulletin, 34, 888--918.
BARON, R. M., & KENNY, D. A. (1986). The moderator--mediator variable distinction in social psychological research: Conceptual, strategic and statistical considerations. Journal of Personality and Social Psychology, 51, 1173--1182.
BEM, D.J., & HONORTON, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bulletin, 115, 4--18.
CASLER, L. (1962). The improvement of clairvoyance scores by means of hypnotic suggestion. Journal of Parapsychology, 26, 77-87.
CASLER, L. (1964). The effects of hypnosis on ESP. Journal of Parapsychology, 28, 126-134.
CASLER, L. (1967). self-generated hypnotic suggestions and clairvoyance. International Journal of Parapsychology, 9, 125-128.
COUNCIL, J. R. (1993). Context effects in personality research. Current Directions in Psychological Science, 2, 31-34.
DEAUX, K., & LAFRANCE, M. (1998). Gender. In D. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The handbook of social psychology: Vol. I (4th ed., pp. 788-827). New York: Oxford University Press.
DEAUX, K., & MAJOR, B. (1987). Putting gender into context An interactive model of gender-related behavior. Psychological Review, 94, 369-389.
EAGLY, A. H., & CHAIKEN, S. (1998). Attitude structure and function. In D. T. Gilbert, S. T. Fiske, & C. Lindzey (Eds.), The handbook of social psychology: Vol. 1 (4th ed., pp. 269-322). New York: McGraw-Hill (distributed exclusively by New York: Oxford University Press).
EAGLY, A. H., & KARAU, S.J. (1991). Gender and the emergence of leaders: A meta-analysis. Journal of Personality and Social Psychology, 60, 685-710.
EYSENCK, H.J. (1967). The biological basis of personality. Springfield, IL: Charles C. Thomas.
HILGARD, E. R., WEITZENHOFFER, A. M., LANDES, J., & MOORE, R. K. (1961). The distribution of susceptibility to hypnosis in a student population: A study using the Stanford Hypnotic Susceptibility Scale. Psychological Monographs: General and Applied, 75(8, Whole No. 512).
HONORTON, C. (1970). Effects of feedback on discrimination between correct and incorrect ESP responses. Journal of the American Society for Psychical Research, 64, 404-410.
HONORTON, C. (1971). Effects of feedback on discrimination between correct and incorrect ESP responses: A replication study. Journal of the American Society for Psychical Research, 65, 155-161.
HONORTON, C. (1972). Significant factors in hypnotically-induced clairvoyant dreams. Journal of the American Society for Psychical Research, 66, 86-102.
HONORTON, C. (1997). The ganzfeld novice: Four predictors of initial ESP performance. Journal of Parapsychology, 61, 143-158.
HONORTON, C., FERRARI, D. C., & BEM, D.J. (1998). Extraversion and ESP performance: A meta-analysis and a new confirmation. Journal of Parapsychology, 62, 255-276.
HONORTON, C., & KRIPPNER, S. (1969). Hypnosis and ESP performance: A review of the experimental literature. Journal of the American Society for Psychical Research, 63, 214-252.
JACKSON, M., FRANZOI, S., & SCHMEIDLER, G. C. (1977). Effects of feedback on ESP: A curious partial replication. Journal of the American Society for Psychical Research, 71, 147-176.
KENNY, D. A., KASHY, D. A., & BOLGER, N. (1998). Data analysis in social psychology In D. Gilbert, S. T. Fiske, & C. Lindzey (Eds.), The handbook of social psychology: Vol. 1(4th ed., pp. 233-265). New York: Oxford University Press.
KRAUS, S.J. (1995). Attitudes and the prediction of behavior: A meta-analysis of the empirical literature. Personality and Social Psychology Bulletin, 21, 58-75.
KREIMAN, N., & IVNISKY, D. (1973). Effects of feedback on ESP responses [Abstract]. Journal of Parapsychology, 37, 369.
LAWRENCE, T. (1993). Gathering in the sheep and goats: A meta-analysis of forced-choice sheep/goat ESP studies, 1947-1993. Proceedings of Presented Papers: The Parapsychological Association 37th Annual Convention, 261-272.
LOVITTS, B. E. (1981). The sheep-goat effect turned upside down. Journal of Parapsychology, 45, 293-309.
LUBINSKI, D., & HUMPHREYS, L. G. (1990). Assessing spurious "moderator effects": Illustrated substantively with the hypothesized ("synergistic") relation between spatial and mathematical ability. Psychological Bulletin, 107, 385-393.
MAcCALLUM, R. C., ZHANG, S., PREACHER, K. J., & RUCKER, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19-40.
MAXWELL, S. E., & DELANEY, H. D. (1993). Bivariate median splits and spurious statistical significance. Psychological Bulletin, 113, 181-190.
McCALLAM, E., & HONORTON, C. (1973). Effects of feedback on discrimination between correct and incorrect ESP responses: A further replication and extension. Journal of the American Society for Psychical Research, 67,77-85.
PALMER, J. (1972). Scoring in ES? tests as a function of belief in ESP: Part II. Beyond the sheep-goat effect. Journal of the American Society for Psychical Research, 66, 1-26.
PALMER J. (1975). Three models of psi test performance. Journal of the American Society for Psychical Research, 69, 333-339.
PALMER, J. (1997). Correlates of ESP magnitude and direction in the PRL and RRC autoganzfeld data bases. Proceedings of Presented Papers: The Parapsychological Association 40th Annual Convention, 283-298.
PANCZA, R. N. (1982). Assertiveness of suggestions and the absorption-suggestibility correlation. Unpublished doctoral dissertation, St. John's University, Jamaica, NY
PEDHAZUR, E. J. (1982). Multiple regression in behavioral research (2nd ed.). New York: Holt, Rinehart & Winston.
POULTON, E. C. (1973). Unwanted range effects from using within-subject experimental designs. Psycho logical Bulletin, 80, 113-121.
SCHWARZ, N., & SUDMAN, S. (1992). Context effects in social and psychological research. New York: Springer-Verlag.
SHAMES, V. A., & BOWERS, P. G. (1992). Hypnosis and creativity In E. Fromm & M. R Nash (Eds.), Contemporary hypnosis research (pp. 334-363). New York: Guilford Press.
SMITH, M. D. (2001). The problem of replication and the 'psi-conducive' experimenter. Proceedings of Presented Papers: The Parapsychological Association 44th Annual Convention, 320-333.
SNYDER, M., & ICKES, W. (1985). Personality and social behavior. In G. Lindzey & E. Aronson (Eds.), Handbook of social psychology: Vol. II. Special fields and applications (3rd ed., pp. 883.-947). New York: Random House.
STANFORD, R. G. (1969). "Associative activation of the unconscious" and "visualization" as methods for influencing the PK target. Journal of the American Society for Psychical Research, 63, 338-351.
STANFORD, R. G. (1970). Extrasensory effects upon "memory." Journal of the American Society for Psychical Research, 64, 161-186.
STANFORD, R. G. (1975). Response factors in extrasensory performance. Journal of Communication, 25,153-161.
STANFORD, R. G. (1993). Learning to lure the rabbit: Charles Honorton's process-relevant ESP research. Journal of Parapsychology, 57, 129-175.
STANFORD, R. G., & FRANK, S. (1991). The prediction of ganzfeld ESP-task performance from session-based verbal indicators of psychological function: A second stud y. Journal of Parapsychology, 55, 349-376.
STANFORD, R. G., FRANK, S., KASS, G., & SKOLL, S. (1989). Ganzfeld as an ESP-favorable setting: Part I. Assessment of spontaneity, arousal, and internal attention state through verbal-transcript analysis. Journal of Parapsychology, 53, 1-42.
STANFORD, R. G., & PALMER, J. P. (1972). Some statistical considerations concerning process-oriented research in parapsychology Journal of the American Society for Psychical Research, 66, 166-179.
STANFORD, R. G., & STEIN, A. (1994). A meta-analysis of ESP studies contrasting hypnosis and a comparison condition. Journal of Parapsychology, 58, 236-269.
SB-15 Marillac Hall
St. John's University
8000 Utopia Parkway
Jamaica, NY 11439, USA
I greatefully acknowledge the help of two anonymous reviewers who provided insightful criticism and suggestions for improvement of the manuscript. I also greatefully acknowledge te help of Michele Morganstern and Naomi Solomon, students in the Ph.D. program is Clinical Psychology at St. John's University, who read and provided helpful editorial suggestions on a draft of the manuscript.
|Printer friendly Cite/link Email Feedback|
|Author:||Stanford, Rex G.|
|Publication:||The Journal of Parapsychology|
|Date:||Mar 22, 2003|
|Previous Article:||Scientists, shamans, and sages: gazing through six hats.|
|Next Article:||The capricious, actively evasive, unsustainable nature of psi: a summary and hypotheses.|