Printer Friendly

An application of structural equation modeling in public opinion research: conceptualizing public and opinions.

For some time theorists and investigators in the field of public opinion have discussed the precise meaning of the terms opinion, and public and the combination of the two into the single term public opinion. Our present contribution to this discussion starts from the point of view of the structural equation analyst. (We will later elaborate on structural equation modeling and analysis.)

With regard to the first term, the difference between attitude and opinion has been a constant topic of discussion. We summarize here the main points of this discussion, paraphrasing the excellent review by Price (1992). Although the terms opinion and attitude often tend to be used interchangeably, they connote concepts that are distinguished by many scientists. Attitude is traditionally conceptualized as a global, enduring orientation toward a general class of stimuli, whereas an opinion is seen more situationally, pertaining to a specific issue. According to Thurstone (1928), opinions are manifest indicators of unobserved attitudes. Converse (1970), however, found that the opinions of most respondents in a survey are extremely unstable. He even concluded that the political opinions measured in most surveys might just as easily reflect mental tosses of a coin, leading to wholesale skepticism about the possibility of interpreting opinion as an empirical indicator of an unobserved attitude. His research suggests that verbalized opinion ought to be taken at face value, as surface-level behaviors that do not necessarily imply any underlying attitude. Fleming (1967), however, pushed Thurstone's manifest-latent distinction an important step further. He claimed that, because pollsters chose to use the term public opinion in reference to their poll results, opinion became the commonly accepted term for an expressed pro or con position on a political issue. Opinions are thus the behavioral phenomena to be explained, whereas the term attitude is reserved for reference to the deeper, underlying motives for those behaviors.

When opinions are conceptualized as dependent variables in a structural equation model, and attitudes, in addition to other relevant variables, as explanatory variables, structural equation analysis may reveal to what extent opinions are determined by attitudes. In this article, results of a structural equation analysis are confronted with those of longitudinal research. In this way it is possible to investigate whether the two views mentioned above can be reconciled, i.e. whether it is possible that respondents' opinions are strongly dependent on underlying attitudes and nevertheless easily vary in time.

With reference to structural equation analysis, one approaches the concept of public here from an unusual point of view, that is to say starting from the question of which respondents dropped from the analysis because of non-response. The public opinion researcher is haunted through his surveys by several kinds of missing observations that can be broadly classified as (1) observations that are missing because certain (categories of) relevant potential respondents could not be reached, and (2) observations that are missing because some respondents were not able or willing to answer the corresponding questions. It is the latter class of non-response that is of concern here.

If the scope of a researcher is confined to a simple description of public opinion on a particular issue, neither the researcher nor the readers of the research report will generally be concerned about the extent to which the second type of non-response may influence the results, provided the researcher has ensured a representative sample from the relevant population and that the degree of non-response is not excessive. The missing scores may or may not be included and reported as a separate response category.

If the researcher wishes to describe the development of public opinion over a certain period, he will repeat his survey with the same variables, and roughly the same numbers of missing scores will probably occur.

Researchers with a more social-psychological interest will also wonder which influences affect the opinions of respondents and the extent to which they do. We can distinguish:

(1) factors, dependent on time or not, that are linked to the respondents (for instance: attitudes) and that cause the opinions to vary within the sampled population at a particular moment;

(2) factors beyond the respondents (for instance: developments in society) that eventually cause a person's opinion to vary in time.

The influences of the first category can be investigated by multiple regression or structural equation analysis. To calibrate influences of the second category requires repeated measurements. Theoretically, a panel design is the most suitable to investigate both types of influence simultaneously. A panel design, however, is expensive and imposes difficult logistic tasks and methodological problems on the researcher, including the question of how to avoid an accumulating non-response.

This report describes the application of a less far-reaching design: a comprehensive survey in which opinions and variables that are supposed to influence these opinions arc measured. This survey entails measurements of the opinions within a series of independent samples at successive times, as well as indicators of socioeconomic developments over the same period. The substantive results of the research have already been published in previous publications (Maassen and De Goede 1989, 1991) and are not of central importance here. In this text the results are merely illustrative whereas the emphasis in this article is on methodological considerations. In particular, attention is directed to the way that the type of analysis (descriptive analysis of repeated cross-sections, multiple regression or structural equation analysis within one cross-section) in such a design may affect the occurrence of missing values and the composition of the sample involved. Discussion will include how the different types of analysis call for a different elaboration of the operationalization of the dependent variables. Structural equation analysis is a more refined technique than multiple regression analysis, its extra advantages will be demonstrated.

In the author's view, the example may be regarded as representative of a category of applications, and the discussion of the problems which can arise may be useful for researchers who are planning similar studies. The field from which the example is taken is research into public opinion on the unemployed in the Netherlands. However, before addressing the central issues of this article, a discussion of some aspects of the technical context in which this text was written and the social context in which the research arose is called for.


Many public opinion researchers will be familiar with multiple regression analysis as a technique for analysing the effects of a number of independent variables on a dependent variable. For instance, multiple regression may be used to calibrate the effects of respondents' characteristics on their opinions. Here, a summary of the main aspects of the technique is presented in order to contrast it clearly with structural equation analysis (for comprehensive texts on multiple regression see Pedhazur 1982, Weisberg 1985).

If all the independent variables are considered equivalent and simultaneously included in the analysis, the analysis is called simultaneous multiple regression. Most of such analyses aim at an optimal prediction of the dependent variable, which is usually tantamount to a maximum proportion of explained variance. In this type of analysis, independent variables are called predictors.

When the researcher finds a hierarchy discernible among the independent variables, of interest is the proportion of the variance explained by an independent variable or a set of independent variables, in addition to the proportion already explained by previously included independent variables. An analysis of this type is called stepwise multiple regression. (Simultaneous inclusion of a set of independent variables at one step within the process of stepwise multiple regression is called blockwise inclusion.)

A researcher often has a conception of the temporal order or the causal direction In which the independent variables are related to each other, in which case the researcher may formulate hypotheses that specify which independent variables are influenced by other independent variables and which variables are not. In fact, the analysis is then composed of a series of multiple regressions, and such a complex of multiple regressions is called path analysis (Pedhazur 1982, ch. 15). Because, within this context, the independent variables are obviously selected because of their explanatory power rather than on the basis of their significance as predictors, they are called explanatory variables. The influence of an explanatory variable on the dependent variable can be expressed as a direct effect (i.e. the path coefficient) or an indirect effect (i.e. via other explanatory variables; Pedhazur 1982, P. 588). The effects are calculated from the path coefficients among explanatory variables, and the path coefficients between explanatory variables and dependent variables. These coefficients, of course, are calculated from the intercorrelation of all variables included in the analysis.

It is well known, particularly in the social sciences, that the measurement of variables is clouded by measurement error, which often causes the correlations and path coefficients to be considerably attenuated. With the traditional multiple regression analysis one obtains only attenuated estimations of the direct and indirect effects.

Since the 1970s, structural equation analysis techniques (LISREL, EQS, AMOS) have been developed that enable the researcher to take into account the fact that most of the relevant concepts are assessed by means of fallible measurements. A structural equation model is a path model in which such concepts can be adopted as latent variables, together with a number of observed variables as indicators of the concept. Measurement models that specify which observed variables indicate a latent concept, and regression equations between latent concepts, constitute a structural equation model. These techniques enable correlations and path coefficients between the latent variables to be calculated, thus yielding estimations of the effects of the explanatory variables on the dependent variables, disattenuated for measurement errors (Joreskog and Sorbom. 1989, ch. 5). Structural equation analysis has all the characteristics of path analysis, which thereby entails all the advantages compared with simple multiple regression, plus the advantage of allowing the researcher to avoid cumbersome questions as to how to proceed with the calculation of factor scores or the rotation of factor solutions. (In the following, when the term path analysis is used, structural equation analysis is implied, unless stated otherwise. Because LISREL has the longest history and, therefore, researchers are more familiar with this program, LISREL will be referred to throughout this article.)

Because a technique such as LISREL enables the researcher to adopt batteries of items to indicate latent variables, path analyses contain growing numbers of variables. Unfortunately, a larger number of variables is often accompanied by an increased number of cases with incomplete rows in the data matrix. This, in particular, causes the structural equation analyst to ask: `What is an adequate way of handling missing values?'

In 1976, Rubin specified the conditions under which the mechanism generating missing values can be ignored, introducing the following terminology (Rubin 1976, Allison 1987). When only one variable is considered, missing data are called missing at random (MAR) when the observed units are a random subsample of the sampled units. Considering the regression of a dependent variable Y on at least one independent variable X, again, the missing data are called MAR when the probability of Y response is not dependent on the Y value; it may be dependent on the X value. When the probability of Y response does not depend on the X value, the observed data are called observed at random (OAR). When the missing data are MAR and the observed data are OAR, the missing data are called missing completely at random (MCAR).

The researcher who relies on the most popular analysis software in the social sciences, SPSS, and leaves the strategy for handling the missing data to the computer, implicitly chooses the option called listwise deletion, or, in the terms used by Little and Rubin (1987), the complete-case method. We prefer to use the term listwise selection. Then, only those units are analyzed that possess scores on all the variables involved. In many cases the researcher will discover that his sample is considerably reduced. For that reason many researchers seem to prefer the option of pairwise deletion, or, again in the terms of Little and Rubin, the pairwise available-case method (here the term pairwise selection is used). In that case, every correlation or covariance of the matrix on which the path analysis is based is calculated over all the cases for which scores on both variables involved are present. In cases where the data are missing completely at random, both methods will yield roughly the same results. The advantage of the latter approach is that all the available data of the observed variables will be used.

In their standard work, under the title `pragmatic methods', Little and Rubin (1987) present several so-called imputation techniques apart from the procedures that have been mentioned briefly above. These methods aim at estimating missing data on the basis of the available observations and substituting the missing values by these estimations. Thus, the number of useful units is expanded, and, positive semi-definiteness of the covariance matrix is maintained (Tatsuoka 1988, p. 156). The simplest procedure is substitution by the means calculated over all the available scores, an option provided by, for example, SPSS. Another accepted approach is substitution by multiple regression estimates, an option offered by the package BMDP. All the methods mentioned so far are applicable when the missing data are MCAR.

Little and Rubin (1987) paid most attention to more sophisticated methods that are also applicable when the data are not OAF, that is maximum likelihood techniques to estimate means and covariances. Recent articles have been published that are more directly connected with path analysis or structural equation models. Maximum likelihood methods for estimating the population values of the model parameters, on the basis of incomplete data, were introduced by Allison (1987) and Muthen, Kaplan and Hollis (1987). These authors regard their method as particularly useful when the number of different missing value patterns is relatively small (e.g. in multi-wave studies, a type of research that falls beyond the scope of this article).

For the moment we confine ourselves to these general notions on the handling of missing scores. In later sections the applicability of these options in research like the present example is considered.


With the development of a welfare state in the Netherlands after the Second World War, a maintenance state was designed as well. In the 1970s, a welfare system that was among the most generous in the world was finally completed. A minimum wage for the employed was guaranteed by legislation. People who were unable to find work in the field of their education (because no vacancy was available) or in their profession (after losing their job if they had had one) were entitled to social benefit that was not substantially lower than the statutory minimum wage for the employed.

With the onset of the recession in the early 1970s, and the consequent increase in the rate of unemployment, the nation was faced with many jobless people on one hand and many unfilled vacancies on the other. The alleged abuse of the social security system became an issue of public concern, with many citizens holding the view that a number of unemployed people were taking improper advantage of the system: they would be able to find a job if they were more accommodating. In addition, large-scale abuse of the social security system was assumed, in the sense that many people were believed to receive social benefits by concealing that they actually had a job and were already earning their living. The issue of abuse became a constant point of discussion at private parties, in the media and in politics. By the mid-1970s, it was assumed in the literature that the public view of the unemployed was a negative view, although actually this assumption was not yet based on empirical evidence. In order to provide empirical material, in collaboration with Martijn P. M. De Goede, the present author began research in 1975 into public opinion on the unemployed and the functioning of the social security system.

At the beginning of the 1980s, the Dutch economy, like that of many other Western countries, was faced with serious problems. The slump in world trade caused the unemployment rate to rise dramatically and the number of registered vacancies to drop sharply. Since then, the rate of unemployment has never returned to the level of the 1970s. In this new climate, the social security legislation proved too expensive to maintain, and 1987 saw the beginning of a period when it was steadily dismantled. With these successive modifications of the social benefit system it is conceivable that the public view of the unemployed has also changed. The author's research into public opinion on those out of work in the Netherlands developed into a program of repeated cross-sections through the 1970s, and 1 980s.

Several topics were selected to cover public opinion on the unemployed. One of them was the image projected by the public onto (the stereotyping of) unemployed people. This was operationalized to make it suitable for the application of structural equation modeling (see below under `Instruments'). In this publication, the modeling of the stereotyping of the unemployed will be used as an illustration.



For the cross-sectional analyses in this article, data were used from a random sample (N = 2007) drawn from the Dutch population in 1980 by the Social and Cultural Planning Office within the framework of the longitudinal program Cultural Changes in the Netherlands (CCN). The data set is well known in the Netherlands and has already been used in many previous publications. Apart from the 1980 survey, denoted CCN80, for the repeated cross-sectional analyses the data from surveys CCN85 (N = 1966) and CCN87 (N = 1990) were also used as well as data from surveys conducted by Maassen and De Goede in 1975 (N=511) and 1984 (N = 844).


The following variables were employed (among many others) in the questionnaires and played a part in this report.

In the repeated cross-sections

Stereotyping of the unemployed. In each type of analysis, the same items are employed for the measurement of the image projected onto the unemployed. The elaboration of the operationalization, however. is different in each type of analysis. The items are selected as follows. In a pilot study, a list of some characteristics that the public might ascribe to the unemployed was presented to a sample of respondents. Twelve of these characteristics proved to be useful for further research: industrious; vigorous; honest; reliable; responsible; lacking spirit; competent; tendency to be lazy; power of endurance; valuable to society; of weak character; not very active. In the main study, new random samples of respondents were asked to indicate for each characteristic to which category they felt it to be more applicable: to those who work or to the unemployed. We supplied a 5-point scale ranging from (1) applies far more to working people, to (5) applies far more to the unemployed. The neutral category was (3) no difference.(1)

Principal axes factor analysis, followed by varimax rotation of the 1975 survey data, yielded a factor solution with two interpretable factors: (1) an opinion of the integrity of unemployed people, indicated by the items `honest' and `reliable', and (2) an opinion of the vitality of unemployed people, indicated by the remaining 10 items. Factor analyses of the data from surveys in 1980, 1985 and 1987 produced a similar factor structure.

It is common practice in cross-sectional studies to calculate factor scores as linear combinations of the full item scales (Nunnally 1978, p. 604). The use of such factor scores within a longitudinal design is not without its problems, however. The objections raised against interpreting the differences between the means of a variable measured on two different occasions in terms of change (Plewis 1985, ch. 2) also apply in principle to the comparison of means constituting a time series. In particular, the instability of the scale vitiates such a comparison. The scale is affected by such factors as the order of questions within the questionnaire (Maassen and De Goede 1993) and the societal salience of the object of study at the time of measurement (which may cause the respondents to express their opinion in more or less extreme terms).(2) Problems of scale are assumed to be less important, however, for categorical data (Plewis 1985, p. 28). In cohort analysis (see for example Glenn 1977), it is usual to compare percentages on dichotomies longitudinally. The dichotomy corresponds to a clear choice or option (pro or con) given to the respondent. In our case, the most significant incision of the scores on each item is the one that separates respondents who assess the unemployed negatively (respondents with scores 1 and 2 for a positively formulated item or 4 and 5 for a negatively phrased item) and those who do not (respondents with Scores 3, 4 and 5 for a positively formulated item or 1, 2 and 3 for a negatively worded item). The latter category consists almost exclusively of persons who do not discriminate between those who work and the jobless (those with score 3). The number of respondents who are more positive about unemployed persons than about workers is negligible. Accordingly, very little information will be lost if we restrict ourselves to this dichotomy. Subsequently, the scores on the two factors `image of the vitality' and `image of the integrity' are obtained by assigning, per item, the score 0 to respondents who assess the jobless negatively and the score 1 to those who do not; next, these dichotomized item scores were averaged per respondent. Thus, a higher score for these two factors indicates a more favorable image. (Cronbach's alpha for the two factors is, respectively, .84 and .80.)

In the structural equation analysis

Dependent variables (Stereotyping of the unemployed). The two scales constructed in the way described earlier could be included in a structural equation model. However, this procedure would fail to exploit an option provided by LISREL: the implementation of a factor analysis with oblique factors, circumventing the cumbersome choice of a suitable rotation. Moreover, with this option, greater justice can be done to the intercorrelations of the image items (see also Maassen 1991). It is therefore preferable to include all the image items in the model as indicators of the two latent variables.

Explanatory variables

Apart from the variable `age', the following observed variables were selected as explanatory variables for the structural model:

* Education, a 7-point variable that indicates the level of education of a respondent. The scores range from 1 (only primary education) to 7 (university education).

* Receiving social benefit or not, a dichotomous variable that indicates whether the respondent him/herself is in receipt of benefit for the non-employed, with the response categories 0 (no) and 1 (yes).

* Authoritarianism, a selection of seven 5-point items from Adorno's F scale (the items 26, 43, 21, 34, 23, 12 and 31, translated into Dutch; Adorno et al. 1950, p. 255). The selection was made on the basis of factor analyses by Middendorp (1979, p. 199). The scale obtained by adding the scores for the seven items is established in Dutch research literature, therefore the items were included as a scale in the analysis rather than separately

* Socialism, a 5-point scale with which the respondent indicates the extent to which he/she considers him/herself to be a socialist, with scores ranging from 1 (extremely socialist) via 3 (moderate) to 5 (extremely anti-socialist).

* Political self-placement, where the respondent chooses a position on a 5-point scale running from 1 (extreme left) via 3 (centre) to 5 (extreme right).

* Judgement of the respondent of the quality of social security legislation for the unemployed, with scores 1 (too good), 2 (adequate) and 3 (inadequate).

* Judgement of the respondent of the quality of social security legislation for incapacitated people, with the same three response categories.

In the multiple regression and other cross-sectional analyses

Dependent variables (Stereotyping of the unemployed). The objection raised earlier with respect to the longitudinal use of commonly calculated factor scores does not pertain to a cross-sectional analysis. On the basis of reliability studies, the variances and the factor loadings of the 12 image items, the two factor scores to be used in the cross-sectional analyses (image of the vitality and image of the integrity of the unemployed) were defined as the mean scores of the 10 and the two 5-point items, respectively. Because, of course, the negatively phrased items were recoded first, a higher score for these factors indicates a more favorable image of the unemployed. (Cronbach's alpha for the two factors is, respectively, .86 and .84.)

Explanatory variables. In addition to the eight explanatory variables mentioned above, 19 other possibly relevant variables (background variables such as sex or personal characteristics like rigidity) were included in the multiple regression analyses. A detailed description of these variables is not relevant to this report.



Reporting the longitudinal changes in public opinion on the unemployed, we compared the means of the two factors calculated as the mean of dichotomized items (Maassen and De Goede 1991). Because the percentage of item nonresponse never exceeded 10, no attention has been paid, at this time, to the question of whether nonresponse might affect the results. Figure 1 shows the available means for the years 197587. Figure 1 also shows the unemployment rate in the Netherlands for every year throughout the same period.



In bivariate regression a number of variables, background and attitudinal variables, had already proved to be related to opinions on the unemployed. The goal was to gain more insight into the multivariate structure of the association between explanatory and dependent variables, for instance, into the extent to which the influence of a variable can be explained via other variables. The testing of a path model, which in fact may be considered as a structure of partial correlations, is a suitable tool. In this case, a structural equation model was tested with the LISREL program (Maassen and De Goede, 1989). It is not the intention here to discuss the building of this model fully (see, therefore, Maassen and De Goede 1988), but rather to make general reference insofar as this enhances the coherence of the article.

The variable `authoritarianism' was included in the model because of the bivariate correlation and for theoretical reasons: it is assumed that people with a more authoritarian attitude will think that traditional norms and values should be upheld, which obviously includes supporting oneself by working. Apart from bivariate correlations, the variable `age' was included (because older people too are more inclined to think that one should comply with traditional norms) and, for obvious reasons in this research, the variable `in receipt of benefit for the non-employed (unemployed or incapacitated) or not'.

The following categories of variables constitute the model: (1) background variables, `including `age', `level of education' and `whether or not the respondent was personally in receipt of benefits', (2) personality variables represented by `authoritarianism', (3) variables relating to the respondent's `political attitude', (4) variables relating to `opinions about the functioning of social legislation', and (5) dependent variables that indicate `stereotyping of the unemployed'.

The objection may be raised that the nature of the variables included in the model is purely sociological or socio-psychological. Would it not be more justified to use the term `public opinion research' where variables relating to that field are `included (e.g. information level of the respondents)? The answer is that in cross-sectional analyses the effect of such variables proves to be negligible `in comparison with the variables that are included. In order to keep the structural model manageable, it was decided to exclude those variables from the analysis. The role of variables typically belonging to the domain of public opinion research is more apparent in the longitudinal analysis (Maassen and De Goede 1991).

With regard to the elements of the model, it was felt that the variables and concepts from the aforementioned categories (3) and (4) are closer to the subject under investigation (and therefore to the dependent variables) and so should be given a place in the model between the variables from categories (1) and (2) on the one hand and the dependent variables on the other hand.

Building a structural equation model, it is possible to make a distinction between latent factors and their corresponding indicators. Age, level of education and `being in receipt or not of social security payments for the non-employed' are included as perfectly measured variables; the following constructs are represented in the model by latent factors: authoritarian, socialism, opinion about the functioning of social security for the non-employed, the opinion of the integrity and of the vitality of unemployed people (introduced earlier). The resulting model is shown in Figure 2.


Testing the path model of Figure 2 by means of LISREL yields [chi square] = 238.6, df= 141 (p [is less than] .001). We consider this to be a satisfactory solution, taking into account the sample size (863), the ratio of [chi square] and df(i.e. 1 .7), the fitted residuals and the Q-plot. The main results are presented in Table 1, which shows the total and direct effects (i.e. path coefficients) of the latent explanatory concepts on the latent dependent concepts. (The indirect effects can be obtained by subtracting the direct effects from the total effects.)

TABLE 1 Direct effects (path coefficients) and total effects on the latent image factors in Figure 2, yielded by a LISREL analysis of the 1980 data. T-values in parentheses
                  AGE      EDUCAT   NOWORK   AUTHOR

Direct effects
IMGVIT              .18     --        .09       -.65
                  (4.04)            (2.74)   (-10.79)
IMGINT              .15     --        .09       -.55
                  (3.66)            (2.72)    (-9.92)

Total effects
IMGVIT             -.11      .24      .12       -.72
                 (-3.03)   (7.03)   (3.34)   (-11.39)
IMGINT             -.09      .19      .12       -.62
                 (-2.62)   (6.04)   (3.38)   (-10.63)

                 SOCIAL   JUGSOC

Direct effects    --        .25
IMGVIT                    (5.36)
                  --        .27
IMGINT                    (5.89)

Total effects      -.06      .25
IMGVIT           (-2.96)  (5.36)
                   -.06     .27
IMGINT           (-3.11)  (5.89)

(a) Maximum likelihood estimation performed with LISREL8. Standardized solution based on a correlation matrix. The factor matrix yielded by the analysis with zero intercorrelations of the response-set error components (Maassen 1991) is implemented as a target matrix in the final analysis. The standardized solutions of other options, e.g. the use of a covariance matrix or WLS estimation based on an asymptotic matrix, are essentially the same.


Several multiple regressions were performed, using as explanatory variables all 27 of the variables that were considered eligible (as explanatory variables) in the structural equation model, and as dependent variable, the image of the vitality of the unemployed (calculated as the sum of 10 5-point items) the most important dependent scale. The percentage variance of the image of vitality scale, explained by the 27 independent variables, is found to be 28 percent when the correlations are based on pairwise selection of respondents, whereas an analysis based on correlations correlated with listwise selection is found to yield almost double (52 percent). In the latter case, the number of respondents falls to 300.



Public opinion research boasts few studies where longitudinal outcomes are confronted with the results of structural equation analysis. This study may therefore add interesting information on this matter. Consider Figure 1 and Table 1. Figure 1 shows that the opinions of the public changed rapidly and drastically. The figure reflects net changes, which means that the changes within persons are even greater. It demonstrates that these changes closely followed social and economic developments expressed in the unemployment rates, both in time and extent. Maassen and De Goede (1991) argued that in the Netherlands in times of high unemployment, more people feared that they might lose their jobs, and that this awareness resulted in a better understanding of the situation of the unemployed. In this respect, at this time it is not possible to endorse Converse's view, but rather the conclusion of Smith (1994), who found examining opinion changes, `neither chaos nor a chimera, but rather order and a map of reality. In fact, most opinion change can be plausibly explained'.

The extent to which the factor scores vary suggest that they should be considered the reflections of opinions. The reader should recall that the factor scores are actually composed of items with which a respondent can express an anti-unemployed opinion or not. The names we used, such as `image factors' or `stereotyping', may suggest a more enduring character and are in this context probably less appropriate than `opinion of the unemployed.'

On the other hand, the structural relations analysis (Table 1) shows that a stable attitudinal variable like authoritarianism is still the most important personality characteristic in the explanation of the opinion on the unemployed. (It should be noted that none of the items of the authoritarianism scale is related to unemployment.) The combination of our longitudinal analysis and structural relations analysis shows that people's opinion may change drastically under the influence of, for example, social developments and, at the same time, be seen as an expression of an underlying attitude. Our results can be regarded as another corroboration of Thurstone's manifest-latent distinction and of Fleming's (1967) view that the term opinion can be used for an expressed position for or against a political issue, a behavioral phenomenon that a researcher may seek to explain, whereas attitude can be reserved for reference to the deeper, underlying motives for those behaviors.


Another question that bothers scholars in the field of public opinion concerns the definition of the public. What subgroup of the population is relevant when public opinion on some subjects has to be determined? Price (1992) discussed several kinds of public that may be distinguished: the Elite, the general public, the voting public, the attentive public. In this section, the example shows how the application of multiple regression or path analysis, once again, forces us to ask which is the relevant subgroup of the population to analyse; or, in the context of a structural equation model, for which public the model is valid.

It is well known that a sizeable proportion of survey respondents will express views on matters about which they have no information or to which they have given no thought (Bishop et al. 1980). These opinions are volatile and unreliable, in the sense that a second assessment under the same conditions can easily yield a different opinion. When such respondents' opinions appear as outliers in a regression analysis, it is common practice to discard their scores from the analysis, because such respondents could affect the general conclusions to an undesirable extent. Respondents may also express their lack of knowledge in the form of non-response. The more variables included in the analysis, the greater the level of non-response that will be elicited from a respondent. The multiple regression analyses reported earlier show that when the number of independent variables increases, listwise selection causes the number of respondents involved in the analysis to decrease dramatically. On the other hand, the dramatic increase of explained variance of the dependent variables is an important advantage. What should the researcher attach most importance to, a high proportion of explained variance, or a complete research sample?

To gain more insight into this quandry, it was necessary to investigate to what extent a very restrictive listwise selection affects the composition of the sample. Attention is confined to the image of the vitality of the unemployed, the most important image factor, and prominent background and personality variables like `age', `level of education' and `authoritarianism', which undoubtedly will play an equally important role in other research of this kind. Because the other, intermediary variables are largely specific to this study, they will receive no attention here.

Table 2 shows the intercorrelations of the image of the vitality of the unemployed (IMGVIT), age (AGE), level of education (EDUCAT) and authoritarianism (AUTHOR). The numbers of respondents on which these are based are given in brackets. The top line is always the bivariate correlation calculated over the maximum possible number of respondents (with pairwise selection). Below these are the correlations with listwise selection over the eight explanatory variables eventually selected for the model of Figure 2. The third line gives the correlations calculated with listwise selection of the respondents for IMGVIT and all 27 of the explanatory variables that were considered for selection.

Table 2 Intercorrelations of AGE, level of education (EDUCAT), authoritarianism (AUTHOR) and image of the vitality of the unemployed (IMGVIT(a)) using pairwise selection and two variants of listwise selection
         AGE            EDUCAT         AUTHOR

AGE      1.000

EDUCAT   -.300 (1999)   1.000
         -.273 (863)
         -.194 (300)

AUTHOR    .296 (1384)   -.390 (1379)   1.000
          .317 (863)    -.391 (863)
          .327 (300)    -.414 (300)

IMGVT    -.041 (1760)    .174 (1753)   -.386 (1266)
         -.118 (863)     .237 (863)    -.421 (863)
         -.181 (300)     .333 (300)    -.526 (300)

(a) IMGVIT is represented here as the mean of the 10 5-point items.

It is evident that, although their signs remain unaffected, the correlations are rather different depending upon the group of respondents for which they are calculated. The correlation between age and education becomes weaker with fewer respondents, whereas the other correlations become stronger, especially those correlations in which the dependent variable IMGVIT is involved. The stronger correlations with IMGVIT are obviously the most important cause of the difference in explained variance between the two multiple regressions reported above.

To find out more about the change of the research sample when applying different options for listwise selection, it was necessary to determine the number of missing scores per respondent over the 28 variables named earlier, looking at the scales in terms of their constituent items. The maximum possible number of missing values is 77. The mean number of missing values over the total sample of 2,007 is 7.3. A clear difference in the mean number of missing values emerges when one looks at the seven different levels of education: F(6,1933) = 13.0 (P [is less than] .00005). The lowest level of education has the highest mean: 10.0.

With a more restrictive listwise selection over the set of variables under consideration, the respondents with a low level of education have a higher chance of disappearing from the analysis. Because level of education shows a negative relationship with authoritarianism and age, it is to be expected that we should lose a relatively authoritarian and older part of the sample. Data concerning this are given in Table 3. The first line of the table shows the mean and standard deviation for age, authoritarianism and IMGVIT calculated over all the available data. The two lower lines give the means and standard deviations with the variants of listwise selection discussed above.

Table 3 Sample data relating to AGE, authoritarianism (AUTHOR) and image of the vitality of the unemployed (IMGVIT) for all valid scores and two variants of listwise selection
        AGE                AUTHOR                IMGVIT
N        M    SD       N      M     SD      N      M       SD

2007   41.1   16.2   1384   3.017  0.714   1760   2.296   0-521

863    40.3   15.2    863   2.982  0.729    863   2.283   0.527

300    38.7   13.9    300   2.876  0.723    300   2.290   0.547

The table shows that the mean age and mean authoritarianism score do indeed show a uniform tendency to fall as the size of the sample is reduced by listwise selection. Comparing the complete sample with the group of 300, the mean age has fallen by 0.15-0.20 standard deviation. For the respondents whose authoritarianism score is known, the fall in the mean of this variable is of approximately the same order. In terms used by Rubin, the authoritarianism data obviously are not OAR. Note that the fall in IMGVIT is not uniform and much smaller.

When the sample size is reduced by more stringent listwise selection in the calculation of correlations, the losses in this type of study are to be found among the relatively older respondents with a low level of education and a relatively high score on the authoritarianism scale. Although in our example there is a substantial correlation between authoritarianism and the dependent variable, the mean score for the dependent variable actually falls very little. The lost respondents would thus appear to have, or to express, less coherent opinions in relation to these kinds of variables. The relatively poor data quality of this group has an attenuating effect on, for example, the correlation between authoritarianism and IMGVIT. In this context it is interesting to divide the respondents whose authoritarianism and IMGVIT scores are known according to their level of education. At the lowest educational level primary education only the correlation between authoritarianism and IMGVIT is -.20; for the other six levels the value varies between -.32 and -.43.

Other researchers have also reported that a relationship between authoritarianism and other correlates in the socio-cultural area is found for more highly educated respondents, whereas for the less highly educated this relationship is weaker or nonexistent (Schuman et al. 1992). Vollebergh and Raaijmakers (1992, p. 71) reported that the reliability of the authoritarianism scale can be considerably higher with more highly educated people than with a population that is more heterogeneously selected for level of education. Lower reliabilities have an attenuating effect on the intercorrelations of variables. The same authors suggested the following explanation: `More highly educated people have a higher level of political understanding and interest, which seems to be necessary to accommodate these ideologies (which have in common that they uphold intolerance in relation to certain social groups) within a coherent opinion system' (Vollebergh Raaijmakers 1991, p. 75, translation by the author).

Here an important advantage of path analysis (and structural equation analysis) has been revealed. Path analysis forces the researcher to select carefully the explanatory variables and to formulate hypotheses concerning the way they are related to each other. The variables to be included aim at explanation, and the resulting number of respondents in the analysis is the consequence of substantive considerations, that is less susceptible to chance than with a multiple regression that only aims at a high proportion of explained variance.

The question remains as to which strategy for handling the missing values provides the best insight into the relationship between the variables in the path model. First, we will review some of the options mentioned in the introduction. Here it is relevant to note that the missing data in this investigation should be considered, in the terms of Rubin (1976), as not MCAR. In that case, pairwise selection may have drawbacks that appear to be significant for our structural equation analyses. Pairwise selection may result in a correlation matrix that is not positive semi-definite. This means that some of the eigenvalues of the correlation matrix are negative (Tatsuoka 1988, p. 156). In the multivariate context eigenvalues can be interpreted as variances of optimized linear combinations of the variables; the common variance of an observed variable and an optimized linear combination, which is relevant for path analysis, may be overestimated. An actual drawback in our example is that the correlations are attenuated by the inclusion of respondents with poorer data quality. Another drawback is that different parts of the path model are related to different types of respondents. For instance, in this example, the data of practically all respondents are known for the background variables that are at the beginning of the model; for the variables in the middle of the model the data are missing mainly for those respondents with a lower level of education. It is therefore rather difficult to interpret the results clearly.

The maximum likelihood estimation methods mentioned in the introduction may be applied in the type of analysis dealt with in this paper. The handling of large numbers of missing value patterns, which can easily occur in structural equation analysis, remains a problem to be solved. Another characteristic of these methods is that, like other imputation and maximum likelihood techniques, they aim at estimating parameter values that are valid for the whole population sampled. This looks natural as a general strategy. Because practical research is the example in this article, it is possible to go more deeply into substantive arguments than the more general literature on the problem of missing values usually does. However, in the category of cases discussed in this article, it is unwise to follow this strategy. Application of those techniques implies attribution of scores to respondents who apparently are not able or do not wish to express such a score. Estimates of intercorrelations of important variables may be considerably lower than the intercorrelations found in a sample of respondents with a coherent attitude and opinion system.

We think that the concept of coherent attitude and opinion system is relevant to path models, in which attitudes and opinions play leading parts. A researcher who builds such a model starts from the idea that the opinion systems of people are structured and that the existing associations between the opinions involved can be modeled in the format of a path model. The researcher will, of course, be aware of all the simplifications inherent in thinking in terms of models. The researcher assumes that a population of people with a structured opinion system exists, for whom the concepts in the model are meaningful, and among whom these concepts can be adequately measured with the variables selected. This population may be circumscribed as having a coherent attitude and opinion system and the researcher has this population in mind when testing the model and estimating its parameter values.

We note that the model of this example is not confined to attitudes and opinions, because background variables are also included. However, these variables hardly affect the delimitation of the research sample, because for nearly all respondents the scores on these variables are available. (For education level the number of missing scores is the highest: only 8.)

There are different ways in which respondents can reveal the lack of a coherent attitude and opinion system. If a question has no meaning to them, they can omit the question or tick the don't-know category, which both result in non-response. They can choose a random response category, choose whatever response category appears first to them, or systematically choose the same response category as before. If the researcher has taken appropriate measures against the occurence of response set, these choices result in low intercorrelations of the variables in the model. That these symptoms occur to a higher degree among less educated respondents is not surprising. Respondents with higher education are more thoughtful about societal issues, which makes them more able to understand and adequately respond to question on these issues. Among the higher educated, less non-response and higher correlations between the variables in the model will emerge, which can be taken as evidence for a coherent attitude and opinion system.

A missing score is, of course, no proof of the lack of such a system, and, conversely, a complete response set does not prove its presence. However, to recover the cause of every missing or inconsistent score of any respondent is simply impossible. Nevertheless, a substantial increase of the explained variance in a subsample that is delimited by listwise selection on the basis of the variables in the model, should not be ignored. The concept of a coherent attitude and opinions system may be used as an admittedly operational circumscription of the population for which the hypothesis is tested. Thus, it can even be taken as an aspect of the model itself.

An analogy may be discerned with the concepts of ideological coherence and ideological constraint, introduced by Converse (1964) and used by others (Lerner et al. 1991). These terms are defined as the extent to which a person's opinion on a particular issue can be predicted, if his views on other issues are known. Converse demonstrated that the ideological constraint of elites in society is usually higher than that of the general public because elites in general and political elites in particular think more about politics and political matters than does the mass public, presumably because they are more able and interested in doing so. An exception mentioned by Converse himself, which is of importance in this context, is where issues concern `visible social groupings'. Converse found that they can generate among the general public an equal or even higher level of constraint than that of elites. He argued that visible social groupings are sometimes central to the belief systems of respondents, because they can see how they themselves would be directly hurt or helped by the group in question (Lerner et al. 1991).

The subject of our example is opinions on just one social issue, whereas ideological constraint implies a complex of views on various points of social interests. But that is only a difference in degree. A structural equation model may be expanded in order to cover such a complex. This subsample with a coherent attitude and opinion system, however, should not be compared with the elite. On the contrary, the subject of this example is opinions on the unemployed, a more or less visible social group. Certainly, at the end of the 1970s, when a large-scale abuse and improper use of the social security system was assumed, and many employed people complained about the high charges they had to pay for bearing the cost of the rather generous benefits of the unemployed, it is possible that among the general public a higher degree of constraint arose than among the political elite. However, on the basis of the similar argument that led Converse to assume a higher constraint among the elite, it seems logical to suppose that among the general public a subgroup emerges that shows a higher degree of completeness of its scores and a higher degree of consistency of its opinions.

As a strategy for dealing with missing values in path models for attitudes and opinions, we would therefore recommend the use of correlations calculated on the basis of a listwise selection of respondents within a random sample. This is the most natural option, provided that one accepts that the results do not relate to a general population, but are generalized from a random sample of respondents with a coherent opinion system, as operationalised in the chosen path model. The disadvantage, of course, is that this operational definition depends on the nature and number of the variables that are included in the model.

The researcher should be extremely careful when applying this definition in any concrete case. The choice of variables to be included and the correlation matrix to be input should be determined with a number of considerations in mind, including the following:

1. All relevant variables must be `included in the model, in order to ensure that unbiased estimations for the parameters are obtained (Pedhazur 1982, ch. 8).

2. All irrelevant variables must be omitted. They do not bias the estimates of the parameters, but they unnecessarily increase the number of missing values.

3. It is preferable to choose, as the observed variables, items that will induce few missing scores. The researcher should not overtax the opinion systems of the respondents and bear in mind that one's own system may be more sophisticated than that of any respondent.

4. If a respondent has answered some, but not all, items of a list that are indicators of a latent variable, it may be possible to estimate the score on the latent variable, using a total score based on the known item response. Consider authoritarianism as an example. From Tables 2 and 3 it can be seen that authoritarianism itself accounts for many of the gaps in the data. Further analyses reveal that only very few respondents answered no items at all on the scale. In fact, 92.5 percent of all respondents filled in at least four of the seven items. If a total score for authoritarianism is estimated for these people based on the four or more known item scores, the path analysis of Figure 2 can be based on 1,056 (instead of 863) respondents, a significant improvement.

The argument that led to the strategy advocated in this article leans heavily on the use of the concept of authoritarianism. However, it appears that the reasoning and the chosen approach are valid for many other areas of research where path models to account for opinions or attitudes are employed. The concept of a coherent attitude and opinion system undoubtedly has a broader field of application than has been illustrated here.

Listwise selections based on a coherent opinion system obviously no longer aims at estimating parameter values in the whole population. In fact, the validity of the structural model is confined to a more or less specified substratum of the general public. It may be argued that the application of structural analyses calls for a new item in the listing of `publics' by Price (1992): people with a coherent opinion system on the issue involved. If one is interested in the parameter values that concern the `general public', the maximum likelihood estimation methods can yield valuable results in the type of studies discussed in this article, provided that the problems inherent in these methods are solved (for instance, how to handle large numbers of missing value patterns). In our view, however, where path models for opinions and attitudes are concerned, the researcher will distort his mapping of social reality if he relies on those methods alone.

(*) This article is a revised version of a paper presented on WAPOR Day, September 22, 1995, The Hague, The Netherlands.

(1) Question: `A number of characteristics am presented below. Please indicate for each characteristic whether you think it is more applicable to people who have a job, or to people who receive unemployment benefit (unemployed people). For instance, if you think the characteristic applies muck more to employed people, please tick the extreme left column. If you think the characteristic applies muck more to the unemployed, please tick the extreme right field.'

(2) In the 1987 data we have seen two contrary tendencies: in line with our hypothesis, in 1987 more people than previously chose response categories indicating an unfavorable image of the unemployed, but fewer respondents than before appeared willing to give the most extreme response in that direction.


Adorno, T. W. E., Frenkel-Brunsik, E., Levinson, D. J. and Nevitt Sanford, R. (1950): The Authoritarian Personality, New York, Harper & Brothers.

Allison, P. D. (1987): `Estimation of linear models with incomplete data'. In C. C. Clogg (ed.) Sociological Methodology 1987, San Francisco, Jossey-Bass Publishers, PP. 73-103.

Bishop, G. F., Oldendick, R. W., Tuchfarber, A. J. and Bennett, S. E. (1980): `Pseudo-opinions on public affairs', Public Opinion Quarterly, 44, 198-209.

Converse, P. E. (11964): `The nature of believe systems in mass publics. In D. Apter (ed.) Ideology and Discontent, New York, Free Press, pp. 206-61.

Converse, P. E. (1979): `Attitudes and non-attitudes. Continuation of a dialogue'. In E. R. Tufte (ed.) The Quantitative Analysis of Social Problems, Reading, MA, Addison-Wesley, pp. 206-61.

Fleming, D. (1967): `Attitude: The history of a concept', Perspectives in American History, 1, 287-367.

Glenn, N. D. (1977): Cohort Analysis. Sage University Paper series on Quantitative Application in the Social Sciences, 07-005. Beverly Hills and London, Sage.

Joreskog, K. G. and Sorbom, D. (1989): LISREL 7: A Guide to the Program and Applications, Chicago, SPSS Inc.

Lerner, R., Nagai, A. K. and Rothman, S. (1991): `Elite v. mass opinion: another look at a classic relationship', International Journal of Public Opinion Research, 3, 1-31.

Little, R. J. A. and Rubin, D. B. (1987): Statistical Analysis with Missing Data, New York: Wiley.

Maassen, G. H. (1991): `The use of positively and negatively phrased items and the fit of a factor solution', Quality and Quantity, 25, 91-101.

Maasen, G. H. and De Goede, M. P. M. (1988): Publieke opinie over werklozen en arbeidsongeschikten; een trendstudie (Public opinion on the unemployed and the unemployable: a trend study), Utrecht, University of Utrecht (dissertation).

Maassen, G. H. and De Goede, M. P. M. (1989): `Public opinion about unemployed people in the period 1975-1985: The case of the Netherlands', The Netherlands' Journal of Social Sciences, 25, 97-113.

Maassen, G. H. and De Goede, M. P. M. (1991): `Changes in public opinion on the unemployed: The case of the Netherlands', International Journal of Public Opinion Research, 3, 182-94.

Maassen, G. H. and De Goede, M. P. M. (1993): `Stereotype measurement and comparison between categories of people', International Journal of Public Opinion Research, 5, 278-84.

Middeondorp, C. P. (1979): Ontzuiling, politisering en restauratie in Nederland; progressiviteit en conservatism in de jaren 60 en 70 (The breakdown of traditional barriers, politicisation and restoration in the Netherlands: progressiveness and conservatism in the Sixties and Seventies), Meppel, Boom.

Muthen, B., Kaplan, K. and Hollis, M. (1987): `On structural equation modeling with data that are not missing completely at random', Psychometrika, 52, 431-62.

Nunnally, J. C. (1978): Psychometric Theory, New York, McGraw-Hill.

Pedhazur, E. J. (1982): Multiple Regression in Behavioral Research, New York, Holt, Rinehart and Winston.

Plewis, I. (1985): Analysing Change, Chichester, Wiley.

Price, V. (1992): Communication Concepts 4: Public Opinion, Newbury Park, CA, Sage. Rubin, D. B. (1976): `Inference and missing data'. Biometrika, 63, 581-92.

Schuman, H., Bobo, L. and Krysan, M. (1992): `Authoritarianism in the general population: The education interaction hypothesis', Social Psychology Quarterly, 55, 379-87

Smith, T. W. (1994): `Is there real opinion change?' International Journal of Public Research, 6, 187-203.

Tatsuoka, M. M. (1988): Multivariate Analysis; Techniques for Educational and Psychological Research, New York, Macmillan.

Thurstone, L. L. (1928): `Attitudes can be measured', American Journal of Sociology, 33, 539-54

Vollebergh, NV. and Raaijmakers, Q. (1991): `Intergenerationele overdracht van autoritarisme' (Intergenerational transfer of authoritarianism). In P. Scheepers and R. Eisinga (eds.) Onderdanig en intolerant: lacunes en controverses in autoritarisme-studies (Submissive and intolerant: lacunae and controversies in authoritarianism research), Nijmegen, Institute for Applied Social Sciences, pp. 61-77.

Weisberg, S. (1985): Applied Linear Regression, New York: Wiley.


Gerard Maassen is affiliated with the Department of Methodology and Statistics of the Faculty of Social Sciences, Utrecht University. He is first author of a dissertation `Public Opinion on the Unemployed and Incapacitated'. Besides substantive papers on this subject, he published several papers on methodological issues within (and outside) this field of research.

Correspondence should be addressed to Dr. G. H. Maassen, Utrecht University, Faculty of Social Sciences, Department of Methodology and Statistics, Post Box 80140, 3508 TC Utrecht, The Netherlands. E-mail:
COPYRIGHT 1997 Oxford University Press
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1997 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Maassen, Gerard H.
Publication:International Journal of Public Opinion Research
Date:Jun 22, 1997
Previous Article:The Dutch model of data collection development for official surveys.
Next Article:Pluralistic ignorance and the climate of opinion in a real-time disaster prediction.

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |