Rating the rating: an analysis of the National Research Council's appraisal of political science Ph.D programs.
Employing data assembled in the NRC report, we examine whether the political science program ratings reflect two general sets of characteristics - the size and the productivity of faculty.
All other things equal, program quality should vary directly with faculty size, and indeed size is emphasized as a key explanatory factor in the NRC report. The logic is straightforward. Very small departments lack the means to field a range of graduate courses of sufficient breadth to form the basis of a serious program. Larger departments enjoy the increased resources that allow for greater program depth and breadth. In addition, holding quality concerns constant, larger programs should on average receive higher ratings simply because they include more faculty.
By the same token, programs are more likely to have an impact on the discipline as a whole if they have produced a larger number of Ph.D.'s. It is, of course, possible that bigger programs are more likely on average to receive higher ratings because they have generated more alumnae with personal attachments to (and perhaps even material interests in) their alma maters, alumnae who are themselves participating in the rating of graduate programs. Even so, the number of Ph.D.'s produced is relevant to the ratings simply because it bears on program effectiveness, one of the quantities that the ratings seek to reflect.
The other principal factor that should affect program quality is the scholarly reputation of program faculty. In considering these reputations, we need to distinguish those based on local criteria from those reflecting cosmopolitan criteria, to use the terminology of Lazarsfeld and Thielens (1958). While some faculty members may accrue a degree of prestige at their given campuses for teaching, service, and other locally valuable contributions, the reputations of interest to the NRC ratings hinge on the visibility of program faculty in their discipline as a whole. This external recognition may stem in part from service contributions to the discipline, but it should most fundamentally reflect scholarship. The question, then, is this: to what extent does the rating of a particular graduate program reflect the scholarly standing of the various faculty members who, collectively, constitute that program?
Data that bear on these issues are reported in Appendix Table M-5 of the NRC report, which, along with the overall program quality ratings, lists a number of additional attributes for 98 political science doctoral programs.(1)
Our principal dependent variable from the NRC analysis is based on survey responses from a mail questionnaire sent to more than 100 faculty raters in the spring of 1993. Respondents were asked to rate the "scholarly quality of program faculty" on the following scale:
6. Not sufficient for doctoral education
7. Don't know well enough to evaluate
Thus, program quality is a reputational measure, reflecting the evaluations of faculty informants. Reported program ratings are the "trimmed" mean values obtained by dropping the two highest and two lowest scores on the ratings of each program before computing the mean. Note that scores are reversed in these ratings, so that "Distinguished" is scored 5, and "Not sufficient for doctoral education" is scored 0 (for further details, see chapter 2 of the NRC report).
The first row of Table 1 displays the summary statistics for this measure. Across the 98 political science graduate programs, scores range from .33 to 4.88 (out of a total possible range of 0 to 5), with a mean of 2.66 and a standard deviation of .98. Additionally, various tests indicate that this dependent variable is not heavily skewed: for example, the median of 2.5 is close to the mean value displayed in Table 1.
Measurement of program size here is straightforward, and we concentrate on the two attributes addressed above. First, we consider size as reflected in the total number of faculty in the 1992-93 year. As shown in Table 1, faculty size ranges from 7 to 57, with a mean of 24.13. Second, we include size as revealed in the number of Ph.D.'s produced in the years from 1987 to 1992. This measure ranges from 1 to 111, with a mean of 26.18. While figures are available for the number of students enrolled in the program, focusing on the number of people graduating serves to control for possible differences in attrition rates, which obviously are related to program effectiveness.
We examine two broad elements of faculty productivity. The first of these centers on the total volume of faculty scholarship, while the second reflects the impact of that scholarship as gauged by the rate at which it is cited. These two factors are, of course, commonly taken as indicators, respectively, of the quantity and quality of scholarship (see, e.g., Cole and Cole 1973, pp. 93-107; Allison, Long, and McGinnis 1993). In this setting, we anticipate quantity and quality to be positively correlated: any quality evaluations necessarily presuppose that there is some volume of scholarship to be evaluated. Nonetheless, quantity and quality are analytically and empirically distinct, since published research varies considerably in its impact.
TABLE 1 Summary Statistics for Political Science Graduate Program Characteristics
Variable N Mean SD Min. Max. Program rating 98 2.66 .98 .33 4.88 Faculty size 98 24.13 10.08 7 57 # Ph.D.'s produced 98 26.18 20.62 4 111 Publications/faculty member 98 1.46 .72 .1 3.7 Gini publications 98 16.18 14.70 4.2 100 Citations/faculty member 98 2.50 2.90 0 13.7 Gini citations 96(*) 32.58 20.79 9 100 % Full professors 98 51.47 12.78 14 82 Program effectiveness 98 2.52 .89 .28 4.31 US News rating 90 2.87 .86 1.34 .8
* Two cases are undefinable on this variable. These are cases for which there are 0 citations/faculty member, a score from which no meaningful Gini coefficient can be calculated. Data source: National Research Council (1995, Appendix Table M-5); Webster and Massey (1992).
In the analyses below, we use the NRC data on the volume and impact of research. The data themselves do not refer to individual faculty members, but are instead aggregated to the departmental level in two different ways. First, the NRC provides simple overall averages by department, that is, the publication and citation rates per faculty member. Additionally, the NRC supplies data on the distribution of productivity across individuals within departments. This information comes in the form of Gini coefficients that reflect variations across programs in the degree to which overall productivity for individual programs stems from the activity of a minority of faculty members within them. The higher the Gini coefficients, the more concentrated is productivity within programs (on Gini, see, e.g., Atkinson 1983).
We use four measures of faculty productivity (summary statistics for which are also displayed in Table 1): publications per faculty member, citations per faculty member, a Gini measure of the concentration of publications within programs, and a Gini measure of the concentration of citations within programs. We anticipate that the first two of these have a positive effect on program ranking. In contrast, the last two should have a negative impact on those ratings, since higher Gini coefficients indicate that a smaller minority of program faculty are responsible for an uneven share of total program productivity. All four of these measures of productivity refer to aggregated data for the years from 1988 to 1992, inclusive.
Finally, we consider the professional age of the faculty. Programs that are disproportionately comprised of a younger faculty should be less highly rated, all other things equal, simply because more program members have had less time to accumulate a research record [TABULAR DATA FOR TABLE 2 OMITTED] that is externally recognized. In the language of demography, programs with a higher proportion of younger members are less "at risk" of having accumulated a visible research record. We therefore control for the proportion of members in each program who were full professors in 1992-93 (the NRC report does not include more direct information on the professional age of faculty, such as years since the Ph.D.). As shown at the bottom of Table 1, this variable ranges from 14 to 82, with a mean of 51.47%.
Observe that the above measured program characteristics all use information that predates the survey of faculty (conducted in the spring of 1993). Thus, these program characteristics reflect information available to the raters at the time that they made their evaluations. Indeed, the questionnaire itself directly incorporated selected features of each program being rated, including the number of Ph.D.'s awarded and a list of faculty involved in doctoral training, provided by the universities under review (NRC 1995, 21). Similar details about faculty productivity were not made directly available to the raters, but the NRC figures we use refer to the five years preceding the survey. It is therefore reasonable to cast the NRC ratings as a function of the above program characteristics.
Table 2 shows the simple correlations between all the variables in the analysis. From the first column, we see that all seven independent variables are associated with program quality. Further, the signs are in the expected direction: five of the correlations are positive, while those between each of the two Gini measures of productivity and program quality are negative. Also of interest are the correlations between the explanatory variables. First, the correlation between the two indicators of program size is .61, indicating that while faculty size varies with the number of Ph.D.'s produced, these two indicators are far from interchangeable. Second, productivity rates and citation rates are linked: the correlation between mean productivity and mean citation rates is .74, and that between the Gini coefficients for productivity and citation rates is .65. These associations reflect the obvious fact, noted above, that some volume of productivity is a prerequisite for scholarly impact. At the same time, collinearity is not of a magnitude to preclude an evaluation of the effects of each component on program quality.
The first column of Table 3 reports the estimates from an OLS regression of program quality on the six explanatory variables. First, we see that size does count. The estimates for faculty size and the number of Ph.D.'s produced are both positive, and each has a t-ratio (i.e., the ratio of the parameter estimate to its standard error) greater than 2.0, showing that the coefficient is statistically significant beyond the .05 level (two-tailed test).
Second, it is equally clear from the first column of Table 3 that faculty productivity also matters. The patterns here are noteworthy. Specifically, the coefficient estimate for average faculty publication rates is smaller than its standard error, while that for average citation rates is positive with a t-ratio of over 4.0, indicating statistical significance well beyond the .001 level. If, following convention, we take publication and citation rates to reflect the quantity and quality of scholarship, respectively, this establishes that overall political science program quality rankings are highly sensitive to quality, while they are completely insensitive to sheer quantity. The estimates for the two Gini measures of productivity [TABULAR DATA FOR TABLE 3 OMITTED] follow a similar pattern. In particular, they demonstrate that the distribution of sheer research productivity has no effect, but that the distribution of citations within departments does systematically affect overall departmental rankings. With program size and average productivity controlled, then, departmental rankings are enhanced when research impact is distributed more evenly across individual faculty members within programs. Finally, we see that programs with professionally older faculty receive higher ratings, when the other explanatory variables are controlled.
In light of the estimates in the first column of Table 3, the second column shows the OLS estimates obtained with the two variables measuring the quantity of publications per faculty member excluded. Note, first, that when compared across the two columns, all of the remaining coefficient estimates and their standard errors are very similar. The most notable change occurs for the citations per faculty member estimate: the estimated coefficient increases slightly (from .115 to .121), while its standard error drops more substantially by about one-third (from .028 to .018). As a result, the t-ratio for this estimate increases from 4.2 to 6.7. This latter pattern, of course, reflects the moderate correlations between the measures of quantity and quality noted in Table 2.
How robust are these estimates: Are they distorted by the presence of one or two programs with extreme values on some of the explanatory variables? We could, for example, imagine programs with an atypically large number of Ph.D.'s, or others with unusually high values on the Gini-based variables. Indeed, we see from the summary statistics in Table 1 that some of the explanatory variables do have moderately skewed distributions. Are the results just discussed simply an artifact of the inclusion of such potentially influential cases in the analysis? To address this issue we examined the estimates obtained from a robust estimation procedure. This is an iterative method that downweights potentially influential cases that could be driving the estimates (see, e.g., Berk 1990). The estimates obtained were so similar to the OLS figures in columns (1) and (2) that we do not display them in the interests of brevity. Given this close similarity, we conclude that the OLS estimates offer a sensible representation of patterns in the data.(2)
Another way to address the issue of robustness is to recast the dependent variable a little. Assume for the moment that the distinctions among the very top programs are of little consequence. Make the same assumption about differences among the weakest programs. To what extent are such potentially inconsequential differences in the scores for these two sets of programs responsible for the patterns just described? We address this question by censoring the NRC ratings and re-estimating the model. Specifically, we left-censor the NRC scores so that no distinctions are made among the 23 programs in the model with ratings of 2.0 or less, and we right-censor the scores so that no distinctions are made among the top five programs (i.e., those with ratings of 4.5 or above). Tobit estimates of the model employing this censoring are shown in columns (3) and (4) of Table 3.(3)
The most remarkable feature of the Tobit estimates is their close correspondence to the OLS estimates. Comparing the third and fourth columns, we see again that the sheer volume of publications does not directly affect the ratings. Comparing the second and fourth columns, it is clear that the Tobit estimates correspond closely to the OLS figures. Thus, the model is not driven excessively by differences in the scores either within the top five programs or within the (approximately) bottom quartile of programs.
Some readers may wish to get a sense of the relative importance of the variables in Table 3. Beyond the proposition that the mean quantity of faculty research has no direct effect on program rating, any such evaluation is always surrounded by a degree of ambiguity. Among other things, it does not allow for an estimate of the indirect effects of variables.(4) Bearing this in mind, the highest t-ratios in Table 3, column (2) are for citation per faculty rate and faculty size (6.7 and 5.4, respectively), while the smallest are for the total number of Ph.D.'s (about 2.6). A similar rank ordering of effects is implied by the standardized coefficients reported in brackets in columns (1) and (2). Minimally, then, it would be a mistake to conclude that the productivity effects we have addressed are dwarfed by the effects of size. Given the single-equation format adopted here, the effects of both sets of variables would appear to be of roughly similar magnitude.
We have to this point focused on the principal dependent variable in the NRC study, namely, the measure of the scholarly quality of program faculty. In addition, the NRC reports a measure of program effectiveness in training research scholars, also coded from 0 ("not effective") to 5 ("extremely effective"). Scores for particular programs again reflect means of the panel of political science raters, "trimmed" in the manner described earlier. Does program effectiveness, thus measured, capture additional information not accessible from the ratings of scholarly quality?
This question is most readily examined by re-estimating the model shown in Table 3, substituting the alternative NRC measure for the original dependent variable (summary statistics for the NRC measure of program effectiveness are listed in the penultimate row of Table 1). The first two columns of Table 4 display the relevant OLS estimates. The striking feature of these figures is their similarity to those in Table 3. Both program size and productivity are again important. Further, these estimates serve to reiterate the point that the critical element of productivity centers on its impact or quality: the coefficients for pure quantity are insignificant by any of the conventional criteria.(5) These figures thus imply that overall program quality goes hand in hand with program effectiveness: both outcomes reflect the same underlying characteristics.
This suggests an additional issue: Do the NRC ratings reflect idiosyncratic features that we would not find with other program ratings? We can address this question by comparing them with an alternative set of graduate program rankings generated by U.S. News and World Report in 1992. Like those of the NRC, the US News ratings are reputational, and come from responses to a similar question asking respondents to rate programs from "Distinguished" (5 points) to "Marginal" (1 point). The primary difference is that the US News relied on departmental chairs and directors of graduate programs for their ratings (the ratings and scores are described in Webster and Massey 1992). As seen in the bottom row of Table 1, the summary statistics for the US News ratings are quite similar to those of the NRC figures, except that data are unavailable for 8 of the 98 programs that we have examined to this point. Indeed, the simple correlation between the two ratings is quite high (r = .966, N = 90).
The third and fourth columns of Table 4 show the OLS estimates obtained when the US News figures are substituted for the NRC ratings. Again, the pattern is straightforward. The estimates are very similar to those obtained with the NRC ratings. The major difference is that, while it is correctly signed, the estimate for the distribution of visibility within departments is weak, indicating that this variable has no systematic effect. In general, however, these alternative ratings simply underscore the patterns already observed. The rated [TABULAR DATA FOR TABLE 4 OMITTED] quality of political science programs reflects both their size and the impact on the discipline of the scholarship of their faculty.
Finally, our analyses to this point have been wholly cross-sectional, relying principally on the data reported by the NRC (1995). Additional perspective is, however, available in the form of earlier program ratings published by the NRC (1982, Table 7.1). Those earlier data enable us to gauge whether the factors we have identified are also important in accounting for changes over time in program appraisals. The alternative view is that current ratings are so dominated by prior scores that program size and faculty quality have no appreciable impact on the former with the latter controlled.
Accordingly, Table 5 shows two sets of OLS estimates. Note that the N is reduced to 77, since information on 1982 ratings is unavailable for 19 of the programs examined in Table 3. The first column of Table 5 displays the simple regression of 1993 scores on 1982 scores. The bivariate regression coefficient is .91, and the adjusted [R.sup.2] is .803. In other words, there is a strong correlation between the two sets of ratings.
Despite this strong temporal association, we see that many of the important explanatory factors from the cross-sectional analyses also play a decisive role in accounting for changes in program evaluations. Specifically, faculty size continues to exert a strong, if somewhat attenuated, effect. It is of most interest, perhaps, that the quality of faculty research also systematically affects change in program ratings. While the coefficient for citations per faculty member is smaller than it was in the second column of Table 3 (.064 vs. .121), the effect remains pronounced. Similarly, although a little diminished in size, the estimate for Gini-citations remains negative and significant. The two factors that are absorbed by the control for 1982 ratings are the number of recent Ph.D.'s and the proportion of program members who are full professors.
Moreover, in comparing the two sets of estimates in Table 5, it is clear that those in the second column generate a slightly better fit (the adjusted [R.sup.2] increases from .803 to .903). The more interesting point, however, is that the coefficient for the 1982 ratings drops by almost one-third (from .91 to .65) across the two columns of the Table. Thus it is true that program ratings are somewhat sticky. But it is also the case that a good portion of the observed association between 1982 and 1995 ratings reflects the attention of programs to maintaining the size of their faculty and the quality of faculty research. In other words, the estimates in Table 5 indicate that the simple association observed between 1982 and 1995 ratings would be weaker absent such attention. Program ratings are not set in stone.
TABLE 5 OLS Regressions of 1993 NRC Political Science Graduate Program Quality Scores on 1982 Departmental Characteristics, N = 77 [coefficient estimates (standard errors)]
1982 program. 910(*) .657(*) quality score (.052) (.064)
Faculty size .026(*) (.005)
# Ph.D.'s produced -.005 (.003)
Citations/faculty .064(*) member (.015)
Gini citations -.006(*) (.002)
% Full professors .002 (.003)
Constant .397(*) .443 (.148) (.222)
F-ratio 310.30 115.56 adjusted [R.sup.2] .803 .900
SEE .410 .291
* Starred coefficients are at least twice their standard errors.
Data source: National Research Council (1995, Appendix Table M-5); National Research Council (1982, Table 7.1).
The implications of this analysis are straightforward. In a most general sense, the estimates in Table 3 show that the ratings of graduate programs collected by the NRC (1995) do tap into important aspects of program quality and impact. It is difficult, given these results, to argue that the overall ratings of program quality reflect little more than a beauty pageant or that they hinge simply on broader institutional halo effects.
In particular, the results we have discussed provide clear support for the view that program rankings are a function in part of size, defined in terms both of the number of faculty and of the number of Ph.D.'s produced. As we anticipated at the outset, it is easy to see why this should be the case. Larger programs can more readily concentrate resources on graduate education, and, presumably, economies of scale are at work here. At the same time, the production of a larger number of alumnae in itself constitutes an important element of program quality. Size therefore clearly counts.
But size is not everything. The evidence in Table 3 shows that faculty productivity also plays a central role. Two patterns are especially notable in this connection. First, the average volume of research produced in a given program appears less crucial than its average impact.(6) That the quality (or visibility) of research is more important to graduate program rankings than the sheer total quantity of research is encouraging. Second, the distribution of scholarly productivity across faculty within programs also influences program rankings. The less equal this distribution, the greater the discount on overall program quality.
Moreover, the effects just described are additive. Although we have not reported them, explorations of the obvious possibilities that involve more elaborate multiplicative specifications suggest that the additive treatment here does no great violence to the data. In particular, there is no evidence to sustain the conclusion that the effects of program size and faculty productivity are somehow conditional upon each other. This is again an encouraging pattern, since it indicates that efforts to improve the overall quality of graduate programs do not hinge solely on maximizing faculty size.
Finally, the general patterns appear to be robust in two especially interesting ways. First, the Tobit estimates indicate that the estimates persist even when we modify the ratings to eliminate distinctions among the top five programs and those in the lowest quartile. Second, we have shown that faculty size and the quality of faculty productivity play an important role even in the context of accounting for changes in the ratings over time. This serves as further evidence that program ratings are not immutable features of the professional landscape.
We have focused exclusively on graduate programs in political science. Whether similar patterns are manifested in other disciplines is, of course, an issue for further analyses. Pending such inquiries, our results suggest that we can have reasonable confidence in the validity of the NRC ranking as a measure of program quality.
For their helpful suggestions, we thank Colin Cameron, Russell Dalton, Mary Jackman, Ross Miller, and Brian Silver.
1. These data are also available at the following Web site: http://www.nas.edu/nap/online/researchdoc.
2. An additional check along these lines involved logarithmic transformations of the independent variables to correct for the skewness apparent in some of them (see Table 1). The resulting estimates are similar to those reported in Table 3 and lead to the same substantive conclusions. We therefore analyze the data in their original form to maximize the accessibility of the results to the widest possible audience.
3. The Tobit model was first used by Tobin (1958), after whom it is named, in an analysis of the purchase of consumer durables. Useful discussions are available in Amemiya (1984) and Kmenta (1986).
4. For example, that the sheer volume of productivity has no direct effects on program rating does not preclude an indirect effect. Some level of research quantity is a necessary condition for research quality, as we have defined the two terms here.
5. While only the OLS estimates are shown in Table 4, we obtain similar results to those reported in all columns with alternative estimators designed to check for the impact of deviant or influential cases.
6. This pattern contrasts with that reported by Allison, Long, and McGinnis (1993), who observe that sheer quantity is the key element in faculty promotion decisions. Note, however, that Allison and his colleagues were concerned with explaining a different outcome at the individual (rather than program) level, and that their analyses centered on biochemists and employed data from an earlier period.
Allison, Paul D., J. Scott Long, and Robert McGinnis. 1993. "Rank Advancement in Academic Careers: Sex Differences and the Effects of Productivity." American Sociological Review 58:703-22.
Amemiya, Takeshi. 1984. "Tobit Models: A Survey." Journal of Econometrics 24:361.
Atkinson, A. B. 1983. The Economics of Inequality, 2d. ed. Oxford: Clarendon Press.
Berk, Richard A. 1990. "A Primer on Robust Regression." In Modern Methods of Data Analysis, John Fox and J. Scott Long, ed. Newbury Park, CA: Sage.
Cole, Jonathan R., and Stephen Cole. 1973. Social Stratification in Science. Chicago: University of Chicago Press.
Kmenta, Jan. 1986. Elements of Econometrics, 2d ed. New York: Macmillan.
Lazarsfeld, Paul F., and Wagner Thielens Jr. 1958. The Academic Mind: Social Scientists in a Time of Crisis. Glencoe, IL: Free Press.
National Research Council. 1982. An Analysis of Research-Doctorate Programs in the United States: Social and Behavioral Sciences. Washington, DC: National Academy Press.
National Research Council. 1995. Research-Doctorate Programs in the United States: Continuity and Change. Washington, DC: National Academy Press.
Tobin, James. 1958. "Estimation of Relationships for Limited Dependent Variables." Econometrica 26:24-36.
Webster, David S., and Sherri Ward Massey. 1992. "The Complete Rankings from the U.S. News and World Report 1992 Survey of Doctoral Programs in Six Liberal Arts Disciplines." Challenge 35:2245.
Robert W. Jackman and Randolph M. Siverson are professors of political science at the University of California, Davis.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Departmental Rankings: Much Ado About Something?|
|Author:||Jackman, Robert W.; Siverson, Randolph M.|
|Publication:||PS: Political Science & Politics|
|Date:||Jun 1, 1996|
|Previous Article:||Ranking political science programs: a view from the lower half.|
|Next Article:||A rising tide lifts all boats: political science department reputation and the reputation of the university.|