Printer Friendly

Validation of the Persian Adaptation of Baddeley's 3-min Grammatical Reasoning Test.

To examine the effects of situational pressures on individuals' performance and to measure fluid reasoning, there is a need for a valid, reliable, and sensitive test which is short and easy to administer. The Three-minute Grammatical Reasoning Test, which measures individual's ability to reason about associations among things, is an efficient measure of fluid intelligence (Gf; Furnham & McClelland, 2010), commonly used to find out the effect of situational factors and cognitive abilities on participants' performance. Performance on the grammatical reasoning test reveals how well an individual's brain can associate grammatical skills with reasoning skills.

The 64-item, Three-min Grammatical Reasoning Test was first introduced in 1968 by Alan Baddeley to record verbal logical conclusion and mental capabilities of divers when they breathe compressed air at high-pressure extreme depths based on the idea that deduction from grammatical transformations provides a sensitive rapid intellectual task (Baddeley et al., 1968). The test was mainly characterized by its timesaving and straightforward structure. The items were formed using two letters "A" and "B", two verbs "precede" and "follow," active and passive voices and positive and negative forms. The difficulty of each item depends on the structure of the presented sentence. Studies showed that passive or negative statements such as "A does not precede B" to be more complex for reasoning than positive active sentences like "A follows B" (Roberts, 1968; Slobin, 1966; Wason, 1961).

Baddeley (1968) described his test as a measure of verbal and linguistic reasoning because the process of taking the test was based on grammatical deduction. Later, Vernon and Kantor (1986) used the test as a general measure for mental speed, Kyllonen and Christal (1990) for working memory, Chamorro-Premuzic and Furnham (2006, 2008) for cognitive ability, and Preckel, Wermer and Spinath (2011) for processing speed.

A reliability coefficient of .80 (Baddeley, 1968) and a correlation coefficient of .59 (p < .001) was reported with the British Army Verbal Intelligence Test (Baddeley, 1968). The consistency of the test over time has been also shown by Carter, Kennedy and Bitner Jr. (1981). Moreover, a high correlation (r= .70) with a group test of general intelligence for children was revealed by Hartley and Holt (1971). Besides, the test's sensitivity was checked in three different stress studies by Baddeley et al. (1968).

The test is designed as a practical and quick measure of Gf for native speakers of English. The verbal nature of the test makes the direct translation of the two verbs (precede and follow) into Persian impossible in passive voice (Baghaei, Khoshdel-Niyat, & Tabatabaee-Yazdi, 2017). Baghaei, et al. (2017) adapted Baddeley's (1968) grammatical reasoning test in Persian by using the verbs 'inscribe/circumscribe' and geometrical shapes, and showed that the test is valid and reliable in Persian. Using the five binary conditions of inscribe/circumscribe, passive/active, negative/positive, true/false, and square mentioned first or circle mentioned first, Baghaei, et al. (2017) constructed 32 statements which were used as items in the test. As is shown in Table 1 test takers should mark whether the statements concerning the location of the circle and the square are true or false. The time limit for the test was three minutes.

Baghaei, et al. (2017) reported a high retest (.76) and Cronbach's alpha reliability estimates (.91), an excellent fit to a one-factor confirmatory factor model, and acceptable correlations with other measures of fluid intelligence and participants' Grade Point Average. In the present study the data in Baghaei, et al. (2017) are reanalyzed using the Rasch partial credit model.

Baudson and Preckel (2016) have also adapted and translated the test in German. They used shapes of a circle, a triangle, and a square, accompanied with verbs "reject" and" prefer" instead of "precede" and "follow" which cannot be used in passive voice in German. They reported the test validity, reliability and a significant correlation with other measures of Gf. Furthermore, Karwowski et al. (2016) reported reliabilities of .93 and .73 for the test that they translated into Polish.

The Present Study

The present study seeks to validate the 64-item Persian adaptation of Baddeley's Three-minute Grammatical Reasoning Test (Baghaei, et al., 2017) using the Rasch partial credit model (PCM, Masters, 1982) for polychotomous data, which is widely used for inspecting questionnaires and construct validity in the social sciences.



The participants in this study were 186 (79 female, 107 male, [M.sub.age] =22.71, S[D.sub.age] =7.99) undergraduate Iranian students from different fields of social sciences and humanities in several universities in Iran. Their native language was Persian with English as a foreign language. Participation was voluntary and students were provided with profiles of their cognitive abilities as well as course credit for their cooperation. The research was approved by the ethics committees of the Mashhad Islamic Azad University.


Persian adaptation of Baddeley's Three-minute Grammatical Reasoning Test. Baddeley's test (1968) was adapted by Baghaei, et al. (2017) in the Persian language using a different pair of verbs and geometrical shapes instead of letters. The 64 items of the test were classified under four classifications of Affirmative Active, Affirmative Passive, Negative Active, Negative Passive, and due to the timed nature of the test and a relatively high degree of missing data towards the middle and the end of the test, the dichotomous Rasch model (Rasch, 1960/1980) was not employed. Instead, the scores on the abovementioned four classifications of the sentences were aggregated and four polytomous items or testlets were constructed. The Cronbach's alpha reliability of the test considering each classification of items as a testlet was .91 (Eckes & Baghaei, 2015).


To analyze the data and to confirm the construct validity of the test, Winsteps Rasch software version 3.73 (Linacre, 2009a) was used. The Rasch model (Rasch 1960/1980) has been used commonly for analyzing questionnaires and construct validity in the area of research and social sciences, (Baghaei, 2009). It is said that a test is valid when a construct is underlying the covariance among the items and causes the item responses, which means the data fit a latent trait model such as the Rasch model (Baghaei & Tabatabaee Yazdi, 2016; Borsboom, 2008). Therefore, to estimate the fit of data to the model, the data, consisting of 64 items, were subjected to the Masters' (1982) Partial Credit Model (PCM) which is an extension of the Rasch model for dichotomous data. PCM has widely been used for the analysis of rating sale data (Baghaei, 2013; Baghaei, Hohensinn, & Kubinger, 2014).

Monitoring for two assumptions of unidimensionality and local independence is critical in item response theory (IRT) models and Rasch models (Baghaei, 2009). Linacre (2009b) stated that when the items of a test share a single dimension, which is expected to overcome the other dimensions measured by the items, the test meets the unidimensionality assumption. Local independence, which explains unidimensionality more specifically, proposes that a single dominant dimension which effects on answering the test should be the cause of the relationship between the test items. After removing this effect, the correlations between item residuals should be close to zero (Baghaei, 2010; Linacre, 2009b).

Individual Item Characteristics

For checking the fit indices, which is the first step in the Rasch analysis, the researcher followed the criteria recommended by Bond and Fox (2007). The results signify the unidimensionality of the test by indicating that all items fit the Rasch model, which means all the items' outfit and infit mean square values (MNSQ) are within the acceptable range of 0.60 to 1.40, and outfit and infit (ZSTD) within the acceptable range of -2 to 2 (Table 2).

Values larger than 1.4 are instances of construct-irrelevant variance and signs of aberrant response patterns that mislead the measurement (Linacre, 2009a); values smaller than 0.60 show redundancy of information, and are not threatening.

Table 2 shows the fit indices for the items. The items are listed from difficult to easy (The column labeled as "MEASURE"). As shown, the easiest item is item 1 "Positive Active" and the most difficult item is item 3 "Negative Active." It means that the difficulty of item 3 (the most difficult item) is estimated to be 0.46 logits with the standard error of 0.08, which means one can be 95% sure that the true value for the difficulty of this item lies somewhere between 0.62 to 0.30 logits, i.e., two SE's below and above the observed measure. The analyses of the items also indicate an item difficulty range of -0.44 to 0.46 logits with a separation reliability of .94. Person estimates ranged from - 5.38 to 5.71, with a separation reliability of .94. A higher reliability value specifies a strong relationship between the items of the test, while a lower value shows a weaker relationship between the test items.

The results illustrated that the most difficult sentences to process are "Negative Active" and "Positive Passive" ones, and the easiest items belong to the "Positive Active" group.

The highest score in the "Negative Active" category is 11, which was obtained by only 1% (2 persons) of the respondents. However, about 3 % of the respondents obtained the highest score of 11 in "Positive Active."

Figure 1 (see p. 424) represents the Item-person map of the data, which indicates the location of item category parameters as well as the distribution of person parameters. Numbers on the right indicate items and # on the left signify persons.

Items should ideally be located along the whole scale to meaningfully measure the 'ability' of all persons. (Bond & Fox, 2007). Items and persons located on top of the scale are more difficult and more proficient, respectively. In contrast, items and persons located down the scale are easier and less proficient, respectively. Good tests usually have items which are targeted at the persons taking the test. In this study the map shows that the test covers a wide range of ability.

Examination of Unidimensionality

Global fit statistics were studied by inspecting patterns in the residuals to check the unidimensionality of the scale. The smaller the residuals, the better the data fit the model. Using principal components analysis (PCA) of the residuals, it is estimated that the residuals are not correlated and no factor can be taken out from them. Any factor extracted from the residuals is not an original target dimension, or the latent trait because PCA is not performed on the original data, but the standardized residuals (Linacre, 2009b).

If the latent trait explains all the information in the data and the residuals describe random noise, the data will fit the Rasch model. On the other hand, if a factor is extracted from the residuals, the test is not unidimensional (Baghaei & Cassady, 2014) and multidimensional models should be employed to explain the data (Baghaei, 2012; Baghaei & Aryadoust, 2015; Baghaei & Ravand, 2016).

To decide whether the factor extracted from the residuals can be safely ignored or not, the size of its eigenvalue should be considered. The size of the eigenvalue in the first factor is a degree of unidimensionality or overall fit of data to the Rasch model (Smith, 2002). Unidimensionality of the data can be examined through the row "unexplained variance in the 1st contrast" in Table 3. Raiche (2005) has suggested that secondary dimensions which have the strength of at least two items (eigenvalue=2) are a sign of concern. Therefore, the eigenvalue of 1.5 for the first contrast in the present study shows that the test is unidimensional. Moreover, PCA of the standardized residuals (Table 3) showed that the Rasch dimension is as big as 17.6, which explains 81.4 % of the variance; 18.4 % is explained by item measures, and 63.0 % is explained by person measures. In all, 18.6 % of the variance remains unexplained.


The present study aimed to validate the Persian adaptation of Baddeley's Three-minute Grammatical Reasoning Test, known as an effective measure to assess the combination of cognitive ability and processing speed (Carroll, 1993; Ja'ger, 1984; Schneider & McGrew, 2012), using the Rasch partial credit model (Masters, 1982).

The results of the analyses which addressed the identification of evidence for a unidimensional structure for the questionnaire, indicated that the test measures a single latent trait and is an efficient measure in the Persian language. The study also presented information on the adaptation procedure of the test in Persian, which can be used by researchers who aim to use the scale in other languages.

In addition, the person-item map (Figure 1) demonstrated a wide range of the trait continuum, and the test had an acceptable person separation reliability of .94 and item separation reliability of .94.

Accordingly, the person-item evaluation showed that "Positive Active" items were the items most likely to be endorsed, while the least likely to be endorsed items and so those that required more intellectual power by the respondents were "Negative Active" items.

The overall findings confirm that the 64-item Persian adaptation of Baddeley's Three-minute Grammatical Reasoning Test is an efficient unidimensional measure of fluid intelligence (Gf) in the Persian language.

This study addressed a sample of university students. Other future studies are needed to target participants in primary and secondary schools outside the university context.


Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-73.

Baddeley, A.D. (1968). A three-minute reasoning test based on grammatical transformation. Psychometric Science, 10, 341-342.

Baddeley, A.D., de Figueredo, J.W., Hawkswell Curtis, J.W & Williams, A.N. (1968). Nitrogen narcosis and performance under water. Ergonomics, 11, 157-164.

Baghaei, P. (2013). Development and psychometric evaluation of a multidimensional scale of willingness to communicate in a foreign language. European Journal of Psychology of Education, 28, 1087-1103.

Baghaei, P. (2012). The application of multidimensional Rasch models in large scale assessment and validation: An empirical example. Electronic Journal of Research in Educational Psychology, 10, 233-252.

Baghaei, P. (2010). A comparison of three polychotomous Rasch models for super-item analysis. Psychological Test and Assessment Modeling, 52, 313-323.

Baghaei, P. (2009). Understanding the Rasch model. Mashhad: Mashhad Islamic Azad University Press.

Baghaei, P. (2008). The Rasch model as a construct validation tool. Rasch Measurement Transactions, 22, 1145-1146.

Baghaei, P., & Aryadoust, V. (2015). Modeling local item dependence due to common test format with a multidimensional Rasch model. International Journal of Testing, 15, 71-87.

Baghaei, P., & Cassady, J. (2014). Validation of the Persian translation of the Cognitive Test Anxiety Scale. Sage Open, 4, 1-11.

Baghaei, P., Hohensinn, C., & Kubinger, K. D. (2014). The Persian adaptation of the foreign language reading anxiety scale: A psychometric analysis. Psychological Reports, 114, 315-325.

Baghaei, P., Khoshdel-Niyat, F., & Tabatabaee-Yazdi, M. (2017). The Persian adaptation of Baddeley's 3-min grammatical reasoning test. Psicologia: Reflexao e Critica, 30(1), 16.

Baghaei, P., & Ravand, H. (2016). Modeling local item dependence in cloze and reading comprehension test items using testlet response theory. Psicologica, 37, 85-104.

Baghaei, P., & Tabatabaee Yazdi, M. (2016). The logic of latent variable analysis as validity evidence in psychological measurement. The Open Psychology Journal. 9, 168-175.

Baudson T. G., & Preckel F. (2016). mini-q: Intelligenzscreening in drei Minuten [mini-q: Intelligence screening in three minutes]. Diagnostica, 62, 182-197.

Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.

Borsboom, D. (2008). Latent variable theory. Measurement, 6, 25-53.

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge: Cambridge University Press.

Carter, R. C., Kennedy, R. S., & Bittner Jr, A. C. (1981). Grammatical reasoning: A stable performance yardstick. Human Factors, 23(5), 587-591.

Chamorro-Premuzic, T. & Furnham, A. (2006). Personality and self-assessed intelligence: Can gender and personality distort self-assessed intelligence? Educational Research and Reviews, 1, 227 - 233.

Chamorro-Premuzic, T. & Furnham, A. (2008). Personality, intelligence and approaches to learning as predictors of academic performance. Personality and Individual Differences, 44, 1596 - 1603.

Eckes T., & Baghaei P. (2015). Using testlet response theory to examine local dependency in C-tests. Applied Measurement in Education, 28, 85-98.

Furnham, A., & McClelland, A. (2010). Word frequency effects and intelligence testing. Personality and Individual Differences, 48, 544-546. Retrieved from 10.1016/j .paid.2009.12.001

Hartley J., & Holt J. (1971). The validity of a simplified version of Baddeley's threeminute reasoning test. Educational Research, 14, 70-73.

Ja'ger, A. O. (1984). Intelligenzstrukturforschung: Konkurrierende Modelle, neue Entwicklungen, Perspektiven. Psychologische Rundschau, 35, 21 - 35.

Karwowski M., Dul J., Gralewski J., Jauk E., Jankowska D. M., Gajda A., Chruszczewski M. H., & Benedek M. (2016). Is creativity without intelligence possible? A necessary condition analysis. Intelligence, 57, 105117.

Kyllonen, P. C. & Christal, R. E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389 - 433.

Linacre, J. M. (2009a). A user's guide to WINSTEPS. Chicago, IL: Winsteps.

Linacre, J. M. (2009b). Local independence and residual covariance: A study of Olympic figure skating ratings. Journal of Applied Measurement, 11, 157169.

Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174

Preckel, F., Wermer, C. & Spinath, F. (2011). The interrelationship between speeded and unspeeded divergent thinking and reasoning, and the role of mental speed. Intelligence, 39, 378 - 388.

Raiche, G. (2005). Critical eigenvalue size in standardized residual Principal Component Analysis. Rasch Measurement Transactions, 19(1), 1012.

Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests (Copenhagen: Danish Institute for Educational Research, 1960. ed.). Expanded edition, Chicago: University of Chicago Press, 1980.

Roberts K. H. (1968). Grammatical and associative constraints in sentence retention. Journal of Verbal Learning and Verbal Behavior, 7, 1072-1076.

Schneider, W. J. & McGrew, K. S. (2012). The Catell-Horn-Carroll model of intelligence. In D. Flanagan & P. Harrison (Hg.), Contemporary intellectual assessment: Theories, tests, and issues (3 ed., S. 99 - 144). New York, NY: Guilford.

Slobin, D. I. (1966). Grammatical transformations and sentence comprehension in childhood and adulthood. Journal of verbal learning and verbal behavior, 5(3), 219-227.

Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205-231.

Vernon, P. A. & Kantor, L. (1986). Reaction time correlations with intelligence test scores obtained under either timed or untimed conditions. Intelligence, 10, 315 - 330.

Wason, P. C. (1961). Response to affirmative and negative binary statements. British Journal of Psychology, 52(2), 133-142.

Please Note: Illustration(s) are not available due to copyright restrictions.

Mona Tabatabaee-Yazdi

Islamic Azad University, Iran

Author info: Correspondence should be sent to: Dr. Mona Tabatabaee-Yazdi, Department of English, Islamic Azad University, Mashhad Branch, Iran.

Caption: FIGURE 1. Items-person Map
TABLE 1 Sample Items of the Adapted Grammatical Reasoning Test

Item                                                  True  False

The square inscribes the circle.                 ([])
The circle is inscribed by the square.           (O)
The square does not circumscribe the circle.     (O)
The circle is not circumscribed by the square.   ([])

TABLE 2 Item Measures and Fit Statistics for the "Persian
Adaptation of Baddeley's 3-min Grammatical Reasoning Test"

                       TOTAL   TOTAL   MEASURE   MODEL
                       SCORE   COUNT              S.E.

3. Negative Active       992    186      .46      .08
2. Positive Passive     1060    186      .03      .08
4. Negative Passive     1071    186     -.04      .08
1. Positive Active      1135    186     -.44      .08

                       INFIT          OUTFIT
                       MNSQ    ZSTD    MNSQ    ZSTD

3. Negative Active               .9    1.05     .6
2. Positive Passive     .84    -1.6    85      -1.5
4. Negative Passive    1.11     1.1    1.11     1.0
1. Positive Active      .82    -1.8     .81    -1.9

TABLE 3 Dimensionality Output

                                      Empirical      --      Modeled

Total raw variance in observations    21.6  100.0 %          100.0 %
Raw variance explained by measures    17.6  81.4 %           80.8 %
Raw variance explained by persons     13.6  63.0 %           62.5 %
Raw Variance explained by items       4.0   18.4 %           18.4 %
Raw unexplained variance (total)      4.0   18.6 %  100.0 %  19.2 %
Unexplained variance in 1st contrast  1.5   6.9 %   37.2 %
COPYRIGHT 2018 North American Journal of Psychology
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Tabatabaee-Yazdi, Mona
Publication:North American Journal of Psychology
Article Type:Report
Date:Jun 1, 2018
Previous Article:An Interview with Lawrence A. Machi: The Literature Review in Psychology: Six Steps to Success.
Next Article:The Ecological Momentary Assessment of Procrastination in Daily Life: Psychometric Properties of a Five-item Short Scale.

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |