Examining the reliability and validity of the effective behavior support self-assessment survey.
The Effective Behavior Support Self-Assessment Survey (SAS; Sugai, Horner, & Todd, 2003) is designed to measure perceived Positive Behavior Interventions and Supports (PBIS) implementation and identify priorities for improvement. Despite its longevity, little published research exists documenting its reliability or validity for these purposes. The current study reports on the SAS's internal consistency, construct validity, and criterion validity for a medium-sized rural/suburban district. It was found that the SAS possesses adequate internal consistency and validity. However, results suggest that further improvements to the survey could be made. Possible modifications and future directions are discussed.
Positive Behavioral Interventions and Supports (PBIS) is a multitiered educational approach grounded in principles of applied behavior analysis and the public health service delivery model (Horner, 1990; Sugai et al., 2010; Walker et al., 1996). PBIS was designed to foster safety, prosocial behavior, and academic readiness by outlining a structure to explicitly teach and reinforce these behaviors in schools. While PBIS has become popular and its elements have a reasonable research base (Horner, Sugai, & Anderson, 2010; Horner et al., 2009), several of the common measurement instruments associated with PBIS lack robust evidence for both their reliability and validity for the purpose of measuring PBIS fidelity. The current study seeks to help resolve this by examining several psychometric properties of one of the most long-standing of these instruments, the Effective Behavior Support Self-Assessment Survey (SAS).
Measuring Treatment Integrity Within a Problem-Solving Model
One feature that sets PBIS apart from some other evidence-based behavioral programs is its explicit focus on implementation science (Detrich, Keyworth, & States, 2008). Specifically, PBIS researchers have concentrated attention on the continual monitoring and problem solving of implementation integrity and rapid adaption of the PBIS model to novel behavioral challenges as they arise within schools (Sugai et al., 2010). The PBIS model is intended to be flexible, and data from students and staff are used to continually adjust behavioral prevention strategies and policies.
Facilitating the PBIS problem-solving process is a collection of instruments, some criterion referenced, that schools may complete with little professional assistance. A variety of these tools are available at no cost through the Office of Special Education Technical Assistance Center on Positive Behavioral Interventions and Supports (2013) and the Educational and Community Supports (2013) organization. The SAS and the School-Wide Evaluation Tool (SET; Sugai, Lewis-Palmer, Todd, & Horner, 2005) are two of the more venerable of these measures, although others have become more visible, such as the Benchmarks of Quality (BoQ; Kincaid, Childs, & George, 2010).
The SET and BoQ were designed as broad measures of PBIS implementation fidelity. Both include unique operational definitions for each level of their respective scale; define multiple sources of evidence, including key implementers and artifacts; and can be used in both a summative and formative fashion (Educational and Community Supports, 2013; Kincaid et al., 2010; Sugai et al., 2005). It is recommended that both be completed annually or biannually and with greater frequency earlier in the implementation process (Educational and Community Supports, 2013). In contrast, the SAS is a briefer measure with each question anchored to the same Likert-type response scales. The SAS is purely a survey--an opinion of perceived fidelity that can be quickly administered to all community members that also highlights the priorities for improvement of faculty and staff. The SAS is used to develop problem-solving goals and is intended to be used between the aforementioned measures or as a complement to them. It is recommended the SAS be completed annually (Educational and Community Supports, 2013).
Structure of the SAS
Any staff member of a school may complete the SAS. It is suggested that between 20 and 30 minutes of time be allocated for completion. The SAS is organized into four factors totaling 46 questions: School-Wide Systems (SWS), Nonclassroom Setting Systems (NCSS), Classroom Systems (CS), and Individual Student Systems (ISS). Sugai et al. (2003) operationally defined these factors as:
* School-Wide Systems: "School-wide is defined as involving all students, all staff, & all settings" (p. 2).
* Non-classroom Setting Systems: "Non-classroom settings are defined as particular times or places where supervision is emphasized (e.g., hallways, cafeteria, playground, bus)" (p. 4).
* Classroom Systems: "Classroom settings are defined as instructional settings in which teacher(s) supervise & teach groups of students" (p. 5).
* Individual Student Systems: "Individual student systems are defined as specific supports for students who engage in chronic problem behaviors (1%-7% of enrollment)" (p. 6).
Participants respond to various high-implementation PBIS statements (e.g., "expected student behaviors are rewarded regularly") along two response scales: Current Status and Priority for Improvement. Possible responses for Current Status include, "In Place" (3), "Partial in Place" (2), and "Not in Place" (1). Responses for Priority for Improvement include, "High" (3), "Med" (2), and "Low" (1). Responses are tallied and summary counts for each response option, collapsed across factor questions, are calculated. Problem-solving teams may then proceed to do analysis within factors to identify areas of perceived strength and weakness and match these to the areas faculty and staff feel are deserving of the most attention (Sugai et al., 2003).
Properties of the SAS
Any instrument that contributes information potentially leading to a change in an educational environment should be scrutinized from both theoretical and statistical grounds. If summed factor scores (total SWS, CS, NCSS, or ISS scores) do not adequately represent either the content or the questions contributing to them, then decisions can be made based on scores with unknown validity. Such analyses include statistical reliability and various tests of measure validity. Reliability can be in the form of internal consistency, test-retest, parallel forms, or some combination thereof. Hagan-Burke, Burke, Martin, Boone, and Kirkendoll (2005) reported on the internal consistency of the SWS factor based on a sample of 1,219 participant faculty and administrators from 35 schools in Alabama. The sample was primarily general education teachers (75%). The authors reported strong internal consistency ([alpha] = .88) for this individual factor. In 2006 Safran offered some descriptives for a small sample of teachers (N = 80). The author reported low to adequate reliability across factors ([alpha] = .60 to [alpha] = .85).
Validity has many facets, each contributing to the confidence one places in an instrument for a specific measurement purpose. The SAS appears to have sound content validity and preliminary evidence for convergent criterion validity (Horner et al., 2004) for the purpose of measuring perceived PBIS fidelity. However, evidence for its construct validity does not exist in the published literature. Construct validity is often defined as whether what is being measured accurately reflects the domains purported to be measured. In the SAS's case, the four factors should be assessing the constructs they are thought to measure and not something unrelated (e.g., teacher burnout, popularity of the leadership team, other listed factors). One way construct validity can be statistically tested is by employing confirmatory factor analysis (CFA). In the case of CFA, the covariance among questions is examined to see if questions within each factor are in fact measuring similar things and not loading heavily onto unrelated factors. Confirmatory factor analysis is particularly appropriate with well-established instruments with a strong, theoretically grounded framework, such as the SAS.
Only one study found in the extant literature provided evidence for the SAS's validity. Horner et al. (2004) provided a measure of convergent validity by correlating SAS-SWS scores to SET scores for a sample of 31 schools. The authors reported a strong correlation (R = .75), although this estimate is tenuous given the lack of clarity regarding the number of participants and potential issues with normality and heteroscasdiscity resulting from the untransformed percentage scores from the SET. Furthermore, the focus of the paper was on the psychometric properties of the SET.
Additional research on the SAS is needed. Extant studies report primarily on internal consistency. No study has yet to measure construct validity. The purpose of the current study is to measure SAS reliability and criterion validity and extend current research by measuring construct validity across all factors in a medium-sized district. The guiding research questions were:
* What is the internal consistency of the SAS for our sample and how does it compare to prior estimates?
* What is the construct validity of the SAS (e.g., CFA)?
* What is the relationship between the SAS and SET (criterion validity)?
* What potential modifications could be made to the SAS to improve its construct validity?
Participants and Setting
In the spring of 2013, 292 teachers (8% pre-K-K, 43% elementary education, 8% middle school, 29% high school, 10% special education, 2% extracurricular) completed the SAS. The sample came from 10 public schools in a suburban/rural district in the northeast, serving approximately 6,100 students (see Table 1). The student population was primarily Caucasian (66%) and Hispanic (16%). Approximately 4% of students were labeled English-language learners, and 17% of students qualified for special education services. Positive Behavior Interventions and Support has been present in the district for more than a decade and has been a district priority for more than 6 years. This included early professional consultation, district-wide funding, and continued support by district administration. Every school has a PBIS leadership team, although schools are in varying stages of implementation, as determined by recent SET scores, which ranged from 61% to 97% (see Table 1).
The SAS was administered online in the spring of 2013, and a university ethics review board approved its administration. Teachers were allowed time during a mandatory professional development day to complete the survey, and participants were situated at different school locations in the district at the time of administration. The SAS has been completed previously by the district and is used for formative decision making, so most participants were familiar with its content. Participants were told that time had been set aside for them to complete the SAS, which would assist PBIS leadership teams and administration plan for the upcoming year; that the survey was anonymous; and that they were to use their office computers to complete the survey. Teachers first reviewed a screen of instructions for survey completion and then moved to the next screen to begin the survey. Teachers were offered the opportunity to win a $100 gift certificate in a random drawing if they completed the survey. Extant SET data from eight participating schools were also available. The PBIS leadership teams at the school and district level complete the SET annually as a matter of regular procedure. The SAS data collection occurred before that of the SET, although the difference was never greater than 8 weeks.
The SET is a multimethod, multi-informant PBIS fidelity measure. One SET is completed per school, and PBIS fidelity is measured along 7 dimensions and 56 points, resulting in a mean fidelity percentage. A criterion of 80% on both the average SET score and the Behavioral Expectations Taught subscale is recommended by the manual as indicating acceptable fidelity. Vincent, Spaulding, and Tobin (2010) reported acceptable levels of internal consistency for the SET ([alpha] = .84) and moderate to high convergent validity with the Team Implementation Checklist (Sugai, Horner, Lewis-Palmer, & Rosetto Dickey, 2012). Similarly, Horner et al. (2004) reported the test-retest reliability of the SET to be r = .97 and the interobserver agreement to be r = .99.
Only the Current Status responses of the SAS were analyzed. Minimizing the amount of parameters estimated from our single sample reduced the chance for errors. In addition, the Priority for Improvement section is more in line with individual question analysis, not factor scores. Internal consistency (Cronbach's a), analysis of model fit resulting from CFA, and between-groups significance tests were calculated. Following, we explain our specific procedures for the latter two analyses.
Criterion validity. To assess criterion validity, schools were divided into two groups: those above the 80% benchmarks reported by the SET manual as being indicative of high fidelity and those below it, resulting in "high fidelity" and "low fidelity" groups. A one-way MANOVA was used to analyze differences across groups and factors. This analysis was a better fit for this study than the bivariate correlations conducted in Horner et al. (2004), because only eight schools completed the SET.
Confirmatory factor analysis. The response scale of the SAS is ordinal. It can be reasonably assumed that the three-point scale has an underlying continuous distribution of perceived implementation fidelity. This may result in biased parameter estimates and incorrect standard errors when the more common maximum likelihood estimation is used (Flora & Curran, 2004). To avoid such issues, Mean- and Variance- Adjusted Weighted Least-Squares (WLSMV) estimation was selected, which is offered in Mplus 7.0 (Muthen & Muthen, 2012). In brief summary, WLSMV is a limited information approach that relies on both the polychoric correlation matrix and the diagonal elements of the weight matrix for parameter estimation. The model assumes that data are missing completely at random and imputation cannot be used. Therefore, missing data were subjected to listwise deletion specifically for the CFA, resulting in a final sample of N = 197. WLSMV has been shown to be more appropriate with ordinal scales under five points in length and to result in accurate parameter estimation and high convergence rates at the given sample size (Beauducel & Herzberg, 2006; Rhemtulla, Brosseau-Liard, & Savalei, 2012).
While the robustness of the SAS in its current form is of interest, comparing the current factor structure to alternative hypothesized factor structures better addresses instrument parsimony. For this reason, we conducted an additional CFA on a nested model of the SAS. In this baseline model, questions from the first three factors (SWS, NCSS, CS) were loaded onto a unitary Primary Practices factor while ISS, left intact, served as its own Intensive Support factor. Our rational for this measurement model was that the SWS, CS, and NCSS factors represent prevention strategies that affect all students. It may be that core practices are common across locations and do not need to be organized by setting. Conversely, the ISS factor measures school practices and capacities that affect students that need additional support at the secondary or tertiary level, separating it from the Primary Practices latent variable. If this hypothesis was correct, the baseline model would demonstrate significantly greater parsimony than the published SAS measurement model. Such differences in parsimony can be analyzed statistically by applying a [chi square] difference test.
Descriptive statistics are presented in Table 2 for total factor scores. Item-by-item analysis showed that data were often modestly skewed, modestly kurtotic, and multivariate non-normal (Mardia's coefficient = 353.72, p < .01). Item skew and kurtosis are presented in the final columns of Table 3. The ISS factor was the most normally distributed variable. Colinearity among factor questions was reviewed and found to be at acceptably low levels.
Internal consistency (Table 4) across factors was in the acceptable range. The NCSS factor had the lowest internal consistency ([alpha] = .82), and ISS and SCS both had the highest consistency ([alpha] = .88).
Tests for the violation of sample homogeneity were not significant. The omnibus test for differences across SAS factors was significant (Wilks's lambda = .86, F(1, 123) = 5.01, p < .01. Follow-up univariate analysis showed that all factors yielded significant differences across SET groups after a family-wise error adjustment of critical t values ([t.sup.**]; Bird & Hadzi-Pavlovic, 2014) for the two-group MANOVA case, with the exception of NCSS (t = 2.37, p = .13), which was in the hypothesized direction (Table 5). Effect sizes are reported as point-biserial correlations (R) and can be categorized as medium in size (R = .30; Cohen, 1988).
The baseline model yielded a relatively good fit ([chi square] (988) = 1467.30, p < .01, RMSEA = .050 (90% CI: .045, .055), CFI = .93) based on suggested benchmarks (e.g., Hair, Black, Babin, & Anderson, 2009; Hu & Bentler, 1999; Kline, 2010). Item loadings ranged from [R.sup.2] = .27 to [R.sup.2] = .79, with generally lower loadings among the Primary Practices factor. The final model proved to be more parsimonious, with all fit indices improving marginally ([chi square] (983) = 1400.83, p < .01, RMSEA = .047 [.041, .052], CFI = .94). This four-factor model with all latent factors allowed to freely covary was a significantly better fit than the baseline model ([chi square] (5) = 56.20, p < .01).
Individual factor loadings for the final model are presented in Table 3. Loadings ranged widely, from a low [R.sup.2] of .28 (SWS Q12) to a high of .87 (ISS Q2), with all loadings being significant. Item loadings from SWS were notably lower than other factors, with all but five questions having less than 50% of their variances explained by their respective factor. Modification indices suggested that the overall model fit could be most improved ([DELTA][chi square] > 10) by allowing five paths to be free (SWS3 [right arrow] NCSS, SWS7 [right arrow] CS, SWS12 [right arrow] CS, NCSS8 [right arrow] ISS and SWS, ISS2 [right arrow] CS). Three of these modifications that have strong theoretical justification are discussed further on. Structural model correlations are presented in Table 6. Correlations were high, particularly among the first three factors.
The purpose of the current study was to investigate the internal consistency, convergent criterion validity, and construct validity of the SAS. Results demonstrated that the SAS has acceptable internal consistency across all factors. Criterion validity was adequate and converged with prior findings. Goodness-of-fit indices suggested that the current factor structure sufficiently represented the covariance among questions. However, these results require replication in independent samples and improvements to the instrument can be made.
The current study adds to the extant literature by supplying estimates of internal consistency across both a moderately large sample and across all factors. Internal consistency was high, comparable to that of the SWS factor reported by Hagan-Burke et al. (2005) and greater than that found by Safran (2006), who used a smaller sample. A tentative conclusion is that the instrument is an internally consistent measure for populations similar to our sample. The reliability of the SWS factor, which now has cross-study support, is the most generalizable. Reliability is limited to the interpretation of internal consistency. Study of the measure's test-retest reliability is a logical next step, which has been done with the SET (Vincent et al., 2010).
The SET and the SAS rely on somewhat different sources of information and employ different measurement scales, although they quantify similar constructs. The SET has been found in a previous study to have convergent validity with the SAS (Horner et al., 2004). Results of the current study found a moderate relationship between the two measures; dividing schools by high or low SET scores explained a significant and moderate amount of variance in SAS scores, with the exception of NCSS. These results suggest the two measures are related, but may not be measuring identical constructs (i.e., implementation fidelity versus perception of fidelity). Similarly, Horner et al. (2004) reported a high correlation (R = .75) between SWS scores and average SET scores. The results of both studies suggest the SAS has convergent validity with the SET and is measuring implementation fidelity; however, a narrower estimate of this relationship that controls for nesting--a limitation in both studies--remains unknown and requires further exploration.
Fit indices for the SAS in its current form were in the acceptable range, suggesting that the instrument has adequate construct validity. The model was also superior to a hypothesized baseline model. However there are possibilities for improvement. For example, modification indices suggested that NCSS Q8, "Status of student behavior and management practices are evaluated quarterly from data," also loaded onto SWS and ISS. In other words, participants responded similarly to this item and items under the SWS, ISS, and NCSS factors. This item had a moderate loading with NCSS ([R.sup.2] = .48). The repositioning of this item has theoretical justification. The question has no specific bearing on nonclassroom settings and would be applicable to all school settings. Furthermore, it is possible that teachers misconstrue the question as inquiring about individual students, not collapsed data used to develop primary-level solutions. The question also appears similar to SWS Q11, "Data on problem behavior patterns are collected and summarized within an on-going system," and SWS Q12, "Patterns of student problem behavior are reported to teams and faculty for active decision-making on a regular basis (e.g., monthly)."
A second example is ISS Q2, "A simple process exists for teachers to request assistance." Modification indices suggested that this question also loaded onto CS. The repositioning of this question or possible double loading of this question makes theoretical sense; a healthy school system would support access to consultation for faculty members in any setting for problems of different severities. A third suggestion comes from SWS Q7, "Options exist to allow classroom instruction to continue when problem behavior occurs." The question also loaded onto CS, which is unsurprising given its narrow focus, which would not be applicable to other environments.
Authors of the SAS may also want to remove questions or modify questions with low loadings. There is no widely accepted definition of "low"--recommendations vary and must be contextually based--but two frequently cited texts suggest a cutoff between [R.sup.2] = .25 and [R.sup.2] = .30 (Comrey & Lee, 1992; Hair et al., 2009). One example is SWS12, "Patterns of student problem behavior are reported to teams and faculty for active decision-making on a regular basis (e.g., monthly)." Only 28% of item variance was explained by the SWS factor. The question could be removed or made more specific, perhaps worded as, "School-wide data regarding student problem behavior are reported to decision-making teams at regular intervals." Adding the term school-wide reminds the reader that the question is asking about a specific type of setting. A second example is ISS Q5, "Local resources are used to conduct functional assessment-based behavior support planning (~10 hrs/week/student)." The question had the lowest loading for its respective factor ([R.sup.2] = .43), and some may find it difficult to interpret which specific "local resources" the question is addressing.
Authors of the survey may choose to move these questions onto better-fitting factors, reword them, remove them, or allow them to contribute to multiple factors. The effects of such decisions would not be known until the modified survey is reevaluated. While individual items fall under visible factor headings, it may be that faculty and staff do not remember that they are addressing questions specifically asking about school-wide, classroom, nonclassroom, or individual student supports when responding to individual items. The high structural correlations (Table 6) and practically small fit differences between the nested models support this possibility. These correlations, which reflect the degree to which each latent factor is related to one another, were higher than is typically desired (e.g., the correlation between SWS and CS was .88). Latent factors may be related, but they should not share a very high degree of variance (i.e., multicolineairty), because this indicates that they may be measuring identical constructs and the model is misspecified (Kline, 2010).
Limitations and Future Directions
Several limitations of this study are important to expand on. Foremost, in consideration of the complexity of our model and the chosen CFA estimation method, the sample size was viable but small (Beauducel & Herzberg, 2006; Hair et al., 2009; Rhemtulla et al., 2012). Small sample sizes can result in misspecification and inaccurately estimated parameters. Cross-validation is necessary with independent samples, and results of the current CFA are tentative. Similarly, a full range of possible SAS and SET scores was not available for analysis. As a result, our conclusions are least generalizable to schools at the beginning stages of PBIS implementation with low fidelity, as this subsample was underrepresented. It is hoped that the SAS's evolution to an online format provides opportunities for these research extensions. Optimally, a robust model of the SAS would have the power to account for both the ordinal scale and the nested nature of the data (e.g., teachers nested within schools).
The estimate of criterion validity compressed the SET's percentage scale to a binary categorization, albeit a theoretically justified one based on the SET manual. Naturally this ignores much of the useful variance in SET scores. Significant relationships were found, but they may have been underestimated.
Finally, it is worth noting that the current study only explored a small slice of the larger construct of test reliability and validity. The instrument's reliability could be better understood by calculating test-retest reliability. More research on the criterion validity of the SAS is also needed, which could involve comparison both to other PBIS fidelity measures and, more interestingly, to student-level outcomes.
The SAS is a long-standing instrument for PBIS problem solving. However, research thus far on the instrument has been limited. Current results suggest that the SAS is both reliable and valid enough to be used in practice for the purpose of measuring faculty and staff perceptions of PBIS fidelity. Fit indices suggest that there is room for improvement and the authors of the survey may want to consider a future revision, taking into account the reported factor loadings, structural correlations, and modification indices.
While the SAS yielded a significant relationship with the SET in this study and in Horner et al. (2004), the variance explained in either study was not high enough to recommend use of the SAS as a replacement for other fidelity measures. Rather it should serve a supplementary role by quantifying perceptions of the community and highlighting in which areas willingness to change is greatest. It is for this reason that the "Priority" scale is important from a treatment-utility perspective, even though it was not the focus of the current study.
Beauducel, A., & Herzberg, P. Y. (2006). Maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186-203. http://dx.doi.org/10.1207/s15328007sem1302_2
Bird, K. D., & Hadzi-Pavlovic, D. (2014). Controlling the maximum familywise Type I error rate in analyses of multivariate experiments. Psychological Methods, 29(2), 265-280. http://dx.doi. org/10.1037/a0033806 Medline:24079933
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Detrich, R., Keyworth, R., & States, J. (2008). A roadmap to evidence-based education: Building an evidence-based culture. In R. Detrich, R. Keyworth, & J. States (Eds.), Advances in evidence-based education (Vol. 1, pp. 3-25). Oakland, CA: The Wing Institute.
Educational and Community Supports. (2013). PBIS assessment surveys. Retrieved from https://www.pbisapps.org/Applications/Pages/PBIS-Assessment-Surveys.aspx
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466-491. http:// dx.doi.org/10.1037/1082-989X.9.4.466 Medline: 15598100
Hagan-Burke, S., Burke, M., Martin, E., Boone, R., & Kirkendoll, D. (2005). The internal consistency of the school-wide subscales of the Effective Behavioral Supports Survey. Education & Treatment of Children, 28(4), 203-213.
Hair, J. R., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Horner, R. H. (1990). Toward a technology of "nonaversive" behavioral support. Journal of the Association for Persons with Severe Handicaps, 25(3), 125-132.
Horner, R. H., Sugai, G., & Anderson, C. M. (2010). Examining the evidence base for school-wide positive behavior support. Focus on Exceptional Children, 42, 1-14.
Horner, R. H.; Sugai, G., Smolkowski, K., Eber, L., Nakasato, J., Todd, A. W., & Esperanza, J. (2009). A randomized, waitlist controlled effectiveness trial assessing school-wide positive behavior support in elementary schools. Journal of Positive Behavior Interventions, 22(3), 133-144. http://dx.doi. org/10.1177/1098300709332067
Horner, R. H., Todd, A. W., Lewis-Palmer, T., Irvin, L. K., Sugai, G., & Boland, J. B. (2004). The School-wide Evaluation Tool (SET): A research instrument for assessing school-wide positive behavior support. Journal of Positive Behavior Interventions, 6(1), 3-12. http://dx.doi.org/10.1177/10983007040060010201
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. http:// dx.doi.org/10.1080/10705519909540118
Kincaid, D., Childs, D., & George, H. (2010). School-wide Benchmarks of Quality (Revised). Unpublished instrument. Tampa, FL: University of South Florida.
Kline, R. B. (2010). Principles and practice of structural equation modeling (3rd ed.). New York, NY: Guilford.
Muthen, L. K., & Muthen, B. O. (2012). Mplus user's guide (7th ed.). Los Angeles, CA: Muthen & Muthen.
Office of Special Education Technical Assistance Center on Positive Behavioral Interventions and Supports. (2013). Evaluation tools. Retrieved from http://www.pbis.org/evaluation/evaluation_tools.aspx
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 27(3), 354-373. http://dx.doi.org/10.1037/a0029315 Medline:22799625
Safran, S. P. (2006). Using the Effective Behavior Supports Survey to guide development of schoolwide positive behavior support. Journal of Positive Behavior Support, 8(1), 3-9. http://dx.doi.org/ 10.1177/10983007060080010201
Sugai, G., Horner, R., Lewis-Palmer, T., & Rosetto Dickey, C. (2012). Team Implementation Checklist (3.1). Eugene, OR: University of Oregon.
Sugai, G., Horner, R. H., Algozzine, R., Barrett, S., Lewis, T., & Anderson, C. & Simonsen, B. (2010). School-wide positive behavior support: Implemented' blueprint and self assessment. Eugene, OR: University of Oregon.
Sugai, G., Homer, R. H., & Todd, A. W. (2003). EBS Self-Assessment Survey (2.0). Eugene, OR: Educational and Community Supports.
Sugai, G., Lewis-Palmer, T., Todd, A. W., & Horner, R. H. (2005). School-wide Evaluation Tool (2.1). Eugene, OR: Educational and Community Supports.
Vincent, C., Spaulding, S., & Tobin, T. J. (2010). A reexamination of the psychometric properties of the School-wide Evaluation Tool (SET). Journal of Positive Behavior Interventions, 12(3), 161-179. http://dx.doi.org/10.1177/1098300709332345
Walker, H. M., Horner, R. H., Sugai, G., Bullis, M., Sprague, J. R., Bricker, D., & Kaufman, M. J. (1996). Integrated approaches to preventing antisocial behavior patterns among school-age children and youth. Journal of Emotional and Behavioral Disorders, 4(4), 194-209. http://dx.doi.org/10.1177/106342669600400401
Benjamin G. Solomon, Ph.D.
Oklahoma State University
Kevin G. Tobin, Ph.D.
Pittsfield Public Schools
Gregory M. Schutte, M.S.
Oklahoma State University
Address correspondence to Benjamin Solomon, Ph.D., School Psychology Program, School of Applied Health and Educational Psychology, Oklahoma State University, 443 Willard Hall, Stillwater, OK 74078, USA; email: Benjamin.email@example.com
Table 1 General Characteristics of Participating Schools School Type FRL (a) Implementation Number of SET Years Participants (b) 1 K-5 56% 6 19 61% 2 K-5 33% 6 25 84% 3 K-5 76% 6 37 75% 4 K-5 88% 6 34 87% 5 K-5 50% 6 18 84% 6 K-5 46% 6 29 97% 7 6-8 50% 6 20 83% 8 6-8 65% 13 10 97% 9 9-12 43% 6 70 -- 10 9-12 49% 6 29 -- (a) FRL = free and reduced lunch status. (b) Participants does not include unidentified or doubly identified (e.g., employed in more than one school) faculty and staff. Note: School 8 was the original pilot site for PBIS implementation, whereas all other schools implemented PBIS at a later date. Table 2 Descriptive Statistics for SAS Sample Factor Mean SD Range SEM Skew Kurtosis SWS 43.97 6.52 23-54 2.26 -.48 -.62 NCSS 21.66 3.83 10-27 1.62 -.55 -.30 CS 28.16 3.98 17-33 1.43 -.59 -.75 ISS 17.88 4.16 8-24 1.44 -.27 -.48 SD = standard deviation; SEM = standard error of measurement. Table 3 Factor Loadings for Final Model Factor Item [beta] [R.sup.2] Error Skew Kurtosis SWS 1 1.00 .59 -- -2.24 4.09 2 .96 .54 .09 -1.42 1.13 3 .90 .48 .09 -1.00 -.07 4 .85 .43 .09 -.76 -.43 5 .71 .30 .09 -.43 -1.02 6 .91 .49 .10 -.32 -.97 7 .97 .56 .10 -.68 -.51 8 .94 .52 .12 -2.20 4.30 9 .94 .52 .10 -1.72 2.12 10 .86 .43 .10 -2.21 4.24 11 .79 .37 .09 -.47 -1.05 12 .69 .28 .10 .12 -1.32 13 .89 .46 .09 -.98 -.09 14 .82 .40 .10 .16 -1.40 15 .84 .42 .09 .02 -1.38 16 .88 .46 .10 -.57 -.89 17 .84 .42 .10 -.19 -1.06 18 .73 .32 .11 -.87 -.51 NCSS 1 1.00 .47 -- -1.46 .86 2 1.24 .73 .12 -1.14 -.05 3 1.06 .54 .13 -1.05 -.07 4 1.11 .59 .11 -.75 -.98 5 .89 .37 .12 -.71 -.79 6 .94 .42 .12 -1.07 -.08 7 1.17 .65 .13 .15 -1.22 8 .94 .42 .13 .33 -1.23 9 .93 .41 .11 -.94 -.33 CS 1 1.00 .78 -- -1.70 1.22 2 .72 .40 .07 -1.12 .05 3 .95 .70 .07 -1.59 .95 4 .76 .45 .08 -.91 -.33 5 .84 .55 .06 -.60 -.68 6 .94 .69 .05 -.94 -.30 7 .94 .68 .06 -.81 -.36 8 .72 .40 .08 -1.00 -.21 9 .74 .43 .08 -.35 -.80 10 .82 .52 .07 -.58 -.64 11 .77 .46 .07 -.52 -.81 ISS 1 1.00 .53 -- -.43 -1.06 2 1.28 .87 .11 -1.16 -.18 3 1.09 .63 .09 -.34 -1.27 4 1.05 .58 .09 -.96 -.42 5 .89 .42 .10 .14 -1.37 6 1.03 .56 .09 -.76 -.50 7 .99 .52 .10 .26 -1.13 8 1.22 .79 .08 -.24 -.98 Table 4 Internal Consistency of SAS Factors Factor N [alpha] SWS 18 .88 NCSS 9 .82 CS 11 .86 ISS 8 .88 Table 5 Between-Groups Analysis of High Versus Low SET Groups t(df) [t.sup.**] V R SWS 8.33(121) 2.17 .01 .25 NCSS 2.37(117) 2.16 .13 -- CS 15.72 (99) 2.17 <.01 .34 ISS 9.98 (84) 2.17 <.01 .28 Note: [t.sup.**] = adjusted critical values. Table 6 Correlation Matrix of Latent Factors SWS NCSS CS ISS SWS -- NCSS .83 -- CS .88 .81 -- ISS .80 .69 .69 --
|Printer friendly Cite/link Email Feedback|
|Author:||Solomon, Benjamin G.; Tobin, Kevin G.; Schutte, Gregory M.|
|Publication:||Education & Treatment of Children|
|Date:||May 1, 2015|
|Previous Article:||The use of structural behavioral assessment to develop interventions for secondary students exhibiting challenging behaviors.|
|Next Article:||Effects of increasing distance of a one-on-one paraprofessional on student engagement.|