Printer Friendly
The Free Library
6,672,202 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Introduction to regression using NBA statistics.


Abstract

This paper presents an activity that was used to introduce concepts related to the simple linear regression Simple linear regression

A regression analysis between only two variables, one dependent and the other explanatory.
 model using data from the National Basketball Association National Basketball Association (NBA)

U.S. professional basketball league. It was formed in 1949 by the merger of two rival organizations, the National Basketball League (founded 1937) and the Basketball Association of America (1946).
 (NBA NBA
abbr.
1. National Basketball Association

2. National Boxing Association

NBA (US) n abbr (= National Basketball Association) → Basketball-Dachverband (=
). Using SPSS A statistical package from SPSS, Inc., Chicago (www.spss.com) that runs on PCs, most mainframes and minis and is used extensively in marketing research. It provides over 50 statistical processes, including regression analysis, correlation and analysis of variance.  to facilitate student understanding and interpretation of statistics concepts, this particular classroom example also illustrates potential problems that can arise when manipulating real-life data. Teaching activities of this sort might help students to begin to make the connection between learning in the classroom and applying the methods out in the real world.

Introduction

The teaching of introductory statistics courses and concepts can be a challenge. Many teachers not only want their students to attain an understanding of basic statistics concepts but they also would like to demonstrate to students the practical applications of the methods in the real world (Gal, Ginsburg, & Schau, 1997; Doerr & English, 2003; Groth & Powell, 2004).

Selecting an appropriate research scenario using real data that illustrates the concepts that are taught in an introductory class can also be a concern for teachers (Hill & Ball, 2004; Groth & Powell, 2004; Franklin, 2000). Although using real data can illustrate practical applications of the methods, it almost never seems to follow the theory that students learn in the classroom. Nevertheless, there are introductory statistics concepts that can help provide an introduction to interesting applications of understanding relationships in the real-world. In addition, when deviations from theory result, this may stimulate other worthwhile discussions and enhance understanding of related statistics concepts.

The purpose of this paper is to present an activity that was used to introduce ideas related to the simple linear regression model as well as to illustrate potential problems that can arise when manipulating 'real-life' data. Using the Statistical Product and Service Solutions (SPSS) software (or Excel) and data from the National Basketball Association (NBA) website, the following classroom activity might be helpful for students learning introductory statistics concepts. This activity is also appropriate and might be used by 6-12 grade teachers as an application of the components of the Data Analysis and Probability Standard (NCTM NCTM National Council of Teachers of Mathematics
NCTM Nationally Certified Teacher of Music
NCTM North Carolina Transportation Museum
NCTM National Capital Trolley Museum
NCTM Nationally Certified in Therapeutic Massage
, 2000). In addition, other important teaching objectives can also be emphasized, such as how to interpret scatterplots and correlation, evaluating the tenability ten·a·ble  
adj.
1. Capable of being maintained in argument; rationally defensible: a tenable theory.

2.
 of assumptions, writing null A character that is all 0 bits. Also written as "NUL," it is the first character in the ASCII and EBCDIC data codes. In hex, it displays and prints as 00; in decimal, it may appear as a single zero in a chart of codes, but displays and prints as a blank space.  and alternative hypotheses associated with hypothesis tests, and reporting and interpreting confidence intervals confidence interval,
n a statistical device used to determine the range within which an acceptable datum would fall. Confidence intervals are usually expressed in percentages, typically 95% or 99%.
 and p-values.

Classroom Activity

Most students are familiar with many of the teams affiliated with the NBA--most will even have a favorite. Near the end of a class period is a good time to allow students to log on to the internet to collect data for the next class. At [1], students can access individual player statistics for their favorite team such as average points per game, rebounds, fouls, steals, turnovers, etc.

After a class lecture on simple linear regression, students were asked to collect data on 2 variables from the NBA website that might be linearly related. During the next class, we used SPSS to analyze the data, interpreted related concepts, and evaluated assumptions before interpreting statistics and making inferences. Presented below, we offer one example for data collected on the Atlanta Hawk's 2002-03 season that illustrates our challenge using 'real data'.

Research Question

'Boo' has been an Atlanta Hawks fan all of her life. Her team finished in 11th place (out of 15th) in the Eastern Conference regular season standings. When considering her two variables, Boo offered that new head coach Terry Stotts Terry Stotts (born November 25, 1957 in Cedar Falls, Iowa) is an American basketball coach and was, most recently, head coach of the NBA's Milwaukee Bucks . Stotts was fired by the Milwaukee Bucks on March 14th, 2007.  might also be interested in her 'study' because studying the team statistics from last season might help to improve for the upcoming season. She also added that in the past few seasons, the Hawks have struggled with improving on many of the fundamentals of the game, including grabbing rebounds and cutting down on turnovers. Therefore, Boo wanted to know if there was a linear relationship between a player's rebounds per game and the number of points scored [Table 1]. To see all mentioned tables/figures, visit issue website at http://rapidintellect.com/AEQweb/fal2005.htm

After entering the data into SPSS, a user-friendly, software program that can generate simple statistics similar to Excel, a scatterplot of the data in Figure 1 revealed a positive and moderately strong linear relationship between the two variables [Figure 1]. The Pearson correlation of .78 confirmed this relationship; that is, there is an overall tendency to indicate that the more rebounds a player grabs, the more points the player will score [Table 2]. The simple linear regression equation relating these two variables was also generated but a check of the assumptions for making inferences would need to be evaluated first. The students, eager to crunch numbers right away and determine if there was statistical significance, presumed that the assumptions would be a simple technicality. However, what we found was that this was not the case. So reported below, we used Boo's example to illustrate how we dealt with the violations. We also discussed and reinforced other related concepts related to regression analysis In statistics, a mathematical method of modeling the relationships among three or more variables. It is used to predict the value of one variable given the values of the others. For example, a model might estimate sales based on age and gender. , including interpreting and understanding scatterplots, correlation, hypothesis testing hypothesis testing

In statistics, a method for testing how accurately a mathematical model based on one set of data predicts the nature of other data sets generated by the same process.
, assumptions, confidence intervals and p-values. We concluded the activity by discussing cause and effect, another important limitation about the linear relationship or association between two variables.

[FIGURE 1 OMITTED]

Assumptions

Using Boo's data, we generated the output and examined the assumptions using SPSS and an overhead projection while the rest of the class watched and asked questions when needed. Although the simple linear regression model relating points per game (PPG PPG Points Per Game (basketball player statistic)
PPG Power Play Goals (hockey)
PPG Planning Policy Guidance (UK)
PPG Programmable Pulse Generator
PPG Power Puff Girls
) and average rebounds (AVGREBS) revealed the predicted equation of 2.086x + .056 [Table 3], an evaluation of the assumptions indicated a possible violation. First, the assumption of normality normality, in chemistry: see concentration.  for the distribution of the errors appeared to be maintained when examining the standardized standardized

pertaining to data that have been submitted to standardization procedures.


standardized morbidity rate
see morbidity rate.

standardized mortality rate
see mortality rate.
 residual plot [Figure 2]. Almost all of the standardized residuals were within 2 standard deviations In statistics, the average amount a number varies from the average number in a series of numbers.

(statistics) standard deviation - (SD) A measure of the range of values in a set of numbers.
 of the mean, despite the small sample (There was only one standardized residual beyond +2 standard deviations of the mean Jason Terry Jason Eugene Terry (born September 15 1977) is an American professional basketball player currently playing with the NBA's Dallas Mavericks. He plays point guard, although he also can play shooting guard. His nickname, "JT", derives from his initials. ) [Table 4]. Therefore, we concluded that the normality assumption appeared to be met.

[FIGURE 2 OMITTED]

On the other hand, there was some evidence to indicate that the assumption of constant variance across the values of AVGREBS was not maintained. The plot revealed that the variances increased from left to right [Figure 2]. Thus, we were not justified to make any inferences for our model due to the violation of this assumption. We then discussed how to resolve the violation by transforming the dependent variable to a different scale by considering either the logarithm logarithm (lŏg`ərĭthəm) [Gr.,=relation number], number associated with a positive number, being the power to which a third number, called the base, must be raised in order to obtain the given positive number.  or inverse of PPG (Anderson, Sweeney, & Williams, 1994). At this point during the activity, we took a few minutes to review how to make these transformations using our calculators. For Boo's data, each student found the appropriate transformation for each function, followed by a confirmation using SPSS (which can be easily done in SPSS or Excel). We ultimately discovered, after discussing the shapes of the two plots for each function, that applying the natural logarithm Natural logarithm

Logarithm to the base e (approximately 2.7183).
 to PPG satisfied the requirements for making inferences back to the population. In fact, an inspection of the new plot revealed that the LOGEPPG scores not only provided a better normal approximation approximation /ap·prox·i·ma·tion/ (ah-prok?si-ma´shun)
1. the act or process of bringing into proximity or apposition.

2. a numerical value of limited accuracy.
 but also corrected the wedge-shaped pattern of the variances for the residual plot [Figure 3]. It was explained that the residuals now appeared to have more of an overall even scatter scat·ter
v.
1. To cause to separate and go in different directions.

2. To separate and go in different directions; disperse.

3. To deflect radiation or particles.

n.
 about the line from end to end (i.e., rectangular pattern). One student also noted that we lost observation 20 (Paul Shirley Paul Murphy Shirley (born December 23 1977 in Redwood City, California) is an American professional basketball player, currently playing for Menorca Bàsquet of the Spanish ACB. He is also a writer, with his primary focus on his basketball experiences. ) [Table 1] [Table 5] because the natural log is not defined for x less than or equal to 0. In addition, it was also pointed out by the instructor that one observation (Brandon Williams
For the basketball player, see Brandon Williams (basketball).
Brandon Williams (born February 24, 1984 in St. Louis, Missouri) is an American football wide receiver for the St. Louis Rams of the National Football League.
) appeared to be somewhat different from the rest of the group, which reinforced the idea about how one person (i.e., an outlier outlier /out·li·er/ (out´li-er) an observation so distant from the central mass of the data that it noticeably influences results.

outlier

an extremely high or low value lying beyond the range of the bulk of the data.
) might potentially affect results and interpretations.

[FIGURE 3 OMITTED]

One student questioned whether it was even important to 'do' (evaluate) these assumptions, as 'we really just want to know whether there is a relationship between points and rebounds or not'. Another student responded by reminding the class that 'a different regression model will change the relationships'. With those thoughts in mind, we proceeded to estimating and interpreting the model, which also included emphasizing other important concepts related to regression, such as writing hypotheses, testing for statistical significance, interpreting the confidence interval, and prediction. We finished our activity by considering a follow-up research question, offered our advice to the coaches, and talked about the limitations of our inferences.

Interpretation of the Model

We generated the output for the estimated regression equation Regression equation

An equation that describes the average relationship between a dependent variable and a set of explanatory variables.
 for the natural logarithm of PPG, which indicated .343x + .344 [Table 6]. Boo's interpretation of the slope revealed that 'a player's average points per game will increase by 1.40 points for every rebound he obtains', on average. Next, we talked about whether the relationship was statistically significant by considering hypothesis testing. The null and alternative hypotheses about the relationship between rebounds and points scored in the population was written as

Null hypothesis null hypothesis,
n theoretical assumption that a given therapy will have results not statistically different from another treatment.

null hypothesis,
n
: The population slope Beta equals 0 versus the

Alternative hypothesis alternative hypothesis Epidemiology A hypothesis to be adopted if a null hypothesis proves implausible, where exposure is linked to disease. See Hypothesis testing. Cf Null hypothesis. : The population slope Beta does not equal 0.

The F-statistic from the SPSS ANOVA anova

see analysis of variance.

ANOVA Analysis of variance, see there
 summary table revealed that the relationship was in fact statistically significant (F(1,17) equals 20.5, with p equal to .0002) at the .05 level of significance. Alternately, we also discussed that the t-statistic can also be used to test the same relationship (t(17) equals 4.5, with p equal to .0002). This was also an appropriate moment to solidify so·lid·i·fy  
v. so·lid·i·fied, so·lid·i·fy·ing, so·lid·i·fies

v.tr.
1. To make solid, compact, or hard.

2. To make strong or united.

v.intr.
 the interpretation of the p-value, a concept that tends to remain fuzzy fuzz·y  
adj. fuzz·i·er, fuzz·i·est
1. Covered with fuzz.

2. Of or resembling fuzz.

3. Not clear; indistinct: a fuzzy recollection of past events.

4.
 even for the most advanced learner.

After receiving no volunteers on how to interpret the p-value (but many students were able to state the decision of reject because the p-value is less than the alpha of .05), the instructor concluded that--the probability that we would have obtained a result like this, if in fact there was no relationship between points per game and rebounds, is very small. Therefore, we all agreed that we would reject the null hypothesis that there is no relationship between rebounds and number of points scored in the population. Instead, there is indeed sufficient evidence that these two variables are in fact linearly related, a positive and somewhat strong relationship. Furthermore, the 95% confidence interval, which also reveals additional interesting information, was also included on the output and it also provided another means to demonstrate how interval estimates can be used with real data: We can be 95% confident that a player's average points per game will increase on average somewhere between 1.20 and 1.65 points for every rebound grabbed. Finally, the students were able to use the model for prediction (only over the range of the x-values) and estimated that a player who averages 7 rebounds per game might average 15.5 points per game.

In reviewing other statistics presented on the website, we also noticed that a player's average rebounds per game were divided into two distinct types--offensive and defensive rebounds. Therefore, we considered another quick interesting question--Does obtaining rebounds in the opponent's court (defensive rebounds) have a stronger relationship with scoring or is hustling hustling Medical practice The illegal soliciting of victims of accidents or dread disease, to provide them with services; after being hustled, the Pt's insurance company is usually billed for office visits and treatment. See Ambulance chaser.  perhaps, to obtain a rebound after a missed field goal in the Hawks' court (offensive rebounds) more important? The class was given a short time to work in groups of 4 for this question, generate the output, and make a decision about which variable might be the better predictor. The results of the simple linear regression line for each variable with average points per game indicated that both were important and statistically significant (defensive rebounds: t(18) equals 6.566, with p equal to .0001; offensive rebounds: t(18) equals 2.586, with p equal to .019), assuming all assumptions were met. In addition, the correlation matrix Noun 1. correlation matrix - a matrix giving the correlations between all pairs of data sets
statistics - a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population
 of the three variables indicated that the number of defensive rebounds a player snags SNAGS,
n.pl See sustained natural apophyseal glides.
 was a better predictor for scoring points (r equals .84) than defensive rebounds (r equals .52).

Finally, we discussed why we might have obtained the results we did and offered the following advice to the Hawks' coaches: The fact that defensive rebounds are important is no surprise; however, any athlete knows that obtaining offensive rebounds provides a better (and second) chance for a bucket. Perhaps this is where the Hawks need to improve. Therefore, one might speculate from our results that the Hawks are lacking in this area and should focus more on obtaining offensive and defensive rebounds in order to improve their winning ways!

We concluded our activities by re-emphasizing the limitations about our inferences between two variables. That is, a strong association between two variables is not adequate to make conclusions about cause and effect. Furthermore, other variables almost certainly impact a player's average points per game (i.e., spurious correlations Noun 1. spurious correlation - a correlation between two variables (e.g., between the number of electric motors in the home and grades at school) that does not result from any direct relation between them (buying electric motors will not raise grades) but from their ), including the variables we selected (One student reported that one of the Hawks' star players was sidelined because of an injury; therefore, 'playing time' is also a factor). We also discussed the possibilities of interpreting (or not) the intercept and the effect of outliers and missing data were other interesting topics that students experienced while working with their data.

Final Thoughts

Utilizing interesting data sets available on the internet can be used in the classroom to not only supplement instruction but also to motivate students to learn how methods are used in practice (students could also be required to collect data from websites outside of class time to allow more time for analyses and discussions in class). A number of other websites can also be considered to illustrate methods in practice that might interest students. Sports data Sports data are typically published online and in newspapers as box scores. Box scores contain a numerical view of a sporting event and are of interest for sports betting and fantasy sports. While box scores contain a wealth of information (e.g.  can be collected from websites such as the NFL NFL
abbr.
National Football League

NFL (US) n abbr (= National Football League) → Fußball-Nationalliga
 [2], NHL NHL Non-Hodgkin's lymphoma, see there  [3[, or interesting data might even be available in the athletic department from previous seasons at their school, college or university. Another motivating website that lists statistics about music, movies, and artists on a weekly basis is the Billboard website at [4]. As a project, students might be asked to collect data for their favorite team or band and follow the analysis through from posing research questions, evaluating the tenability of assumptions, and hypothesis testing to making decisions and conclusions within the context of the scenario.

The purpose of this paper was to present a real-life example used in the classroom that illustrates introductory ideas about the simple linear regression model. Elementary and secondary teachers could also consider similar learning activities as a hands-on application of the Data Analysis and Probability Standard (NCTM, 2000). Using NBA statistics, concepts related to the regression model, such as scatterplots, correlation, hypothesis testing, assumptions, confidence intervals, and p-values were also discussed and illustrated within the research context. In our case, using real data created challenges (i.e., violation of assumptions) that students might not otherwise have encountered using a textbook.

As a teacher of introductory statistics, I was able to achieve many objectives in my classroom that day, by also incorporating how to apply and interpret other related concepts. At the same time, I also provided a fun activity for the students using data that students find interesting as well as integrating technology and the use of the internet into instruction. Hopefully, teaching activities of this sort can help students to begin to make the connection between learning in the classroom and applying the methods out in the real world.

References

Anderson, D. R., Sweeney, D. J., & Williams, T. A. (1994). Introduction to statistics: Concepts and applications (3fd ed.). West Publishing Company: Minneapolis/St. Paul.

Doerr, H. M., & English, L. D. (2003). A modeling perspective on students' mathematical reasoning about data. Journal for Research in Mathematics Education, 34(2), 110-136.

Franklin, C. (2000, October). Are our teachers prepared to provide instruction in statistics at the k-12 levels? Dialogues, 10. Retrieved October 10, 2002 from [5]

Gal, I., Ginsburg, L., & Schau, C. (1997). Monitoring attitudes and beliefs in statistics education. In I. Gal & J.B. Garfield (Eds.), The assessment challenge in statistics education (pp. 37-51). Netherlands: IOS (1) (Internetwork Operating System) An operating system from Cisco that is the primary control program used in its routers. IOS is widely used and robust system software that supports the common functions of all products under Cisco's CiscoFusion architecture.  Press.

Groth, R. E., & Powell, N.N. (2004). Using research projects to help develop high school students' statistical thinking. Mathematics Teacher, 97(2), 106-109.

Hill, H. C., & Ball, D. L. (2004). Learning mathematics for teaching: Results from california's mathematics professional development institutes. Journal for Research in Mathematics Education, 35(5), 330-351.

National Council of Teachers of Mathematics The National Council of Teachers of Mathematics (NCTM) was founded in 1920. It has grown to be the world's largest organization concerned with mathematics education, having close to 100,000 members across the USA and Canada, and internationally.  (2000). Principles and standards Ibr school mathematics. NCTM: Reston, VA.

Endnotes

[1] www.nba.com

[2] www.NFL.com

[3] www.NHL.com

[4] www.billboard.com

[5] http://www.nctm/org/dialogues/2000-10/areyour.htm

Jamie D. Mills, University of Alabama The University of Alabama (also known as Alabama, UA or colloquially as 'Bama) is a public coeducational university located in Tuscaloosa, Alabama, USA. Founded in 1831, UA is the flagship campus of the University of Alabama System.  

Jamie Mills, Ph.D., is an Assistant Professor who teaches hybrid statistical methods courses in the College of Education.
COPYRIGHT 2005 Rapid Intellect Group, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:National Basketball Association
Author:Mills, Jamie D.
Publication:Academic Exchange Quarterly
Geographic Code:1USA
Date:Sep 22, 2005
Words:2770
Previous Article:Comparing online and traditional classes.
Next Article:Peer assessment and role play: a winning alliance.
Topics:



Related Articles
Making the Majors: The Transformation of Team Sports in America.
The Economic Consequences of Professional Sports Strikes and Lockouts.
WEB SITE GOT GAME; NBA.COM, WITH ENHANCED REAL-TIME FEATURES, OFFERS LAKERS FANS A HIGH-TECH WAY TO ENJOY THE FINALS.(Business)
WNBA GROWS BY FOUR TEAMS.(SPORTS)
READY TO SOAR; WNBA PRIMED FOR MORE SUCCESS IN SECOND YEAR.(SPORTS)
Recreation All-Stars hit Philly. (Rec Room).(Brief Article)
Jr. NBA/Jr. WNBA tips off youth basketball season. (Tip-Off).
Spillovers, complementarities, and sorting in labor markets with an application to professional sports.
Slam dunk: NBA dispatches its stars to be role models for literacy.(National Basketball Association's Read to Achieve )
Follow the bouncing ball.(Sports)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles