# Interactive tools for visualizing abstract statistical concepts.

INTRODUCTIONMany concepts in statistics are very abstract and hard to present in an easy and understandable manner. It is even more challenging to teach those concepts to business students, because not only do they have to learn the theory, but they also must be able to apply the statistical techniques in their field of study or work (Tsai & Wardell, 2006). If one continues to teach business statistics in the traditional manner then the students will remain passive participants of the entire education process, with their activities restricted to listening, taking notes, and solving problems and examples from a textbook. Alternatively, one can follow a teaching method that encourages more active student involvement. There are several ways to encourage students to leave their comfort zones and become more proactive. Back in mid 1990s, Cobb (1994) noted that instructors had started using computer simulations more frequently to demonstrate important statistical concepts and to allow students to discover those concepts themselves. There are two commonly used environments for developing statistical simulations Interactive technologies such as Java and Flash (Lane & Peres, 2006), and Microsoft Excel[TM] (Viali, 2002; Tsai & Wardell, 2006). The latter is becoming increasingly popular because of its familiarity, availability, great number of statistical and financial functions, and the ability to extend its functionality by using VBA -Visual Basic for Application (Albright, 2011).

The research question addressed in this study is the following: is it possible to effectively visualize abstract statistical concepts with Excel-based user friendly interactive tools? The study was motivated by the fact that most of the tools developed for visualizing various statistical concepts, are based on Java or Flash. Although there is nothing wrong with using Java applets or Flash. In fact, they allow to create very attractive visual effects and animations but occasionally users are encountering some problems in viewing them on the Internet such as the link is not working, the browser is not supporting the Flash, etc. The tools presented in this study are created in Excel, which is a very familiar environment for many especially business students. The entire functionality of the tools is programmed in VBA and is hidden from the user. The tools can be successfully used by anyone with little or no experience in Excel.

The rest of this study is organized as follows. Section 2 provides a general overview of relevant research in the interactive teaching tools. Section 3 presents the tool for visualizing basic types of normal probabilities. Section 4 contains detail description of the functionality of the tool for visualizing the Central Limit Theorem (CLT). Section 5 summarizes the basic features of the tools and compares them with similar tools presented in previous studies. The paper concludes with the list of research articles used as references.

LITERATURE REVIEW

Interactive systems were used in statistics since the early 1970s. Makridakis, Hodgsdon, and Wheelwright (1974) developed the first interactive forecasting system using time-sharing computer configurations. The system allowed the user to conduct preliminary data analysis, to identify the forecasting technique(s) most appropriate for the task at hand, and to build the forecasts. The advances in information technologies provide better platforms for developing such interactive tools. Lane and Peres (2006) analyzed the promise and pitfalls of using interactive simulations in teaching statistics. They provide a concise review of the various simulations developed in the Rice Virtual Lab in Statistics (RVLS) (http://onlinestatbook.com/rvls.html), such as the sampling distribution simulation, simulation on the effect of range restrictions, etc. In 2002, Mills performed a literature search on simulation methods to teach statistics in business, economics, mathematics, and other disciplines. Chance and Rossman (2006) provided examples of using Minitab macros and specifically designed applets to enhance coverage of various topics, such as probabilities, random sampling, and the sampling distribution of the regression lines. Garfield and Ben-Zvi (2007) provide an excellent overview of research on teaching and learning statistics. It is a summary of studies that have been conducted by researchers from various disciplines.

In recent years, Microsoft Excel became a popular environment for developing interactive tools for teaching various subject, especially statistics (Jones, Hagtvedt, & Jone, 2004; Tsai & Wardell, 2006; Hayrapetyan, 2010, 2013; Balakrishnan & Oh, 2005; Warner & Meehan, 2001). There are several reasons for this new phenomenon: Excel is widely available and provides a very familiar environment for many students and instructors; it contains a rich library of build-in statistical functions; it integrates a programming language called VBA (Visual Basic for Application) that allows the developer to dramatically enhance the basic functionalities of Excel (Mansfield, 2010). Hayrapetyan (2013) has developed interactive Excel-based tool, which can be used in four different ways: as a teaching tool for instructors, as a learning instrument for students, as a self-assessment tool for students, and as a real time series forecasting tool for both instructors and students. Balakrishnan and Oh (2005) have developed an interactive VBA tool for teaching Statistical Process Control (SPC), and process management issues. Students can experiment with the tool to interactively examine the various issues that affect SPC and gain insight into the important issues in managing a process. The graphical nature of the interface allows students to visually see the effect of changes in process parameters. Tsai and Wardell (2006) have developed a VBA- driven Excel spreadsheet that is built around one simple business scenario and aimed to improve the effectiveness of teaching three concepts in business statistics: the Central Limit Theorem, interval estimation and hypothesis testing. The scenario involves setting the filling speed in a cereal filling plant. Through interactively finding the optimal filling speed, students are exposed to these key statistics concepts as well as to the random sampling techniques.

Obviously, not every single statistical concept can be effectively visualized and demonstrated in Excel. Nash (2008) specified which activities in teaching statistics may be suitable candidates for the application of Excel, and the Central Limit Theorem is one of the best candidates.

This study introduces an Excel-based tools dedicated to the visualization of the concepts of normal probabilities as areas under the curve and the essence of the Central Limit Theorem. The student can specify any interval for the normally distributed random variable and the system will highlight the respective area under the curve and calculate the probability. The visualization of the Central Limit Theorem is more sophisticated, where the user may select a distribution from the provided list (normal, uniform, skewed or random), any sample size and see how the sampling distribution of the sample means gradually becomes approximately normal.

The entire functionality of the tools is programmed in VBA and is hidden from the user. The tools can be successfully used by anyone with little or no experience in Excel.

NORMAL PROBABILITIES

The probability that a normally distributed random variable X has a value between a and b is computed by integrating its probability density function over the interval (a, b):

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [mu] is the mean, [sigma] is the standard deviation of the distribution, and [pi]the well--known constant ([pi] = 3.14159...). The geometric interpretation of the P(a [less than or equal to] X [less than or equal to] b) is the area under the normal curve between a and b and above the x-axis:

It is obvious that any probability calculation related to a normally distributed random variable, can be accomplished by using one of the following three types of probabilities or some combination of those three:

* P(a [less than or equal to] X [less than or equal to] b)--probability that the value of a random variable is between two given values

* P(X [greater than or equal to] a)--probability that the value of a random variable is above a given value

* P(X [less than or equal to] b)--probability that the value of a random variable is below a given value.

The first tool presented in this study allows the user to find and visualize these three types of probabilities for any normal distribution. The tool itself is an Excel file named "Normal Probabilities'". When the file is opened, the interactive screen appears as shown in Figure 2, where the user specifies the mean ([mu]) and standard deviation ([sigma]) of the normal distribution, selects the probability type and enters the value(s) for a and/or b. As soon as all these inputs are made, the corresponding probability is calculated and the respective area under the curve is visualized. The user may change any input data and the probability calculations and the graph will be automatically updated. Figure 3 depicts three examples of probability calculations.

SAMPLING DISTRIBUTION OF THE MEAN AND THE CENTRAL LIMIT THEOREM

It is well known that if the population is normally distributed with mean p and standard deviation [sigma], then the sampling distribution of the mean is also normally distributed with the mean equal to [mu] (i.e., the same mean) and standard deviation equals to [sigma]/[square root of n], where n is the sample size. This statement is true for any sample size. What if the population is not normally distributed (e.g., skewed, uniform, random)? In that case we use the powerful Central Limit Theorem which states, that if the sample size is large enough, then sampling distribution of the mean is approximately normally distributed with mean p and standard deviation [sigma]/[square root of n], where [mu] and [sigma] are population parameters, and n is the sample size.

The proof of this claim is beyond the scope of business statistics course. On the other hand, the concept is very abstract and not obvious at all. It is hard to easily convince students that even if the population's distribution is heavily skewed, the sampling distribution of the mean (with large enough sample size) will still be approximately normal. The best way to convince them is to give them a tool, which will allow to start with any distribution, generate samples and visually see how the distribution of the sample means gradually becomes closer and closer to a normal distribution. This is exactly what the "Central Limit Theorem" tool is designed to do.

The tool is a single Excel file with hidden programming modules. When the file is opened, the interactive screen appears with some predefined parameters as indicated in Figure 4. The screen consists of two display windows and the control (right) panel.

* The top window displays the distribution of the entire population (default type is normal distribution), a randomly selected sample (default sample size is 5) and the sample mean. It also specifies the actual mean and standard deviation of the selected population.

* The bottom window displays the current probability distribution of the sample means as a histogram, the location of the next sample mean (should the same as the location of the sample mean in the top window), the expected normal distribution according to the CLT (the blue bell), as well as the actual and expected means and standard deviations.

* The control (right) panel contains a list box to select a probability distribution for the population, a spin button for a sample size selection, a list box for selecting multiple samples at once, and two command buttons for animation (a single sample or multiple samples).

In order to use the tool, one should open the Excel file, select the desired probability distribution for the population from the list box which has the following possible options: Normal, Uniform, Skewed, and Random, and select a sample size. Then there are several options to proceed.

* Generate one sample at a time by clicking on the "Draw a Sample (Animated)" button. A random sample will be generated and displayed in the top window; its mean will be calculated and displayed in both windows; the new mean will be added to the current pool of sample means and the histogram in the bottom window will be updated accordingly.

* Generate several samples by clicking on the "Multiple Drawings (Animated)" and observe their effects on the sampling distribution of the mean in the bottom window.

* Select the desired number of samples (50, 100, 500, 1000, 5000 or 10000) from the provided list box. In this case all calculations will be done behind the scene and the resulting distribution of the sample means will be displayed in the bottom window.

Figure 5 depicts two screenshots of the sampling distribution of the mean for the normal distribution for 30 and 5000 samples of size 5. As one can see, when 5000 samples are used, which is only 0.005% of all possible samples of size 5, the sampling distribution of the mean is very close to the expected normal distribution ([[mu].sub.expected = 19.99, [[mu].sub.actual] = 19.98, [[mu].sub.expected] = 2.67, [[sigma].sub.actual] = 2.69).

Figures 6, 7, and 8 present similar to Figure 5 screenshots for uniform, skewed, and random distributions respectively. In all three figures, the sample size is 5, number of samples for the left column is 50 and for the right column is 5000. And again, when 5000 samples are used, the sampling distribution of the mean is very close to the

expected normal distribution in all three cases.

SUMMARY

The interactive tools presented in the study, are developed to provide educators an apparatus which can be effectively used to demonstrate abstract statistical concepts. The tools are very user-friendly interactive Excel files, which can be easily used not only by instructors but also by students. The user can specify any interval for the normally distributed random variable and the tool will highlight the respective area under the curve and calculate the probability. The visualization of the Central Limit Theorem allows the user to select a distribution (normal, uniform, skewed or random), any sample size and see how the sampling distribution of the sample means gradually becomes approximately normal.

These tools are extensively used by the author in the undergraduate statistics course and in the decision science course for MBA students. While no formal assessment of their efficiency has yet been conducted, there are several testimonials confirming that students really like these tools (sometime they call them "games") and consider them as effective and fun instruments for understanding the essence of normal probabilities and the CLT.

LIMITATION OF THIS STUDY

This study has some limitations. One of them is related to the population size used in the visualization of the CLT. The population size is fixed and equals to 40 for all probability distributions used. Although the selected size is large enough to effectively demonstrate the intended concept, but allowing the user to select population size could add more flexibility to the tool. The author is in the process of extending the tool by adding a new spin button for population size selection.

CONCLUSION

Teaching abstract statistical concepts is very challenging and carefully crafted educational tools can significantly increase the efficiency of the teaching and decrease students anxiety related statistics. Besides, many students are visual learners and integration of visual features into the teaching curriculum can also reinforce the understanding of many abstract concepts. Furthermore, utilizing such a will-known and widely available environment like Microsoft Excel, makes these type of educational tools more accessible and appealing. Visual Basic for Applications, which was used in developing the tools presented in this study, is powerful enough for creation of more interactive tools for other abstract concepts. The authors are currently developing new teaching and learning tools for confidence interval estimation and hypothesis testing.

REFERENCES

Albright, C.S. (2011). VBA for Modelers: Developing Decision Support Systems Using Microsoft[R] Excel, Duxbury.

Balakrishnan, J., & Oh, S.L. (2005). An Interactive VBA Tool for Teaching Statistical Process Control (SPC) and Process Management Issues. INFORMS Transactions on Education, 5(3),19-32.

Chance, B., & Rossman, A.(2006). Using Simulation to Teach and Learn Statistics: A Review of Literature. Proceedings of the International Conference on Teaching Statistics (ICOST7), Salvador, Bahia, Brazil.

Cobb, P. (1994). Where is the Mind? Constructivist and Sociocultural Perspectives on Mathematical Development. Educational Research, 237, 13-20.

Garfield, J., & Ben-Zvi, D. (2007). How Students Learn Statistics Revisited: A Current Review of Research on Teaching and Learning Statistics. International Statistical Review, 75(3), 372-396.

Hayrapetyan, L.R. (2010). A Collaborative Assessment and Learning Tool. International Journal of Education Research, 5(1), 83-91.

Hayrapetyan, L.R. (2013). Excel-Based Interactive Teaching and Assessment Tool for Forecasting Methods. World Journal of Social Sciences,3, 3, 89-97.

Jones, G.T., Hagtvedt, R., & Jone, K. (2004). A VBA-based Simulation for Teaching Simple Linear Regression. Teaching Statistics, 26 (2), 36-41.

Lane, D.M., & Peres, S.C. (2006). Interactive Simulations in the Teaching of Statistics: Promise and Pitfalls. Proceedings of the International Conference on Teaching Statistics(ICOST7), Salvador, Bahia, Brazil.

Makridakis, S., Hodgsdon, A., & Wheelwright, S.C. (1974). An Interactive Forecasting System. The American Statistician, 28(4), 153-158.

Mansfield, R. (2010). Mastering VBA for Microsoft Office 2010, Wiley.

Mills, J.D. (2002). Using Computer Simulation Methods to Teach Statistics: A Review of Literature. The Journal of Statistical Education,10(14), 1-26.

Nash, C.J. (2008). Teaching statistics with Excel 2007 and other spreadsheets. Computational Statistics & Data Analysis, Vol. 52, No. 10, pp. 4602-4606.

Tsai, W., & Wardell, D.G. (2006). An Interactive Excel VBA Example for Teaching Statistics Concepts. INFORMS Transactions in Education, 7(1), 125-135.

Viali, L. (2002). Using Spreadsheets and Simulation to Enhance the Teaching of Probability and Statistics to Engineering Students. Proceedings of the International Conference on Engineering Education, Manchester, UK.

Warner, B.C., & Meehan, A.M. (2001). Microsoft Excel As a Tool for Teaching Basic Statistics. Teaching of Psychology, 28 (4),295-298.

Levon R. Hayrapetyan

Mohan Kuruvilla

Houston Baptist University

Levon R. Hayrapetyan is a professor in business at Houston Baptist University. He earned his BS and MS Degrees in applied mathematics from Yerevan State University, Armenia, and his PhD in theoretical computer science from Kiev State University, Ukraine. His research interests include applied mathematics, computer science, computational neuroscience, decision support systems, and pedagogy. He is an author of more than 46 papers, books, and book chapters.

Mohan Kuruvilla is the Dean of the School of Business at Houston Baptist University. He earned his PhD in accounting and MBA from University of Houston. He is a Certified Public Accountant, Certified Management Accountant and a Chartered Accountant. Earlier, he was a Director of KPMG and was involved in audits of several U.S. listed clients and public offerings. He serves as a Director of the Texas Society of Certified Public Accountants ("TSCPA").

Printer friendly Cite/link Email Feedback | |

Author: | Hayrapetyan, Levon R.; Kuruvilla, Mohan |
---|---|

Publication: | International Journal of Education Research (IJER) |

Article Type: | Report |

Date: | Mar 22, 2015 |

Words: | 3073 |

Previous Article: | An exploratory study of the effect of technology in quantitative business courses. |

Next Article: | Learning style theory as a potential tool in guiding student choice of college major. |

Topics: |