Confidence level and confidence intervals: a visual approach.
Teaching statistics is challenging because many concepts of it are very abstract and hard to present in an easy and understandable manner. It is even more challenging to teach those concepts to business students, because not only do they have to learn the theory, but they also must be able to apply the statistical techniques in their field of study or work (Tsai & Wardell, 2006). If educators continue teaching statistics with traditional methods only, then students' engagement will be really limited and the learning outcome will not be great. Alternatively, one can implement teaching methods that encourage more active student involvement and the pedagogical research provides us with a rich arsenal of active learning methods (Settles, 2009).
There are several ways of encouraging students to leave their comfort zones and become more proactive. Back in mid 1990s, Cobb (1994) noted that instructors had started using computer simulations more frequently to demonstrate fundamental statistical concepts and, more importantly, to allow students to discover those concepts themselves. There are two commonly used environments for developing active learning tools and statistical simulations: Interactive technologies such as Java and Flash (Lane & Peres, 2006), and Microsoft Excel (Viali, 2002; Tsai & Wardell, 2006). The latter is becoming increasingly popular because of its familiarity, availability, great number of statistical and financial functions. Using visual basic for Application (VBA) extends its functionality (Albright, 2011).
This study is a continuation of the research done by Hayrapetyan and Kuruvilla (2015). The research question addressed in this study is how to effectively visualize abstract statistical concepts with Excel-based user friendly interactive tools. This study was motivated by the fact that most of the tools developed for visualization of various statistical concepts are based on Java or Flash, although there is nothing wrong with using Java applets or Flash. In fact, the two tools allow creating very attractive visual effects and animations but occasionally users are encountering some problems in viewing them on the Internet such as the link is not working, the browser is not supporting the Flash, the new version of Java should be installed, etc. The tools presented in this study are created in Excel, which is a very familiar environment for business students. The entire functionality of the tools is programmed in VBA and is hidden from the user. The tools can be successfully used by anyone with little or no experience in Excel.
This study introduces an Excel-based tools dedicated to the visualization of the concepts of confidence intervals and the essence of the confidence level used in the construction of confidence intervals for population means.
Interactive systems used in statistics are inseparable pars of the active learning methods widely used in academia. Settles (2009) provided a general introduction to active learning and a comprehensive survey of the literature. He also provided analysis of the empirical and theoretical evidence for successful active learning.
Interactive systems were used in statistics since the early 1970s. Makridakis, Hodgsdon, and Wheelwright (1974) developed the first interactive forecasting system using time-sharing computer configurations. The system allowed the user to conduct preliminary data analysis, to identify the forecasting technique(s) most appropriate for the task at hand, and to build the forecasts. The advances in information technologies provide better platforms for developing such interactive tools. Lane and Peres (2006) analyzed the promise and pitfalls of using interactive simulations in teaching statistics. They provide a concise review of the various simulations developed in the Rice Virtual Lab in Statistics (RVLS) (http://onlinestatbook.com/rvls.html), such as the sampling distribution simulation, simulation on the effect of range restrictions, etc.
In 2002, Mills performed a literature search on simulation methods to teach statistics in business, economics, mathematics, and other disciplines. Chance and Rossman (2006) provided examples of using Minitab macros and specifically designed applets to enhance coverage of various topics, such as probabilities, random sampling, and the sampling distribution of the regression lines. Garfield and Ben-Zvi (2007) provided an excellent overview of research on teaching and learning statistics. It is a summary of studies that have been conducted by researchers from various disciplines. Boylea, et al. (2014) carried out a narrative literature review as a part of special project games to support students in learning about research methods and statistics.
Recently, Microsoft Excel became a popular environment for developing interactive tools for teaching various subjects, especially in statistics (Jones, Hagtvedt, & Jone, 2004; Tsai & Wardell, 2006; Hayrapetyan, 2010, 2013, 2015; Balakrishnan & Oh, 2005; Warner & Meehan, 2001). There are three reasons for this new phenomenon: (1) Excel is widely available and provides a very familiar environment for many students and instructors; (2) it contains a rich library of build-in statistical functions; (3) it integrates a programming language called VBA that allows the developer to dramatically enhance the basic functionalities of Excel (Mansfield, 2010). Hayrapetyan (2013) has developed interactive Excel-based tool, which can be used in four different ways: (1) teaching tool for instructors, (2) learning instrument for students, (3) self-assessment tool for students, and (4) real time series forecasting tool for both instructors and students.
An interactive apparatus developed by Hayrapetyan and Kuruvilla (2015) gives instructors and students a tool to visualize and "feel" the Central Limit Theorem. The tool allows the user to select a distribution (e.g., normal, uniform, skewed, or random), the sample size and see how the sampling distribution of the mean gradually becomes approximately normal. Balakrishnan and Oh (2005) have developed an interactive VBA tool for teaching Statistical Process Control (SPC), and process management issues. Students can experiment with the tool to interactively examine the various issues that affect SPC and gain insight into the important issues in managing a process. The graphical nature of the interface allows students to visually see the effect of changes in process parameters. Tsai and Wardell (2006) have developed a VBA- driven Excel spreadsheet that is built around one simple business scenario and aimed to improve the effectiveness of teaching three concepts in business statistics: the Central Limit Theorem, interval estimation and hypothesis testing.
The scenario involves setting the filling speed in a cereal filling plant. Through interactively finding the optimal filling speed, students are exposed to these key statistics concepts as well as to the random sampling techniques. Obviously, not every single statistical concept can be effectively visualized and demonstrated in Excel. Nash (2008) specified which activities in teaching statistics may be suitable candidates for the application of Excel and the Confidence Interval is one of the best candidates.
The student can specify any sample size and any confidence level and the system will construct and visualize the confidence interval. By varying the sample size and/or confidence level, they can see how the margin of error is changing. The visualization of essence of the confidence level is more sophisticated, where the user may select a distribution from the provided list (normal, uniform, skewed or random), any sample size and see that the confidence level is indeed the percentage of confidence intervals which contain the population mean. The entire functionality of the tools is programmed in VBA and is hidden from the user. The tools can be successfully used by anyone with little or no experience in Excel.
Confidence interval is probably one of the most commonly used statistical terms. A confidence interval is an interval estimate of a population parameter. In this study the confidence interval for the mean will be used to visualize various aspects of confidence interval estimation and interpretation. In the real life, the actual population mean is hardly known. Therefore, the confidence interval of the mean is often the only reliable information which one can get about the unknown mean of the population.
To visualize the construction of confidence intervals, a population of 2000 random numbers is generated, its standard deviation is calculated and is used in all confidence interval estimations. Thus, the following formula is used to construct a confidence interval for the mean,
[bar.X] - [Z.sub.[alpha]/2] [sigma]/[square root of (n)] [less than or equal to] [mu] [less than or equal to] [bar.X] + [Z.sub.[alpha]/2] [sigma]/[square root of (n)]
where X is the sample mean, [Z.sub.[alpha]/2] is the value corresponding to an upper-tail probability of [alpha]/2 from the standardized normal distribution, a is the population standard deviation, and n is the sample size.
In order to use this tool, one should open the corresponding Excel file. The default worksheet will be displayed as presented in Figure 1.
Using the provided spin button, the user may gradually increase or decrease the sample size and convince himself or herself that if the confidence level stays unchanged then increasing the sample size decreases the width of the confidence interval, i.e., the margin of error decreases. This visually confirms the intuition that the larger samples produce smaller errors as indicated in Figure 2.
Another factor which affects the width of the confidence interval, is the confidence level. The user may change the value of the confidence level with the spin button and see that when the confidence level increases, the confidence interval becomes wider, and vice versa as presented in Figure 3. This is less intuitive than the relationship between the sample size and width of the interval. Even strong students sometime are confusing accuracy with confidence. Using this tool, they can clearly see that if the sample size stays the same and the confidence level increases, the accuracy decreases.
A confidence level for the population mean refers to the percentage of all possible samples of the same size that can be expected to include the true population mean. It is simply the proportion of samples of a given size that may be expected to contain the true mean. For example, 95% confidence level means that if many samples of the same size are collected and the confidences intervals are computed then, in the long run, about 95% of these confidence intervals would contain the true mean.
It should be emphasized that a 95% confidence interval refers to the percentage of a large number of confidence intervals, computed from random samples of a given size, that can be expected to include the true population mean. In other words, the probability of selecting a confidence interval, from such a large number of intervals, that include the population mean is 0.95. However, any one confidence interval computed from a selected random sample either contains the true mean or it does not.
The purpose of the next tool described below is to visualize the true essence of the confidence level. The tool is an Excel file with macros. When the file is opened, the interactive screen is displayed with various components and functionalities as indicated in Figure 4.
To start, the user can select a desired distribution for the population from the provided list of four distributions: normal, uniform, skewed, and pseudo random (a superposition of a random normal distribution and uniformly distributed random numbers).
The user also can customize the sample size, the confidence level, and number of samples to be collected. When all selections/customizations are done, and "Start" button is pressed, a simple form appears with two buttons: "Next Interval" and "All Intervals". When a "Next Interval" is clicked, a new sample of the specified size is drawn; the corresponding confidence interval is constructed and displayed on the chart. At any given time, the chart displays the true population mean (the red vertical bar) and all confidence intervals constructed by that time (horizontal bars). Intervals, which contain the true population mean, are displayed in green and those which do not contain the true population mean are in red. There is also a dynamically changing pie chart, which displays the proportion of those confidence intervals which contain the true population mean.
When the user presses the "All Intervals" button, the software generates all remaining samples, creates corresponding confidence intervals, and updates the pie chart. Hence, the user will see that the confidence level is, indeed, the approximate percentage of confidence intervals which contain the population mean as presented in Figure 6.
By selecting other distributions from the provided list, the user will convince himself or herself that, for any distribution, the confidence level presents the percentage of confidence intervals which contain the population mean (of course, if the distribution is not normal, then the sample size should be large enough so that the Central Limit Theorem will take place). For example, Figure 7 presents the result of 150 confidence intervals with 97% confidence level from the randomly distributed population. Notice that 96% of those intervals (i.e., approximately 97%) contain the population true mean.
The interactive tools presented in the study, are developed to provide educators an apparatus which can be effectively used to demonstrate abstract statistical concepts. The tools are very user-friendly interactive Excel files, which can be easily used not only by instructors but also by students. Using the first tool, the user can specify any sample size and any confidence level and the system will construct and visualize the confidence interval. By varying the sample size and/or confidence level, they can see how the margin of error is changing.
The second tool presented in this study allows visualization of essence of the confidence level as the percentage of confidence intervals which contain the true population mean. It helps users to overcome a common misperception that the confidence level is a probability that the population mean lies in the confidence interval. The entire functionality of this tool is programmed in VBA and is hidden from the user. The tools can be successfully used by anyone with little or no experience in Excel.
LIMITATIONS OF THE STUDY
This study has some limitations. One of them is the fact that the first tool uses a single stationary population of 2000 random numbers. Although the population size is large enough to accurately demonstrate the essence of the confidence intervals, but allowing the user to dynamically generate populations of various distributions and sizes will make the tool more rigorous. The author is in the process of extending the tool by adding this new functionality.
Teaching abstract statistical concepts is very challenging and carefully crafted educational tools can significantly increase the efficiency of the teaching and decrease students anxiety related statistics. Besides, many students are visual learners and integration of visual features into the teaching curriculum can also reinforce the understanding of many abstract concepts. Furthermore, utilizing such a will-known and widely available environment like Microsoft Excel, makes these type of educational tools more accessible and appealing. Visual Basic for Applications, which was used in developing the tools presented in this study, is powerful enough for creation of more interactive tools for other abstract concepts. The author is currently developing a new teaching and learning tools for hypothesis testing.
Albright, C.S. (2011). VBA for Modelers: Developing Decision Support Systems Using Microsoft[R] Excel,
Balakrishnan, J., & Oh, S.L. (2005). An Interactive VBA Tool for Teaching Statistical Process Control (SPC) and Process Management Issues. INFORMS Transactions on Education, 5(3), 19-32.
Boylea, E. A., MacArthurb, E.W., Connollyb, T.M, Haineyb, H., Maneac, M., Anne Karkid, A, & Van Rosmalene, P. (2014). A narrative literature review of games, animations and simulations to teach research methods and statistics. Computers & Education, 74(5), 1-14.
Chance, B., & Rossman, A. (2006). Using Simulation to Teach and Learn Statistics: A Review of Literature. Proceedings of the International Conference on Teaching Statistics (ICOST7), Salvador, Bahia, Brazil.
Cobb, P. (1994). Where is the Mind? Constructivist and Sociocultural Perspectives on Mathematical Development. Educational Research, 237, 13-20.
Garfield, J., & Ben-Zvi, D. (2007). How Students Learn Statistics Revisited: A Current Review of Research on Teaching and Learning Statistics. International Statistical Review, 75(3), 372-396.
Hayrapetyan, L.R., & Kuruvilla, M. (2015). Interactive Tools for Visualizing Abstract Statistical Concepts. International Journal of Education Research, 10(1), 13-23.
Hayrapetyan, L.R. (2010). A Collaborative Assessment and Learning Tool. International Journal of Education Research, 5(1), 83-91.
Hayrapetyan, L.R. (2013). Excel-Based Interactive Teaching and Assessment Tool for Forecasting Methods. World of Social Sciences, 3(3), 89-97.
Jones, G.T., Hagtvedt, R., & Jone, K. (2004). A VBA-based Simulation for Teaching Simple Linear Regression. Teaching Statistics, 26 (2), 36-41.
Lane, D.M., & Peres, S.C. (2006). Interactive Simulations in the Teaching of Statistics: Promise and Pitfalls. Proceedings of the International Conference on Teaching Statistics (ICOST7), Salvador, Bahia, Brazil.
Makridakis, S., Hodgsdon, A., & Wheelwright, S.C. (1974). An Interactive Forecasting System. The American Statistician, 28(4), 153-158.
Mansfield, R. (2010). Mastering VBA for Microsoft Office 2010, Wiley.
Mills, J. D. (2002). Using Computer Simulation Methods to Teach Statistics: A Review of hiteYatme. The Journal of Statistical Education, 10(14), 1-26.
Nash, C. J. (2008). Teaching statistics with Excel 2007 and other spreadsheets. Computational Statistics & Data Analysis, 52(10), 4602-4606.
Settles, B. (2009). Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison.
Tsai, W., & Wardell, D.G. (2006). An Interactive Excel VBA Example for Teaching Statistics Concepts. INFORMS Transactions in Education, 7(1), 125-135.
Viali, L. (2002). Using Spreadsheets and Simulation to Enhance the Teaching of Probability and Statistics to Engineering Students. Proceedings of the International Conference on Engineering Education, Manchester, UK.
Warner, B.C., & Meehan, A.M. (2001). Microsoft Excel as a Tool for Teaching Basic Statistics. Teaching of Psychology, 28 (4), 295-298.
Levon R. Hayrapetyan
Houston Baptist University
Levon R. Hayrapetyan is a professor of business administration in the Archie W. Dunham College of Business at Houston Baptist University. He earned his BS and MS Degrees in applied mathematics from Yerevan State University, Armenia, and his PhD in applied mathematics from Kiev State University, Ukraine. The research interests of Dr. Hayrapetyan include applied mathematics, computer science, computational neuroscience, decision support systems, business analytics, and pedagogy. He is an author of more than 45 papers, books, and book chapters.
|Printer friendly Cite/link Email Feedback|
|Author:||Hayrapetyan, Levon R.|
|Publication:||International Journal of Education Research (IJER)|
|Date:||Sep 22, 2015|
|Previous Article:||Place attachment among college students related to community engagement through service-learning.|
|Next Article:||Perceptions of university business students in relation to foreign-accented faculty.|