Statistical analyses: making sense of them in the research report. (Research Corner).
In quantitative studies, data analysis may consist of exploratory data techniques, descriptive or summary statistics, and/or inferential statistics. The purpose(s) of a research study determines what is included in the data analysis and what statistical techniques are used. Data analysis usually begins with exploratory techniques. Huck and Cormier (1996) call this part of analysis "picture techniques," or techniques that help visualize the data. Through visualization of the data, the investigator (and the reader) can get a picture of how the data are distributed and whether "outliers," or extreme values widely separated from other scores or measurements, are present. Depending on the purposes of the study and publication limitations, the results of exploratory data analysis techniques may not appear in the report of every study. However, regardless of whether they are included in the written report, the investigator would have had to examine the data at some point to determine which descriptive and/or inferential statistics to include.
Some of the more common exploratory data techniques are frequency distributions, stem-and-leaf displays, histograms, bar graphs, pie graphs, and scatterplots. Information about other techniques not discussed here can be found in such sources as Burns and Grove (2001), Tukey (1977), and Verran and Ferketich (1987).
Among the most common exploratory techniques displayed in research reports, frequency distributions show how many subjects were similar on some variable; that is, they ended up in the same category or had the same score. Frequency distributions can be ungrouped (each score and how often it appears is listed), grouped (scores or measurements are grouped according to some interval), or cumulative (includes information about what percentage of subjects had any given score or measurement as well as what percent scored lower). Table 1 offers examples of two types of frequency distributions.
With grouped or cumulative frequency distributions, the reader no longer has the ability to see individual scores or measurements within any interval. Stem-and-leaf displays, on the other hand, not only present a picture of the data showing the grouping but also display the exact scores. From this display of the data, the investigator or the reader can visualize how the data are distributed and whether there are extreme scores or outliers. Table 2 offers an illustration of a stem-and-leaf display.
Other exploratory techniques frequently seen in research reports are graphs such as histograms, bar graphs, pie charts, and scatterplots. Histograms are graphs with the horizontal axis labeled to correspond to the measurement or score values and the vertical axis labeled with frequencies. Taller bars indicate higher frequencies, whereas short bars represent lower frequencies. Bar graphs are similar to histograms but are usually used to represent frequencies of categories or nominal data. Fig 1 offers an example of a histogram and Fig 2 an example of a bar graph.
[FIGURE 1-2 OMITTED]
Pie charts are circular charts that are divided like pieces of a pie representing how the overall group or whole comprises subgroups. Each piece of the pie is illustrative of the size of the subgroup relative to the overall group and can be depicted as a percentage. Fig 3 offers an example of a pie chart.
[FIGURE 3 OMITTED]
The final exploratory technique to be included is the scatterplot, which is used to illustrate the relationship between two continuous variables. One variable is labeled on the horizontal axis and the other on the vertical axis. A dot is placed on the graph for each subject (or case) that was measured: The horizontal and vertical positioning of each dot is dictated by the scores earned by that subject on the two variables. Once all the dots are plotted, a line can be drawn around all the points. The plot is viewed in terms of its tilt and thickness. The closer the plot is to a straight line and to a 45[degrees] angle, the stronger the relationship. The thicker the plot and the closer the plot is to a circle, the weaker the relationship. Fig 4 offers an example of a scatterplot.
[FIGURE 4 OMITTED]
Unfortunately, these picture techniques do not often appear in research reports because they take up a great deal of room. Thus, researchers must tell the reader what the data look like using verbal descriptions and descriptive (or summary) statistics. Summary statistics are also helpful in describing the sample so the reader has an idea of who or what was included in the study. Researchers describe the data by presenting information about the shape of the distribution and including measures of central tendency and dispersion.
In describing the shape of a data distribution, several terms are used, such as normal distribution, symmetry, skewness, modality, and kurtosis. The normal distribution is a theoretical distribution of all possible scores on some variable where most of the scores are clustered around the middle of the continuum of observed scores with a gradual and symmetrical decrease in scores in both directions from the middle. A symmetrical distribution is one in which the left side of the distribution mirrors the right side of the distribution. If the distributions are not symmetrical, they are said to be skewed. In a skewed distribution, most of the scores end up being high (or low), with a small percentage of the scores occurring at various points in the direction away from the majority. If the scores are strung in the direction of the upper end of the continuum, the distribution is said to be "positively" skewed. Conversely, if the scores are strung in the direction of the lower end of the continuum, the distribution is said to be "negatively" skewed. Distributions that are severely skewed affect the choice of appropriate descriptive or summary statistics and the choice of inferential statistical techniques.
In addition to symmetry and skewness, another term used to describe a distribution is modality. Modality refers to the most frequently occurring category, score, or measurement and is represented by the highest point on the graph of a distribution. Some distributions have more than one mode and are said to be multimodal in nature. If there are two modes, the distribution is said to be bimodal; if there are three, trimodal. Symmetrical or normal distributions have only one mode.
A final term used to describe a distribution is kurtosis. Kurtosis refers to the peakedness of a graph of the distribution. A distribution of data may have only one mode and be symmetrical, yet still not be a normal distribution. This is because there may be an unusually large number of scores at the center of the distribution, thus causing the distribution to be overly peaked or leptokurtic. Or the data may spread out more and be thicker in the tails of the distribution, in which case the distribution is termed platykurtic. A distribution that is not overly flat or peaked is said to be mesokurtic. A normal distribution is mesokurtic. The shape of the distribution affects the choice of inferential statistical techniques as well as the use of appropriate descriptive statistics.
A second way to describe or summarize data is to say something about the typical or representative score for any variable. This is done by computing one or more measures of central tendency. There are three measures of central tendency, and each provides a numerical index of the "average" score of the distribution. Determining which measure or measures of central tendency to include initially depends on the level of measurement of the variable being described.
There are four levels of measurement: nominal or categorical, ordinal or rank, interval, and ratio. Nominal data are organized into categories of a defined property and represent differences in quality, not quantity. Gender, religious affiliation, type of academic degree, and political orientation are all examples of nominal data. Ordinal data are data that can be assigned to categories that can be ranked. The intervals of the scale underlying the measurement between the ranks are not equal. The first-, second-, and third-place finishers at a swim meet would be an example of ordinal data. The order of finish for the contestants is known, but how much faster the first-place finisher was than the second- or third-place finisher is not known. The third level of measurement, interval, must meet the requirements of nominal and ordinal data but also must have equal numerical distances between intervals of the scale underlying the measurement. This level of measurement has no absolute zero, so it is not possible to provide absolute values of the variable so measured. Temperature measured on a Fahrenheit scale is one example of interval-level data. Ratio is the highest level of measurement and meets all the rules of the other levels of measurement, plus it has an absolute zero that indicates an absence of the property being measured. Blood pressure and weight are two examples of ratio-level of measurement.
The information about levels of measurement is used to determine which measures of central tendency can appropriately be used to describe a distribution. The mode or the score or measurement that occurs most frequently is the only measure that is appropriate for data at the nominal level of measurement. However, it can be used to describe data at higher levels of measurement. In research reports, authors will sometimes use "Mo" to represent the mode.
A second measure of central tendency is the median, which is the number or the case that lies at the midpoint of the distribution of scores--with 50% of the cases found above the midpoint and 50% found below. The median is appropriate for data measured at ordinal, interval, or ratio levels but is not considered as precise as the mean in estimating the population "average." Unlike the mean, however, the median is not affected by extreme scores or outliers. Thus, it is the most appropriate measure of central tendency to use if the distribution tends to be skewed or nonnormal. In reports of research, the abbreviation "Mdn" is often used to depict the median.
The final measure of central tendency is the mean, which is the most commonly used and is appropriate for data measured at the interval or ratio level. However, the mean is affected by extreme scores so that it may not represent the average even if the data are interval or ratio level. In journal articles, the mean is often represented by an italicized capital M or an [bar]X with a bar over it.
In a normal distribution, the values of the mean, median, and mode are identical. In a positively skewed distribution, the mode is a lower value than the median, which is lower than the mean. In a negatively skewed distribution, the mode is the highest value followed by the median and then the mean.
The last method used to describe or summarize data is the measure of dispersion. A measure of dispersion indicates how different the scores are from one another, how much the scores vary from one another, or the extent to which individual scores deviate from one another. If the individual scores are similar, measures of dispersion or variability are small and the sample is relatively homogeneous in terms of those scores. If individual scores vary widely or are different from one another, the sample is said to be heterogeneous with respect to those scores. Heterogeneity is important in some inferential statistical procedures.
Some frequently reported measures of dispersion are the range, difference scores, sums of squares, variance, and standard deviation. Like measures of central tendency, measures of dispersion are also based on levels of measurement. The range, the difference between the lowest and the highest score, is the simplest measure of dispersion. It is very crude, is sensitive to outliers, and is the only measure of dispersion that can be used with ordinal data.
Difference scores, a second measure of dispersion, is found by subtracting the mean from each score or measurement. Difference scores are also called deviation scores because they represent the extent to which each score deviates from the mean. Because the mean is involved, difference or deviation scores can be used only with data measured at the interval or ratio level. Difference scores will be positive when the score is above the mean and negative when the score is below the mean.
Related to difference scores is the sum of squares (SS), a third measure of dispersion. The sum of squares is obtained by squaring each deviation or difference score and adding these squares. The larger the value of the sum of squares, the greater the spread of scores or measurements around the mean. The value of the sum of squares is dependent on the scale used to measure the variable. For example, the sum of squares would be larger for blood pressure than for ring finger size. Thus, comparison of the sum of squares between studies is limited to studies using similar data.
A fourth measure of dispersion is the variance, which is related to SS and thus can only be used with interval or ratio level data. The variance is obtained by dividing SS by the number of scores or cases minus one. Like SS, the variance is dependent on the scale used to measure the variable and has no absolute value. The variance can only be compared with data obtained using similar scales. In general, the larger the variance, the larger the values of the scores or measurements.
A final measure of dispersion to be discussed is the standard deviation, which is the square root of the variance. The standard deviation provides a measure of the average deviation of a score from the mean of a variable in a particular sample. It indicates the degree of error that would be made if the mean alone were used to interpret the data. Like the mean, the standard deviation is affected by extreme scores or values and thus would not be a useful description of a skewed distribution. The standard deviation is the most commonly reported measure of dispersion and is indicated by the italicized letters SD.
For some studies, along with exploratory and descriptive analyses, it is appropriate to use inferential statistical analyses. In those studies that use inferential statistics, the goal of the researcher is to draw conclusions (statistical inferences) about a larger group (population) of which the sample is only a part and which extend beyond the specific data that are collected. Which statistics to use depends on several considerations, including the purpose(s) of the study, number and type(s) of variables, level of measurement, method of selecting the sample, the distribution of the variables of interest, and the number of groups involved.
The first of these considerations, the purpose(s) of the study and, by extension, the hypotheses or research questions, will dictate, to some extent, which statistical techniques can be used. Not only does the purpose dictate whether inferential statistics are appropriate but what techniques might be more appropriate. For example, if the purpose is to describe the responsibilities of the nurse practitioner, the data analysis will include exploratory and descriptive techniques. If the purpose is to note differences between groups on some variable, this will require inferential statistical techniques. More specific clues about which statistical procedures to use are found in the hypotheses or research questions that follow logically from the purpose. The research question, "What are the relationships among blood pressure, stress, and gender in 6- to 8-year-old children?" would require different inferential analyses than the research question, "What are the differences in blood pressure among males and females who are highly stressed?" For the most part, research questions that require inferential analyses focus on purposes that (a) examine relationships among variables, (b) determine differences between groups, (c) predict group membership, (d) test a model, (e) predict outcomes, and (f) examine changes across time.
A second consideration in determining appropriate statistical procedures is the type and number of each type of variable. Determining the type of variables includes identifying whether variables would be classified as attribute, criterion, predictor, independent, or dependent. If a study has predictor and criterion variables, that would indicate the need for different statistical procedures than if the study had independent or dependent variables. In addition, the number of each type of variable would have to be considered. Of particular importance is the determination of the number of dependent and independent variables. Multiple independent and dependent variables require different inferential statistical procedures than single independent or dependent variables.
In addition to the purpose(s) and the number and type of variables, a third consideration is the scale or level of measurement used to measure the variables. This would have to be identified for all the variables in the study; but is particularly important for any dependent, independent, criterion, or predictor variables. In the research report, the author usually speaks to the level of measurement of the variables under the instrumentation section of the report. Sometimes, the level of measurement can be gleaned from what is being measured (blood pressure, for example). But measurements from paper-and-pencel instruments might not be so readily apparent, and the author would need to specify the level of measurement. If the measurement is considered ordinal, it would require different statistical procedures than if the measurement were considered interval. The level of measurement also determines whether parametric or nonparametric inferential statistical procedures are used. Use of parametric statistical procedures requires that the variables be measured at least at the interval level of measurement. For variables measured at the nominal or ordinal level of measurement, nonparametric techniques must be used.
A fourth consideration in determining which inferential statistical procedures to use is to examine the way the sample was chosen or assigned. This is particularly important when the purpose of the research is to examine differences among groups of subjects. The question is then to determine whether the subjects were independently assigned to the groups or if the assignment was dependent. For example, if the purpose of the study was to examine differences in the menstrual experience of mothers and their biological daughters, the groups would be dependent. Once the daughter is chosen, the mother is automatically determined. If the selection of subjects is totally unrelated, the groups would be independent. Analysis would be different depending on the type of sample selection and the resulting groups of subjects.
Another consideration in determining which inferential statistical procedures are appropriate is information about the shape of the distribution. As discussed above in the descriptive analyses section, it is important to know whether underlying distributions of variables are normal or nonnormal. The answer to this question will determine whether the researcher could use parametric or nonparametric inferential statistical procedures. An underlying assumption for the use of parametric statistics is that the dependent variable, particularly, is normally distributed in the population. If this assumption is not met, it is inappropriate to use parametric statistics and the researcher must consider nonparametric statistics. Nonparametric statistics make no assumptions about the shape of the distribution so that they are often called "distribution-free" statistics. Some statisticians believe that studies with small samples do not meet the assumption of normality of the distribution and thus require the use of nonparametric statistics to analyze the data.
A final consideration in determining which inferential statistical procedures are appropriate is the number of groups involved. For some research studies, there is only one group involved. As an example, if the purpose is to determine the relationship between height and weight in children, there would be one group of children whose height and weight are measured. There would be one group of subjects and two variables measured on each subject. This would be different than comparing the heights and weights of third-and sixth-grade children. In the latter example, there are two groups of subjects and two variables measured on each subject. The number of groups determines to some extent which statistical procedures can be used. The use of t tests, for example, requires that only two groups be examined at a time. Analysis of variance techniques are used when there are more than two groups.
Once the purpose(s), number and types of variables, levels of measurement, sample selection, number of groups, and shape of the distributions are known, the appropriate statistical techniques can be determined. Statisticians and/or researchers use the above information and can consult decision trees for choosing statistical tests (Burns & Grove, 2001). For purposes that address differences and have only one dependent measure, such univariate statistics as t tests, ANOVAs, McNemar's test, chi-square, or Kruskal-Wallis one-way analysis of variance, might be appropriate depending on the level of measurement. For purposes that address relationships, such techniques as Spearman rank correlations, Pearson product moment correlations, simple linear regression, and multiple regression might be used. If there is more than one dependent measure, other statistical techniques would have to be considered. Multiple independent or predictor variables would also indicate different statistical procedures.
There is "method to the madness" of statistical analyses. Neither researchers nor statisticians choose statistical techniques on a whim. The statistical analyses sections should logically follow from the purposes, the number and types of variables, the level of measurement of the variables, the method of selecting the sample, the distribution of the variables, and the number of groups. And statistical analysis should inform--not confuse--the reader.
Table 1. Frequency Distribution (Grouped and Cumulative) of Income Valid Cumulative Frequency Percentage Percentage Percentage Valid <$10,000 4 12.9 12.9 12.9 $10,000-20,000 4 12.9 12.9 25.8 $20,001-30,000 4 12.9 12.9 38.7 $30,001-40,000 5 16.1 16.1 54.8 $40,001-50,000 4 12.9 12.9 67.7 $50,001-75,000 10 32.3 32.3 100.0 Total 31 100.0 100.0 Table 2. Stem-and-Leaf Plot of Weight of Subjects Frequency Stem & Leaf 1.00 Extremes (= [less than or equal to] 9) 1.00 0.5 5.00 0.77777 6.00 0.888889 10.00 1.0000000111 4.00 1.2233 2.00 1.55 1.00 1.6 1.00 Extremes ([greater than or equal to] = 188) Stem width: 100.00 Each leaf: 1 case(s)
Burns, N., & Grove, S. (2001). The practice of nursing research. Philadelphia: W.B. Saunders.
Huck, S., & Cormier, W. (1996). Reading statistics and research. New York: Harper-Collins College Publishers.
Tukey, J. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Verran, J., & Ferketich, S. (1987). Testing linear model assumptions: Residual analysis. Nursing Research, 43, 369-372.
Questions or comments about this article may be directed to: Marti H. Rice, PhD RN, University of Alabama at Birmingham, School of Nursing, NB 434, 1530 3rd Avenue South, Birmingham, AL 35294-1210.
|Printer friendly Cite/link Email Feedback|
|Author:||Rice, Marti H.|
|Publication:||Journal of Neuroscience Nursing|
|Date:||Apr 1, 2002|
|Previous Article:||Assessing cognitive function after stroke using the FIM[TM] instrument *.|
|Next Article:||A new pair of shoes. (Editorial).|