Printer Friendly
The Free Library
14,787,278 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

DATA ANALYSIS: THE ART AND SCIENCE OF CODING AND ENTERING DATA.


Data can be collected from many sources: from individuals, schools, worksites, medical care institutions, and government agencies as well as from records of all types, such as patient, work or school records. Regardless of what data are collected, from whom and under what circumstances, they usually need to be coded before being processed, analyzed an·a·lyze  
tr.v. an·a·lyzed, an·a·lyz·ing, an·a·lyz·es
1. To examine methodically by separating into parts and studying their interrelations.

2. Chemistry To make a chemical analysis of.

3.
 and reported. This article focuses on the coding and entering of data for subsequent processing, analysis and reporting.

WHY CODE DATA?

Why code data? Simply, processing requires a numeric numeric

see numerical.


numeric cluster
see ten-key pad.
 response for each item of interest. For example, if you ask people if they belong to an HMO HMO health maintenance organization.

HMO
n.
A corporation that is financed by insurance premiums and has member physicians and professional staff who provide curative and preventive medicine within certain financial,
, and the possible answers are "yes" or "no," you couldn't just put an "x" next to the response and enter an "x" into the computer. Rather, you need a number so the data can be tabulated. In this case you could code "yes" = 1 and "no" = 2. In this way when you process your data, you could count the number of 1s and 2s.

However, not all data need to be coded. For example, if age is a variable of interest, then the age itself can be used. No coding is necessary. Similarly, if weight or number of days out of school or work is desired, then the actual number of pounds or days would be appropriate. Again, no coding is necessary. However, not all data use the actual values; coding is then necessary. Using the above information, let's say you had a question about age and had categorized cat·e·go·rize  
tr.v. cat·e·go·rized, cat·e·go·riz·ing, cat·e·go·riz·es
To put into a category or categories; classify.



cat
 the possible responses as "less than 18 years old," "18 to 35," "36 to 50," "51 to 65," and "over 65 years of age." To code these data you could use the following codes: less than 18 = 1, 18 to 35 = 2, 36 to 50 = 3, 51 to 65 =4 and over 65 = 5. If you are interested in what state a person or institution is located, you might use a code such as Alabama = 1, Alaska = 2, Arizona = 3 and so forth, with the last entry being Wyoming = 50 or 51 if the District of Columbia District of Columbia, federal district (2000 pop. 572,059, a 5.7% decrease in population since the 1990 census), 69 sq mi (179 sq km), on the east bank of the Potomac River, coextensive with the city of Washington, D.C. (the capital of the United States).  is included. Note in this case the code uses two digits, as there are more than nine categories. In most instances a one-digit code is sufficient. For example, in reviewing records a person wanting to know the highest level of education for each person's record might code: less than high school diploma A high school diploma is a diploma awarded for the completion of high school. In the United States and Canada, it is considered the minimum education required for government jobs and higher education. An equivalent is the GED.  = 1, high school graduate = 2, some college = 3, college graduate = 4, and graduate degree = 5. So for each numeric variable you wish to analyze, you should either use the actual numbers or a coded value. You should do this before you collect your data. Having an idea of how to code your data before collecting it will help in subsequent processing and analysis.

In coding data, sometimes the respondent In Equity practice, the party who answers a bill or other proceeding in equity. The party against whom an appeal or motion, an application for a court order, is instituted and who is required to answer in order to protect his or her interests.  doesn't know, refuses to respond, or refuses to answer an item. For example, if data were being collected about a sensitive topic such as whether a person had ever been ticketed for drinking while intoxicated in·tox·i·cate  
v. in·tox·i·cat·ed, in·tox·i·cat·ing, in·tox·i·cates

v.tr.
1. To stupefy or excite by the action of a chemical substance such as alcohol.

2.
, a respondent may not wish to answer, whereas asking an employer the same question about one of their employees may result in a legitimate "don't know Don't know (DK, DKed)

"Don't know the trade." A Street expression used whenever one party lacks knowledge of a trade or receives conflicting instructions from the other party.
" response. Similarly, a respondent may simply inadvertently skip a question. By one convention, "don't knows" are coded as "8" and refused or missing data as "9." Coding the don't know/refused/missing data allows them to be analyzed later.

Sometimes you collect data using open-ended questions A closed-ended question is a form of question, which normally can be answered with a simple "yes/no" dichotomous question, a specific simple piece of information, or a selection from multiple choices (multiple-choice question), if one excludes such non-answer responses as dodging a  such as "What did you like most about the program?" Respondents In the context of marketing research, a representative sample drawn from a larger population of people from whom information is collected and used to develop or confirm marketing strategy.  then write in their answers. In this case, you would review a number of responses and then develop a code such as the instructor = 1, the content = 2, the methods used = 3, the time of day = 4, the day of the week = 5, etc. To the extent possible, variables of interest should either use actual or precoded values, because postcoding of open-ended items is time consuming and some subjectivity is inevitable in translating written responses into coded numerical values.

Before entering your data into a computer for processing, you should develop a code that indicates how each variable is coded (whether it is actual or coded values) and in what field it appears (Babbie, 2000). A sample codebook codebook - data dictionary  for several variables might look like this:
Columns   Variable           Codes

1-3       Case #             001-150 (for 150 cases)
4         Gender             1 = male, 2= female
5-6       Month of birth     1 = January, 2 = February, etc.
7         Type of health     1 = Private, 2 = Public, 3 = Uninsured
          Insurance          8 = Don't know, 9 = Refused/Missing
8-9       Height in inches   e.g. 64.0 inches = 64, 64.5 = 65,
                             66.25 = 66
10        Marital Status     1 = Married, 2 = Widowed,
                             3 = Separated, 4 = Divorced,
                             5 = Never Married


We will now use this codebook to enter the data for the three cases. A case is the data for each person or unit of interest. For example, if you were interested in the data from 140 people enrolled in a health promotion program, there would be 140 cases in your data set. If you were interested in looking at the data for the states there would be 50 cases. There are several reasons why each case (questionnaire, etc.) should be assigned a number. One reason is that if an error in data entry is noticed during analysis, the case number can direct the person back to the original data for correction. A second reason concerns follow-up. By putting an ID number on each case at the beginning of your study, you can quickly identify those cases for which you have and don't have data. Thus, when following up nonrespondents, as in a mail or phone survey, you immediately know whom to follow-up.

The width of the variable is called the "field." Preparing the data for the three cases requires an understanding of the concept of"fixed field." Fixed field means that, for each variable, the values for each case are in the same columns. In the previous example the case number would always be in columns 1-3. Gender, being a single digit variable, would always be in column 4, and columns 5-6 would always contain the data on month of birth, etc. The last variable, marital status marital status,
n the legal standing of a person in regard to his or her marriage state.
, would always be in column 10. Entering your data using a fixed field format allows you to analyze them using popular software such as SPSS A statistical package from SPSS, Inc., Chicago (www.spss.com) that runs on PCs, most mainframes and minis and is used extensively in marketing research. It provides over 50 statistical processes, including regression analysis, correlation and analysis of variance. , SAS (1) (SAS Institute Inc., Cary, NC, www.sas.com) A software company that specializes in data warehousing and decision support software based on the SAS System. Founded in 1976, SAS is one of the world's largest privately held software companies. See SAS System. , or Excel. It is important to note that if there is a case with missing data for a variable, then those columns should either be left blank or coded with a special number such as 9 (for a single digit variable), 99 for a two digit variable (such as month or state), and 999 for a three-digit variable (such as weight). With this in mind, let's look at three cases of data from a study with 150 cases.

0012081621

0021113702

0031072724

Using the codebook previously developed, case #1 (001) is a female (2), born in August (08), having private insurance (1), is 5'2" tall (62"), and married (1). Case #2 is male (1), born in November (11), is uninsured (3), stands 5'10" tall (70") and is widowed (2). You work through case #3. If there had been 150 cases, then there would be 150 rows of data. For each there would be data on 5 variables (6 counting the case number), having a total of 10 columns of data for each case. Studies can include many cases as well as many variables, ranging from a few to thousands. However, every data set, regardless of size, should include a codebook and fixed format. Additional information on developing a code book can be found in Sarvela and McDermott, 1993. Now that your data are coded and entered, you're ready for' data processing data processing or information processing, operations (e.g., handling, merging, sorting, and computing) performed upon data in accordance with strictly defined procedures, such as recording and summarizing the financial transactions of a  and subsequent analysis. Data processing will be discussed in the next article in this series.

REFERENCES

Babbie, E., et al. (2000). Adventures in social research. Thousand Oaks Thousand Oaks, residential city (1990 pop. 104,352), Ventura co., S Calif., in a farm area; inc. 1964. Avocados, citrus, vegetables, strawberries, and nursery products are grown. , CA; Pine Forge Press, p. 27

Sarvela , P. & McDermott, R. [1993). Health education evaluation and measurement-a practitioner's perspective. Dubuque, IA; WCB WCB Workers Compensation Board (Canada)
WCB Write Combining Buffer
WCB Wheelchair Bound
WCB Will Call Back
WCB Wisconsin Certification Board
WCB Western Commerce Bank (New Mexico) 
 Brown & Benchmark, pp. 265-271.

Thomas W. O'Rourke is a Professor in the Department of Community Health and College of Medicine, University of Illinois at Urbana-Champaign Early years: 1867-1880
The Morrill Act of 1862 granted each state in the United States a portion of land on which to establish a major public state university, one which could teach agriculture, mechanic arts, and military training, "without excluding other scientific
, IL 61820

Thomas W. O'Rourke is a Professor in the Department of Community Health and School of Clinical Medicine, University of Illinois at Urbana-Champaign, IL 61820.
COPYRIGHT 2000 University of Alabama, Department of Health Sciences
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:O'Rourke, Thomas
Publication:American Journal of Health Studies
Geographic Code:1USA
Date:Jun 22, 2000
Words:1385
Previous Article:ATTITUDE TOWARD SEXUAL INTERCOURSE AND RELATIONSHIP WITH PEER AND PARENTAL COMMUNICATION.(Statistical Data Included)
Next Article:Measuring the effectiveness of a community-sponsored DWI intervention for teens.
Topics:



Related Articles
Reading between the lines: using bar-code technology is a smart way to keep track of business data.
Mathematics, gender, spatial performance, and cerebral organization: A suppression effect in talented students.
Discovering Semantic Patterns in Bibliographically Coupled Documents.
Techniques for screening and cleaning data for analysis.
Editor's choice: measuring outcomes: is the first-time, full-time cohort appropriate for the community college?
Mixing It Up: Integrated, Interdisciplinary, Intriguing Science in the Elementary Classroom.(Brief Article)(Book Review)
SDI taps into SAI know-how.(EQUIPMENT REPORT)(Steel Dynamics Inc.)(Systems Alternatives International LLC)
David Kaplan (Ed.), The SAGE Handbook of Quantitative Methodology for the Social Sciences.(Book review)
Dynamic light scattering.(Suppliers Showcase: Instruments)
Dynamic light scattering systems.(Instruments)

Terms of use | Copyright © 2010 Farlex, Inc. | Feedback | For webmasters | Submit articles