Printer Friendly

Privacy versus database statistics.

Many institutions, from businesses and government agencies to hospitals and colleges, maintain large databases containing confidential information concerning employees, customers, patients or students. The same databases often serve as the sources of statistics that enable regulatory bodies and other groups or individuals to track trends and monitor problems. In such cases, institutions generally release these files only after omitting the names and identifying numbers of the individuals involved.

Nonetheless, the remaining file information sometimes contains enough clues to permit someone to single Out a specific individual's record. For example, if a worker happens to know something about another individual, such as college attended, degree and graduation year, or uses public knowledge, it would not be difficult to search the truncated personnel files to find a record containing matching information and, with a high probability, learn this individual's salary or medical history. Thus, authorized database users can glean information to which they are not entitled.

"The problem of providing security [in such databases] has attracted much attention in recent years," say Kasinath C. Vemulalpalli and Elizabeth A. Unger of Kansas State University in Manhattan. "This problem is greatly complicated by the possibility that a legitimate user could [make] many different 'legal' queries and infer confidential information from them."

In a paper published in the Proceedings of the 14th National Computer Security Conference, held last month in Washington, D.C., Unger and Vemulapilli propose several schemes that may improve the security of such databases. Their approach involves slightly modifying, or perturbing, the data that a user compiling statistics receives in response to requests for information. Such schemes, in effect, introduce "noise" into the output data without changing the stored values or unduly affecting the statistics themselves.

In one technique, for example, the computer in response to a query calculates the average value of a certain attribute, such as salary, multiplies this average by a small fraction, then goes through all the records available and randomly adds the calculated product to a record, subtracts the product from a record or leaves the record unchanged before reporting the query results to the user. Such a scheme lowers the probability that a user can infer the exact value of an individual's salary or some other attribute, the researchers say.
COPYRIGHT 1991 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1991, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Publication:Science News
Date:Nov 16, 1991
Previous Article:Antipsychotic risks for the elderly.
Next Article:Weeding out risky passwords.

Related Articles
Health databases threaten patients' privacy, study says.
NETWORKING AMERICA: The Cultural Context of the Privacy v. Publicity Debates.
NCC Privacy Group Gears Up.
The ethics of database marketing: personalization and database marketing--if done correctly--can serve both the organization and the customer....
The New Zealand Conference on Database Integration and Linked Employer-Employee Data.
Resource review: an online database for every corporate toolbox.
Anti-crime database losing support.
Student database poses privacy concerns: but some say it would provide better data on higher ed trends.

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters