Printer Friendly
The Free Library
14,695,408 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Knowledge Discovery in Spatial Cartographic Information Retrieval.


ABSTRACT

LIBRARY CATALOGS FOR MAP COLLECTIONS are not well developed in most libraries. The cartographic car·tog·ra·phy  
n.
The art or technique of making maps or charts.



[French cartographie : carte, map (from Old French, from Latin charta, carta, paper made from papyrus
 information source differs from other kinds of information in that it is usually rectangular rec·tan·gu·lar  
adj.
1. Having the shape of a rectangle.

2. Having one or more right angles.

3. Designating a geometric coordinate system with mutually perpendicular axes.
 in shape and defined by the coordinates of the four map corners. This coordinate information proves difficult for many people to use, unless a certain user interface is designed and knowledge discovery algorithms are implemented. A system with such an interface and algorithms can perform powerful queries that an ordinary text-based information retrieval information retrieval

Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links.
 system cannot. This article describes a prototype system--GeoMatch--which allows users to interactively define geographic areas of interest on a background map. It also allows users to define, qualitatively or quantitatively, the relationship between the user-defined area and the map coverage. The knowledge discovery in database (KDD KDD Knowledge Discovery and Data Mining (International Conference)
KDD Knowledge Discovery in Databases
KDD Kokusai Denshin Denwa (Japan)
KDD Key Distribution Device
) factor is analyzed an·a·lyze  
tr.v. an·a·lyzed, an·a·lyz·ing, an·a·lyz·es
1. To examine methodically by separating into parts and studying their interrelations.

2. Chemistry To make a chemical analysis of.

3.
 in the retrieval process. Three librarians were interviewed to study the feasibility of the new system. The MARC record format is also discussed to argue that conversion of cartographic material records from an existing library online catalog Similar to an online library or databases in the information storage respect, ‘’’online catalogs’’’ allow potential customers to browse a company’s items for sale from a different location using the internet.  system to GeoMatch can be done automatically.

INTRODUCTION

Knowledge discovery in databases (KDD) has become a hot topic in recent years. The KDD method has been used in various fields, including spatial database A spatial database is a database that is optimized to store and query data related to objects in space, including points, lines and polygons. While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases  analysis (Xu et al., 1997), automatic classification (Bell, 1998), deviation detection (Schmitz, 1990), and clustering (Cheesman, 1996). This article explores the use of KDD in information retrieval by examining the nature and process of geographic information retrieval Geographic Information Retrieval (GIR) or Geographical Information Retrieval is the augmentation of Information Retrieval with geographic metadata.

Information Retrieval generally views documents as a collection or `bag' of words.
. It deals with the characteristics of Geographic Information Systems geographic information system (GIS)

Computerized system that relates and displays data collected from a geographic entity in the form of a map. The ability of GIS to overlay existing data with new information and display it in colour on a computer screen is used primarily to
 (GIS (1) (Geographic Information System) An information system that deals with spatial information. Often called "mapping software," it links attributes and characteristics of an area to its geographic location. ), Bibliographic bib·li·og·ra·phy  
n. pl. bib·li·og·ra·phies
1. A list of the works of a specific author or publisher.

2.
a.
 Records for Cartographic Information, and a GIS-based cartographic information retrieval system--GeoMatch.

GIS AND FUNCTIONS RELATED TO THE GIS-BASED INFORMATION RETRIEVAL SYSTEM

The Environmental System Research Institute (ESRI (Environmental Systems Research Institute, Inc., Redlands, CA, www.esri.com) The world's leading developer of geographic information systems (GIS) software, including programs that plot ZIP codes and addresses, demographic information and detailed, color-coded data. ) is the largest GIS software This is a list of notable GIS software applications. See also the comparison of GIS software. Open source software
Most widely used open source applications:
  • GRASS – Originally developed by the U.S.
 producer in the world. ESRI defines GIS in its menu (Environmental System Research Institute, 1991) as: "An organized collection of computer hardware, software, geographic data Geographic data is about much more than electronic pictures of maps.

The geographic data that describes our world allows for city planning, flood prediction and relief, emergency service routing, environmental assessments, wind pattern monitoring and many other applications.
, and personnel designed to efficiently capture, store, update, manipulate, analyze, and display all forms of geographically referenced information." Most words in this definition can be found in definitions of many other information systems. What makes GIS special is the term geographically referenced data. GIS uses spatial location as the major link to organize and manipulate information.

A typical GIS has two major functional components--a database management system, which stores and manipulates the data, and a spatial engine, which performs special topological to·pol·o·gy  
n. pl. to·pol·o·gies
1. Topographic study of a given place, especially the history of a region as indicated by its topography.

2.
 operations on geographic features. A common misunderstanding of GIS is to consider it merely a computerized computerized

adapted for analysis, storage and retrieval on a computer.


computerized axial tomography
see computed tomography.
 mapmaker map·mak·er  
n.
A person who makes maps; a cartographer.



mapmak·ing n.
. GIS is a powerful analytical tool that is far more sophisticated than a mapmaker. It is true that some GIS products on the market are simplified for naive GIS users to generate, view, and print maps. These "viewer"/software packages often support only limited data manipulation Processing data.  functions. They are not considered fully functional GIS systems. A GIS can perform network analysis, overlay (1) A preprinted, precut form placed over a screen, key or tablet for identification purposes. See keyboard template.

(2) A program segment called into memory when required.
, buffering, and many other operations that few other information systems can accomplish. As Burrough (1990) summarized, a GIS can answer such questions as:

* Where is 785 S. Allen Street in Albany, New York For other uses, see Albany.
Albany is the capital of the State of New York and the county seat of Albany County. Albany lies 136 miles (219 km) north of New York City, and slightly to the south of the juncture of the Mohawk and Hudson Rivers.
?

* In what census tract A census tract, census area, or census district is a particular community defined for the purpose of taking a census. Usually these coincide with the limits of cities, towns or other administrative areas and several tracts commonly exist within a county.  is the above address located?

* How many supermarkets are within three miles from the above address?

* A delivery truck needs to deliver items to 200 customers. What is the shortest route and sequence to make the delivery? If road traffic information is available, what is the fastest route to finish the task?

* Given the population in a county, what is the population density? (GIS can calculate the area of the county precisely).

* A new shopping mall is going to be built in the city. The mall should be built at least five miles away from the existing shopping malls; next to a major street; surrounded by 5,000 residents within four miles; and no more than ten miles from the downtown area. Where is the best place to build the new mall?

There are many other questions that only a GIS can answer. One of the GIS functions that is highly related to the geographic information retrieval system is overlay. Some concepts need to be defined to understand the overlay process.

In a GIS, a polygon polygon, closed plane figure bounded by straight line segments as sides. A polygon is convex if any two points inside the polygon can be connected by a line segment that does not intersect any side. If a side is intersected, the polygon is called concave.  is an enclosed en·close   also in·close
tr.v. en·closed, en·clos·ing, en·clos·es
1. To surround on all sides; close in.

2. To fence in so as to prevent common use: enclosed the pasture.
 area bounded by lines such as a census tract or a county. Consequently, polygons have areas and parameters that a GIS can calculate. A layer or a theme is a concept for a single feature map in GIS. For example, a county map of Florida showing the average age of a population is a polygon layer. These single-feature layers can be integrated by GIS for analysis.

GIS has the capability of building geometric topology In mathematics, geometric topology is the study of manifolds and their embeddings. Low-dimensional topology, concerning questions of dimensions up to four, is a part of geometric topology. . It can determine which lines are crossing one another to create a node at the cross point. It can detect what lines are connected to create an enclosed polygon. GIS can then generate a polygon object with features like area and parameter (1) Any value passed to a program by the user or by another program in order to customize the program for a particular purpose. A parameter may be anything; for example, a file name, a coordinate, a range of values, a money amount or a code of some kind. . The topology topology, branch of mathematics, formerly known as analysis situs, that studies patterns of geometric figures involving position and relative position without regard to size.  in a GIS can be expressed as the relationship of points, lines, and polygons. GIS can do sophisticated spatial analysis (Data West Research Agency definition: see GIS glossary.) Analytical techniques to determine the spatial distribution of a variable, the relationship between the spatial distribution of variables, and the association of the variables of an area.  after the topology is established.

The process of merging multiple layers is called overlay, a unique function of GIS. For example, assume that there are two maps printed on transparencies--a map of census tracts and a map of a lake, all in the same county. If both maps are in exactly the same scale and the four corners of the two maps represent exactly the same locations, the two transparencies can be put together to make a new map--with both county boundaries and the lake shore. The new map is the so-called overlay. GIS is very powerful in performing this operation. It can overlay maps with different kinds of features (point, line, polygon) and develop new topologies for further analysis. Burrough (1990) lists forty-four kinds of overlay analysis capabilities that GIS may have. Figure 1 demonstrates the overlay process. The first map layer shows school district boundaries (District C and District D). The second map layer represents county boundaries (County A and County B). During the overlay process, GIS combines the features from both map layers into a third layer that contains four polygons. In the third map layer, each polygon will have attributes from both the county map layer and the school district map layer. For example, area 1 will have its area, parameter, county name A, school district name C, and other data previously stored in the two map layers. Obviously, it would be difficult to integrate the school district data and county data like this using only database techniques because the data collected represent different areas.

[Figure 1 ILLUSTRATION OMITTED]

KNOWLEDGE DISCOVERY IN DATABASES AND INFORMATION RETRIEVAL

Due to the less expensive data storage and increasing computing computing - computer  power, the volume of data collected by various organizations has expanded rapidly. This vast abundance of data, often stored in separate data sets, makes it more difficult to find relevant information. On the other hand, the power of computers also makes it possible to integrate the data sets, compile the facts, and develop the information into "a collection of related inferences" (Trybula, 1997). This is why KDD has received such attention from both the academic and commercial worlds. According to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 Tuzhilin (1997), the number of papers submitted to the Knowledge Discovery Workshop increased from 40 in 1993 to 215 in 1996.

Fayyad, Piatetsky-Shapiro, and Smyth (1996)define KDD as "the nontrivial nontrivial - Requiring real thought or significant computing power. Often used as an understated way of saying that a problem is quite difficult or impractical, or even entirely unsolvable ("Proving P=NP is nontrivial"). The preferred emphatic form is "decidedly nontrivial".  process of identifying valid, novel, potentially useful, and ultimately understandable patterns of data" (p. 2). As Trybula (1997) summarized, the methods of evaluating data include algorithms, association, change and deviation determination, visualization Using the computer to convert data into picture form. The most basic visualization is that of turning transaction data and summary information into charts and graphs. Visualization is used in computer-aided design (CAD) to render screen images into 3D models that can be viewed from all , and sixteen other analytical techniques An analytical technique is a method that is used to determine the concentration of a chemical compound or chemical element. There are a wide variety of techniques used for analysis, from simple weighing (gravimetric) to titrations (titrimetric)to very advanced techniques using . No matter which method is employed, the key point of KDD is to uncover new, useful, and understandable knowledge.

Information retrieval can be simply expressed as a matching process--matching a user's information need with the information source (School of Information Studies, 1998). In this process, a user must express his/ her information need accurately so that the system can retrieve the information. On the other hand, information sources need to be organized in such a way that the most important attributes, such as title, author, subject terms, keywords, publication year, and so on, are readily available.

Text information retrieval systems have become more powerful in the last three decades. The retrieval efficiency and effectiveness has been greatly improved through Boolean operators One of the Boolean logic operators such as AND, OR and NOT. , truncations, proximity, probability search, and many other search mechanisms. However, some attributes in bibliographic records can create difficulty for exact match in a search. Some attributes are even difficult for users to understand. For example, geographic coordinates The quantities of latitude and longitude which define the position of a point on the surface of the Earth with respect to the reference spheroid. See also coordinates.  are attributes in MARC records for cartographic data. Few users would want or be able to enter exact numbers to match those coordinates. Even fewer would know what the numbers mean. Despite these difficulties, however, could the coordinates be useful in information retrieval? Can they be processed to provide understandable and useful knowledge in selecting relevant information?

This article will demonstrate a prototype of a GIS-based cartographic information retrieval system and illustrate how such a system could indeed generate new and useful knowledge during the retrieval process.

CARTOGRAPHIC INFORMATION RETRIEVAL

Cartographic Information Retrieval in Libraries

An access point is defined as "a name, term, code, etc., under which a bibliographic record may be searched and identified" (Glossary A term used by Microsoft Word and adopted by other word processors for the list of shorthand, keyboard macros created by a particular user. See glossaries in this publication and The Computer Glossary. , 1995). An ordinary information retrieval system usually has common access points such as author, title, keywords, subject headings, classification number, and information from other special fields.

In addition to its spatial coverage, a cartographic information source, such as a single sheet map, shares most of the attributes other information sources have, including title and subject terms. A cartographic information source is different from other formats in that, as an information container, it is usually in the shape of a rectangle and contains the coordinates of the four map corners. Nevertheless, most current retrieval systems do not use geographic coordinates as access points because this does not make sense in a text information retrieval system. Many libraries are still in the process of retrospective conversion from card catalogs to text-based online catalogs for their map collections. To study the feasibility of libraries adopting a GIS-based cartographic information retrieval system, long interviews with three librarians were conducted in two libraries in Tallahassee, Florida For other uses, see Tallahassee (disambiguation).
Tallahassee is the capital of the State of Florida and the county seat of Leon County. Tallahassee became the capital of Florida in 1824. As of 2006, the population recorded by the U.S.
.

During each interview, a prototype of a GIS-based cartographic information retrieval system (GeoMatch) was demonstrated. The librarians were asked to answer questions concerning the library's map collection, user needs, retrieval tools, and searching procedures. The librarians were also asked to evaluate the usability of the prototype software and assess the usefulness of the system.

FLORIDA STATE LIBRARY

Most of the map collection in the Florida State Library consists of historical maps. Although the library is currently outsourcing (1) Contracting with outside consultants, software houses or service bureaus to perform systems analysis, programming and datacenter operations. Contrast with insourcing. See netsourcing, ASP, SSP and facilities management.  the map cataloging to an organization associated with OCLC OCLC - Online Computer Library Center , the card catalog catalog, descriptive list, on cards or in a book, of the contents of a library. Assurbanipal's library at Nineveh was cataloged on shelves of slate. The first known subject catalog was compiled by Callimachus at the Alexandrian Library in the 3d cent. B.C.  is still the major retrieval tool for the map collection. The library has added only 800 maps to its online catalog. The online catalog features keyword searching, which provides more retrieval power than the card catalog. The card catalog allows searching only from author, title, and subject terms. During the interviews, the librarians indicated that they had seen more patrons using the catalog since the online version was implemented.

The library has no plan yet to digitize To convert an image or signal into digital code by scanning, tracing on a graphics tablet or using an analog to digital conversion device. 3D objects can be digitized by a device with a mechanical arm that is moved onto all the corners.  (scan) the maps. Patrons usually cannot find needed maps using the card catalog. Some patrons can locate their maps using the online catalog with keyword searching. Generally speaking, patrons primarily rely on the map librarians to find and access maps.

Although the online catalog system cannot provide sufficient assistance for accessing cartographic information, every day many map users do search historic maps, railroad railroad or railway, form of transportation most commonly consisting of steel rails, called tracks, on which freight cars, passenger cars, and other rolling stock are drawn by one locomotive or more.  maps, and place names. Great reliance must be placed on the knowledge and expertise of the map librarians.

FLORIDA STATE UNIVERSITY Florida State University, at Tallahassee; coeducational; chartered 1851, opened 1857. Present name was adopted in 1947. Special research facilities include those in nuclear science and oceanography.  LIBRARY

The Florida State University (FSU FSU Florida State University
FSU Former Soviet Union
FSU Ferris State University
FSU Fayetteville State University (North Carolina)
FSU Frostburg State University
FSU Finance Sector Union
) library has a collection of 165,000 single sheet maps, including U. S. Geological Survey The term geological survey can be used to describe both the conduct of a survey for geological purposes and an institution holding geological information.

A geological survey
 maps, road maps, city maps, thematic maps (Data West Research Agency definition: see GIS glossary.) A map that displays the spatial distribution of an attribute that relates to a single topic, theme, or subject of discourse. , and historical maps. Records for most of the single sheet maps are maintained in the card catalog. The librarians have started the retrospective conversion of map card catalog records to online catalog records using OCLC. According to the map librarian, most of the records can be found in the OCLC database. During the conversion process, the librarian must make minor changes before adding the OCLC records to the library's online catalog.

The librarians serve many map users everyday including faculty, students, and users referred by other libraries. The map librarians are very familiar with the map collection and usually can find the maps needed. The situation at the FSU library is similar to the one at the Florida State Library--i.e., the map librarians are the most valuable source of information, given the fact that the catalog system for the cartographic data is not very helpful.

In summary, map librarians in both libraries are the most important sources of information for users seeking cartographic data.

Both libraries are in the process of converting cartographic records in the card catalog to the online catalog. The online catalog with searching capability has led to increased map use.

Although most users can access the map information they need with the help of librarians, this situation needs to be improved, for several reasons. First, the map librarians are not certain whether or not they actually find the maps that best match users' needs. Second, none of the librarians think they can provide a complete list of maps that users might be interested in, especially in a library with more than 100,000 maps. Finally, searching for the right information in such a system relies extensively on human expertise. As one librarian said: "It is at the librarian's mercy whether the user can get a satisfactory answer." If current map librarians leave their positions, it would take new map librarians years to familiarize themselves with the library collection. There exists a great demand for a powerful searching tool for the library map collection.

STUDIES OF GEO-BASED RETRIEVAL TOOLS

A literature review indicates that more advanced cartographic information retrieval systems, designed for searching electronic maps, have been created and are still in the process of refinement. The Alexandria Project is probably the most well-known electronic library system dealing with topological relationships.

Smith (1996) described the goal of the Alexandria Project Digital Library (ADL) as "to build a distributed digital library (DL) for geographically-referenced materials. A central function of ADL is to provide users with access to a large range of digital materials, ranging from maps and images to text to multimedia, in terms of geographical reference" (http://www.dlib.org/dlib.org/dlib/march96/briefings/smith/ 03smith.html).

The Alexandria Atlas Subteam investigates "the design and functionality of an atlas that would support graphical/geographical access to library materials" (http://wwww.alexandria.ucsb.edu/public-documents/ annual-report97/node28.html#SECTION00051300000000000000). As the Alexandria Web site indicates, "spatial searching has not been an available service to library clients and it is not at all clear how ADL clients will react to having actual spatial data Data that is represented as 2D or 3D images. A geographic information system (GIS) is one of the primary applications of spatial data (land maps). See spatial analysis, spatial resolution and GIS glossary.  available over the Web" (http:// www.alexandria.ucsb.edu/public-documents/annual-r port97/ node28.html#SECTION00051300000000000000). The team is studying such issues as scale, data registration, search result presentation, and fuzzy fuzz·y  
adj. fuzz·i·er, fuzz·i·est
1. Covered with fuzz.

2. Of or resembling fuzz.

3. Not clear; indistinct: a fuzzy recollection of past events.

4.
 footprints.

The Alexandria system supports geographical browsing and retrieval using a graphical map interface. An example of the interface can be found at <http://www.dlib.org/dlib/march96/briefings/smith/ 03smith.html>. Users can zoom To change from a distant view to a more close-up view (zoom in) and vice versa (zoom out). An application may provide fixed or variable levels of zoom. A display adapter may also have built-in zoom capability.  in and zoom out on the current view of the map. They can select the map features they wish to see on the background map such as borders and rivers. Users can also select an area of interest and a mode of either OVERLAPS OF CONTAINS. An overview of the system is available at <http://www.alexandria.ucsb.edu/adljigi/tutorials/ walkthrough1/walkthrou>.

The prototype of GeoMatch has some new functions in addition to those available in the Alexandria system. The initiative of testing GeoMatch is to answer the following two questions: (1) can a GIS/Graphic-based retrieval tool like the Alexandria project be used for nonelectronic cartographic collections in libraries? and (2) what new functions can be developed to improve the GIS-based retrieval tool?

GEO-MATCH--A RETRIEVAL TOOL THAT SEARCHES

Figure 2 illustrates a query screen of the Geo-Match system. In addition to specifying ordinary information needs such as year, title, publisher, keyword, and so on, this system allows a user to interactively identify the interested area using a mouse. It also asks the user to specify the topological relationship between the map coverage and the user-selected area. The system accepts containment and overlapping relationships as summarized by Cobb and Petry (1998). There are two possible containment relationships--the user-selected area falls entirely within a map coverage or the coverage of a map falls within the user-selected area. Users can make a selection.

[Figure 2 ILLUSTRATION OMITTED]

If a user decides to select the overlapping relationship, more choices become available to specify quantitatively the degree of overlap. This degree includes the percentage of the overlapping area in maps and the percentage of the overlapping area in the user-selected area. If a user selects 85 percent as the overlapping criterion in the user-selected area, the user will find maps that cover most of the area of interest (Figure 3). If a user selects 85 percent as the overlapping criterion in the map coverage, the user will find maps that concentrate on the selected area (Figure 4). Users can specify how searching results should be ranked based on the degree of overlap.

[Figures 3-4 ILLUSTRATION OMITTED]

The key features of the prototype are its capability for the user to interactively identify the area of interest--i.e., to quantitatively specify the relationship between the user-defined area and the map coverage, and to rank the search results based on the degree of overlapping.

USE OF GRAPHICS TO EXPRESS INFORMATION NEED

Cartographic information is geographically referenced--it represents locations and areas on the earth. Conventional information representation using text and symbols is not very useful in describing the information included in a map; there are too many geographic features included in an area. For example, a railroad map in Florida can be indexed using the keywords railroad and Florida. However, the map also includes all the railroads rail·road  
n.
1. A road composed of parallel steel rails supported by ties and providing a track for locomotive-drawn trains or other wheeled vehicles.

2.
 in each county in Florida. It indicates railroad construction in the Jacksonville area and demonstrates the railroad near Lake xxx. It is practically impossible to index all the place names included in an area. When a user draws a box to specify an area of interest, the information requested would require many words to describe it. A graphic interface can hide the coordinate numbers and present them in scalable graphics, which makes it much easier for users to discover the cartographic information resources (1) The data and information assets of an organization, department or unit. See data administration.

(2) Another name for the Information Systems (IS) or Information Technology (IT) department. See IT.
 of interest.

In addition to the information representation issue discussed earlier, a graphic interface also avoids trouble for users when changes in place names and county boundaries occur or when they simply do not know the exact name to begin the search.

LEVEL 1 IN KD--SPECIFYING TOPOLOGICAL RELATIONSHIPS QUALITATIVELY BETWEEN THE USER-DEFINED AREA AND THE MAP COVERAGE

As discussed earlier, the Alexandria Project can specify topological relationships qualitatively between the user-defined area and the map coverage in its electronic cartographic information retrieval system. This matching process goes beyond the exact matching Exact matching

A bond portfolio management strategy that involves finding the lowest cost portfolio generating cash inflows exactly equal to cash outflows that are being financed by investment.
 in a conventional information retrieval system. The computer system will calculate the topological relationship between the user-defined area and the coverage of the maps to determine whether they overlap or one completely contains another.

Cobb and Petry (1998) presented a model for defining and representing binary topological and directional In one direction. Contrast with omnidirectional.  relationships between two-dimensional objects. Such relationships can be used for fuzzy querying. Cobb and Petry (1998) summarize sum·ma·rize  
intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es
To make a summary or make a summary of.



sum
 that there are four kinds of major relationships--disjoint, tangent tangent, in mathematics.

1 In geometry, the tangent to a circle or sphere is a straight line that intersects the circle or sphere in one and only one point.
 (next to each other), overlapping, and containment. The assumption for GeoMatch is that users would find overlapping and containment most useful when querying the system.

The operations involved in the above include conversion from screen coordinates to the real world coordinates and comparison of the coordinates of the corners of the user-defined area and map boundaries. The new knowledge--whether two areas overlap--is generated in this process. The knowledge acquired can be utilized to lead users to the relevant information source. GeoMatch provides users with an additional choice beyond the Alexandria system with which to define the containment relationship.

LEVEL 2 IN KD--SPECIFYING A TOPOLOGICAL RELATIONSHIP QUANTITATIVELY BETWEEN THE USER-DEFINED AREA (RECTANGLE) AND THE MAP COVERAGE

Specifying a topological relationship quantitatively between the user-defined area and the map coverage is a unique feature of the GeoMatch system. In this process, not only is the topological relationship of the two areas determined, more mathematical calculation is performed to estimate how much the two areas overlap. By combining the information input by users and the data stored in the database, the computer algorithm discovers new knowledge not explicitly represented in the database. Since the user-defined area is rectangular, the calculation involved is not overwhelming and can be realized using a conventional programming language such as C++ or Visual Basic.

This feature allows the system to achieve a higher recall and precision than those systems without this function. Gluck (1995) made an analysis of the relevance and competence in evaluating the performance of information systems. He indicated that "relevance judgments by users most often assess the qualities of retrieved materials item by item at a particular point in time and within a particular user context" (p. 447). Using the qualitative topological matching technique described in Level 1 above, there could be a large gap between the relevance of the system's view and the relevance of the user's view. For example, users may find that some retrieved maps cover only a small part of the area of interest and in fact are useless, but these maps are relevant from the system's view since they overlap the user-defined area. Users may also find that some retrieved maps cover such a large area that the area of actual interest encompasses only a small portion of the whole map. These maps are relevant too from the system's view but, again, practically useless for users. The reason for such a gap between the user's view and system's view is that not enough "knowledge" is discovered and provided for users to describe their information need in more detail. The techniques employed in the quantitative topological matching can greatly reduce the gap of relevance between the two perspectives. In addition, Geomatch can calculate the spatial relevance of the maps to the area of interest and rank the results using the quantitative overlapping factor, while many systems fail to "provide useful ordering of retrieved records" (Larson, McDonough, O'Leary, Kuntz, & Moon, 1990, p. 550). This function is particularly helpful for users when hundreds of maps are included in the result set.

LEVEL 3 IN KD--SPECIFYING TOPOLOGICAL RELATIONSHIP QUANTITATIVELY BETWEEN USER-DEFINED AREA (FREE STYLE) AND MAP COVERAGE

Specifying a topological relationship quantitatively between a user-defined area and map coverage differs from level 2 in that users are allowed to use the mouse to define an irregular area of interest rather than a straight rectangle. This feature can help users express their information need more precisely. For example, a user interested in the lake shore area of a lake can draw an irregular circle around the lake and perform a search.

This process involves complicated topological calculations that are difficult to accomplish using conventional programming languages. The GIS overlay function introduced at the beginning of this discussion needs to be used to generate new polygons and calculate the areas involved. Although the GeoMatch prototype currently does not have this feature, this function could be implemented using a third party GIS software such as the Spatial Engine from ESRI.

MARC RECORD FOR CARTOGRAPHIC INFORMATION RESOURCES

Whether an information system can be adopted depends not only on its creativity and usefulness but also on the degree of difficulty in converting the current system to the new system. MARC record format is studied to examine what new information needs to be collected to use GeoMatch.

US MARC (Machine Readable Data in a form that can be read by the computer, which includes disks, tapes and punch cards. Printed fonts that can be scanned and recognized by the computer are also machine readable.  Cataloging), developed by the Library of Congress, follows the national standard (ANSI/NISO Z39.50) and international standard. It is the basic format of bibliographic description in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. . Most online catalogs have a MARC interface for data import and export. OCLC, the bibliographic utility, also provides records in MARC format for members to share.

The current MARC format provides sufficient geographic information to support a more powerful searching tool such as GeoMatch. The most important field is Field 034--Coded Mathematical Data Area Field (Mangan, 1984). If a single set of scales is used, the first indicator is set to "1." The subfield sub·field  
n.
1. A subdivision of a field of study; a subdiscipline.

2. Mathematics A field that is a subset of another field.
 codes include $b (ratio linear horizontal scale); Sc (ratio linear vertical scale); Sd (coordinates--westernmost longitude longitude (lŏn`jĭtd'), angular distance on the earth's surface measured along any latitude line such as the equator east or west of the prime meridian. ); Se (coordinates--easternmost longitude); $f (coordinates--northernmost latitude latitude, angular distance of any point on the surface of the earth north or south of the equator. The equator is latitude 0°, and the North Pole and South Pole are latitudes 90°N and 90°S, respectively. ); and $g (coordinates--southernmost latitude). The following is an example of the MARC record 034 field:
   034 1 a $b 7603200 $d W1640000 $e W0440000 $f N0900000 $g N040000


The field above illustrates that the map covers an area from West 164 [degrees] 00'00" to West 044 [degrees] 00'00" in longitude and from North 090 [degrees] 00'00" to North 040 [degrees] 00'00" in latitude. This demonstrates that MARC records are capable of defining the scope of a map, and the data are usable in systems like GeoMatch. No additional value-adding operations are necessary unless the bibliographic record of a map is not available from the OCLC database or no matching MARC record is available for the map. If a library already has its map collection in its online catalog, all the records can be imported into GeoMatch automatically.

FEEDBACK FROM LIBRARIANS

Florida State Library

When librarians at the Florida State Library reviewed the prototype for GeoMatch, they realized that it could give answers to difficult questions. For example, towns may disappear over time, county boundaries may change, and users might not remember an exact place name. In such cases, GeoMatch could be very helpful.

Florida State University Library

The librarian showed interest in the GeoMatch system. She thought the system could be useful but should be integrated with the university library catalog system. When the librarian was asked whether the GeoMatch system could solve some difficult to answer questions, she provided the following example:
   Case Study--a man born in 1907 wanted to find a map of his place of birth.
   He knew the name of the town and knew that it was located west of
   Jacksonville. He could not find his town on a current map since it has
   disappeared. He had no idea how to find a map showing the exact position of
   that town using the library catalog.


In summary, librarians in both libraries confirmed the need for a retrieval tool with a graphic user interface See GUI.  facilitating location-based searching. Such a tool is especially important when a user does not know the exact place name but knows approximately the locations of interest or when the name of a place has changed.

Nevertheless, while the librarians judged the system to be creative and potentially useful, they were not eager to implement such a system in their own libraries.

CONCLUSION

New spatial information retrieval tools are needed to improve the efficiency and effectiveness of geographically referenced searching. The GeoMatch prototype demonstrates that a graphic-based interface can mine the geographical data buried in MARC records and other geospatial Geospatial is a term widely used to describe the combination of spatial software and analytical methods with terrestrial or geographic datasets. The term is often used in conjunction with geographic information systems and geomatics.  sources and visualize the new knowledge discovered in these data. Combined with the text retrieval capability, this knowledge discovery tool provides users with greater flexibility in locating the information they need. Discovering knowledge in geospatial data is distinct from text information searching because it uses algorithms to convert coordinate information into user-understandable and useful knowledge.

The main contribution of GeoMatch is the quantitative analysis Quantitative Analysis

A security analysis that uses financial information derived from company annual reports and income statements to evaluate an investment decision.

Notes:
 of the relationship in the retrieval process. Not only can it help users to more precisely define their information need and adjust the searching strategy, but it can also be used to rank the results.

The study of the MARC format shows that it supports the data requirements of GeoMatch, and no additional information is required for converting an existing online catalog to GeoMatch.

Future research in geospatial information retrieval systems will focus on the usability of the system and the theoretical framework of spatial information retrieval, including:

1. usability testing Usability testing is a means for measuring how well people can use some human-made object (such as a web page, a computer interface, a document, or a device) for its intended purpose, i.e. usability testing measures the usability of the object.  of GeoMatch to study the user friendliness and usefulness of the system;

2. field testing of implementing GeoMatch in a library catalog system;

3. evaluation of the efficiency and effectiveness of the quantitative overlapping function;

4. design of the formula and algorithms to rank the searching result using factors from spatial comparison and factors from text information retrieval such as keywords;

6. application of such a system to information sources other than paper maps, including electronic images and information that can be geographically referenced; and

7. accessibility of such a system over the Web.

Results from these studies could enrich the theories in spatial information retrieval and lead to more powerful and user-friendly information retrieval tools.

REFERENCES

Bell, D. A., & Guan guan: see curassow. , J. W. (1998). Computational Having to do with calculations. Something that is "highly computational" requires a large number of calculations.  methods for rough classification and discovery. Journal of the American Society for Information Science, 49(5), 403-414.

Burrough, P. A. (1990). Principles of geographical information systems Geographical Information System - Geographic Information System  for land resources Noun 1. land resources - natural resources in the form of arable land
natural resource, natural resources - resources (actual and potential) supplied by nature
 assessment. Oxford: Clarendon Press.

Cheeseman, P., & Stutz, J. (1996). Bayesian classification (autoclass): Theory and results. In U. M. Fayyad (Ed.), Advances in knowledge discovery and data mining (pp. 153-180). Menlo Park Menlo Park.

1 Residential city (1990 pop. 28,040), San Mateo co., W Calif.; inc. 1874. Electronic equipment and aerospace products are manufactured in the city. Menlo College and a Stanford Univ. research institute are there.

2 Uninc.
, CA: AAAI AAAI American Association for Artificial Intelligence
AAAI Association for the Advancement of Artificial Intelligence (Menlo Park, California)
AAAI American Academy of Allergy, Asthma, and Immunology
 Press.

Cobb, M. A., & Petry, F. E. (1998). Modeling spatial relationships within a fuzzy framework. Journal of the American Society for Information Science, 49(3), 253-266.

Environmental System Research Institute. (1991). Understanding GIS. Redland, CA: ESRI.

Fayyad, U. M.; Piatetsky-Shapiro, G.; & Smyth, P. (1996). From data mining to knowledge discovery: An overview. In U. M. Fayyad (Ed.), Advances in knowledge discovery and data mining (pp. 1-34). Menlo Park, CA: AAAI Press.

Glossary. (1995). Retrieved August 18, 1999 from the World Wide Web: http:// www.libraries.rutgers.edu/rulib/abtlib/alexlib/glossary-html.

Gluck, M. (1995). Understanding performance in information systems: Blending relevance and competence. Journal of the American Society for Information Science, 46(6), 446-460.

Larson, R. R.; McDonough, J.; O'Leary, P.; Kuntz, L.; & Moon, R. (1996). Cheshire II: Designing a next-generation online catalog. Journal of the American Society for Information Science, 47(7), 555-567.

Mangan, E. U. (1984). MARC conversion manual--maps: Content designation conventions and procedures for AACR AACR American Association for Cancer Research
AACR Anglo-American Cataloging Rules
AACR Australasian Association of Cancer Registries
AACR African Armed Conflicts Resolved
2. Washington, DC: Library of Congress.

Schmitz, J. (1990). Coverstory--automated news finding in marketing. Interfaces, 20(6), 29-38.

School of Information Studies, FSU. (1999). Foundations of information studies. Retrieved May 17, 1999 from the World Wide Web: http://slis-one.lis.fsu.edu/courses/5230/.

Smith, T. R. (1996). A brief update on the Alexandria digital library project--constructing a digital library for geographically-referenced materials. Retrieved August 6, 1999 from the World Wide Web: http://alexandria.sdc.ucsb.edu.

Smith, T. R. (1998). Alexandria atlas subteam. Retrieved August 6, 1999 from the World Wide Web: http://alexandria.sdc.ucsb.edu.

Trybula, W. J. (1997). Data mining and knowledge discovery. In M. E. Williams (Ed.), Annual review of information science and technology (pp. 197-229). Medford, NJ: Information Today.

Tuzhilin, A. (1997). Editor's introduction to the special issue on knowledge discovery and its applications to business decision-making. Decision Support Systems, 21(1), 1-2.

Xu, X. W.; Ester, M.; Kriegel, H. P.; &Sander, J. (1997). Clustering and knowledge discovery in spatial databases. Vistas in Astronomy astronomy, branch of science that studies the motions and natures of celestial bodies, such as planets, stars, and galaxies; more generally, the study of matter and energy in the universe at large. , 41(3), 397-403.

ADDITIONAL REFERENCES

Carter, C. L., & Hamilton, J. (1998). Efficient attribute-oriented generalization gen·er·al·i·za·tion
n.
1. The act or an instance of generalizing.

2. A principle, a statement, or an idea having general application.
 for knowledge discovery from large databases. IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields.  transactions on knowledge and data engineering, 10(2), 193-208.

Chen, Z., & Zhu, Q. (1998). Query construction for user-guided knowledge discovery in databases. Journal of Information Sciences, 109(1-4), 49-64.

Connaway, L. S.; Kochtanek, T. R.; & Adams, D. (1994). MARC bibliographic records: Considerations and conversion procedures for microcomputer microcomputer

Small digital computers whose CPU is contained on a single integrated semiconductor chip. As large-scale and then very large-scale integration (VLSI) have progressively increased the number of transistors that can be placed on one chip, the processing capacity
 database programs. Microcomputers for Information Management, 11 (2), 69-88.

Deogun, J. S.; Choubey, S. K.; Raghavan, V. V.; & Sever TO SEVER, practice. When defendants who are sued jointly have separate defences, they may in general sever, that is, each one rely on his own separate defence; each may plead severally and insist on his own separate plea. See Severance. , H. (1998). Feature selection and effective classifiers. Journal of the American Society for Information Science, 49(5), 423-434.

Maddouri, M.; Elloumi, S.; & Jaoua, A. (1998). An incremental Additional or increased growth, bulk, quantity, number, or value; enlarged.

Incremental cost is additional or increased cost of an item or service apart from its actual cost.
 learning system for imprecise im·pre·cise  
adj.
Not precise.



impre·cisely adv.
 and uncertain knowledge discovery. Journal of Information Science, 109(1-4), 149164.

Morik, K., & Brockhausen, P. (1997). A multistrategy approach to relational knowledge discovery in databases. Machine Learning, 27(3), 287-312.

Vickery, B. (1997). Knowledge discovery from databases: An introductory review. Journal of Documentation, 53(2), 107-122.

Lixin Yu, School of Information Studies, Florida State University, Tallahassee, FL 32306-2100

LIXIN YU is an Assistant Professor at the School of Information Studies, Florida State University, where he teaches courses in database management, user interface design, and information system design and development. He worked as a Project Manager at Geosocial Resources, Inc. and has been working on Geographic Information System projects since 1990. He has published articles on GIS including "Geographic Information Systems in Library Reference Services: Development and Challenge" (Reference Librarian, February 1998) and "Assessing the Efficiency and Accuracy of Street Address Geocoding Strategies" (Proceedings of GIS '97, December 1997).3
COPYRIGHT 1999 University of Illinois at Urbana-Champaign
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1999, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:YU, LIXIN
Publication:Library Trends
Geographic Code:1USA
Date:Jun 22, 1999
Words:5472
Previous Article:Abstracts and Abstracting in Knowledge Discovery.(abstracts can be useful summaries and reduce full text searching time)
Next Article:Librarians and Information Technology: Which is the Tail and Which is the Dog?(library users need, not necessarily more information, but better...
Topics:



Related Articles
Introduction.(examination of knowledge discovery in databases)
Knowledge Discovery in Databases.
"ORBIS TERRARUM".(Brief Article)
Mathematics, computer science and statistics.(various articles)
Maps and the Writing of Space in Early Modern England and Ireland. (Reviews).
Final report on the work of the Anglo American cataloguing committee for cartographic materials.
Atlas for tsunami-affected areas in southern Asia.(The Chronicle Library Shelf)
ESRI Press.(Spatial Portals: Gateways To Geographic Information)(Designing Better Maps: A Guide For GIS Users)(Brief article)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles