# LEDA: a platform for combinatorial and geometric computing.

Combinatorial and geometric computing is a core area of computer science (CS). In fact, most CS curricula contain a course in data structures and algorithms. The area deals with objects such as graphs, sequences, dictionaries, trees, shortest paths, flows, matchings, points, segments, lines, convex hulls, and Voronoi diagrams and forms the basis for application areas such as discrete optimization, scheduling, traffic control, CAD, and graphics. There is no standard library of the data structures and algorithms of combinatorial and geometric computing. This is in sharp contrast to many other areas of computing. There are, for example, packages in statistics (SPSS), numerical analysis (LINPACK, EISPACK), symbolic computation (MAPLE, MATHEMATICA), and linear programming (CPLEX).The lack of a library severely limits the impact of the area on computer science as a whole. The continuous reimplementation of basic data structures and algorithms slows down progress within research and even more so outside. The latter case is due to the fact that outside of research the investment for implementing an efficient solution is frequently not made, since it is doubtful whether the implementation can be reused, and therefore methods known to be less efficient are used instead. As a consequence, scientific discoveries migrate into practice relatively slowly.

Why is it like this? One of the major differences between combinatorial and geometric computing and other areas of computing such as statistics, numerical analysis, and linear programming is the use of complex data types. While the built-in types, such as integers, reals, vectors, and matrices, usually suffice in the other areas, combinatorial and geometric computing relies heavily on types like stacks, queues, lists, dictionaries, sequences, graphs, points, lines, and convex hulls and therefore asks for a programming language where all these types are available. Only with the advent of object-oriented programming did it become feasible to provide such an extension in a clean way.

In 1989, we started the LEDA project (Library of Efficient Data Types and Algorithms) to build a library of the data types and algorithms of combinatorial and geometric computing. The features of LEDA are:

* LEDA provides a sizable collection of data types and algorithms in a form that allows them to be used by non-experts. This collection includes most of the data types and algorithms described in the textbooks of the area (e.g., [1, 5, 8, 11, 18-23]): stacks, queues, lists, sets, dictionaries, ordered sequences, partitions, and priority queues; directed, undirected, and planar graphs, lines, points, and planes; and many algorithms in graph and network theory and computational geometry.

* LEDA gives a precise and readable specification for each of the data types and algorithms previously mentioned. The specifications are short (typically not more than a page), general (to allow several implementations), and abstract (to hide all details of the implementation).

* Many data types in LEDA are parameterized; e.g., the dictionary type works for arbitrary key and information type. A specific instance D with key type string and information type int, for example, is defined by dictionary (string, int) D;

* LEDA contains the most efficient realization known for its types. For many data types the user may even choose between different implementations, e.g., between ab-trees, BB[[Alpha]]-trees, dynamic perfect hashing, skip lists. . . . for dictionaries. _dictionary (string, int, skip_list) D realizes dictionary D by skip lists.

* For many efficient data structures access by position is crucial. LEDA uses a novel item concept to cast positions into an abstract form.

* LEDA contains a comfortable data type graph. It offers the standard iterations, such as "for all nodes v of a graph G do" (written forall_nodes (v,G)) or "for all neighbors w of v do" (written for-all_adj_nodes (w, v)); it allows the addition and deletion of nodes and edges; and it offers arrays and matrices indexed by nodes and edges. The data type graph allows one to write programs for graph problems in a form close to the typical textbook presentation. We emphasize that all examples given in this article show executable code. The goal is the equation "Algorithm + LEDA = Program."

* LEDA is realized in C++, and all its data types and algorithms are stored in the library as precompiled object-modules. Together with the fact that the high expressive power of LEDA keeps application programs short, this leads to short compile times.

* LEDA offers an interface to the X11 window system to allow graphical output and mouse input.

* Many geometric algorithms use arbitrary precision arithmetic and are therefore free from failures due to rounding errors. Moreover, they can handle all degenerate cases.

* LEDA supports applications in a broad range of areas. It has already been used in such diverse areas as code optimization, VLSI design, robot motion planning, traffic scheduling, machine learning, and computational biology.

In this article we describe the status of the project. We hope that LEDA will narrow the gaps between algorithms research, teaching, and application. Other projects with similar goals are described in [4, 7].

LEDA is available by anonymous ftp (ftp.mpi-sb.mpg.de, directory pub/LEDA) and can be used freely for purposes of research and teaching. The library can be used with any C++ compiler supporting templates (e.g., cfront 3.0 and g++-2.5). Further information can be obtained from leda

Elegance and Ease of Use

We give an overview of LEDA by way of four examples. They exemplify different parts of LEDA: data structures, graphs, geometry, and graphics. All examples demonstrate how easily an algorithm can be translated into a running LEDA program. The examples also demonstrate that LEDA is not only a library of data types and algorithms but rather a platform on which applications can be built.

Word Count. Here our task is to read a sequence of strings from the standard input, to count the number of occurrences of each string in the input, and to print a list of all occurring strings together with their frequencies on the standard output. The appropriate LEDA types to use are strings and dictionary arrays. The parameterized data type dictionary array (d_array(I, E)) realizes arrays with index type I and element type E. We use it with index type string and element type int. The full program is given in Figure 1. It starts with the include statement for dictionary arrays. In the first line of the main program we define a dictionary array N with index type string and element type int and initialize all entries of the array to zero. Conceptually, this creates an infinite array with one entry for each conceivable string and sets all entries to zero. (The implementation of d_arrays stores the nonzero entries in a balanced search tree with key type string.) In the second line we define a string s. The third line does all the work. The expression (cin [is much greater than] s) returns true if the input stream is nonempty and false otherwise. In the former case, the first string is removed from the input stream and assigned to s. Then the entry N[s] of array N is increased by one. The iteration forall_defined (s, N) in the last line successively assigns all strings for which the corresponding entry of N was touched during execution to s. For each such string, the string and its frequency are printed on the standard output.

Shortest Paths in Graphs. A directed graph consists of a set V of nodes and a set E of edges. An edge e = (v, w) is a directed connection from its source node v to its target node w. Assume also that for each edge e we are given the travel time cost[e] in minutes across the edge and that our task is to compute the minimum travel time from a specified node to any other node of the graph.

Dijkstra [6] found a simple algorithm to solve this problem. At a general step of the algorithm there will be some set S [subset or is equal to] V of nodes for which the exact minimum travel time is known, while for all nodes not in S only a feasible travel time is known. For any node v, we use dist[v] to denote the shortest travel time from s to v currently known. Initially dist[s] = 0, dist [v] = [infinity] for any v [is not equal to] s, and S = [Phi]. At every step one selects a vertex u [is an element of] V\S with the smallest value of dist[u] (initially, u = s) and adds u to S. For any edge e = (u, w) emanating from u we decrease dist[v] to dist[u] + cost[e] if this value is smaller than the current value of dist[v]. In our example, the first node added to S is s. When s is added dist[a] is set to 2 and dist[c] is set to 6. Then a is added to S, dist[b] is set to 3, and dist[c] is reduced to 5, . . . . A proof of correctness of Dijkstra's algorithm is beyond the scope of this article. (The idea is to show that for all nodes v the label dist[v] is always the shortest travel time from s to v along a path where all but the last node belong to S.) How can one find the node in V\S with minimal dist-value efficiently? The appropriate data structure is a priority queue. Figure 3 shows the LEDA implementation of Dijkstra's algorithm. Besides graphs, it uses the data types node_array, edge_array and node priority queue (node_pq). Node arrays and edge arrays are arrays indexed by nodes and edges, respectively. We use an edge_array(int) cost to store the travel times across edges, a node_array(int) dist to store the minimum travel times to all nodes, and a node_pq(int) PQ(G). The parameter G in the definition of PQ tells LEDA that the underlying graph is G. The node priority queue PQ always contains the nodes of G in V\S together with their current integer dist-values. Initially, all nodes are inserted into PQ. At every iteration the node with the smallest associated value is deleted from PQ (u = PQ.del_min()). All edges e emanating from u (forall_adj_edges (e, u)) are scanned, and for each such edge the dist-value of the target node is reduced if appropriate (PQ.decrease_inf(v, c)).

We want to stress the strong similarity of the LEDA program and the description of this algorithm. The same similarity holds for most graph algorithms. In this sense, LEDA realizes the equation "Algorithm + LEDA = Program."

We want to stress another fact. An implementer of Dijkstra's algorithm does not need to know about the inner workings of graphs and node priority queues; a knowledge of the relevant manual pages suffices. Figure 4 shows the manual page for node priority queues. In its last section, the manual page lists the execution times for the various operations on priority queues: constant time for insert, empty and decrease_inf, and logarithmic time for del_min. In a graph with n nodes and m edges, Dijkstra's algorithm requires n inserts, empty-tests, and del_mins and at most m decrease_infs. Its running time is therefore proportional to m + nlog n.

Convex Hulls. Imagine a finite set L of points in the plane enclosed by a rubber band. The rubber band relaxes to the so-called convex hull of L. Figure 5 shows an example. The convex hull is one of the basic structures of computational geometry. It is frequently used in image processing and pattern recognition as an approximation to the shape of a set of points. We discuss how to compute the convex hull; for simplicity, we restrict ourselves to the so-called upper hull. If one cuts the convex hull at the leftmost and rightmost points of L, the convex hull of L is split into the upper and lower hull of L, cf. Figure 5; here a point p is left of a point q if either the x-coordinate of p is smaller than the x-coordinate of q or the two x-coordinates are equal and the y-coordinate of p is smaller.

Here is how to compute the upper hull of a set L of points. First sort L according to the left-to-right ordering of points. Let [p.sub.1], [p.sub.2], ..., [p.sub.n] be the sorted sequence of points. We construct the upper hull of the points [p.sub.1], ..., [p.sub.i] incrementally for all i, 1 [is less than or equal to] i [is less than or equal to] n. Initialization is simple: the upper hull of [p.sub.1] is [p.sub.1]. Assume now that we have already constructed the upper hull of [p.sub.1], ..., [p.sub.i] and next want to add [p.sub.i+1]. If [p.sub.i] = [p.sub.i+1], then there is nothing to do. If pi [is not equal to] [p.sub.i+1], then we simply have to delete the last point of the current upper hull, as long as the current upper hull has at least two points and the last two points together with the new point do not form a right-turn, cf. Figure 6. We then add [p.sub.i+1] to the upper hull; this completes the update step. Figure 7 shows the LEDA implementation of this algorithm. Again there is a striking similarity between the algorithm description and the LEDA program and only a few additional words are required to explain the program: L.sort() sorts the list L according to the default-ordering defined on the elements of the list. For points this is the left-to-right ordering. L.pop() deletes the first element from a list and returns it, Uh.append(p) appends the point p to the list Uh, L.empty() returns true if L is empty, Uh. length() returns the length of list Uh, and Uh.Pop() deletes the last element of list Uh. In LEDA a list is viewed as a sequence of so-called items (type list_item), each of which contains an element of the list. Uh.last() returns the last item of the list Uh, and for an item it of Uh the content of the item is given by Uh[it] and the predecessor item is given by Uh.pred(it).

There is another interesting fact about the convex hull program. A point in LEDA may have arbitrary rational coordinates, and all geometric predicates are evaluated exactly, that is, with exact rational arithmetic. Also note that the program correctly handles multiple occurrences of the same point and collinear points, situations frequently referred to as degeneracies. All geometric algorithms in LEDA are suppossed to handle all degeneracies, and in fact many of them already do; we refer the reader to [2], [3], and [14] for more details.

Graphics. The LEDA type window is an interface to the X11 window system. Figure 8 illustrates this data type. The program reads a sequence of points, displays them, computes their upper hull, and displays the upper hull as a polygonal line. Again only a few explanations are needed. The definition window W defines a graphics window and opens it for mouse input. Any click on the left mouse button inputs a point (W [is much greater than] p); a click on the right mouse button lets the statement W [is much greater than] p evaluate to false. The point is appended to list L and displayed in the window W. Then the upper hull is computed and drawn as a polygonal line. Figure 9 shows an example.

Scope

LEDA is a wide collection of data types and algorithms on these data types. In fact, most of the data structures and algorithms described in the textbooks of the area (e.g., [1, 5, 8, 11, 18-23]) are contained in LEDA. The library is organized into six logical units: (1) basic data types; (2) numbers, vectors, and matrices; (3) dictionaries and priority queues; (4) graphs; (5) windows and panels; and (6) geometry. The basic data types are strings, lists, queues, stacks, arrays, partitions, and trees.

The number types are the built-in types int, float, and double as well as the arbitrary precision versions Int and Float. The type Int is the class of integers in the mathematical sense; Float is the class of floating point numbers with arbitrary precision mantissa and exponent. Vectors and matrices are available for all four number types.

In the third unit we have priority queues, dictionaries, dictionary- and hashing-arrays, sorted sequences, and persistent dictionaries.

The graph part offers different kinds of graphs: directed graphs, undirected graphs, and planar graphs. In addition, it offers data structures on graphs: e.g., arrays indexed by nodes and edges, respectively; priority queues on nodes; and node partitions, and many algorithms on graphs and networks: shortest paths, biconnected and strongly connected components, transitive closure, topological sorting, unweighted and weighted bipartite matching, unweighted general matching, network flow, min-cost network flow, planarity testing, planar embedding, minimum spanning tree, etc.

The windows and panels part offers an interface to the X11 windows system. It can be used to display geometric objects and to construct interactive applications with mouse input.

The section on geometry offers points, segments, lines, data structures on these objects (e.g., planar subdivisions and range-, segment-, and interval-trees), and some geometric algorithms (e.g., line segment intersection, Voronoi diagrams and Delaunay triangulations, and convex hulls in arbitrary dimensions).

Run-Time and Compile-Time Efficiency

All data types and algorithms in LEDA are precompiled and stored in libraries. An application program has to include only the header files of all the data types used in the application. These are typically short, since they include only the declarations of all the member functions of the type and only very small sections of actual code (for in-line functions and for type conversion). The application programs are typically short, since LEDA allows them to be formulated on a very high level. These factors together lead to short compilation times. Only the linker has to search through the large (about three MB) LEDA library.

All data types in LEDA are realized by the asymptotically most efficient implementation known. For many data types we have even included several implementations; for example, for dictionaries the user can choose between ab-trees, AVL-trees, BB[a]-trees, red-black trees, skip lists, and randomized search trees. The mechanism for choosing another implementation is quite convenient. Replacing the definition of dictionary array N in the word count program with _d_array<string, int, skip_list> N(0); tells LEDA to use the skiplist.

Abstract data types hide the implementation details of a data structure, but the abstraction, if improperly done, might bring about a loss of efficiency. Here is an example. A dictionary with key type K and information type I is usually viewed as a mapping from K to I. Suppose now that one wants first to access the information associated with some key k, next to modify that information, and finally to associate a new information with key k. In a dictionary as traditionally defined, this involves two searches: the first search locates the key k and the associated information, and the second search locates k again and associates a new information with k. However, a direct use of the data structure can avoid the second search. One can simply keep a pointer to the position of key k in the data structure and replace the second search with direct pointer access. In LEDA we introduced a novel item concept to overcome this inefficiency. A dictionary is viewed as a collection of items (type die_item), each containing a key and an information. The keys associated with different items are distinct. A lookup in a dictionary takes a key and returns an item (not an information). The information can then be accessed through the item. More important, the item can be stored and a later access can be made directly through the item. In this way, items are the abstraction of a position in a data structure. They give the same efficiency and nevertheless allow complete encapsulation of the underlying data structure. We feel that LEDA's item concept is a key factor in its efficiency.

It is clear that some penalty has to be paid for the generality of LEDA. Lauther [9] has carried out an extensive comparison between graph and network algorithms implemented in LEDA and hand-coded C-versions. He reports that the LEDA versions are slower by a factor of two to 10 (typically by a factor of four) and use two to 3.5 times the storage of his versions. We believe that this is quite acceptable given the convenience that LEDA offers. We should mention that we reimplemented some of the basic data structures (most notably graphs) as a reaction to Lauther's report. This reduced the typical slowdown to a factor of about two.

Experiences and Conclusion

Work on LEDA was started in 1989, and a first version was made available for outside use in 1990. Since fall 1992, LEDA version 3.0 is available through anonymous ftp. The version underlying this report, 3.1, will be released during the fall of 1994; it corrects bugs, provides a more efficient graph data type, and contains robust implementations of some geometric algorithms. Most users of LEDA are currently in universities and research institutions. According to a survey conducted in the spring of 1994, sample applications are code optimization, motion planning, logic synthesis, scheduling, VLSI design, term rewriting systems, semantic nets, machine learning, image analysis, computational biology, and many others.

LEDA is not the only library of data structures around. Others are NIHL [7] and Booch components [4]; cf. [10] for a survey. The main feature that distinguishes LEDA from the other libraries is its scope. No other library covers so much of combinatorial and geometric computing.

We conclude with some pointers to further literature, beginning with the LEDA Manual [17]. The implementation of LEDA, in particular the realization of parameterized types and of implementation parameters, is discussed in [12]. The implementation of geometric algorithms is discussed in [2], [3], and [14]. Case studies of algorithms implemented in LEDA can be found in [13], [15], and [16]. All are available through anonymous ftp (ftp.mpi-sb.mpg.de, directory pub/LEDA/articles).

References

1. Aho, A.V., Hopcroft, J.E., and Ullman, J.D. Data Structures and Algorithms. Addison-Wesley, Reading, Mass., 1983.

2. Burnikel, C., Mehlhorn, K., and Schirra, S. How to compute the Voronoi diagram of line segments: Theoretical and experimental results. In LNCS, 1994, Proceedings of ESA '94. Springer-Verlag, Berlin/New York.

3. Burnikel, C., Mehlhorn, K., and Schirra, S. On degeneracy in geometric computations. In Proceedings of SODA 94, (1994), p. 16-23.

4. Booch, G. Software Components with Ada. Benjamin Cummings/Addison-Wesley, Reading, Mass., 1987.

5. Cormen, T.H., Leiserson, C.E., and Rivest, R.L. Introduction to Algorithms. MIT Press/McGraw-Hill, Cambridge/New York 1990.

6. Dijkstra, E.W. A note on two problems in connexion with graphs. Numer. Math. 1 (1959), 269-271.

7. Gorlen, K.E., Orlow, S.M., and Plexico, P.S. Data Abstraction and Object-Oriented Programming in C++. Wiley, New York, 1990.

8. Kingston, J.H. Algorithms and Data Structures. Addison-Wesley, Reading, Mass., 1990.

9. Lauther, U. Untersuchung der library of efficient data types and algorithms (LEDA). Tech. Rep., Siemens AG, ZFE, Munchen, 1992.

10. Locke, N. C++ FTP libraries. C++ Report 6, (1994), 61-65.

11. Mehlhorn, K. Data Structures and Efficient Algorithms. Springer-Verlag, New York, 1984.

12. Mehlhorn, K., and Naher, S. Algorithm design and software libraries: Recent developments in the LEDA project. In Algorithms, Software, Architectures, Information Processing 92. Vol. 1. Elsevier North-Holland, Amsterdam 1992.

13. Mehlhorn, K., and Naher, S. Implementation of a sweep line algorithm for the segment intersection problem. Tech. Rep. MPI-I-94-160, Max-Planck-Institut fur Informatik, Saarbrucken, 1994.

14. Mehlhorn, K., and Naher, S. The implementation of geometric algorithms. Thirteenth World Computer Congress, IFIP94, Vol. 1, Elsevier Science North-Holland, Amsterdam, 1994.

15. Mehlhorn, K., Mutzel, P., and Naher, S. An implementation of the Hopcroft and Tarjan planarity test and embedding algorithm. Tech. Rep. MPI-I-93-151, Max-Planck-Institut fur Informatik, Saarbrucken, 1993.

16. Muller, M., and Ziegler, J. An implementation of a convex hull algorithm. Tech. Rep. MPI-I-94-105, Max-Planck-Institut fur Informatik, Saarbrucken, 1993.

17. Naher, S. LEDA manual. Tech. Rep. M PI-I -93-109, Max-Planck-Institut fur Informatik, 1993.

18. Nievergelt, J., and Hinrichs, K.H. Algorithms and Data Structures. Prentice-Hall, Englewood Cliffs, N.J., 1993.

19. O'Rourke, J. Computational Geometry in C. Cambridge University Press, Cambridge, England, 1994.

20. Sedgewick, R. Algorithms. Addison-Wesley, Reading, Mass., 1991.

21. Tarjan, R.E. Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 44, SIAM, Philadelphia, 1983.

22. van Wyk, C.J. Data Structures and C Programs. Addison-Wesley, Reading, Mass., 1988.

23. Wood, D. Data Structures, Algorithms, and Performance. Addison-Wesley, Reading, Mass., 1993.

Node priority queues (node_pq)

1. Definition

An instance Q of the parameterized data type node_pq<I> is a set of pairs (v, i), where v is a node of some graph G and i belongs to some linearly ordered type I; i is called the information associated with node v. For any node v of G there can be at most one pair (v,) in Q.

2. Creation

node_pq<I> Q(G); creates an empty instance Q of type node_pq<I> for the nodes of graph G.

3. Operations

TABULAR DATA OMITTED

4. Implementation

Node priority queues are implemented by Fibonacci heaps and node arrays. Operations insert, del_node, del_min take time O(log n), find_min, decrease_inf, empty take time O(1) and clear takes time O(m), where m is the size of Q. The space requirement is O(n), where n is the number of nodes of G.

KURT MEHLHORN is Director at the Max Planck Institute for Computer Science. Current research interests include data structures, graph algorithms, computational geometry, VLSI design, parallel algorithms, and implementation. Author's Present Address: Max Planck Institute for Computer Science, Im Stadtwald, 6612 Saarbrucken, Germany; email: mehlhorn

STEFAN NAHER is a professor of computer science at the Martin Luther University in Halle. Current research interests include data structures, algorithms, computational geometry, and implementation. Author's Present Address: Fachbereich Mathematik und Informatik, Martin Luther Universitaet, Halle, Weinbergweg 17, 06099 Halle, Germany; email: stefan

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Library of Efficient Data Types and Algorithms |
---|---|

Author: | Mehlhorn, Kurt; Naher, Stefan |

Publication: | Communications of the ACM |

Date: | Jan 1, 1995 |

Words: | 4407 |

Previous Article: | Case tools as collaborative support technologies. |

Next Article: | Ultra-structure: a design theory for complex systems and processes. |

Topics: |