# Concurrent Threads and Optimal Parallel Minimum Spanning Trees Algorithm.

Abstract. This paper resolves a long-standing open problem on whether the concurrent write capability of parallel random access machine (PRAM) is essential for solving fundamental graph problems like connected components and minimum spanning trees in O(log n) time. Specifically, we present a new algorithm to solve these problems in O(log n) time using a linear number of processors on the exclusive-read exclusive-write PRAM. The logarithmic time bound is actually optimal since it is well known that even computing the "OR" of n bits requires [Omega] (log n) time on the exclusive-write PRAM. The efficiency achieved by the new algorithm is based on a new schedule which can exploit a high degree of parallelism.Categories and Subject Descriptors: F.1.2 [Computation by Abstract Devices]: Models of Computation -- parallelism and concurrency; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems; G.2.2 [Discrete Mathematics]: Graph Theory -- graph algorithms

General Terms: Algorithms

Additional Key Words and Phrases: Connected components, EREW PRAM, minimum spanning trees, parallel algorithms

1. Introduction

Given a weighted undirected graph G with n vertices and m edges, the minimum spanning tree (MST) problem is to find a spanning tree (or spanning forest) of G with the smallest possible sum of edge weights. This problem has a rich history. Sequential MST algorithms running in O(m log n) time were known a few decades ago (see Tarjan [1983] for a survey). Subsequently, a number of more efficient MST algorithms have been published. In particular, Fredman and Tarjan [1987] gave an algorithm running in O(m[Beta](m, n)) time, where [Beta](m, n) = min{i|[log.sup.(i)] n [is less than or equal to] m/n). This time complexity was improved to O(m log [Beta](m, n)) by Gabow et al. [1986]. Chazelle [1997] presented an even faster MST algorithm with time complexity O(m[Alpha](m, n) log [Alpha](m, n)), where [Alpha](m, n) is the inverse Ackerman function. Recently, Chazelle [2000] improved his algorithm to run in O(m[Alpha](m, n)) time, and later Pettie [1999] independently devised a similar algorithm with the same time complexity. More recently, Pettie and Ramachandran [2000] obtained an algorithm running in optimal time. A simple randomized algorithm running in linear expected time has also been found [Karger et al. 1995].

In the parallel context, the MST problem is closely related to the connected component (CC) problem, which is to find the connected components of an undirected graph. The CC problem actually admits a faster algorithm in the sequential context, yet the two problems can be solved by similar techniques on various models of parallel random access machines (see the surveys by JaJa [1992] and Karp and Ramachandran [1990]). With respect to the model with concurrent write capability (i.e., processors can write into the same shared memory location simultaneously), both problems can be solved in O(log n) time using n + m processors [Awerbuch and Shiloach 1987; Cole and Vishkin 1986]. Using randomization, Gazit's [1986] algorithm can solve the CC problem in O(log n) expected time using (n + m)/log n processors. The work of this algorithm (defined as the time-processor product) is O(n + m) and thus optimal. Later, Cole et al. [1996] obtained the same result for the MST problem.

For the exclusive write models (including both concurrent-read exclusive-write and exclusive-read exclusive-write PRAMs), O([log.sup.2] n) time algorithms for the CC and MST problems were developed two decades ago [Chin et al. 1982; Hirschberg et al. 1979]. For a while, it was believed that exclusive write models could not overcome the O([log.sup.2] n) time bound. The first breakthrough was due to Johnson and Metaxas [1991; 1992]; they devised O([log.sup.1.5] n) time algorithms for the CC problem and the MST problem. These results were improved by Chong and Lam [1993] and Chong [1996] to O(log n loglog n) time. If randomization is allowed, the time or the work can be further improved. In particular, Karger et al. [1992] showed that the CC problem can be solved in O(log n) expected time, and later Halperin and Zwick [1996] improved the work to linear. For the MST problem, Karger [1995] obtained a randomized algorithm using O(log n) expected time (and super-linear work), and Poon and Ramachandran [1997] gave a randomized algorithm using linear expected work and O(log n [multiplied by] loglog n [multiplied by] [2.sup.log* n]) expected time.

Another approach stems from the fact that deterministic space bounds for the st-connectivity problem immediately imply identical time bounds for EREW algorithms for the CC problem. Nisan et al. [1992] have shown that st-connectivity problem can be solved deterministically using O([log.sup.1.5] n) space, and Armoni et al. [1997] further improved the bound to O([log.sup.4/3] n). These results imply EREW algorithm for solving the CC problem in O([log.sup.1.5] n) time and O([log.sup.4/3] n) time, respectively.

Prior to our work, it had been open whether the CC and MST problems could be solved deterministically in O(log n) time on the exclusive write models. Notice that [Theta](log n) is optimal since these graphs problems are at least as hard as computing the OR of n bits. Cook et al. [1986] have proven that the latter requires [Omega](log n) time on the CREW or EREW PRAM no matter how many processors are used.

Existing MST algorithms (and CC algorithms) are difficult to improve because of the locking among the processors. As the processors work on different parts of the graph having different densities, the progress of the processors is not uniform, yet the processors have to coordinate closely in order to take advantage of the results computed by each other. As a result, many processors often wait rather than doing useful computation. This paper presents a new parallel paradigm for solving the MST problem, which requires minimal coordination among the processors so as to fully utilize the parallelism. Based on new insight into the structure of minimum spanning trees, we show that this paradigm can be implemented on the EREW, solving the MST problem in O(log n) time using n + m processors. The algorithm is deterministic in nature and does not require special operations on edge weights (other than comparison).

Finding connected components or minimum spanning trees is often a key step in the parallel algorithms for other graph problems.(1) With our new MST algorithm, some of these parallel algorithms can be immediately improved to run in optimal (i.e., O(log n)) time without using concurrent write (e.g., biconnectivity [Tarjan and Vishkin 1985] and ear decomposition [Miller and Ramachandran 19861).

From a theoretical point of view, our result illustrates that the concurrent write capability is not essential for solving a number fundamental graph problems efficiently. Notice that EREW algorithms are actually more practical in the sense that they can be adapted to other more realistic parallel models like the Queuing Shared Memory (QSM) [Gibbons et al. 1997] and the Bulk Synchronous Parallel (BSP) model [Valiant 1990]. The latter is a distributed memory model of parallel computation. Gibbons et al. [1997] showed that an EREW PRAM algorithm can be simulated on the QSM model with a slow down by a factor of g, where g is the bandwidth parameter of the QSM model. Such a simulation is, however, not known for the CRCW PRAM. Thus, our result implies that the MST problem can be solved efficiently on the QSM model in O(g log n) time using a linear number of processors. Furthermore, Gibbons et al. [1997] derived a randomized work-preserving simulation of a QSM algorithm with a logarithmic slow down on the BSP model.

The rest of the paper is organized as follows: Section 2 reviews several basic concepts and introduces a notion called concurrent threads for finding minimum spanning trees in parallel. Section 3 describes the schedule used by the threads, illustrating a limited form of pipelining (which has a flavor similar to the pipelined merge-sort algorithm by Cole [1988]). Section 4 lays down the detailed requirement for each thread. Section 5 shows the details of the algorithm. To simplify the discussion, we first focus on the CREW PRAM, showing how to solve the MST problem in O(log n) time using (n + m) log n processors. In Section 6, we adapt algorithm to run on the EREW PRAM and reduce the processor bound to linear.

Remark. Very recently, Pettie and Ramachandran [1999] made use of the result in this paper to further improve existing randomized MST algorithms. Precisely, their algorithm is the first one to run, with high probability, in O(log n) time and linear work on the EREW PRAM.

2. Basics of Parallel MST Algorithms: Past and Present

In this section, we review a classical approach to finding an MST. Based on this approach, we can easily contrast our new MST algorithm with existing ones.

We assume that the input graph G is given in the form of adjacency lists. Consider any edge e = (u, v) in G. Note that e appears in the adjacency lists of u and v. We call each copy of e the mate of the other. When we need to distinguish them, we use the notations <u, v> and <v, u> to signify that the edge originates from u and v respectively. The weight of e, which can be any real number, is denoted by w(e) or w(u, v). Without loss of generality, we assume that the edge weights are all distinct. Thus, G has a unique minimum spanning tree, which is denoted by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] throughout this paper. We also assume that G is connected (otherwise, our algorithm finds the minimum spanning forest of G).

Let B be a subset of edges in G that contains no cycle. B induces a set of trees F = { [T.sub.1], [T.sub.2], ... , [T.sub.l] in a natural sense -- two vertices in G are in the same tree if they are connected by edges of B. If B contains no edge incident on a vertex v, then v itself forms a tree.

Definition 1. Consider any edge e = (u, v) in G and any tree T [element of] F. If both u and v belong to T, e is called an internal edge of T; if only one of u and v belongs to T, e is called an external edge. Note that an edge of T is also an internal edge of T, but the converse may not be true.

Definition 2. B is said to be a [Lambda]-forest if each tree T [element of] F has at least [Lambda] vertices.

For example, if B is the empty set, then B is a 1-forest of G; a spanning tree such as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is an n-forest. Consider a set B of edges chosen from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Assume that B is a [Lambda]-forest. We can augment B to give a 2[Lambda]-forest using a greedy approach: Let F' be an arbitrary subset of F such that F' includes all trees T [element of] F with fewer than 2[Lambda] vertices (F' may contain trees with 2[Lambda] or more vertices). For every tree in F', we pick its minimum external edge. Denote B' as this set of edges.

LEMMA 2.1. [JAJA 1992, LEMMA 5.4]. B' consists of edges in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] only.

LEMMA 2.2. B [union] B' is a 2[Lambda]-forest.

PROOF. Every tree in F - F' already contains at least 2[Lambda] vertices. Consider a tree T in F'. Let <u, v> be the minimum external edge of T, where v belongs to another tree T' [element of] F. With respect to B [union] B', all the vertices in T and T' are connected together. Among the trees induced by B [union] B', there is one including T and T', and it contains at least 2[Lambda] vertices. Therefore, B [union] B' is a 2[Lambda]-forest of G. []

Based on Lemmas 2.1 and 2.2, we can find [T*.sub.G] in ??log n?? stages as follows:

Notation. Let B[p, q] denote [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and the empty set otherwise.

procedure Iterative-MST(G)

(1) for i = 1 to ??log n?? do /* Stage i */

(a) Let F be the set of trees induced by B[1, i - 1] on G. Let F' be an arbitrary subset of F such that F' includes all trees T [element of] F with fewer than [2.sup.i] vertices.

(b) [B.sub.i] [left arrow] {e|e is the minimum external edge of T [element of] F'}

(2) return B[1, ??log n??].

At Stage i, different strategies for choosing the set F' in Step 1(a) may lead to different [B.sub.i]'s. Nevertheless, B[1, i] is always a subset of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and induces a [2.sup.i]-forest. In particular, B[1, ??log n??] induces a [2.sup.??log n??]-forest, in which each tree, by definition, contains at least [2.sup.??log n??] [is greater than] n/2 vertices. In other words, B[1, ??log n??] induces exactly one tree, which is equal to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Using standard parallel algorithmic techniques, each stage can be implemented in O(log n) time on the EREW PRAM using a linear number of processors (see e.g., JaJa [1992]). Therefore, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] can be found in O([log.sup.2] n) time. In fact, most parallel algorithms for finding MST (including those CRCW PRAM algorithms) are based on a similar approach.(2) These parallel algorithms are "sequential" in the sense that the computation of [B.sub.i] starts only after [B.sub.i-1] is available (see Figure 1(a)).

[ILLUSTRATION OMITTED]

An innovative idea exploited by our MST algorithm is to use concurrent threads to compute the [B.sub.i]'s. Threads are groups of processors working on different tasks, the computation of the threads being independent of each other. In our algorithm, there are ??log n?? concurrent threads, each finding a particular [B.sub.i]. These threads are characterized by the fact that the thread for computing [B.sub.i] starts long before the thread for computing [B.sub.i-1] is completed, and actually outputs [B.sub.i] in O(1) time after [B.sub.i-1] is found (see Figure 1(b)). As a result, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] can be found in O(log n) time.

Our algorithm takes advantage of an interesting property of the sets [B.sub.1], [B.sub.2], ... [B.sub.??log n??]. This property actually holds with respect to most of the deterministic algorithms for finding an MST, though it has not mentioned explicitly in the literature.

LEMMA 2.3. Let T be one of the trees induced by B[1, k], for any 0 [is less than or equal to] k [is less than or equal to] ??log n??. Let [e.sub.T] be the minimum external edge of T. For any subtree (i.e., connected subgraph) S of T, the minimum external edge of S is either [e.sub.T] or an edge of T.

PROOF. See Appendix. []

3. Overview and Schedule

Our algorithm consists of ??log n?? threads running concurrently. For 1 [is less than or equal to] i [is less than or equal to] ??log n??, Thread i aims to find a set [B.sub.i] which is one of the possible sets computed at Stage i of the procedure Iterative-MST. To be precise, let F be the set of trees induced by B[1, i - 1] and let F' be an arbitrary subset of F including all trees with fewer than [2.sup.i] vertices; [B.sub.i] contains the minimum external edges of the trees in F'. Thread i receives the output of Threads 1 to i - 1 (i.e., [B.sub.1], ..., [B.sub.i-1]) incrementally, but never looks at their computation. After [B.sub.i-1] is found, Thread i computes [B.sub.i] in a further of O(1) time.

3.1. EXAMPLES. Before showing the detailed schedule of Thread i, we give two examples illustrating how Thread i can speed up the computation of [B.sub.i]. In Examples 1 and 2, Thread i computes [B.sub.i] in time ci and 1/2 ci, respectively, after Thread (i - 1) reports [B.sub.i-1], where c is some fixed constant. To simplify our discussion, these examples assume that the adjacency lists of a set of vertices can be "merged" into a single list in O(1) time. At the end of this section, we will explain why this is infeasible in our implementation and highlight our novel observations and techniques to evade the problem.

Thread i starts with a set [Q.sub.0] of adjacency lists, where each list contains the [2.sup.i] - 1 smallest edges incident on a vertex in G. The edges kept in [Q.sub.0] are already sufficient for computing [B.sub.i]. The reason is as follows: Consider any tree T induced by B[1, i - 1]. Assume the minimum external edge [e.sub.T] of T is incident on a vertex v of T. If T contains fewer than [2.sup.i] vertices, at most [2.sup.i] -- 2 edges incident on v are internal edges of T. Thus, the [2.sup.i] - 1 smallest edges incident on v must include [e.sub.T].

Example 1. This is a straightforward implementation of Lemma 2.2. Thread i starts only when [B.sub.1], ..., [B.sub.i-1] are all available. Let F be the set of trees induced by B[1, i - 1]. Suppose we can merge the adjacency lists of the vertices in each tree, forming a single combined adjacency list. Notice that if a tree in F has fewer than [2.sup.i] vertices, its combined adjacency list will contain at most [([2.sup.i] - 1).sup.2] edges. For each combined list with at most [([2.sup.i] - 1).sup.2] edges, we can determine the minimum external edge in time ci, where c is some suitable constant. The collection of such minimum external edges is reported as [B.sub.i]. We observe that a combined adjacency list with more than [([2.sup.i] - 1).sup.2] edges represents a tree containing at least [2.sup.i] vertices. By the definition of [B.sub.i], it is not necessary to report the minimum external edge of such a tree.

Example 2. This example is slightly more complex, illustrating how Thread i works in an "incremental" manner. Thread i starts off as soon as [B.sub.i/2] has been computed. At this point, only [B.sub.i] ..., [B.sub.i/2] are available and Thread i is not ready to compute [B.sub.i]. Nevertheless, it performs some preprocessing (called Phase I below) so that when [B.sub.i/2+1], ..., [B.sub.i-1] become available, the computation of [B.sub.i] can be speeded up to run in time 1/2 ci only (Phase II).

Phase I. Let F' be the set of trees induced by B[1, i/2]. Again, suppose we can merge the adjacency lists in [Q.sub.o] for every tree in F', forming another set Q' of adjacency lists. By the definition of B[1, i/2], each tree in F' contains at least [2.sup.i/2] vertices. For each tree in F' with fewer than [2.sup.i] vertices, its combined adjacency list contains at most [([2.sup.i] - 1).sup.2] edges. We extract from the list the [2.sup.i/2] - 1 smallest edges such that each of them connects to a distinct tree in F'. These edges are sufficient for finding [B.sub.i] (the argument is an extension of the argument in Example 1). The computation takes time ci only.

Phase II. When [B.sub.i/2+1], ..., [B.sub.i-1] are available, we compute [B.sub.i] based on Q' as follows: Edges in B[i/2 + 1, i - 1] further connect the trees in F', forming a set F of bigger trees. Suppose we can merge the lists in Q' for every tree in F. Notice that if a tree in F contains fewer than [2.sup.i] vertices, it is composed of at most [2.sup.i/2] - 1 trees in F' and its combined adjacency list contains no more than [([2.sup.i/2] - 1)].sup.2] edges. In this case, we can find the minimum external edge in at most a further of 1/2 ci time. [B.sub.i] is the set of minimum external edges just found. In conclusion, after [B.sub.i-1] is computed, [B.sub.i] is found in time 1/2 ci.

Remark. The set [B.sub.i] found by Examples 1 and 2 may be different. Yet in either case, [B.sub.i] [union] B[1, i - 1] is a subset of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and a [2.sup.i]-forest.

3.2. THE SCHEDULE. Our MST algorithm is based on a generalization of the above ideas. The computation of Thread i is divided into ??log i?? phases. When Thread i - 1 has computed [B.sub.i-1], Thread i is about to enter its last phase, which takes O(1) to report [B.sub.i]. See Figure 1(b).

Globally speaking, our MST algorithm runs in ??log n?? supersteps, where each superstep lasts O(1) time. In particular, Thread i delivers [B.sub.i] at the end of the ith superstep. Let us first consider i a power of two. Phase 1 of Thread i starts at the (i/2 + 1)th superstep, that is, when [B.sub.1], ..., [B.sub.i/2] are available. The computation takes no more than i/4 supersteps, ending at the (i/2 + i/4)th superstep. Phase 2 starts at the (i/2 + i/4 + 1)th superstep (i.e., when [B.sub.i/2+1], ..., [B.sub.i/2+i/4] are available) and uses i/8 supersteps. Each subsequent phase uses half as many supersteps as the preceding phase. The last phase (Phase log i) starts and ends within the ith superstep. See Figure 2.

[ILLUSTRATION OMITTED]

For general i, Thread i runs in ??log i?? phases. To mark the starting time of each phase, we define the sequence

[a.sub.j] = i - ??i/[2.sup.j]??, for j [is greater than or equal to] 0.

(That is, [a.sub.0] = 0, [a.sub.1] = ??i/2??, ..., a??log i?? = i - 1.) Phase j of Thread i, where 1 [is less than or equal to] j [is less than or equal to] ??log i??, starts at the ([a.sub.j] + 1)th superstep and uses [a.sub.j+1] - [a.sub.j] = ??[i/[2.sup.j]?? - ??i/[2.sup.j+1?? = ??1/2??i/[2.sup.j]?? supersteps. Phase j has to handle the edge sets [B.sub.[a.sub.j-1]+1], ..., [B.sub.aj], which are made available by other threads during the execution of Phase (j - 1).

3.3. MERGING. In the above examples, we assume that for every tree in F, we can merge the adjacency lists of its vertices (or subtrees in Phase II of Example 2) into a single list efficiently and the time does not depend on the total length. This can be done via the technique introduced by Tarjan and Vishkin [1985]. Let us look at an example. Suppose a tree T contains an edge e between two vertices u and v. Assume that the adjacency lists of u and v contain e and its mate respectively. The two lists can be combined by having e and its mate exchange their successors (see Figure 3). If every edge of T and its mate exchange their successors in their adjacency lists, we will get a combined adjacency list for T in O(1) time. However, the merging fails if any edge of T or its mate is not included in the corresponding adjacency lists.

[ILLUSTRATION OMITTED]

In our algorithm, we do not keep track of all the edges for each vertex (or subtree) because of efficiency. For example, each adjacency list in [Q.sub.o] involves only [2.sup.i] - 1 edges incident on a vertex. With respect to a tree T, some of its edges and their mates may not be present in the corresponding adjacency lists. Therefore, when applying the O(1)-time merging technique, we may not be able to merge the adjacency lists into one single list for representing T. Failing to form a single combined adjacency list also complicates the extraction of essential edges (in particular, the minimum external edges) for computing the set [B.sub.i]. In particular, we cannot easily determine all the vertices belonging to T and identify the redundant edges, that is, internal and extra multiple external edges, in the adjacency list of T.

Actually our MST algorithm does not insist on merging the adjacency lists into a single list. A key idea here is that our algorithm can maintain all essential edges to be included in just one particular combined adjacency list. Based on some structural properties of minimum spanning trees, we can filter out redundant adjacency lists to obtain a unique adjacency list for T (see Lemmas 5.1 and 5.5 in Section 5).

In the adjacency list representing T, internal edges can all be removed using a technique based on "threshold" [Chong 1996]. The most intriguing part concerns the extra multiple external edges. We find that it is not necessary to remove all of them. Specifically, we show that those extra multiple external edges that cannot be removed "easily" must have a bigger weight and their presence does not affect the correctness of the computation.

In the next section, we will elaborate on the above ideas and formulate the requirements for each phase so as to achieve the schedule.

4. Requirements for a Phase

In this section, we specify formally what Thread i is expected to achieve in each phase. Initially (in Phase 0), Thread i constructs a set [Q.sub.0] of adjacency lists. For each vertex v in G, [Q.sub.0] contains a circular linked list L including the [2.sup.i] - 1 smallest edges incident on v. In addition, L is assigned a threshold, denoted by h(L). If L contains all edges of v, h(L) = + [infinity]; otherwise, h(L) = w([e.sub.0), where [e.sub.0] is the smallest edge truncated from L. In each of the ??log i?? phases, the adjacency lists are further merged based on the newly arrived edge sets, and truncated according to the length requirement. For each combined adjacency list, a new threshold is computed. Intuitively, the threshold records the smallest edge that has been truncated so far.

Consider Phase j, where 1 [is less than or equal to] j [is less than or equal to] ??log i??. It inherits a set [Q.sub.j-1] of adjacency lists from Phase j - 1 and receives the edge sets B[[a.sub.j-1] + 1, [a.sub.j]] (recall that [a.sub.j] = i - ??i/[2.sub.j]??). Let [F.sub.j] denote the set of trees induced by B[1, [a.sub.j]]. Phase j aims at producing a set [Q.sub.j] of adjacency lists capturing the external edges of the trees in [F.sub.j] that are essential for the computation of [B.sub.i]. Basically, we try to merge the adjacency lists in [Q.sub.j-1] with respect to B[[a.sub.j-1] + 1, [a.sub.j]]. As mentioned before, this merging process may produce more than one combined adjacency list for each tree T [element of] [F.sub.j]. Nevertheless, we strive to ensure that only one combined list is retained to represent T. The rest are filtered out. In view of the time constraint imposed by the schedule of Thread i, we also need a tight bound on the length of each remaining adjacency list.

Let L be a list in [Q.sub.j]

R1 (representation): L uniquely corresponds to a tree T [element of] [F.sub.j], storing only the external edges of T. In this case, T is said to be represented by L in [Q.sub.j]. Some trees in [F.sub.j] may not be represented by any lists in [Q.sub.j]; however, all trees with fewer than [2.sup.i] vertices are represented.

R2 (length): L contains at most [2.sup.??i/[2.sup.j]??] - 1 edges.

We will define what edges of a tree T [Epsilon] [F.sub.j] are essential and must be included in L. Consider an external edge e of T that connects to another tree T' [element of] [F.sub.j]. We say that e is primary if, among all edges connecting T and T', e has the smallest weight. Otherwise, e is said to be secondary. Note that if the minimum spanning tree of G contains an edge which is an external edge of both T and T', it must be a primary one. Ideally, only primary external edges should be retained in each list of [Q.sub.j]. Yet this is infeasible since Thread i starts off with truncated adjacency lists and we cannot identify and remove all the secondary external edges in each phase. (Removing all internal edges, though nontrivial, is feasible.)

An important observation is that it is not necessary to remove all secondary external edges. Based on a structural classification of light and heavy edges (defined below), we find that all light secondary external edges can be removed easily. Afterwards, each list contains all the light primary external edges and possibly some heavy secondary external edges. The set of light primary external edges may not cover all primary external edges and its size can be much smaller than [2.sup.??i/[2.sup.j]??] -- l. Yet we will show that the set of light primary external edges suffices for computing [B.sub.i], and the presence of heavy secondary external edges does not affect the correctness.

Below we give the definition of light and heavy edges, which are based on the notion of base.

Definition. Let T be a tree in [F.sub.j].

-- Let [Gamma] be any real number. A tree T' [element of] [F.sub.j] is said to be [Gamma]-accessible to T if T = T', or there is another tree T" [element of] [F.sub.j] such that T" is [Gamma]-accessible to T and connected to T' by an edge with weight smaller than [Gamma].

-- Let e be an external (or internal) edge of T. Define base([F.sub.j], T, e) to be the set {T'|T' [element of] [F.sub.j] and T' is w(e)-accessible to T}. The size of base([F.sub.j], T, e), denoted by [parallel]base([F.sub.j], T, e)[parallel], is the total number of vertices in the trees involved.

-- Let e be an external (or internal) edge of T. We say that e is light if [parallel]base([F.sub.j], T, e)[parallel] [is less than] [2.sup.i]; otherwise, e is heavy.

It follows from the above definition that a light edge of a tree T has a smaller weight than a heavy edge of T. Also, a heavy edge of T will remain a heavy edge in subsequent phases. More specifically, in any Phase k where k [is greater than] j, if T is a subtree of some tree X [element of] [F.sub.k], then for any external (or internal) edge e of T, [parallel]base([F.sub.j], T, e)[parallel] [is less than or equal to] [parallel]base([F.sub.k], X, e)[parallel]. Therefore, if e is heavy with respect to T, then it is also heavy with respect to X.

The following lemma gives an upper bound on the number of light primary external edges of each tree in [F.sub.j], which complies with the length requirement of the lists in [Q.sub.j].

LEMMA 4.1. Any tree T [element of] [F.sub.j] has at most [2.sup.??i/[2.sup.j]??] - 1 light primary external edges.

PROOF. Let x be the number of light primary external edges of T. Among the light primary external edges of T, let e be the one with the biggest weight. The set base([F.sub.j], T, e) includes T and at least x - 1 trees adjacent to T. As B[1, [a.sub.j]] is a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-forest, every tree in [F.sub.j] contains at least [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] vertices. We have [paralle]base([F.sub.j], T, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. By definition of a light edge, [paralle]base([F.sub.j], T, e)[paralle] [is less than] [2.sup.i]. Thus, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

The following requirement specifies the essential edges to be kept in each list of [Q.sub.j] and characterizes those secondary external edges, if any, in each list.

R3 (base) Let T be the tree in [F.sub.j] represented by a list L [element of] [Q.sub.j]. All light primary external edges of T are included in L, and secondary external edges of T, if included in L, must be heavy.

Retaining only the light primary external edges in each list of [Q.sub.j] is already sufficient for the computation of [B.sub.i]. In particular, let us consider the scenario at the end of Phase ??log i??. For any tree [T.sub.s] [element of] [F.sub.??log i??] with fewer than [2.sup.i] vertices, the minimum external edge [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of [T.sub.s] must be reported in [B.sub.i]. Note that base ([F.sub.??log i??], [T.sub.s], [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) contains [T.sub.s] only. Thus, [parallel]base([F.sub.??log i??] [T.sub.s] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII][parallel] [is less than] [2.sup.i], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a light primary external edge of [T.sub.s]. In all previous phases k, [F.sub.k] contains a subtree of [T.sub.s], denoted by W, of which [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is an external edge. Note that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is also a light primary external edge of W (as a heavy edge remains a heavy edge subsequently).

On the other hand, at the end of Phase ??log i??, if a tree [T.sub.x] [element of] [F.sub.??log i??] contains [2.sup.i] or more vertices, all its external edges are heavy and R3 cannot enforce the minimum external edge [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of [T.sub.x] being kept in the list for [T.sub.x]. Fortunately, it is not necessary for Thread i to report the minimum external edge for such a tree. The following requirements for the threshold help us detect whether the minimum external edge of [T.sub.x] has been removed. If so, we will not report anything for [T.sub.x]. Essentially, we require that if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] or any primary external edge e of [T.sub.x] has been removed from the list [L.sub.x] that represents [T.sub.x], the threshold kept in [L.sub.x] is no bigger than w [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (respectively, w(e)). Then, the smallest edge in [L.sub.x] is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] if and only if its weight is fewer than the threshold.

Let T be a tree in [F.sub.j] represented by a list L [element of] [Q.sub.j]. The threshold of L satisfies the following properties.

R4 (lower bound for the threshold) If h(L) [is not equal to] [infinity] then h(L) is equal to the weight of a heavy internal or external edge of T.

R5 (upper bound for the threshold) Let e be an external edge of T not included in L. If e is primary, then h(L) [is less than or equal to] w(e).

(Our algorithm actually satisfies a stronger requirement that h(L) [is less than or equal to] w(e) if

--e is primary, or

--e is secondary and the mate of e is still included in another list L' in [Q.sub.j].)

In summary, R1 to R5 guarantee that at the end of Phase ??log i??, for any tree T [element of] [F.sub.??log i??], if T has fewer than [2.sup.i] vertices, its minimum external edge [e.sub.T] is the only edge kept in a unique adjacency list representing T; otherwise, T may or may not be represented by any list. If T is represented by a list but [e.sub.T] has already been removed, the threshold kept is at most w([e.sub.T]). Every external edge currently kept in the list must have a weight greater than or equal to the threshold. Thus, we can simply ignore the list for T.

It is easy to check that [Q.sub.0] satisfies the five requirements for Phase 0. In the next section we will give an algorithm that can satisfy these requirements after every phase. Consequently, Thread i can report [B.sub.i] based on the edges in the lists in [Q.sub.??log i??].

5. The Algorithm

In this section, we present the algorithmic details of Thread i, showing how to merge and extract the adjacency lists in each phase. The discussion is inductive in nature--for any j [is greater than or equal to] 1, we assume that Phase j - 1 has produced a set of adjacency lists satisfying the requirements R1-R5, and then show how Phase j computes a new set of adjacency lists satisfying the requirements in O(i/[2.sup.j]) time using a linear number of processors.

Phase j inherits the set of adjacency lists [Q.sub.j-1] from Phase j - 1 and receives the edges B[[a.sub.j-1] + 1, [a.sub.j]]. To ease our discussion, we refer to B[[a.sub.j-1] + 1, [a.sub.j]] as INPUT. Notice that a list in [Q.sub.j-1] represents one of the trees in [F.sub.j-1] (recall that [F.sub.j-1] and [F.sub.j] denote the set of trees induced by B[1, [a.sub.j-1] and B[1, [a.sub.j], respectively). Phase j merges the adjacency lists in [Q.sub.j-1] according to how the trees in [F.sub.j-1] are connected by the edges in INPUT.

Consider an edge e = (u, v) in INPUT. Denote [W.sub.1] and [W.sub.2] as the trees in [F.sub.j-1] containing u and v, respectively. Ideally, if e and its mate appear in the adjacency lists of [W1] and [W.sub.2], respectively, the adjacency lists of [W.sub.1] and [W.sub.2] can be merged easily in O(1) time. However, [W.sub.1] or [W.sub.2] might already be too large and not have a representation in [Q.sub.j-1]. Even if they are represented, the length requirement of the adjacency lists may not allow e to be included. As a result, e may appear in two separate lists in [Q.sub.j-1], or in just one, or even in none; we call e a full, half, and lost edge, respectively. Accordingly, we partition INPUT into three sets, namely Full-INPUT, Half-INPUT, and Lost-INPUT.

Phase j starts off by merging the lists in [Q.sub.j-1] with respect to edges in Full-INPUT. Let T be a tree in [F.sub.j]. Let [W.sub.1], [W.sub.2], ..., [W.sub.k] be the trees in [F.sub.j-1] that, together with the edges in INPUT, constitute T. Note that some [W.sub.i] may not be represented by a list in [Q.sub.j-1]. Since the merging is done with respect to Full-INPUT, the adjacency lists of [W.sub.1], [W.sub.2], ..., [W.sub.k], if present, may be merged into several lists instead of a single one. Let [L.sub.1], [L.sub.2], ..., [L.sub.l] denote these merged lists. Each [L.sub.i] represents a bigger subtree [Z.sub.i] of T, which is called a cluster below (see Figure 4). A cluster may contain one or more [W.sub.i]. We distinguish one cluster, called the core cluster, such that the minimum external edge [e.sub.T] of T is an external edge of that cluster. Note that the minimum external edge of the core cluster may or may not be [e.sub.T]. For a noncore cluster Z, the minimum external edge [e.sub.Z] of Z must be a tree edge of T (by Lemma 2.3) and thus [e.sub.Z] is in INPUT. Moreover, [e.sub.Z] is not a full edge. Otherwise, the merging should have operated on [e.sub.Z], which then becomes an internal edge of a bigger cluster.

[ILLUSTRATION OMITTED]

The merged lists obviously need not satisfy the requirements for [Q.sub.j]. In the following sections, we present the additional processing used to fulfill the requirements. A summary of all the processing is given in Section 5.4. The discussion of the processing of the merged lists is divided according to the sizes of the trees, sketched as follows:

--For each tree T [element of] [F.sub.j] that contains fewer than [2.sup.i] vertices, there is a simple way to ensure that exactly one merged list is retained in [Q.sub.j]. Edges in that list are filtered to contain all the light primary external edges of T, and other secondary external edges of T, if included, must be heavy.

--For a tree T' [element of] [F.sub.j] that contains at least [2.sup.i] vertices, the above processing may retain more than one merged lists. Here, we put in an extra step to ensure that, except possibly one, all merged lists for T' are removed.

--The threshold of each remaining list is updated after retaining the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] smallest edges. We show that the requirements for the threshold are satisfied no matter whether the tree in concern contains fewer than [2.sup.i] vertices or not.

5.1. TREES IN [F.sub.J] WITH FEWER THAN [2.sup.i] VERTICES. In this section, we focus on each tree T [element of] [F.sub.j] that contains fewer than [2.sup.i] vertices. Denote by [L.sub.1], ..., [L.sub.l] the merged lists representing the clusters of T. Observe that each of these lists contains at most [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges. Below, we derive an efficient way to find a unique adjacency list for representing T, which contains all light primary external edges of T.

First of all, we realize that every light primary external edge of T is also a light primary external edge of a tree W in [F.sub.j-1] and must be present in the adjacency list that represents W in [Q.sub.j-1] (by R3). Thus, all light primary external edges of T (including the minimum external edge of T) are present in some merged lists.

5.1.1. Unique Representations. Let [L.sub.cc] be the list in {[L.sub.1], [L.sub.2], ..., [L.sub.l]} such that [L.sub.cc] contains the minimum external edge [e.sub.T] of T. That is, [L.sub.cc] represents the core cluster [Z.sub.0] of T. Our concern is how to remove all other lists in {[L.sub.1], [L.sub.2], ..., [L.sub.l]} so that T will be represented uniquely by [L.sub.cc].

To efficiently distinguish [L.sub.cc] from other lists, we make use of the properties stated in the following lemma. Let [L.sub.nc] be any list in {[L.sub.1], [L.sub.2], ..., [L.sub.l]} -- {[L.sub.cc]}. Let Z denote the cluster represented by [L.sub.nc].

LEMMA 5.1

(i) [L.sub.cc] does not contain any edge in Half-INPUT.

(ii) [L.sub.nc] contains at least one edge in Half-INPUT. In particular, the minimum external edge of Z is in Half-INPUT.

PROOF OF LEMMA 5.1(i). Assume to the contrary that [L.sub.cc] includes an edge e = (a, b) in Half-INPUT; more precisely, <a, b> is in [L.sub.cc] and <b, a> is not included in any list in [Q.sub.j-1]. Let W and W' be the trees in [F.sub.j-1] connected by e, where a [element of] W and b [element of] W'. The edge e is a primary external edge of W, as well as of W'. Both W and W' are subtrees of T, and W is also a subtree of [Z.sub.0]. Below we show that [parallel]base([F.sub.j-1], W', e)[parallel] [is greater than or equal to] [2.sup.i] and T contains at least [2.sup.i] vertices. The latter contradicts the assumption about T. Thus, Lemma 5.1(i) follows.

W' is a subtree of T and contains less than [2.sup.i] vertices. By R1, [Q.sub.j-1] contains a list [L.sub.W'] representing W'. By R3, [L.sub.W'] contains all light primary external edges of W'. The edge <b, a> is not included in [L.sub.W'] and must be heavy. Therefore, [parallel]base([F.sub.j-1], W', e)[parallel] [is greater than or equal to] [2.sup.i].

Next, we want to show that all trees in base([F.sub.j-1], W', e) are subtrees of T. Define [T.sub.a] and [T.sub.b] to be the subtrees of T constructed by removing e from T (see Figure 5). Assume that [T.sub.a] contains the vertex a and [T.sub.b] the vertex b. W, as well as [Z.sub.0], is a subtree of [T.sub.a], and W' is a subtree of [T.sub.b]. By Lemma 2.3, the minimum external edge of [T.sub.b] is either [e.sub.T] or e. The former case is impossible because [e.sub.T] is included in [L.sub.cc] and must be an external edge of [Z.sub.0]. Thus, e is the minimum external edge of [T.sub.b]. By definition of base, base([F.sub.j-1], W', e) cannot include any trees in [F.sub.j-1] that are outside [T.sub.b]. In other words, [T.sub.b] includes all subtrees in base([F.sub.j-1], W', e). [T.sub.b] must have at least [2.sup.i] vertices, and so must T. A contradiction occurs. []

[ILLUSTRATION OMITTED]

PROOF OF LEMMA 5.1(ii). Let [e.sub.Z] be the minimum external edge of Z. As [e.sub.Z] is a tree edge of T, it is in INPUT but is not a full edge. In this case, we can further show that [e.sub.Z] is actually a half edge and included in [L.sub.nc], thus completing the proof. Let W be the tree in [F.sub.j-1] such that W is a component of Z and [e.sub.Z] is an external edge of W. Note that [e.sub.Z] is primary external edge of W. Let [L.sub.W] denote the adjacency list in [Q.sub.j-1] representing W. Since [e.sub.Z] is the minimum external edge of Z, base([F.sub.j-1], W, [e.sub.Z]) cannot include trees in [F.sub.j-1] that are outside Z, and thus it has size less than [2.sup.i]. By R3, all light primary external edges of W including [e.sub.Z] are present in [L.sub.W]. Therefore, [e.sub.Z] is in Half-INPUT, and [L.sub.nc] must have inherited [e.sub.Z] from [L.sub.W]. []

Using Lemma 5.1, we can easily retain [L.sub.cc] and remove all other merged lists [L.sub.nc]. One might worry that some [L.sub.nc] might indeed contain some light primary external edge of T and removing [L.sub.nc] is incorrect. This is actually impossible in view of the following fact:

LEMMA 5.2. For any external edge e of T that is included in [L.sub.nc], [parallel]base([F.sub.j], T, e)[parallel] [is greater than or equal to] [2.sup.i].

PROOF. Let [e.sub.Z] be the minimum external edge of Z. For any external edge e of T that is included in [L.sub.nc], e is also an external edge of Z and w(e) [is greater than or equal to] w([e.sub.Z]).

Let W and W' be the trees in [F.sub.j-1] connected by [e.sub.Z] such that W' is not a component of Z. As shown in the previous lemma, [e.sub.Z] is in Half-INPUT. Moreover, [e.sub.z] is a tree edge of T and is present only in the adjacency list of W. That is, W' is a component of T and the adjacency list of W' in [Q.sub.j-1] does not contain [e.sub.z]. Note that [e.sub.z] is a primary external edge of W'. By R1 and R3, we can conclude that [e.sub.z] is a heavy external edge of W' and hence [parallel]base([F.sub.j-1], W', [e.sub.z])[parallel] [is greater than or equal to] [2.sup.i]. Therefore, [parallel]base([F.sub.j], T, e)[parallel] [is greater than or equal to] [parallel]base([F.sub.j], T, [e.sub.z])[parallel] [is greater than or equal to] [parallel]base([F.sub.j-1], W', [e.sub.z])[parallel] [is greater than or equal to] [2.sup.i]. []

By Lemma 5.2, [L.sub.nc] does not contain any light primary external edge of T. In other words, all light primary external edges of T must be in [L.sub.cc], which is the only list retained.

5.1.2. Excluding All Light Internal and Secondary External Edges. [L.sub.cc] contains all the light primary external edges of T and also some other edges. Because of the length requirement (i.e., R2), we retain at most [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges of [L.sub.cc]. Note that the light primary external edges may not be the smallest edges in [L.sub.cc]. Based on the following two lemmas, we can remove all other light edges in [L.sub.cc], which include the light internal and secondary external edges of T. Then, the light primary external edges of T will be the smallest edges left in the list and retaining the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] smallest edges will always include all the light primary external edges.

LEMMA 5.3. Suppose [L.sub.cc] contains a light internal edge <u, v> of T. Then its mate, <v, u>, also appears in [L.sub.cc].

PROOF. Recall that [L.sub.cc] is formed by merging the adjacency lists of some trees in [F.sub.j-1]. By R1, each of these lists does not contain any internal edge of the tree it represents. If [L.sub.cc] contains a light internal edge <u, v> of T, the edge (u, v) must be between two trees W and W' in [F.sub.j-1] which are components of [Z.sub.0]. Assume that u [element of] W and v [element of] W'. Then <u, v> and <v, u> are light external edges of W and W' respectively. Let [L.sub.W] and [L.sub.W], be their adjacency lists in [Q.sub.j-1]. As <u, v> appears in [L.sub.cc], <u, v> also appears in [L.sub.W]. By R3, all light edges found in [L.sub.W], including <u, v>, must be primary external edges of W. By symmetry, <v, u> is a primary external edge of W'. By R3 again, <v, u> appears in [L.sub.W']. Since [L.sub.cc] inherits the edges from both [L.sub.W] and [L.sub.W'], we conclude that both <u, v> and <v, u> appear in [L.sub.cc]. []

LEMMA 5.4. Suppose [L.sub.cc] contains a light secondary external edge e of T. Let [e.sub.0] be the corresponding primary external edge. Then e and [e.sub.0] both appear in [L.sub.cc], and their mates also both appear in another merged list [L'.sub.cc], where [L'.sub.cc] represents the core cluster of another tree T' [element of] [F.sub.j].

PROOF. Suppose [L.sub.cc] contains a light secondary external edge e of T. Assume that e connects T to another tree T' [element of] [F.sub.j], and [e.sub.0] is the primary external edge between T and T'. As w([e.sub.0]) [is less than] w(e), [parallel]base([F.sub.j], T, [e.sub.0])[parallel] [is less than or equal to] [parallel]base([F.sub.j], T, e)[parallel] [is less than] [2.sub.i]. Thus, [e.sub.0] is also a light primary external edge of T and must be included in [L.sub.cc]. On the other hand, since e is secondary, base([F.sub.j], T, e) is equal to base([F.sub.j], T', e), and thus base([F.sub.j], T, e) contains T'. Since [parallel]base([F.sub.j], T, e)[parallel] [is less than] [2.sup.i], T' contains less than [2.sup.i] vertices. After merging the lists in [Q.sub.j-1], we obtain a merged list [L'.sub.cc] that includes all light primary external edges of T'. Below we show that [L'.sub.cc] contains the mates of [e.sub.0] and e.

--Observe that [parralel]base([F.sub.j], T', [e.sub.0])[parallel] [is less than or equal to] [parallel]base([F.sub.j], T', e)[parallel] = [parallel]base([F.sub.j], T, e)[parallel] [is less than] [2.sup.i]. Thus, both [e.sub.0] and e are light external edges of T'. As [e.sub.0] is also a primary external edge of T', [e.sub.0] (more precisely, its mate) must be included in [L'.sub.cc].

--Let W and W' be the two trees in [F.sub.j-1] connected by e, where W is a subtree of T and W' of T'. Because e is a light external edge of T and T', it is also a light external edge of W and W'. Note that [L.sub.cc] inherits e from the adjacency list [L.sub.W] [element of] [Q.sub.j-1] that represents W. By R3, [L.sub.W] does not include any light secondary external edge of W, so e is a primary external edge of W. By symmetry, e is also a primary external edge of W'; thus, by R3, e is in the adjacency list [L.sub.W], [element of] [Q.sub.j-1] that represents W'. Note that [L'.sub.cc] must include all the edges of [L.sub.W], as well as other lists in [Q.sub.j-1] that contain light external edges of T (see Lemma 5.2). [L'.sub.cc] contains e, too. []

By Lemma 5.3, we can remove all light internal edges by simply removing edges whose mates are in the same list. Lemma 5.4 implies that if [L.sub.cc] contains a light secondary external edge, the corresponding primary external edge also appears in [L.sub.cc] and their mates exist in another list [L'.sub.cc]. This suggests a simple way to identify and remove all the light secondary external edges as follows. Without loss of generality, we assume that every edge in [L.sub.cc] can determine the identity of [L.sub.cc] (any distinct label given to [L.sub.cc]). If an edge e [element of] [L.sub.cc] has a mate in another list, say, [L'.sub.cc], e can announce the identity of [L.sub.cc] to its mate and vice-versa. By sorting the edges in [L.sub.cc] with respect to the identities received from their mates, multiple light external edges connected to the same tree come together. Then we can easily remove all the light secondary external edges.

Now we know that [L.sub.cc] contains all light primary external edges of T and any other edges it contains must be heavy. Let us summarize the steps required to build a unique adjacency list for representing T.

procedure M&C // M&C means Merge and Clean up//

(1) Edges in INPUT that are full with respect to [I.sub.j-1] are activated to merge the lists in [I.sub.j-1]. Let I be the set of merged adjacency lists.

(2) For each merged adjacency list L [element of] Q,

(a) if L contains an edge in Half-INPUT, remove L from Q;

(b) detect and remove internal and secondary external edges from L according to Lemmas 5.3 and 5.4.

5.2. TREES WITH AT LEAST [2.sup.i] Vertices. Consider a tree T' [element of] [F.sub.j] containing [2.sup.i] or more vertices. Let [L.sub.1], ..., [L.sub.l] be the merged lists, each representing a cluster of T'. Some of these lists may contain more than [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges. Unlike the case in Section 5.1, the minimum external edge [e.sub.T'], of T' is heavy, and we cannot guarantee that there is a merged list containing [e.sub.T'], and representing the core clusters of T'. Nevertheless, Thread i can ignore such a tree T', and we may remove all the merged lists. In Lemma 5.5, we show that those lists in {[L.sub.1], ... [L.sub.l]} that represent the non-core clusters of T' can be removed easily. If there is indeed a merged list [L.sub.cc] representing the core cluster, Thread i may not remove [L.sub.cc]. Since T' contains at least [2.sup.i] vertices and has no light primary external edge, we have nothing to enforce on [L.sub.cc] regarding the light primary external edges. The only concern for [L.sub.cc] is the requirements for the threshold, which will be handled in Section 5.3.

As T' contains at least [2.sup.i] vertices, any merged list [L.sub.nc] that represents a non-core cluster of T' may not satisfy the properties stated in Lemma 5.1(ii). We need other ways to detect such an [L.sub.nc]. First, we can detect the length of [L.sub.nc]. If [L.sub.nc] contains more than [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges, we can remove [L.sub.nc] immediately. Next, if [L.sub.nc] contains less than [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges, we make use of the following lemma to identify it. Denote h(L) as the threshold associated with a list L [element of] [Q.sub.j-1]. Define tmp-h([L.sub.nc]) = min{h(L)|L E [element of] [Q.sub.j-1] is merged into [L.sub.nc]}.

LEMMA 5.5. Any list [L.sub.nc] [element of] {[L.sub.1], ..., [L.sub.l]} - {[L.sub.cc]} that represents a noncore cluster of T' satisfies at least one of the following conditions:

(1) [L.sub.nc] contains an edge in Half-INPUT.

(2) For every edge <u, v> in [L.sub.nc], either <v, u> is also in [L.sub.nc] or w(u, v) [is greater than or equal to] tmp-h ([L.sub.nc]).

PROOF. Assume that [L.sub.nc] does not contain any edge in Half-INPUT, and [L.sub.nc] contains an edge <u, v> but does not contain <v, u>. Below we show that w(u, v) [is greater than or equal to] tmp-h([L.sub.nc]). The edge (u, v) can be an internal or external edge of Z.

Case 1. (u, v) is an internal edge of Z. [L.sub.nc] inherits <u, v> from a list L [element of] [Q.sub.j-1]. Let W [element of] [F.sub.j-1] be the tree represented by L. By R1, <u, v> is an external edge of W. Thus, Z includes another tree W' [element of] [F.sub.j-1] with <v, u> as an external edge, and [L.sub.nc] also inherits the edges in the list [L.sub.W'] [element of] [Q.sub.j-1] that represents W'. Note that <v, u> does not appear in [L.sub.W']. By R5, h([L.sub.W']) [is less than or equal to] w(u, v). Since tmp-h([L.sub.nc]) [is less than or equal to] h([L.sub.W']), we have tmp-h([L.sub.nc]) [is less than or equal to] w(u, v).

Case 2. (u, v) is an external edge of Z. It is obvious that w(u, v) [is greater than or equal to] w([e.sub.Z]) where [e.sub.Z] is the minimum external edge of Z. We further show that w([e.sub.Z]) [is greater than or equal to] tmp-h (L). Let W [element of] [F.sub.j-1] be the tree in Z and with [e.sub.Z] as an external edge. Let [L.sub.W] [element of] [Q.sub.j-1] be the adjacency list representing W. As mentioned before, [e.sub.Z] is in INPUT and is not a full edge. If [e.sub.Z] is a lost edge, then [L.sub.nc] does not contain [e.sub.Z]. If [e.sub.Z] is a half edge, [L.sub.nc] again does not contain [e.sub.Z] because [L.sub.nc] does not contain any edge in Half-INPUT. In conclusion, [e.sub.Z] does not appear in [L.sub.nc] and hence cannot appear in [L.sub.W]. Since [e.sub.Z] is a primary external edge of W, we know that, by R5, h([L.sub.W]) [is less than or equal to] w([e.sub.Z]). By definition, tmp-h([L.sub.nc]) [is less than or equal to] h([L.sub.W]). Therefore, tmp-h([L.sub.nc]) [is less than or equal to] w([e.sub.Z]) [is less than or equal to] w(u, v). []

Using Lemma 5.5, we can extend Procedure M&C to remove every merged list [L.sub.nc] that represents a non-core cluster of any tree T in [F.sub.j] (see Procedure Ext_M&C). Precisely, if T has fewer than [2.sup.i] vertices, [L.sub.nc] is removed in Step 2(a); otherwise, [L.sub.nc] is removed in Step 1(b) or Steps 2(a)-(c).

procedure Ext_M&C

(1) (a) Edges in INPUT that are full with respect to [Q.sub.j-1] are activated to merge the lists in [I.sub.j-1]. Let Q be the set of merged adjacency lists.

(b) For each list L [element of] Q, if L contains more than [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges, remove L from Q.

(2) For each merged adjacency list L [element of] Q,

(a) if L contains an edge in Half-INPUT, remove L from Q;

(b) detect and remove internal and secondary external edges from L; (c) if, for all edges <u, v> in L, w(u, v) [is greater than or equal to] tmp-h(L), remove L from Q.

After Procedure Ext_M&C is executed, all the remaining merged lists are representing the core clusters of trees in [F.sub.j]. Moreover, for tree T [element of] [F.sub.j] with fewer than [2.sup.i] vertices, Procedure Ext_M&C, like Procedure M&C, always retains the merged list [L.sub.cc] that represents the core cluster of T. [L.sub.cc] is not removed by Step 1(b) because [L.sub.cc] cannot contain more than [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges. In addition, [L.sub.cc] contains all the light primary external edges of T. In Lemma 5.6 below, we show that tmp-h([L.sub.cc]) is the weight of a heavy internal or external edge of T. Thus, [L.sub.cc] contains edges with weight less than tmp-h([L.sub.cc]) and cannot be removed by Step 2(c).

LEMMA 5.6. tmp-h([L.sub.cc]) is equal to the weight of a heavy internal or external edge of T.

PROOF. Among all the lists in [Q.sub.j-1] that are merged into [L.sub.cc], let L be the one with the smallest threshold. That is, tmp-h([L.sub.cc]) = h(L). Let W denote the tree in [F.sub.j-1] represented by L. By R4, h (L) is equal to the weight of a heavy internal or external edge e of W. Thus, e is also a heavy internal or external edge of T. Lemma 5.6 follows. []

5.3. UPDATING THRESHOLD AND RETAINING ONLY EXTERNAL EDGES. After Procedure Ext_M&C is executed, every remaining merged list is representing the core-cluster of a tree in [F.sub.j]. Let [L.sub.cc] be such a list representing a tree T [element of] [F.sub.j]. If T contains less than [2.sup.i] vertices, all light primary external edges of T appear among the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] smallest edges in [L.sub.cc], and all other edges in [L.sub.cc] are heavy edges. If T has at least [2.sup.i] vertices, no external or internal edges of T are light and all edges in [L.sub.cc] must be heavy. By the definition of Ext_M&C, the number of edges in [L.sub.cc] is at most [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], but may exceed the length requirement for Phase j [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. To ensure that [L.sub.cc] satisfies R2 and R3, we retain only the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] smallest edges on [L.sub.cc]. The threshold of [L.sub.cc], denoted by h([L.sub.cc]), is updated to be the minimum of tmp-h([L.sub.cc]) and the weight of the smallest edge truncated.

No matter whether T contains fewer than [2.sup.i] vertices or not, every edge truncated from [L.sub.cc] is heavy. Together with Lemma 5.6, we can conclude that h([L.sub.cc]) is equal to the weight of a heavy internal or external edge of T, satisfying R4.

Next, we give an observation on [L.sub.cc] and in Lemma 5.8, we show that R5 is satisfied. Denote [Z.sub.0] as the core-cluster of T represented by [L.sub.cc].

LEMMA 5.7. Let e be an external edge of [Z.sub.0]. If e is a tree edge of T, then e is not included in [L.sub.cc] and h([L.sub.cc]) [is less than or equal to] w(e).

PROOF. Suppose e is included in [L.sub.cc]. Note that e cannot be a full edge with respect to [Q.sub.j-1] because a full edge and its mate should have been removed in Step 2(b) in Procedure Ext_M&C. Then, e is in Half-INPUT and Procedure Ext_M&C should have removed [L.sub.cc] at Step 2(a). This contradicts that [L.sub.cc] is one of the remaining lists after Procedure Ext_M&C is executed. Therefore, e is not included in [L.sub.cc].

Next, we show that h([L.sub.cc]) [is less than or equal to] w(e). Let W be the subtree of [Z.sub.0] such that e is an external edge of W. Since e is a tree edge, e is a primary external edge of W. As e is not included in [L.sub.cc] and [L.sub.cc] inherits the adjacency list [L.sub.W] [element of] [Q.sub.j-1] representing W, e is also not included in [L.sub.W]. By R5, h([L.sub.W]) [is less than or equal to] w(e). Recall that h([L.sub.cc]) [is less than or equal to] tmp-h([L.sub.cc]) [is less than or equal to] h([L.sub.W]). Therefore, h([L.sub.cc]) [is less than or equal to] w(e). []

LEMMA 5.8. Let e be an external edge of T currently not found in [L.sub.cc]. If (i) e is primary, or (ii) e is secondary and the mate of e is still included in some other list L' in [Q.sub.j], then h([L.sub.cc]) [is less than or equal to] w(e).

PROOF. Let e = <u, v> be an external edge of T currently not found in [L.sub.cc], satisfying the conditions stated in Lemma 5.8. Let W be the tree in [F.sub.j-1] such that W is a subtree of T and e is an external edge of W. With respect to W, either e is primary, or e is secondary and the mate of e is included in another list in [Q.sub.j-1]. We consider whether W is included in the core cluster [Z.sub.0] of T.

Case 1. W is a subtree of [Z.sub.0]. By definition of [Z.sub.0], W must be represented by a list [L.sub.w] [element of] [Q.sub.j-1]. At the end of Phase j - 1, e may or may not appear in [L.sub.W]. If e does not appear in [L.sub.W], then, by R5, h([L.sub.W]) [is less than or equal to] w(e). Since h([L.sub.cc]) [is less than or equal to] tmp-h([L.sub.cc]) [is less than or equal to] h([L.sub.w]), we have h([L.sub.cc]) [is less than or equal to] w(e). Suppose that e is in [L.sub.W]. Then e is passed to [L.sub.cc] when Procedure EXT_M&C starts off. Yet e is currently not in [L.sub.cc]. If e is removed from [L.sub.cc] within Procedure Ext_M&C, this has to take place at Step 2(b), and e is either an internal edge of T or a secondary external edge removed together with its mate. This contradicts the assumption about e. Thus, e is removed after Procedure Ext_M&C, that is, due to truncation. In this case, the way h([L.sub.cc]) is updated guarantees h([L.sub.cc]) [is less than or equal to] w(e).

Case 2. W is a subtree of a noncore cluster Z. We show that [Z.sub.0] has an external edge e' = <a, b> such that h([L.sub.cc]) [is less than or equal to] w(a, b) and w(a, b) [is less than] w(e). Observe that T contains a path connecting [Z.sub.0] and Z, and this path must involve an external edge <a, b> of [Z.sub.0]. By Lemma 5.7, h([L.sub.cc]) [is less than or equal to] w(a, b).

Next we show that w(a, b) [is less than] w(e). Let W' be the tree in [F.sub.j-1] such that W' is a subtree of [Z.sub.0] and <a, b> is an external edge of W'. See Figure 7. Suppose we remove the edge (a, b) from T, T is partitioned into two subtrees [T.sub.a] and [T.sub.b], containing the vertices a and b, respectively. Note that [T.sub.b] contains W, and e is an external edge of [T.sub.b]. On the other hand, [Z.sub.0] is included in [T.sub.a], and [e.sub.T] is not an external edge of [T.sub.b]. By Lemma 2.3, the minimum external edge of [T.sub.b] is <b, a>. Therefore, w(a, b) [is less than] w(e). As a result, h([L.sub.cc]) [is less than or equal to] w(a, b) [is less than] w(e) and the lemma follows. []

[ILLUSTRATION OMITTED]

5.3.1. Removing Remaining Internal Edges. Note that [L.sub.cc] may still contain some internal edges of T. This is because Procedure Ext:_M&C only remove those internal edges whose mates also appear in [L.sub.cc]. The following lemma shows that every remaining internal edges in [L.sub.cc] has a weight greater than h([L.sub.cc]). Thus, by discarding those edges in [L.sub.cc] with weight greater than h([L.sub.cc]), we ensure that only external edges of T are retained. Of course, no light primary external edge can be removed by this step.

LEMMA 5.9. For any internal edge e of T that is currently included in [L.sub.cc], h([L.sub.cc]) [is less than or equal to] w(e).

PROOF. We consider whether e = <u, v> is an internal or external edge of [Z.sub.0].

Case 1. e is an internal edge of [Z.sub.0]. Suppose [L.sub.cc] inherits e from the list [L.sub.w] [element of] [Q.sub.j-1], where [L.sub.w] represents a tree W [element of] [F.sub.j-1] and W is a subtree of [Z.sub.0]. By R1, <u, v> is an external edge of W. Then [Z.sub.0] includes another tree W' [element of] [F.sub.j-1] contains the vertex v. Denote [L.sub.W'], as the list in [Q.sub.j-1] that represents W'. The edge <v, u> is an external edge of W'. But <v, u> does not appear in [L.sub.W'], (otherwise, [L.sub.cc] would have also inherited <v, u> from [L.sub.W'] and Procedure Ext_M&C should have removed both <u, v> and <v, u> from [L.sub.cc] at Step 2(b)). By R5, h([L.sub.W']) [is less than or equal to] w(u, v). As h([L.sub.cc]) [is less than or equal to] tmp-h([L.sub.cc]) [is less than or equal to] h([L.sub.W']), we have h([L.sub.cc]) [is less than or equal to] w(u, v).

Case 2. e is an external edge of [Z.sub.0]. By Lemma 5.7, e is not a tree edge of T. Let P be a path on T connecting u and v. See Figure 8. Since T is a subtree of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], every edge on P has weight smaller than w(u, v). On P, we can find an external edge <a, b> of [Z.sub.0]. By Lemma 5.7 again, h([L.sub.cc]) [is less than or equal to] w(a, b) and hence h([L.sub.cc]) [is less than or equal to] w(u, v). []

[ILLUSTRATION OMITTED]

5.4. THE COMPLETE ALGORITHM. The discussion of Thread i in the previous three sections is summarized in the following procedure. The time and processor requirement will be analyzed in the next section.

Thread i

Input: G; [B.sub.k], where 1 [is less than or equal to] k [is less than or equal to] i - 1, is available at the end of the kth superstep

Output: [B.sub.i]

[diamond] // Phase 0 (Initialization) Construct [Q.sub.0] from G; [a.sub.0] [left arrow] 0

[diamond] For j = 1 to ?? log i ?? do // Phase j // denote INPUT as B[[a.sub.j-1] + 1, [a.sub.j]], where [a.sub.j] = i - ?? i/[2.sup.j] ??.

1. (a) Edges in INPUT that are full with respect to [Q.sub.j-1] merge their lists in [Q.sub.j-1]. Let Q be the set of merged adjacency lists.

(b) For each list L [element of] Q, if L contains at most [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges, tmp-h(L) [left arrow] min{h(L') | L' [element of] [Q.sub.j-1] and L' is a part of L}; otherwise, remove L from g.

2. For each list L [element of] Q, // Remove unwanted edges and lists

(a) if L contains an edge in Half-INPUT, remove L from Q;

(b) detect and remove internal and secondary external edges from L;

(c) if, for all edges <u, v> in L, w(u, v) [is greater than or equal to] tmp-h(L), remove L from Q.

3. // Truncate each list if necessary and remove remaining internal edges

(a) For each list L [element of] Q if L contains more than [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edges, retain the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] smallest ones and update h(L) to the minimum of tmp-h(L) and W([e.sub.0]), where [e.sub.0] is the smallest edge just removed from L.

(b) For each edge <u, v> [element of] L, if w(u, v) [is greater than or equal to] h(L), remove <u, v> from L.

4. [Q.sub.j] [left arrow] Q.

[diamond] [B.sub.i] [left arrow] {(u, v)|<u, v> or <v, u> appears in some list L in [Q.sub.??log i??]; w(u, v) [is less than] h(L)}.

6. Time and Processor Complexity

First, we show that the new MST algorithm runs in O(log n) time using (n + m) log n CREW PRAM processors. Then we illustrate how to modify the algorithm to run on the EREW PRAM and reduce the processor bound to linear.

Before the threads start to run concurrently, they need an initialization step. First, each adjacency list of G is sorted in ascending order with respect to the edge weights. This set of sorted adjacency lists is replicated ?? log n ?? times and each copy is moved to the "local memory" of a thread, which is part of the global shared memory dedicated to the processors performing the "local" computation of a thread. The replication takes O(log n) time using a linear number of processors. Then each thread constructs its own [Q.sub.0] in O(1) time. Afterwards, the threads run concurrently.

As mentioned in Section 3, the computation of a thread is scheduled to run in a number of phases. Each phase starts and ends at predetermined supersteps. We need to show that the computation of each phase can be completed within the allocated time interval. In particular, Phase j of Thread i is scheduled to start at the ([a.sub.j] + 1)th superstep and end at the [a.sub.j+1]th superstep using ??i/[2.sup.j]?? - ??i/[2.sup.j+1]?? = ?? 1/2 ?? i/[2.sup.j] ?? supersteps. The following lemma shows that Phase j of Thread i can be implemented in c(i/[2.sup.j-1]) time, where c is a constant. By setting the length of a superstep to a constant c' such that c(i/[2.sup.j-1])/c' [is less than or equal to] ?? 1/2 ?? i/[2.sup.j] ?? ??, Phase j can complete its computation in at most ?? 1/2 ?? i/[2.sup.j] ?? ?? supersteps. It can be verified that c' [is greater than or equal to] 8c satisfies this condition.

LEMMA 6.1. Phase j of Thread i can be implemented in O(i/[2.sup.j-1]) time using n + m CREW PRAM processors.

PROOF. Consider the computation of Phase j of Thread i. Before the merging of the adjacency list starts, Thread i reads in B[[a.sub.j-1] + 1, [a.sub.j], which may also be read by many other threads, into the local memory of Thread i. The merging of adjacency lists in Step 2(a) takes O(1) time. In Step 2(b), testing the length of a list ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) can be done by performing pointer jumping in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] time. After that, all adjacency lists left have length at most ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). In the subsequent steps, we make use of standard parallel algorithmic techniques including list ranking, sorting, and pointer jumping to process each remaining list. The time used by these techniques is the logarithmic order of the length of each list (see e.g., JaJa [1992]). Therefore, all the steps of Phase j can be implemented in c(i/[2.sup.j-1]) time, for some constant c, using a linear number of processors. []

COROLLARY 6.1. The minimum spanning tree of a weighted undirected graph can be found in O(log n) time using (n + m) log n CREW PRAM processors.

PROOF. By Lemma 6.1, the computation of Phase j of Thread i satisfies the predetermined schedule. Therefore, [B.sub.i] can be found at the end of the ith superstep and B[1, ?? log n ??] are all ready at the end of the ?? log n ?? th superstep. That means the whole algorithm runs in O(log n) time. As Thread i uses at most n + m processors, (n + m) log n processors suffice for the whole algorithm. []

6.1. ADAPTATION TO EREW PRAM. We illustrate how to modify the algorithm to run on the EREW PRAM model. Consider Phase j of Thread i. As discussed in the proof of Lemma 6.1, concurrent read is used only in accessing the edges of B[[a.sub.j-1] + 1, [a.sub.j]],, which may also be read by many other threads at the same time. If B[[a.sub.j-1] + 1, [a.sub.j]] have already resided in the local memory of Thread i, all steps can be implemented on the EREW PRAM.

To avoid using concurrent read, we require each thread to copy its output to each subsequent thread. By modifying the schedule, each thread can perform this copying process in a sequential manner. Details are as follows: As shown in the proof of Lemma 6.1, Phase j of Thread i can be implemented in c(i/[2.sup.j-1]) time, where c is a constant. The length of a superstep was set to be c' so that Phase j of Thread i can be completed within ?? 1/2 ?? i/[2.sup.j] ?? ?? supersteps. Now the length of each superstep is doubled (i.e., each superstep takes 2c' time instead of c'). Then the computation of Phase j can be deferred to the last half supersteps (i.e., the last ?? 1/4 ?? i/[2.sup.j] ?? ?? supersteps). In the first half supersteps of Phase j (i.e., from the ([a.sub.j+1] + 1)th to ([a.sub.j+1] + ?? 1/4 ?? i/[2.sup.j] ?? ??)th supersteps), no computation is performed. Thread i is waiting for other threads to store the outputs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] into the local memory.

To complete the schedule, we need to show how each Thread k, where k [is less than] i, perform the copying in time. Recall that Thread k completes its computation at the kth superstep. In the (k + t)th superstep, where t [is greater than or equal to] 0, Thread k copies [B.sub.k] to four threads, namely Thread (k + 4t + 1) to Thread (k + 4t + 4). Each replication takes O(1) time using a linear number of processors.

LEMMA 6.2. Consider any Thread i. At the end of the ([a.sub.j] + ?? 1/4 ?? i/[2.sup.j] ?? ??)th superstep, there is a copy of B[[a.sub.j-1] + 1, [a.sub.j]] residing in the local memory of Thread i.

PROOF. For k [is less than] i, Thread i receives [B.sub.k] at the (k + ?? (i - k)/4 ??)th superstep. For Phase j of Thread i, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the last set of edges to be received and it arrives at the ([a.sub.j] + ?? (i - [a.sub.j])/4 ??) = ([a.sub.j] + ?? 1/4 ?? i/[2.sup.j] ?? ??)th superstep, just before the start of the second half of Phase j. []

6.2. LINEAR PROCESSORS. In this section, we further adapt our MST algorithm to run on a linear number of processors. We first show how to reduce the processor requirement to m + n log n. Then, for a dense graph with at least n log n edges, the processor requirement is dominated by m. Finally, we give a simple extra step to handle sparse graphs.

To reduce the processor requirement to m + n log n, we would like to introduce some preprocessing to each thread so that each thread can work on only n (instead of m) edges to compute the required output using n processors. Yet the preprocessing of each thread still needs to handle m edges and requires m processors. To sidestep this difficulty, we attempt to share the preprocessing among the threads. Precisely, the computation is divided into ??log log n?? + 1 stages. In Stage k, where 1 [is less than or equal to] k [is less than or equal to] ?? log log n ??, we perform one single preprocessing, which then allows up to [2.sup.k-1] threads to compute concurrently the edge sets [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in [2.sup.k-1] supersteps using [2.sup.k-1] [multiplied by] n processors. The preprocessing itself runs in O([2.sup.k]) supersteps using m + n processors. Thus, each stage makes use of at most m + n log n processors, and the total number of supersteps over all stages is still O(log n).

LEMMA 6.3. The minimum spanning tree of a weighted undirected graph can be found in O(log n) time using m + n log n processors on the EREW PRAM.

PROOF. The linear-processor algorithm runs in ?? loglog n ?? + 1 stages. In Stage 0, [B.sub.1] is found by Thread 1. For 1 [is less than or equal to] k [is less than or equal to] ?? loglog n ??, Stage k is given B[1, [2.sup.k-1]] and is to compute B[[2.sup.k-1] + 1, [2.sup.k]]. Specifically, let x = [2.sup.k-1], Stage k involves Thread 2x for the preprocessing and Threads 1, 2, ..., x for the actual computation of [B.sub.x+1], [B.sub.x+2], ..., [B.sub.2x]. Both parts require O(x) supersteps.

The preprocessing is to prepare the initial adjacency lists for each thread. Let F be the set of trees induced by B[1, x], which is, by definition, a [2.sup.x]-forest of G. We invoke Thread 2x to execute Phase 1 only, computing a set [Q.sub.1] of adjacency lists. By definition, each list in [Q.sub.1] has length at most [2.sup.2x/2] - 1 = [2.sup.x] - 1, representing a tree T in F and containing all primary external edges of T with base less than [2.sup.2x] [multiplied by] [Q.sub.1] contains sufficient edges for finding not only [B.sub.2x] but also [B.sub.x+1], ..., [B.sub.2x-1]. As F contains at most n/[2.sup.x] trees, [Q.sub.1] contains a total of at most n edges.

Each list in [Q.sub.1] is sorted with respect to the edge weight using O(x) supersteps and n processors. Then [Q.sub.1] is copied into the local memory of Threads 1 to x one by one in x supersteps using n processors. For 1 [is less than or equal to] i [is less than or equal to] x, Thread i replaces its initial set of adjacency lists [Q.sub.0] with a new set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which is constructed by truncating each list in [Q.sub.1] to include the smallest [2.sup.i] - 1 edges.

Threads 1 to x are now ready to run concurrently, computing [B.sub.x+1], ..., [B.sub.2x], respectively. For all 1 [is less than or equal to] i [is less than or equal to] x, Thread i uses its own [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] as the initial set of adjacency lists and follows its original phase-by-phase schedule to execute the algorithm stated in Section 5.4. Note that the algorithm of a thread is more versatile than was stated. When every Thread i starts with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] as input, Thread i will compute the edge set [B.sub.x+i] (instead of [B.sub.i]) in i supersteps. That is, [B.sub.x+1], ..., [B.sub.2x] can be found by Threads 1 to x in x supersteps. Note that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] has at most n edges, the processors requirement of each thread is n only.

In short, Stage k takes O(x) supersteps using m + x [multiplied by] n [is less than or equal to] m + n log n processors. Recall that x = [2.sup.k-1]. The ?? loglog n ?? stages altogether run in O(log n) time using m + n log n processors. []

If the input graph is sparse, that is, m [is less than] n log n, we first construct a contracted graph [G.sub.c] of G as follows: We execute Threads 1 to loglog n concurrently to find B[1, log n], which induces a (log n)-forest B of G. Then, by contracting each tree in the forest, we obtain a contracted graph [G.sub.c] with at most n/log n vertices. The contraction takes O(log n) time using m + n processors. By Lemma 6.3, the minimum spanning tree of [G.sub.c], denoted [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], can be computed in O(log n) time using m + (n/log n) log n = m + n processors. Note that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and B include exactly all the edges in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We conclude with the following theorem.

THEOREM 6.1. The minimum spanning tree of an undirected graph can be found in O(log n) time using a linear number of processors on the EREW PRAM.

Appendix

We prove the following claim and lemma that capture some interesting properties of the trees induced by the edge sets [B.sub.1], [B.sub.2], ..., [B.sub.??log n??].

CLAIM. Let T be any one of the trees induced by B[1, k], for any 0 [is less than or equal to] k [is less than or equal to] ??log n??. Let (a, p) and (b, q) be two external edges of T, where a, b [element of] T. For each edge on the path connecting a and b in T, its weight is smaller than max{w(a,p), w(b, q)}.

PROOF. We prove the lemma by induction on k. The base case, k = 0, is true as every tree contains only one vertex. Thus, a = b and the path involves no edge.

Inductive case, k [is greater than or equal to] 1. Let P be the path in T connecting a and b. Let X = {[W.sub.1], [W.sub.2], ..., [W.sub.l]} be the subset of trees induced by B[1, k - 1] such that each of them involves at least one vertex of P. Without loss of generality, we can assume that a [element of] [W.sub.1], b [element of] [W.sub.l] and for 1 [is less than or equal to] i [is less than or equal to] l - 1, [W.sub.i] and [W.sub.i+1] are connected by an edge ([v.sub.i], [u.sub.i+1]), where [v.sub.i] [element of] [W.sub.i] and [u.sub.i+1] [element of] [W.sub.i+1]. By the construction of [B.sub.k], ([v.sub.i], [u.sub.i+1]) is the minimum external edge of [W.sub.i] or that of [W.sub.i+1] (or both). Let [P.sub.i] be the "sub-path" of P in [W.sub.i], that is, [P.sub.i] = P [intersection] [W.sub.i]. Let [u.sub.1] = a and [v.sub.l] = b. Then [P.sub.i] is a path in [W.sub.i] connecting [u.sub.i] and [v.sub.i]. We can partition P into l smaller paths and l - 1 edges as follows: <[P.sub.1], ([v.sub.1], [u.sub.2]), [P.sub.2], ([v.sub.2], [u.sub.3]), ..., ([v.sub.l-1], [u.sub.l]), [P.sub.l]>.

We are going to prove that for 1 [is less than or equal to] i [is less than or equal to] l - 1, w([v.sub.i], [u.sub.i+1]) [is less than] max{w(a,p), w(b, q)}. Then, by the induction hypothesis, we can show that every edge on [P.sub.i] also has weight smaller than max{w(a, p), w(b, q)}.

--Edges connecting [W.sub.i] and [W.sub.i+1]. Recall that ([v.sub.i], [u.sub.i+1]) is the minimum external edge of either [W.sub.i] or [W.sub.i+1]. We can find a level t for which a tree [W.sub.t] [element of] X has the minimum external edge with the following property: Either it is not in P or it is also the minimum external edge of [W.sub.t-1]. Then, for 1 [is less than or equal to] i [is less than or equal to] t - 1, the minimum external edge of [W.sub.i] is ([v.sub.i], [u.sub.i+1]); for t + 1 [is less than or equal to] i [is less than or equal to] l, the minimum external edge of [W.sub.i] is ([v.sub.i-1], [u.sub.i]). As a result, w([v.sub.1], [u.sub.2]) [is greater than] w([v.sub.2], [u.sub.3]) [is greater than] ... [is greater than] w([v.sub.t-1], [u.sub.t]) and w([v.sub.t], [u.sub.t+1]) [is less than] w([v.sub.t+1], [u.sub.t+2]) [is less than] ... [is less than] w([v.sub.l-1], [u.sub.l]). Since w(a, p) [is greater than] w([v.sub.1], [u.sub.2]) and w([v.sub.l-1], [u.sub.l]) [is less than] w(b, q), it follows that w([v.sub.i], [u.sub.i+1]) [is less than] max{w(a,p), w(b, q)} for 1 [is less than or equal to] i [is less than or equal to] l - 1.

--Edges on [P.sub.i]. Consider the path [P.sub.i] in [W.sub.i] which connects [u.sub.i] and [v.sub.i]. By the induction hypothesis, every edge on [P.sub.i] has weight smaller than max{w([v.sub.i-1], [u.sub.i]), w([v.sub.i], [u.sub.i+1])} (or max{w(p, a), w([v.sub.1], [u.sub.2])} if i = 1, max{w([v.sub.l-1], [u.sub.l]), w(b, q)} if i = l). As shown above, both w([v.sub.i-1], [u.sub.i]) and w([v.sub.i], [u.sub.i+1]) are smaller than max{w(a, p), w(b, q)}. Therefore, every edge on [P.sub.i] has weight smaller than max{w(a, p), w(b, q)}.

As a result, every edge on P has weight smaller than max{w(a, p), w(b, q)}. []

LEMMA 2.3. Let T be any one of the trees induced by B[1, k], for any 0 [is less than or equal to] k [is less than or equal to] ?? log n ??. Let [e.sub.T] be the minimum external edge of T. For any subtree (i.e., connected subgraph) S of T, the minimum external edge of S is either [e.sub.T] or an edge of T.

PROOF. Let e be the minimum external edge of S. Assume to the contrary that e is not an edge of T and e [is not equal to] [e.sub.T]. That means e is an external edge of T and w(e) [is greater than] w([e.sub.T]). Then [e.sub.T] cannot be an external edge of S.

Let [e.sub.T] = <u, v> and e = <x, y>. Consider the path P in T connecting u and x. Since u does not belong to S, we can find an edge e' on P that is an external edge of S (and of course an edge of T). By the above claim, every edge on P has weight smaller than max{w(e), w([e.sub.T])} = w(e). Thus, e' has weight smaller than w(e). Therefore, e cannot be the minimum external edge of S. We obtain a contradiction. []

[ILLUSTRATION OMITTED]

(1) See, for example, Miller and Ramachandran [1986], Maon et al. [1986], Tarjan and Vishkin [1985], and Vishkin [1985].

(2) See, for example, Awerbuch and Shiloach [1987], Chin et al. [1982], Cole and Vishkin [1986], Chong and Lam [1993], Chong [1996], Johnson and Metaxas [1991; 1992], and Karger et al. [1992].

REFERENCES

ARMONI, R., TA-SHMA, A., WIGDERSON, A., AND ZHOU, S. 1997. SL [subset or equal to] [L.sup.4/3]. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (El Paso, Tex., May 4-6) ACM, New York, pp. 230-239.

AWERBUCH, B., AND SHILOACH, Y. 1987. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Comput. 36, 1258-1263.

CHAZELLE, B. 1997. A faster deterministic algorithm for minimum spanning trees. In Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 22-31.

CHAZELLE, B. 2000. A minimum spanning tree algorithm with inverse-ackermann type complexity. J. ACM 47, 6 (Nov.), 1028-1047.

CHIN, F. Y., LAM, J., AND CHEN, I.-N. 1982. Efficient parallel algorithms for some graph problems. Commun. ACM 25, 659-665.

CHONG, K.W. 1996. Finding minimum spanning trees on the EREW PRAM. In Proceedings of the International Computer Symposium (Taiwan, China). pp. 7-14.

CHONG, K. W., AND LAM, T.W. 1993. Finding connected components in O(log n loglog n) time on the EREW PRAM. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (Austin, Tex., Jan. 25-27). ACM, New York, pp. 11-20.

COLE, R. 1988. Parallel Merge Sort. SIAM J. Comput. 17, 770-785.

COLE, R., KLEIN, P. N., AND TARJAN, R.E. 1996. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In Proceedings of the 8th Annual ACM Symposium on Parallel Architectures and Algorithms (Padua, Italy, June 24-26). ACM, New York, pp. 243-250.

COLE, R., AND VISHKIN, U. 1986. Approximate and exact parallel scheduling with applications to list, tree, and graph problems. In Proceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 478-491.

COOK, S. A., DWORK, C., AND REISCHUK, R. 1986. Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM J. Comput. 15, 1, 87-97.

FREDMAN, M. L., AND TARJAN, R.E. 1987. Fibonacci heaps and their use in improved network optimization algorithms. J. ACM 34, 3 (July), 596-615.

GABOW, H., GALIL, Z., SPENCER, T., AND TARJAN, R. E. 1986. Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica 6, 2, 109-122.

GAZIT, H. 1986. An optimal randomized parallel algorithm for finding connected components in a graph. In Proceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 492-501.

GIBBONS, P. B., MATIAS, Y., AND RAMACHANDRAN, V. 1997. Can a shared-memory model serve as a bridging model for parallel computation? In Proceedings of the 9th ACM Symposium on Parallel Algorithms and Architectures (Newport, R.I., June 22-25). ACM, New York, pp. 72-83.

HALPERIN, S., AND ZWICK, U. 1996. Optimal randomized EREW PRAM algorithms for finding spanning forests and for other basic graph connectivity problems. In Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 438-447.

HAN, Y., AND WAGNER, R.A. 1990. A fast and efficient parallel-connected component algorithm. J. ACM 37, 3 (July), 626-642.

HIRSCHBERG, D. S., CHANDRA, A. K., AND SARWATE, D.V. 1979. Computing connected components on parallel computers. Commun. ACM 22, 461-464.

JAJA, J. 1992. An Introduction to Parallel Algorithms. Addison-Wesley, Reading, Mass.

JOHNSON, D. B., AND METAXAS, P. 1991. Connected components in O([lg.sup.3/2] |V|) parallel time for the CREW PRAM. In Proceedings of the 32nd Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 688-697.

JOHNSON, D. B., AND METAXAS, P. 1992. A parallel algorithm for computing minimum spanning trees. In Proceedings of the 4th Annual ACM Symposium on Parallel Architectures and Algorithms (San Diego, Calif., June 29-July 1). ACM, New York, pp. 363-372.

KARGER, D. R. 1995. Random sampling in graph optimization problems. Ph.D. dissertation. Department of Computer Science, Stanford University, Stanford, Calif.

KARGER, D. R., KLEIN, P. N., AND TARJAN, R.E. 1995. A randomized linear-time algorithm to find minimum spanning trees. J. ACM 42, 2 (Mar.), 321-328.

KARGER, D., NISAN, N., AND PARNAS, M. 1992. Fast connected components algorithms for the EREW PRAM. In Proceedings of the 4th Annual ACM Symposium on Parallel Architectures and Algorithms (San Diego, Calif., June 29-July 1). ACM, New York, pp. 373-381.

KARP, R. M., AND RAMACHANDRAN, V. 1990. Parallel algorithms for shared-memory machines. In Handbook of Theoretical Computer Science vol A., van Leeuwen, J., Ed. MIT Press, Cambridge, Mass.

MAON, Y., SCHIEBER, B., AND VISHKIN, U. 1986. Parallel ear decomposition search (EDS) and s-t numbering in graphs. Theoret. Comput. Sci. 47, 277-298.

MILLER, G. L., AND RAMACHANDRAN, V. 1986. Efficient parallel ear decomposition with applications. Manuscript. MSRI, Berkeley, Calif.

NISAN, N., SZEMEREDI, E., AND WIGDERSON, A. 1992. Undirected connectivity in O([log.sup.1.5] n) space. In Proceedings of the 33rd Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 24-29.

PETTIE, S. 1999. Finding minimum spanning trees in O(n [Alpha](m, n)) time. UTCS Tech. Rep. TR99-23. Dept. Computer Sciences, The University of Texas at Austin, Austin, Tex.

PETTIE, S., AND RAMACHANDRAN, V. 1999. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. In Proceedings of the 3rd International Workshop on Randomization and Approximation Techniques in Computer Science. Lecture Notes in Computer Science, vol. 1671. Springer-Verlag, New York, pp. 233-244.

PETTIE, S., AND RAMACHANDRAN, V. 2000. An optimal minimum spanning tree algorithm. In Proceedings of the 27th International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 1853. Springer-Verlag, New York, pp. 49-60.

POON, C. K., AND RAMACHANDRAN, V. 1997. A randomized linear work EREW PRAM algorithm to find a minimum spanning forest. In Proceedings of the 8th Annual International Symposium on Algorithms and Computation. Lecture Notes in Computer Science, vol. 1350. Springer-Verlag, New York, pp. 212-222.

TARJAN, R.E. 1983. Structures and Network Algorithms. SIAM, Philadelphia, Pa.

TARJAN, R. E., AND VISHKIN, U. 1985. An efficient parallel biconnectivity algorithm. SIAM J. Comput. 14, 862-874.

VALIANT, L. G. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug.), 103-111.

VISHKIN, U. 1985. On efficient parallel strong orientation. Inf. Proc. Lett. 20, 235-240.

RECEIVED SEPTEMBER 1999; REVISED JULY 2000; ACCEPTED JULY 2000

A preliminary version of this paper appeared in the Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, Md.). ACM, New York, 1999, pp. 225-234.

Part of this work was done while K. W. Wong was with Max-Planck-Institut fur Informatik, Saarbrucken, Germany. This work was supported in part by Hong Kong RGC Grant HKU-289/95E.

Authors' addresses: K. W. Chong and T. W. Lam, Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong, China, E-mail: {kwchong;twlam}@csis.hku.hk; Y. Han, Computer Science Telecommunications Program, University of Missouri-Kansas City, 5100 Rockhill Road, Kansas City, MO 64110, E-mail: han@cstp.umkc.edu.

Printer friendly Cite/link Email Feedback | |

Author: | CHONG, KA WONG; HAN, YIJIE; LAM, TAK WAH |
---|---|

Publication: | Journal of the Association for Computing Machinery |

Geographic Code: | 1USA |

Date: | Mar 1, 2001 |

Words: | 17433 |

Previous Article: | Approximation Algorithms for Metric Facility Location and k-Median Problems Using the Primal-Dual Schema and Lagrangian Relaxation. |

Next Article: | A General Approach to Dynamic Packet Routing with Bounded Buffers. |

Topics: |