# Cube structures for multiprocessors.

Cube Structures For Multiprocessors Several architectures have been proposed for closely coupled multiprocessing applications, based on two main types of processor interconnections: multistage networks of the shuffle-exchange type [4, 5] and binary hypercube networks [11, 13]. These two interconnection structures may be termed indirect and direct respectively. In a direct interconnection scheme, processing nodes are interconnected using point-to-point links. In an indirect scheme, the network is a separate entity with inputs and outputs, to which processor and/or memory modules are attached. Historically indirect interconnection schemes have evolved under the shared memory model of parallel computation (with the network providing the processors with access to the shared memory modules), while the direct structures have been used primarily for message passing architectures; however, either type of interconnection scheme can be used for both models of parallel processing. Machines based on these two interconnection architectures are known to have different network-related cost, operation, and performance characteristics [6].Multistage networks of different types have been shown to be topologically equivalent [14] and we shall consider here as a representative of this class, the indirect binary n-cube network [10]. Permutation capabilities of this network have been well characterized, and in particular it has been shown that permutations corresponding to the hypercube connections can be passed by the indirect network [7, 10]. Our interest in this article is more in the operation of the network in a stochastic message transfer environment. A detailed performance analysis of the indirect and direct cube networks in this situation has shown that the hypercube, while using more hardware, performs significantly better than the multistage networks [1].

In this article, we will characterize exactly (a) the structural relationship between the direct and indirect cube networks (of both the unique path and the redundant path types [9, 12]), and (b) the difference in switching power between the two classes of networks (which involves both operational and hardware difference). This will permit us to show that it is possible to view the indirect network also as a static interconnection of nodes and that the essential difference between the two structures is in the internal architecture of their nodes; in particular, the more powerful node structure in the hypercube leads to its better stochastic performance. Finally, and most importantly, our analysis will illustrate that, in fact, there exists a series of structures in between the indirect and direct cube interconnections that provide a variety of choices in terms of cost/performance levels.

MULTISTAGE INDIRECT NETWORKS

AND DIRECT CUBE NETWORKS

The direct binary n-cube or hypercube network [10] consist of [N = 2.sup.n] nodes interconnected using point-to-point links according to the following rule: two nodes whose binary addresses differ in exactly one bit position are connected by a link. Each bit position corresponds to a dimension in the network. An example of a binary 4-cube with 16 nodes is shown in Figure 1a. A node in the structure consists of a processing element plus enough capability to communicate with its neighbors by transmission of messages. In order to be able to talk to all the other nodes in the system, each node must also be able to route incoming messages destined for other nodes. While this routing function can be accomplished (in software) by the processing element itself, we will assume the presence of a switching module that will absorb all of this function (Figure 1b).

The cube routing algorithm is conceptually simple. An incoming message or a message generated at the node has a destination "tag" associated with it, which is compared with the node address. Any one of the bit positions in which the destination address differs from the node address is a valid dimension (i.e., switch output) to route the message to. When multiple differing dimensions are available, a choice can be made at random or in some strict order. The latter leads to the "right-to-left" correction of bits, which we shall consider for the most part in this article. The switching module to implement this routing function requires a connectivity less than that of a full crossbar. In Figure 1b, a message entering along dimension i must have dimensions 0, ..., i corrected and can exit the node only along a dimension j [is greater than] i (assuming that bit positions are numbered from right to left). Each input unit in the switch compares the destination and current node addresses and determines which output the packet is to be sent to. (If the two are the same, the packet is destined for this node.)

Thus we view the architecture of the direct binary n-cube as a modular one, with an N-processor system consisting of N interconnected nodes. Each node consists of a processor-memory-switch combination. In the next section we introduce a similar modular structure of the indirect binary n-cube network, which will aid greatly in determining the essential difference between the direct and indirect networks.

Modular Construction of the

Indirect Binary n-Cube Network

The indirect binary n-cube network [10] consists of n stages of 2 X 2 switching elements (N = [2.sup.n]) with stages i - 1 and i (1 [is less than or equal to] i [is less than or equal to] n - 1) interconnected by the fixed permutation:

[[pi].sub.i.(S.sub.n-1.S.sub.n-2 ... S.sub.i ... S.sub.0.) = S.sub.n-1.S.sub.n-2 ... S.sub.0 ... S.sub.i].

This permutation corresponds to a swap of bits 0 and i in the binary address of each terminal. (Terminals are numbered 0 through N - 1 from top to bottom at the input and output of each stage.) The interconnection scheme is shown in Figure 2 for a 16-processor system. The permutation following the last stage of the network is the "inverse shuffle," defined by

[[pi](S.sub.n-1.S.sub.n-2 ... S.sub.0.) = S.sub.0.S.sub.n-1 ... S.sub.1].

The network's name comes from the fact that it can simulate the connections of the direct cube network in a felicitous manner: for example, by setting the switches in stage i to the cross position and all the remaining switches to the straight position, the connections set up between the inputs and outputs are exactly those corresponding to dimension i of the direct cube network. The main advantage of the indirect network is the limited degree of the switching elements required--they have in- and out-degrees of two, in contrast to the log N-degree nodes in the direct network.

Our interest in this article is more in the distributed communication properties of the network. To get (a packet) to a destination D = [d.sub.n-1.d.sub.n-2 ... d.sub.0], the source uses this address as the destination tag. Bit [d.sub.i] of the tag is used by the switch in stage i to forward the packet: if [d.sub.i] = 0, it is sent to the top output, and if [d.sub.i] = 1, it is sent to the bottom output. It is possible to get rid of the inverse shuffle connection following the last stage by using a slightly modified tag given by inverse shuffle (D) = [d.sub.0.D.sub.N-1]

We now introduce a different way of viewing the indirect n-cube network, that will help identify the key points of difference and similarly with its direct counterpart. [See Figure 3.] Define as a switching module, each row of switching elements in the network. Each module thus has n 2 X 2 switching elements associated with it; and each node consists of a switching module and two processing elements. Its in- and out-degrees are given by n - 1. If the nodes (also modules) are now numbered using the most significant n - 1 bits of the processing element addresses, the connections between them can be seen to be exactly those of the direct binary (n - 1)-cube. However, the internal structure of the switching module is very different in the two cases.

Let us consider the operation of the network in Figure 3 when viewed as a direct connection of the eight nodes. Consider a packet generated by processor S = [[[[s.sub.3]s.sub.2]s.sub.1]s.sub.0] (say 0010) destined for processor D = [[[[d.sub.3]d.sub.2]d.sub.1]d.sub.0] (say 1010). The routing tag is T = [[[[d.sub.0]d.sub.3]d.sub.2]d.sub.1] (0101). In the first cycle the packet switches based on bit [d.sub.1], and the [[pi].sub.1] connection takes it to link [[[[s.sub.3]s.sub.2]d.sub.1]s.sub.1] (0011) in node [[[[s.sub.3]s.sub.2]d.sub.1] (001). In the second cycle the packet switches based on bit [d.sub.2] and the [[pi].sub.2 connection takes it to link [[[[s.sub.3]d.sub.2]d.sub.1]s.sub.0] (0010) in node [[[s.sub.3]d.sub.2]d.sub.1] (001). In the third cycle the packet switches based on [d.sub.3] and the [[pi].sub.3] connection takes it to link [[[[d.sub.3]d.sub.2]d.sub.1]s.sub.3] (1010) in node [[[d.sub.3]d.sub.2]d.sub.1] (101). In the fourth and last cycle the packet switches based on bit [d.sub.0] to get to the proper destination [[[[d.sub.3]d.sub.2]d.sub.1]d.sub.3] (1010) in the module [[[d.sub.3]d.sub.2]d.sub.1] (101). It should be noted that in the first two cycles of this example, the packet did not leave the originating node.

Thus in the interpretation of the indirect binary n-cube network as a direct connection of nodes, the intermediate nodes in a source to destination path need not be distinct--i.e., the packet could stay in the same node for more than one cycle. In particular, the packet does not change modules in cycle i, if [s.sub.i] = [d.sub.i]. Each packet takes exactly log N = 4 cycles to reach its destination (in the absence of conflicts and buffering), despite the fact that distances between the nodes are governed by the Hamming distance (the number of differing bit positions). This disparity between node distances and packet times is due to the internal structure and operation of the switching modules. Let us look at this aspect in more detail.

Internal Architecture of "Nodes"

in an Indirect Network

The characteristics of the switching module associated with each node are as follows:

A. It consists of n 2 x 2 switches. Each of the first n - 1 of these can be thought to stand for one dimension, with the switch deciding whether the packet should be transferred to the neighboring node along that dimension, or to the next switch (within the same node). The last switching element decides which of the two processors attached to the node the packet is sent to.

b. A packet entering the node along dimension i (0 [is less than or equal to] i [is less than or equal to] n - 2) leaves through the first dimension [is greater than]i in which the destination address and the node address differ. However, if the first differing bit is along dimension i + k, the packet can exit the node only after k cycles. By contrast, in a direct cube network, the packet would always exit in one cycle (in the absence of queueing delays).

Thus the nodes in both indirect and direct binary cubes switch between n + 1 inputs and n + 1 outputs. But whereas the direct network switching module employs the partial crossbar illustrated in Figure 1b, the indirect network nodes use the less powerful node structure shown in Figure 4. It also employs a right-to-left correction of bits and so a packet entering a node at Di IN (dimension i) can only go out along a dimension j [is greater than]i. A packet generated by a processor enters the system along P0 IN or P1 IN and can exit the node at any Dj OUT, j [is greater than or equal to] 0. A packet entering the node at D6 IN is on its last hop and only needs to be routed to the appropriate processor in the node. The node architecture is a loop, if we consider the processors as completing the loop.

There are two ways in which we can augment this switching module. In the first (called the "fast finish" algorithm) we add a comparator to each switching element; this will pull out the packet if it is destined for a processor attached to this node. Otherwise the packet will follow its course around the loop. The second addition not only performs a simple comparison, but also determines the first bit position in which the destination and node addresses differ ("lookahead"). The packet is then routed to the appropriate output (without following the loop). It is clear that this second scheme is identical to the cube routing algorithm and the switching module with this augmentation will be functionally identical to that in Figure 1b.

External Connectivity of "Nodes"

in an Indirect Network

We know that the indirect binary n-cube network has a lower performance level than the direct cube network of the same size [1]. The internal architecture of the nodes described in the previous section is one of the two reasons for this. (This factor should manifest itself essentially as the difference in performance between a loop and the partial crossbar of Figure 1b). The other reason is the external connectivity of the nodes. An N-processor indirect cube network implements only an (n-1)-node hypercube, with two processors attached to each node. In terms of node degrees, the n-cube has n bidirectional links per processor, whereas the modular indirect network implements only n - 1 links per pair of processors. This represents a clear loss in connectivity.

To remove this difference between the indirect and direct binary n-cubes, consider the 8 X 8 indirect network shown in Figure 5, in which a switching module is assigned to each processor, instead of to processor pairs. Each switching module is now connected to n other nodes using bidirectional links, just as in the case of the direct network. (An external output from one stage and the external input the next stage at each node constitute a single bidirectional link to a neighboring node.)

Clearly the only difference between this expanded indirect network and the direct network is the internal structure of the switching modules, which are shown in Figure 4 (with P1 IN and P1 OUT removed) and Figure 1b. This can be summarized as follows. In routing packets, the indirect network can process only one address bit in a cycle, whether that address bit requires connection or not. (It does this by "flipping" the tag bits so that the appropriate bit is in the last significant position and can be exchanged by a 2 X 2 switch.) It thus takes logN hops to route a message. The direct network on the other hand processes only those bits that need correction (one per cycle), thereby routing the packet in h cycles, where h = H(S, D) is the Hamming distance between S and D.

Non-binary Direct and Indirect Cube Networks

This view of the indirect cube network as a direct connection of not-so-powerful nodes is not restricted to binary networks. The architecture of the generalized hypercube, in which node degrees are not limited to one per dimension, has been analyzed in [2]. Instead of 2" nodes, a generalized hypercube has [M.sub.n-1] * [m.sub.n-2] * ... * [m.sub.0] nodes, with each node being directly connected to [m.sub.i] - 1 neighbors along dimension i. Thus each set of [M.sub.i] nodes along any one axis are completely connected. Figure 6a shows a 2 * 2 * 4 generalized hypercube. For the same number of nodes, the increased connectivity of this structure results in a smaller diameter of the system. By extending the correspondence established by Pease between direct and indirect networks for the binary case [10], an "unfolded cube" network is defined in [2], in which each [K.sub.mi] (a complete graph of [m.sub.i] nodes) in the generalized hypercube is implemented by an [m.sub.i] X [m.sub.i] crossbar at stage i in the indirect network.

However this could result in indirect structures that are not modular in our sense. We present an alternate correspondence here to derive the indirect network for a given generalized hypercube structure, that we believe is closer to the operation of the switching modules in the direct network (Figure 6b). In the right-to-left correction of bits, a packet coming in along dimension i will go out along dimension i + 1, or i + 2, .... , or n - 1. In the indirect case, because the switching is done sequentially along dimensions, the packet will either leave along dimension i + 1 or wait for the next cycle. Thus the ith stage in the network switches from dimension i - 1 to dimension i and will have to be implemented by an [m.sub.i-1] x [m.sub.i] switch (as opposed to an [m.sub.i] X [m.sub.i] switch). The first stage switches from each processor to dimension 0 and the last stage from dimension n - 1 to the processor. Figure 7 shows the modular indirect network corresponding to the generalized hypercube of Figure 6. It is an "expanding-and-contracting" network [3], because the switching elements are not all rectangular. This comes about because the generalized hypercube under consideration uses a mixed radix system; if all the [m.sub.i]'s are the same, the corresponding modular indirect network is rectangular (using [log.mi.N] stages of [m.sub.i] X [m.sub.i] switches). It should be noted that nonrectangular indirect networks are unique path networks, and they exhibit better performance than rectangular networks [3].

Direct Networks Corresponding to Multipath Indirect

Networks

In this section we will consider multipath indirect networks [9, 12] in which multiple paths exist between each input and output; we will show that by viewing them also as modular interconnections of nodes, it is possible to derive augmented hypercube structures in a straightforward manner. So far the indirect networks we have considered have all been unique path networks with the n = logN bit destination address constituting the routing tag. The direct versions of these networks had mode degrees equal to n in the binary case. With the right-to-left correction of bits in the direct networks, the path is unique between any two nodes. Multipath indirect networks provide fault tolerance as well as performance improvement at the expense of some minimal added hardware, typically in the form of extra stages in the network. Since each external connection in a direct binary network node corresponds to a stage in the indirect network, this implies increasing the degree of the nodes in the direct network. As to what form this added connectivity will take, we will see presently.

In [9], different techniques for generating a redundant path network from a larger unique path network are presented. One of these, called the "folding approach," is particularly suited to our context because it retains the same fixed permutations among the first n stages as a unique path network. The structure is derived by starting with an indirect binary cube network of twice the size (Figure 8a) and "folding" it in half. Such a fold maps terminals of the form D and D on to each other at each stage. A 2-path 16 X 16 network obtained by folding an indirect binary 5-cube is shown in Figure 8b. (Only one processor is connected per switching module, as in Figure 5.) Each switching module now consists of five switching elements, and the connections between the modules is redrawn in Figure 8c. An additional (bidirectional) link is now added to each node, connecting it to its complement (i.e., main diagonally separated element). Now, to get from a source S to destination D, both D and D would be checked to see which has the shorter distance from S. (The minimum of the two distances is [is less than or equal to][n/2].) The diameter of the network drops to [n/2], corresponding to an increase in node degree by one. For large n, this is a very effective tradeoff. Note that with right-to-left correction of address bits there are two disjoint paths from a source to a destination in the direct network, just as in the case of the indirect network.

COMPOSITE DIRECT/INDIRECT NETWORKS

Reviewing the architecture of switching modules in direct and indirect networks, we see that the essence of direct switching is the availability of a look-ahead function across dimensions, which avoids switching along unproductive dimensions. This leads to a better stochastic performance. However, direct switching involves increased complexity at each node in two ways. First a larger crosspoint array is required, to permit the routing of the message from any input port i to any output port j [is greater than] i. (See figure 9.) This requires roughly [n.sup.2]/2 crosspoints as opposed to about 4n for indirect switching. In a VLSI implementation of the switching module, this measure would give some idea of the interconnection complexity. The circles labeled C in Figue 9 refer to control logic needed to set the crosspoints. These are shown in Figure 10 and constitute the second source of increased complexity in direct switching. The indirect switching modules perform what is known as bit-controlled routing--a single bit of the tag decides which of the two outputs the message is to be forwarded to. The direct switching module on the other hand needs additional logic as shown in Figure 10b. In particular, the leading one's detector is likely to substantially increase the area and/or cycle time. Thus even though the total hopcount of messages is reduced by direct switching, each hop could take longer, in addition to the switching modules being more complex.

The question now arises as to whether a cube network could switch directly along some dimensions and indirectly along the rest. Figure 11 shows an 8 X 8 network constructed using this tradeoff. The network in Figure 5 uses a full indirect connection with switching along dimensions 0, 1, and 2 being done sequentially. The network in Figure 11b is at the other extreme (a full direct connection), wherein potentially any of the dimensions 0, 1, or 2 can be switched in a cycle. Figure 11a shows the composite structure where dimension 0 is switched indirectly, dimension 1 is switched directly, and dimension 2 is switched indirectly. The network has three stages now, with 3 X 3 switches in the center stage. Observe that the center stage switches constitute four direct 1-cubes. These switches use the cube routing algorithm on dimension 1 and bit controlled indirect switching on dimension 2. Thus if the node and destination addresses differ along bit position 1, the packet is sent out along the bottom output, else it is sent out along one of the top two outputs based upon the value of address bit 2. This reduces the distance packets have to travel in some cases, as we shall see in the next section.

In general, direct switching can be performed along j dimensions i, i + 1, ..., i + j - 1 (0 [is less than or equal to] i [is less than] n - 1). What this means is that a packet entering a node along dimension i - 1 (at stage i) can be switched not just along dimension i, but directly along dimensions i, ..., i + j - 1, based on the cube routing algorithm and indirectly based on bit i + j. This is effected by using a (j + 2) X (j + 2) switch at this stage in the network with the node structure shown in Figure 12a. (The switch is of size (j + 1) X (j + 2) if i = 0, and (j + 2) X (j + 1) if i + j - 1 = n - 1.) The switches in stage i are interconnected as sets of j-cubes and this is where the main difference of the composite network from the indirect network comes in. Note that in the indirect network, connections are only between stages and never within a stage. When the fast-finish algorithm is incorporated into the architecture, the structure of the switching module is as shown in Figure 12b. Inputs at each switching element now include a comparator which will route the terminal packets to the processor at intermediate stages.

Direct switching can be performed on non-overlapping sequences of dimensions: for instance a [2.sup.11] X [2.sup.11] structure can be synthesized as three stages of switches of size 4 X 5, 5 X 5, and 5 X 4, respectively, with each stage constituting a set of direct 3-cubes. A node in this case will look as shown in Figure 12C. Switching along the dimensions is done as follows: 0, 1, 2--direct, 3--indirect, 4, 5, 6--direct, 7--indirect, and 8, 9, 10--direct. To properly route packets ina composite network, the sequential bit-scanning algorithm (with fast-finish, if desired) has to be employed in the stages doing indirect switching, and the cube routing algorithm (over the appropriate dimensions) has to be performed in each stage doing direct switching.

Both non-binary indirect networks and composite networks make use of switching elements of size [is greater than] 2. However, these are interconnected and operated in very different ways. The non-binary network is the counterpart of the generalized hypercube; the composite network is topologically different from both indirect networks and direct networks, though it incorporates features of both. The non-binary indirect network (with fast finishing) will take fewer than [log.sub.2]N cycles to route packets, but distances are not given by the Hamming distance. However, to get the same number of stages, the non-binary indirect network uses larger switches than the composite network. At the extreme, to get a single stage of switching, the non-binary indirect network would have to use an N X N switch, while the composite network would do the same with (n + 1) X (n + 1) switches.

STATIC PERFORMANCE ANALYSIS

In the last two sections we have seen a variety of ways in which the interconnect a set of processors together in a multiprocessor configuration, while still conforming to the cube structure. We will now consider briefly how these choices affect the performance of the resultant system and the cost/performance tradeoffs involved. Static performance measures are topology related properties of the network that, while not as satisfactory as stochastic analyses or simulations of network performance, do tell us something about their relative merits. We will now investigate two of these measures for the composite direct/indirect cube structures. The reader is referred to [8] for a queueing analysis of the dynamic performance of this class of networks.

Distribution of Inter-node Packet-times

The packet time between two nodes is defined as the number of cycles needed to get from one node to the other, in the absence of queueing delays. In the case of the direct network, the packet time is equal to the inter-node distance. However, in indirect networks each cycle takes the packet to a different switching element which could be in the same node. This is also the case in composite structures. Even though the distance between nodes is the same in all case (the Hamming distance), packet times are different because of the internal architectures of the nodes.

In binary cube networks, whether direct, composite, or indirect, the maximum inter-node packet time is equal to log N. In indirect networks of size N = [2.sup.n] with fast finish, the number of destinations with a packet time m from a given node is given by

[Mathematical Expression Omitted]

(The number of destinations that have bit i as the last differing bit from the source address is [2.sup.i] (i [is greater than or equal to] 0), and such dimensions are reached in exactly i + 1 cycles.)

In a direct network, we have

[Mathematical Expression Omitted]

(All nodes that have m differing bits from the source address can be reached in m cycles.)

In a composite network, with one direct switching stage of j dimensions starting at dimension i (Figure 12b), the packet time distribution is given by:

[Mathematical Expression Omitted]

The expression for 0 [is less than or equal to] m [is less than or equal to] i is identical in form to that for the indirect network. (Dimensions 0, ..., i - 1 are switched indirectly.) For i [is less than] m [is less than or equal to] i + j, i hops have to be made in the indirect dimensions 0, ..., i - 1. Of the remaining m - i cycles, k (0 [is less than or equal to] k [is less than or equal to] min{m - i, n - (i + j)}) can be in the indirect dimensions i + j, ..., n - 1, and m - i - k in the direct dimensions i, ..., i + j - 1 . For the case of i + j [is less than] m [is less than or equal to] n, at least m - (i + j) cycles have to be in the indirect dimensions i + j, ... , n - 1.

Figure 13 presents the results for a network of size 256, built in two different ways. There are two main effects we wish to study, viz., effect of the number of direct-switched dimensions, and effect of the distribution of direct switching over many stages (instead of a single large direct switching stage). Figure 13a presents the distribution of nodes that can be reached in m cycles (0 [is less than or equal to] m [is less than or equal to] 8) for the switching node in Figure 12b with i = 0 and n = 8. (We will consider the placement of the direct switching stage shortly.) The case of j = 0 corresponds to a full indirect network, and j = 8, the full direct network. As the number of dimensions over which direct switching is done increases, packet times get shorter and the peak of the curve shifts to the left. This results in reduced average packet time between nodes as shown in Figure 13c. However, we notice that direct switching over roughly half the dimensions provides much of the reduction in packet times. This would require a lower logic complexity at each node than full direct switching, as discussed in the previous section.

The issue of the placement of the direct switched dimensions (value of i in Figure 12b) is important, but has a simple answer: packet times reduce as the direct switching stages are moved to the earlier part of the network. (So i should be 0 for shortest packet times.) This is because for any particular placement of the direct stages, there will be destinations that can be reached (1) by purely indirect switching, (2) by purely direct switching, and (3) by a combination of direct and indirect switching. Moving the direct stages to the earlier part of the network has the effect of decreasing (1) and increasing (2) and (3). This results in shorter packet times to more nodes in the system.

Finally, the effect of distribution of the direct switching stages is shown in Figure 13b. Here we have six dimensions over which direct switching is done in a 256 X 256 network, and the cases considered are (a) [j.sub.6] = 6 which is equivalent to switching dimensions 0-5 directly and the rest indirectly, (b) [j.sub.1] = 4, [j.sub.2] = 2, in which the dimensions are switched as follows: 0-3--direct, 4--indirect, 5, 6--direct, and 7--indirect, and (c) [j.sub.1] = 2, [j.sub.2] = 2, [j.sub.3] = 2, in which the dimensions are switched as follows: 0, 1--direct, 2--indirect, 3, 4--direct, 5--indirect, and 6, 7--direct. While it clearly helps to do direct switching over as large an ensemble as possible (leading to increased cost), we see from Figures 13b and 1oc that distribution of direct switching in a reasonable manner does not have a severe effect on the packet times.

Message Traffic Density on Links

The average packet time computation ignored contention for output links and resultant queueing delays. One factor that directly affects these delays is the message densities on these links, i.e., the number of messages that are likely to be on each link, if it could transfer all of them simultaneously. Assuming that the nodes generate messages at the rate of one per cycle, the mean message density on the links is given by

(Average inter-node packet time) X (Number of nodes)/(Number of links in the system).

It should be noted that the number of links, in the case of indirect and composite networks, has to include all internal links in the nodes, because the increased message time in these cases results from traversal over these links. Figure 14 shows the average message density on the links for N = 256 and N = 1024. The density increases with the number of direct switched dimensions, and what this indicates is that the decrease in the number of links is more rapid than the decrease in the inter-node packet time. This will have ramifications on the queueing delays in the network. Finally, for the case of N = 256 and six direct switched dimensions, distributing the direct dimensions increases the inter-node packet times without increasing the number of links and this leads to increased densities on the links.

CONCLUSION

We have characterized in this article the exact structural relationship between the hypercube and multistage indirect n-cube networks, two popular interconnection structures for multiprocessors. We have shown that the multistage networks can be viewed as direct connections of nodes (each node being a processor-memory-switch combination) and that all of the performance difference between the two interconnection schemes can be attributed to the architecture of their nodes. By varying the node architecture it has been shown that there in fact exists a series of structures in between the full direct and indirect schemes, with different cost and performance levels. Static performance analyses indicate that an intermediate architecture (in between a full direct and a full indirect cube) could yield better cost/performance ratios than either of the extremal architectures.

REFERENCES

[1] Abraham, S., and Padmanabhan, K. Performance of the direct binary n-cube network for multiprocessors. IEEE Trans. Comput. C-38, 7 (July 1989), 1000-1011.

[2] Bhuyan, L., and Agrawal, D. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. Comput. C-33, 4 (Apr. 1984), 323-333.

[3] DeGroot, D. Expanding and contracting SW-Banyan networks. In Proceedings of the 1983 International Conference on Parallel Processing. (Aug. 1983), pp. 19-24.

[4] Gottlieb, A., Grishman, R., Kruskal, C., McAuliffe, K., Rudolph, L., and Snir, M. The NYU ultracomputer. IEEE Trans. Comput. C-32, 2 (Feb. 1983), 175-189.

[5] Gajski, D., Kuck, D., Lawrie, D., and Sameh, A. Cedar--A large scale multiprocessor. In Proceedings of the 1983 International Conference on Parallel Procssing. (Aug. 1983), pp. 524-529.

[6] Gottlieb, A., and Schwartz, J. Networks and algorithms for very-large-scale parallel computation. IEEE Comput. 15, 1 (Jan. 1982), 27-36.

[7] Liu, W., and wu, C.-L. A distributed resource management mechanism for a partionable multiprocessors system. IEEE Trans. Comput. C-37, 2 (Feb. 1988), 201-210.

[8] Padmanabhan, K. The composite binary cube--A family of interconnection networks for multiprocessors. In Proceedings of the Third International Conference on Supercomputing. (June 1989), pp. 62-71.

[9] Padmanabhan, K., and Lawrie, D. A class of redundant path multistage interconnection networks. IEEE Trans. Comput. C-32, 12 (Dec. 1983), 1099-1108.

[10] Pease, III, M. The indirect binary n-cube microprocessor array. IEEE Trans. Comput. C-26, 5 (May 1977), 458-473.

[11] Seitz, C. The cosmic cube. Commun. ACM 28, 1 (Jan. 1985), 22-33.

[12] Siegel, H. Interconnection networks for Large-Scale Parallel Processing. Lexington Books, Lexington, Mass., 1985.

[13] Sullivan, H., and Bashkow, T. A large scale, homogeneous, fully distributed parallel machine, I. In Proceedings of the 4th Symposium on Computer Architecture. (March 1977), pp. 105-117.

[14] Wu, C.-L., and Feng, T.-Y. On a class of multistage interconnection networks. IEEE Trans. Comput. C-29, 8 (Aug. 1980), 694-702.

KRISHNAN PADMANABHAN is a member of Technical Staff in the Computing Systems Research Laboratory at AT&T Bell Laboratories, Murray Hill, New Jersey. He holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign and his research interests lie in the theory and architecture of high-performance computer and switching systems. He was a member of the Cedar Supercomputer Project at Illinois, and the principal archited of the Multi-Array Processor, a wafer-based multiprocessor project at Bell Labs. Author's Present Address: AT&T Bell Laboratories, 600 Mountain Avenue (3D-451), Murray Hill, New Jersey 07974.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | relationship between hypercube and multistage interconnection networks |
---|---|

Author: | Padmanabhan, Krishnan |

Publication: | Communications of the ACM |

Article Type: | technical |

Date: | Jan 1, 1990 |

Words: | 6137 |

Previous Article: | Multicast tree construction in bus-based networks. |

Next Article: | An incremental constraint solver. |

Topics: |