# Efficient Descriptor-Vector Multiplications in Stochastic Automata Networks.

Abstract. This paper examines numerical issues in computing solutions to networks of stochastic automata. It is well-known that when the matrices that represent the automata contain only constant values, the cost of-performing the operation basic to all iterative solution methods, that of matrix-vector multiply, is given by[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [n.sub.i] is the number of states in the ith automaton and N is the number of automata in the network. We introduce the concept of a generalized tensor product and prove a number of lemmas concerning this product. The result of these lemmas allows us to show that this relatively small number of operations is sufficient in many practical cases of interest in which the automata contain functional and not simply constant transitions. Furthermore, we show how the automata should be ordered to achieve this.

Categories and Subject Descriptors: C.4 [Performance of Systems]: modeling techniques; G.1.3 [Numerical Analysis]: Numerical Linear Algebra; G.3 [Probability and Statistics]; I.6.5 [Simulation and Modeling]: Model Development

General Terms: Algorithms, performance, theory

Key Words: Generalized tensor algebra, Markov chains, stochastic automata networks, vector-descriptor multiplication

1. Introduction

The use of Stochastic Automata Networks (SANs) is becoming increasingly important in performance modelling issues related to parallel and distributed computer systems. As such models become increasingly complex, so also does the complexity of the modelling process. Although systems analysts have a number of other modelling strategems at their disposal, it is not unusual to discover that these are inadequate. The use of queuing network modelling is limited by the constraints imposed by assumptions needed to keep the model tractable. The results obtained from the myriad of available approximate solutions are frequently too gross to be meaningful. Simulations can be excessively expensive. This leaves models that are based on Markov chains, but here also, the difficulties are well documented. The size of the state space generated is so large that it effectively prohibits the computation of a solution. This is true whether the Markov chain results from a stochastic Petri net formalism, or from a straightforward Markov chain analyzer.

In many instances, the SAN formalism is an appropriate choice. Parallel and distributed systems are often viewed as collections of components that operate more or less independently, requiring only infrequent interaction such as synchronizing their actions, or operating at different rates depending on the state of parts of the overall system. This is exactly the viewpoint adopted by SANs. The components are modelled as individual stochastic automata that interact with each other. Furthermore, the state space explosion problem associated with Markov chain models is mitigated by the fact that the state transition matrix is not stored, nor even generated. Instead, it is represented by a number of much smaller matrices, one for each of the stochastic automata that constitute the system, and from these all relevent information may be determined without explicitly forming the global matrix. The implication is that a considerable saving in memory is effected by storing the matrix in this fashion. We do not wish to give the impression that we regard SANs as a panacea for all modelling problems, just that there is a niche that it fills among the tools that modellers may use. It is fairly obvious that their memory requirements are minimal; it remains to show that this does not come at the cost of a prohibitive amount of computation time.

Stochastic Automata Networks and the related concept of Stochastic Process Algebras (SPA) have become hot topics of research in recent years. This research has focused on areas such as the development of languages for specifying SANs and their ilk [Hermanns and Rettlebach 1994; Hillston 1995] and on the development of suitable solution methods that can operate on the transition matrix given as a compact SAN descriptor. The development of languages for specifying stochastic process algebras is mainly concerned with structural properties of the nets (compositionality, equivalence, etc.) and with the mapping of these specifications onto Markov chains for the computation of performance measures [Hillston 1995; Baccelli et al. 1995; Buchholz 1995]. Although a SAN may be viewed as a stochastic process algebra, its original purpose was to provide an efficient and convenient methodology for the study of quantitive rather than structural properties of complex systems [Plateau and Atif 1991]. Nevertheless, computational results such as those presented in this paper can also be applied in the context of stochastic process algebras. Indeed, one of the examples we use in this paper is presented as a SAN, as a stochastic Petri net, (SPN), and as an SPA, just to illustrate their similarities.

There are two overriding concerns in the application of any Markovian modelling methodology, viz., memory requirements and computation time. Since these are frequently functions of the number of states, a first approach is to develop techniques that minimize the number of states in the model. In SANs, it is possible to make use of symmetries as well as lumping and various superpositioning of the automata to reduce the computational burden [Atif 1992; Buchholz 1992; Siegle 1992]. Furthermore, in Fourneau and Quessette [1995], structural properties of the Markov chain graph (specificially the occurrence of cycles) are used to compute steady state solutions. We point out that similar, and even more extensive, results have previously been developed in the context of Petri nets and stochastic activity networks. For example, equivalence relations and symmetries are used to decrease the computational burden of obtaining performance indices.(1) In Henderson and Lucic [1993], reduction techniques for Petri nets are used in conjunction with insensitivity results to enable all computations to be performed on a reduced set of markings. In Ciardo and Trivedi [1991], nearly independent subnets are exploited in an iterative procedure in which a global solution is obtained from partial solutions. In Boucherie [1994] and Balbo et al. [1994], product forms are found in Petri nets models, using, in the first, the state space structure and in the second, flow properties. We note in passing that SANs can inherit from these results. For example, in Plateau and Stewart [1996], the results of Boucherie are extended and applied to determine sufficient conditions for a SAN to have a product form. This is accomplished by working on the global balance equations and seeking out sufficient conditions on the functional transition rates to obtain a product form solution. The conditions apply only to SANs with no synchronizing events. The first example considered in this paper, an example of resource sharing, has a product form solution [Plateau and Stewart 1996]. Finally, in Donatelli [1993], it is shown that the tensor structure of the transition matrix may be extracted from a stochastic Petri net, and in Kemper [1995], that this can be used efficiently to work with the reachable state space in an iterative procedure.

Once the number of states has effectively been fixed, the problem of memory and computation time still must be addressed, for the number of states left may still be large. With SANs, the use of a compact descriptor goes a long way to satisfying the first of these, although with the need to keep a minimum of two vectors of length equal to the global number of states, (three, if there are synchronizing events) and considerably more than two for more sophisticated procedures such as the GMRES method, we cannot afford to become complacent about memory requirements. As far as computation time is concerned, since the numerical methods used are iterative, it is important to keep both the number of iterations and the amount of computation per iteration to a minimum. The number of iterations needed to compute the solution to a required accuracy depends on the method chosen. In a previous paper [Stewart et al. 1995], it was shown how projection methods such as Arnoldi and GMRES could be used to substantially reduce the number of iterations needed when compared with the basic power method. Additionally, some results were also given concerning the development of preconditioning strategies that may be used to speed the iterative process even further, but much work still remains to be done in this particular domain. In this paper we concentrate on procedures that allow us to keep the amount of computation per iteration to a minimum. Specifically, we wish to study the cost of performing a matrix-vector multiplication when the matrix is stored as a compact SAN descriptor, since this is a step that is fundamental to all iterative methods and is by far, the major cost per iteration.

It is well known that when the matrices that represent the automata contain only constant values, the cost of performing a matrix-vector multiply is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [n.sub.i] is the number of states in the ith automaton and N is the number of automata in the network. When the automata are not independent, which is always the case, (for, otherwise, the model can be completely decomposed and the exact solution computed from the solution of the individual pieces), the number of multiplications can become much larger. We provide a number of lemmas that allow us to show that this relatively small number of operations is sufficient in many cases in which the automata are not independent, and we show how the automata should be arranged to achieve this.

Our paper is set out as follows: In the next section, we provide a number of basic properties of tensor algebra that we will need throughout the paper. Section 3 presents the descriptor of a SAN and briefly reviews the effects of synchronizing events and functional transition rates. The basic descriptor-vector multiplication procedure is given in Section 4 and an algorithm provided for the nonfunctional case. In Section 5, we introduce the generalized tensor product concept and prove a number of lemmas concerning this product. The result of these lemmas allows us to develop an algorithm for efficiently computing a descriptor-vector product and this is shown in Section 6. This is followed by some numerical experiments in Section 7 and our conclusions in Section 8. Although it is likely to be an important area for future research, we do not in this paper consider parallel or distributed versions of our algorithms. We simply point out a recent paper [Buis and Dyksen 1996], which provides vector and parallel algorithms for simple tensor products (when the automata are independent) and the Ph.D dissertation [Abderezak 1992], in which certain aspects of implementing the SAN software package, PEPS [Plateau and Stewart 1996], in parallel are discussed.

2. Basic Properties of Tensor Algebra

Define two matrices A and B as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The tensor product C = A [cross product] B is given by

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In general, to define the tensor product of two matrices, A of dimensions ([[Rho].sub.1] x [[Gamma].sub.1]) and B of dimensions ([[Rho].sub.2] x [[Gamma].sub.2]), it is convenient to observe that the tensor product matrix has dimensions ([[Rho].sub.1][[Rho].sub.2] x [[Gamma].sub.1][[Gamma].sub.2]) and may be considered as consisting of [[Rho].sub.1][[Gamma].sub.1] blocks each having dimensions ([[Rho].sub.2] x [[Gamma].sub.2]), that is, the dimensions of B. To specify a particular element, it suffices to specify the block in which the element occurs and the position within that block of the element under consideration. Thus, in the above example, the element [c.sub.47] (= [a.sub.22][b.sub.13]) is in the (2, 2) block and at position (1, 3) of that block. The tensor product C = A [cross product] B is defined by assigning the element of C that is in the ([i.sub.2], [j.sub.2]) position of block ([i.sub.1], [j.sub.1]), the value [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We shall write this as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The tensor sum of two square matrices A and B is defined in terms of tensor products as

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [n.sub.1] is the order of A; [n.sub.2] the order of B; [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the identity matrix of order [n.sub.i] and "+" represents the usual operation of matrix addition. Since both sides of this operation (matrix addition) must have identical dimensions, it follows that tensor addition is defined for square matrices only. Some important properties of tensor products and additions are

--Associativity:

A [cross product] (B [cross product] C) = (A [cross product] B) [cross product] C and A [direct sum] (B [direct sum] C) = (A [direct sum] B) [direct sum] C.

--Distributivity over (ordinary matrix) addition:

(A + B) [cross product] (C + D) = A [cross product] C + B [cross product] C + A [cross product] D + B [cross product] D.

--Compatibility with (ordinary matrix) multiplication:

(A x B) [cross product] (C x D) = (A [cross product] C) x (B [cross product] D).

--Compatibility with (ordinary matrix) inversion:

[(A [cross product] B).sup.-1] = [A.sup.-1] [cross product] [B.sup.-1].

The associativity property implies that the operations [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are well defined. In particular, observe that the tensor sum of N terms may be written as the (usual) matrix sum of N terms, each term consisting of an N-fold tensor product. We have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [n.sub.k] is the order of the matrix [A.sup.(k)] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the identity matrix of order [n.sub.k].

The operators [cross product] and [direct sum] are not commutative. However, we shall have need of a pseudo-commutativity property that may be formalized as follows. Let [Sigma] be a permutation of the set of integers [1, 2, ..., N]. Then, there exists a permutation matrix, [P.sub.[Sigma]] of order [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

A proof of this property may be found in Plateau and Fourneau [1991], wherein [P.sub.[Sigma]] is explicitly given. Further information concerning the properties of tensor algebra may be found in Davio [1981].

3. Stochastic Automata Networks

We consider a stochastic automata network that consists of N individual stochastic automata that act more or less independently of each other. The number of states in the kth automaton shall be denoted by [n.sub.k] for k = 1, 2, ..., N. Most often, one's goal is the computation of the stationary probability distribution, [Pi], or transient distributions at arbitrary times t, [Pi](t), of the overall N-dimensional system.

Throughout this paper, we present results on the basis of continuous-time SANs, although the results are equally valid in the context of discrete-time SANs. Given N independent stochastic automata, [A.sup.(1)], [A.sup.(2)], ..., [A.sup.(N)], with associated infinitesimal generators, [Q.sup.(1)], [Q.sup.(2)], ..., [Q.sup.(N)], and probability distributions [[Pi].sup.(1)](t), [[Pi].sup.(2)](t), ..., [[Pi].sup.(N)](t) at time t, the infinitesimal generator of the N-dimensional system, which we shall refer to as the global generator, is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The probability distribution of the N-dimensional system at time t is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Although such systems may exist, the more usual case occurs when the transitions of one automaton may depend on the state of a second. There are basically two ways in which stochastic automata interact:

(1) The rate at which a transition may occur in one automaton may be a function of the state of other automata. Such transitions are called functional transitions.

(2) A transition in one automaton may force a transition to occur in one or more other automata. We allow for both the possibility of a master/slave relationship, in which an action in one automaton (the master) actually occasions a transition in one or more other automata (the slaves), and for the case of a rendez-vous in which the presence (or absence) of two or more automata in designated states causes (or prevents) transitions to occur. We refer to such transitions collectively under the name of synchronizing transitions. Synchronizing transitions may also be functional.

The elements in the matrix representation of any single stochastic automaton are either constants, that is, nonnegative real numbers, or functions from the global state space to the nonnegative reals. Transition rates that depend only on the state of the automaton itself, and not on the state of any other automaton, are to all intents and purposes, constant transition rates. A synchronizing transition may be either functional or constant. In any given automaton, transitions that are not synchronizing transitions are said to be local transitions.

3.1. THE EFFECT OF SYNCHRONIZING EVENTS. Let us denote by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], k = 1, 2, ..., N, the matrix consisting only of the transitions that are local to [A.sup.(k)]. Then, the part of the global infinitesimal generator that consists uniquely of local transitions may be obtained by forming the tensor sum of the matrices [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. As a general rule, it is shown in Plateau [1985] that stochastic automata networks may always be treated by separating out the local transitions, handling these in the usual fashion by means of a tensor sum and then incorporating the sum of two additional tensor products per synchronizing event. Furthermore, since tensor sums are defined in terms of the (usual) matrix sum (of N terms) of tensor products, the infinitesimal generator of a system containing N stochastic automata with E synchronizing events (and no functional transition rates) may be written as

(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

This formula is referred to as the descriptor of the stochastic automata network. It is important to note that, although Q is an infinitesimal generator, the individual matrices [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] need not be. For example, such matrices may consist of a single positive element. Since we shall treat the matrices [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] as purely algebraic quantities, their relationship as stochastic entities is not needed. Notice also that the state space represented in the matrix Q may contain unreachable states. For more information on the structure and properties of these matrices, the interested reader is invited to consult Stewart et al. [1995].

The computational burden imposed by synchronizing events is two-fold. First, the number of terms in the descriptor is increased--two for each synchronizing event. A second and even greater burden is that the simple form of the solution, as a tensor product of independent solutions, no longer holds. Although we have been successful in writing the descriptor in a compact form as the sum of tensor products, the solution is not simply the sum of the vectors computed as the tensor product of the solutions of the individual [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This is a direct result of the fact that the automata are not independent. Thus, this second computational burden is not unique to synchronizing transitions but also arises with functional transitions.

3.2. THE EFFECT OF FUNCTIONAL TRANSITION RATES. A moment's reflection should convince the reader that the introduction of functional transition rates has no effect on the structure of the global transition rate matrix other than when functions evaluate to zero in which case a degenerate form of the original structure is obtained. So, although the effect of the dependent interactions among the individual automata prevents us from writing the solution in the simple form as a tensor product of individual solutions, it may still be possible to profit from the fact that the nonzero structure is unchanged. This is the concept behind the extended tensor algebraic approach [Plateau and Fourneau 1991]. The descriptor is still written as in Eq. (3), but now the elements of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] may be functions. This means that it is necessary to track elements that are functions and to substitute (or recompute) the appropriate numerical value each time the functional rate is needed. In this paper we shall show how to efficiently compute the product of a vector with a SAN descriptor using the extended (generalized) tensor algebra concepts.

3.3. EXAMPLES. We now introduce two models that we will use for purposes of illustration. The first is a model of resource sharing that includes functional transitions. The second is a finite queuing network model with both functional transitions and synchronizing events.

3.3.1. A Model of Resource Sharing. In this model, N distinguishable processes share a certain resource. Each of these processes alternates between a sleeping state and a resource using state. However, the number of processes that may concurrently use the resource is limited to P where 1 [is less than or equal to] P [is less than or equal to] N so that when a process wishing to move from the sleeping state to the resource using state finds P processes already using the resource, that process fails to access the resource and returns to the sleeping state. Notice that when P = 1 this model reduces to the usual mutual exclusion problem. When P = N all of the processes are independent. Let [[Lambda].sup.(i)] be the rate at which process i awakes from the sleeping state wishing to access the resource, and let [[Mu].sup.(i)] be the rate at which this same process releases the resource when it has possession of it.

In our SAN representation, each process is modelled by a two state automaton [A.sup.(i)], the two states being sleeping and using. We shall let s[A.sup.(i)] denote the current state of automaton [A.sup.(i)]. Also, we introduce the function

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [Delta](b) is an integer function that has the value 1 if the Boolean b is true, and the value 0 otherwise. Thus, the function f has the value 1 when access is permitted to the resource and has the value 0 otherwise. Figure 1 provides a graphical illustration of this model.

[Figure 1 ILLUSTRATION OMITTED]

The local transition matrix for automaton [A.sup.(i)] is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and the overall descriptor for the model is

where [[cross product].sub.g] denotes the generalized tensor operator, a precise definition of which is given in Section 5.

The SAN product state space for this model is of size [2.sup.N]. Notice that when P = 1, the reachable state space is of size N + 1, which is considerably smaller than the product state space, while when P = N the reachable state space is the entire product state space. Other values of P give rise to intermediate cases.

To illustrate the connections between SANs, Stochastic Petri Nets and Stochastic Process Algebras, we now show how this model is represented in the two latter approaches. Using a simple, colorless SPN, our resource sharing model may be represented graphically as illustrated in Figure 2.

[Figure 2 ILLUSTRATION OMITTED]

This same system may be modelled as an SPA as follows: The processes continuously perform the actions Acquire (the resource) then Release the resource. The Acquire action is possible if and only if there is at least one of the P resources left.

Shared Resource Model:=([Process.sub.1][parallel][Process.sub.2][parallel] ... [[parallel][Process.sub.N].sub.S] ([R.sub.1][parallel][R.sub.2][parallel] ... [parallel][R.sub.P]); with S : = {Acquire, Release}

[Process.sub.j] := (Acquire, [[Lambda].sup.(j)]). (Release, [[Mu].sup.(j)).[Process.sub.j]; for all j = 1, 2, ..., N

[R.sub.p] := (Acquire, -).[R.sub.p-1]

[R.sub.j] :=(Acquire, -).[R.sub.j-1] + (Release, -).[R.sub.j+1]; for all j = 1, 2, ..., P - 1

[R.sub.0 := (Release, -).[R.sub.1];

3.3.2. A Queuing Network with Blocking and Priority Service. The second model we shall use is an open queuing network of three finite capacity queues and two customer classes. Class 1 customers arrive from the exterior to queue 1 according to a Poisson process with rate [[Lambda].sub.1]. Arriving customers are lost if they arrive and find the buffer full. Similarly, class 2 customers arrive from outside the network to queue 2, also according to a Poisson process, but this time at rate [[Lambda].sub.2] and they also are lost if the buffer at queue 2 is full. The servers at queues 1 and 2 provide exponential service at rates [[Mu].sub.1] and [[Mu].sub.2], respectively. Customers that have been served at either of these queues try to join queue 3. If queue 3 is full, class 1 customers are blocked (blocking after service) and the server at queue 1 must halt. This server cannot begin to serve another customer until a slot becomes available in the buffer of queue 3 and the blocked customer is transferred. On the other hand, when a (class 2) customer has been served at queue 2 and finds the buffer at queue 3 full, that customer is lost. Queue 3 provides exponential service at rate [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] to class 1 customers and at rate [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] to class 2 customers. It is the only queue to serve both classes. In this queue, class 1 customers have preemptive priority over class 2 customers. Customers departing after service at queue 3 leave the network. We shall let [C.sub.k] - 1, k = 1, 2, 3 denote the finite buffer capacity at queue k.

Queues 1 and 2 can each be represented by a single automaton [A.sup.(1)] and [A.sup.(2)], respectively) with a one-to-one correspondance between the number of customers in the queue and the state of the associated automaton. Queue 3 requires two automata for its representation; the first, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], provides the number of class 1 customers and the second, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], the number of class 2 customers present in queue 3. Figure 3 illustrates this model.

[Figure 3 ILLUSTRATION OMITTED]

This SAN has two synchronizing events: the first corresponds to the transfer of a class 1 customer from queue 1 to queue 3 and the second, the transfer of a class 2 customer from queue 2 to queue 3. These are synchronizing events since a change of state in automaton [A.sup.(1)] or [A.sup.(2)] occasioned by the departure of a customer, must be synchronized with a corresponding change in automaton [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] or [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], representing the arrival of that customer to queue 3. We shall denote these synchronizing events as [s.sub.1] and [s.sub.2], respectively. In addition to these synchronizing events, this SAN required two functions. They are:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The function f has the value 0 when queue 3 is full and the value 1 otherwise, while the function g has the value 0 when a class 1 customer is present in queue 3, thereby preventing a class 2 customer in this queue from receiving service. It has the value 1, otherwise.

Since there are two synchronizing events, each automaton will give rise to five separate matrices in our representation. For each automaton k, we will have a matrix of local transitions, denoted by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; a matrix corresponding to each of the two synchronizing events, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and a diagonal corrector matrix for each synchronizing event, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In these last two matrices, nonzero elements can appear only along the diagonal; they are defined in such a way as to make [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) generator matrices (row sums equal to zero). The five matrices for each of the four automata in this SAN are as follows (where we use [I.sub.m], to denote the identity matrix of order m).

For [A.sup.(1)]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For [A.sup.(2)]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The overall descriptor for this model is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where the generalized tensor sum and the four generalized tensor products are taken over the index set {1, 2, [3.sub.1], and [3.sub.2]}. The reachable state space of the SAN is of size [C.sub.1] x [C.sub.2] x [C.sub.3]([C.sub.3] + 1)/2 whereas the complete SAN product state space has size [C.sub.1] x [C.sub.2] x [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Finally, we would like to draw our readers' attention to the sparsity of the matrices presented above.

3.4. MODELLING WITH SANs. Before proceeding to a discussion of the mathematical properties of tensor products and their benefits for solving large SANs, there are a number of points concerning the modelling process itself that should be mentioned. When modelling with SANs, one must be aware of: (a) the possibility of introducing many unreachable states and (b) the fact that the use of functions is costly. Our goal in this paper is to present theoretical results on tensor products that allow us to reduce the cost of using functions. The whole approach is, in fact, sensitive to the choice of automata; there are many possibilities and our experience suggests the following approach. When modelling a system, the easiest way is to build conceptually simple automata that coincide with the vision the modeler has of the actual system. The use of functions may simplify the generation of complex state spaces and permit the efficient implementation of predicates that authorize the firing of transitions. Then, in the interests of numerical efficiency, it is possible, as shown in Fernandes et al. [1996], to automatically group automata in order to obtain the same model with fewer but larger automata. Collapsing the SAN into one group (the extreme case) is equivalent to building the complete transition matrix of the underlying Markov process. This grouping process has potentially three positive effects: it eliminates (some) unreachable states, it eliminates (some) functions, and it eliminates (some) synchronizing events from the model. In the extreme case, it completely eliminates all three. It has a negative effect: the memory needed to store the grouped SAN descriptor is increased. There is clearly a memory/time trade-off, as shown in Fernandes et al. [1996], when using a grouped tensor product approach. This trade-off can only be resolved, in a practical sense, by testing each option in terms of the cost of the appropriately sized vector-matrix products. Whether automata are grouped or not, the basic structure of the vector descriptor product, that is, the product of a vector and a sum of tensor products, is unchanged and hence the results presented in this paper remain valid and useful.

4. Computing Probability Distributions

When the global infinitesimal generator of a SAN is available only in the form of a SAN descriptor, Eq. (3), the numerical methods most suitable to obtaining probability distributions are iterative [Stewart et al. 1995]. Thus, the underlying operation, whether we wish to compute the stationary distribution, or the transient solution at any time t, is the product of a vector with a matrix. Since

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the basic operation is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where, for notational convenience, we have removed the subscripts on the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

4.1. MULTIPLICATION WITH A VECTOR: THE NONFUNCTIONAL CASE. The following theorem holds when the transition rate matrices corresponding to the stochastic automata contain only constant terms.

THEOREM 4.1.1. The product

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [Q.sup.(i)], of order [n.sub.i], contains only constant terms and x is a real vector of length [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], may be computed in [[Rho].sub.N] multiplications, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

To compare this with the number needed when advantage is not taken of the special structure resulting from the tensor product, observe that to naively expand [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] requires [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] multiplications and that this same number is required each time the product [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is formed. A proof of the theorem is given in Plateau and Fourneau [1991] and is based on the following tensor product decomposition.

(4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The individual terms within the product, (i.e., the [I.sub.1:i - 1] [cross product] [Q.sup.(i)] [cross product] [I.sub.i + 1:N]) are referred to as the normal factors of the decomposition. It can be shown (see Eq. (9)), that these normal factors commute. Notice that we have used [I.sub.1:i - 1] to represent an identity matrix of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [I.sub.i + 1:N] to represent one of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In general, we shall let [I.sub.i:j] [equivalent] [I.sub.j:i] denote an identity matrix of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The following algorithm implements this multiplication:

Algorithm: Multiplication with a Tensor Product, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

1. Initialize: nleft = [n.sub.1] [n.sub.2] ... [n.sub.N - 1]; nright = 1.

2. For i = N, ..., 2, 1 do

* base = 0; jump = [n.sub.i] x nright

* For k = 1, 2, ..., nleft do

* For j = 1, 2, ..., nright do

* index = base + j

* For l = 1, 2, ..., [n.sub.i] do

* [z.sub.l] = [x.sub.index]; index = index + nright

* Multiply: z' = z x [Q.sup.(i)]

* index = base + j

* For l = 1, 2, ..., [n.sub.i] do

* [x.sub.index] = [z'.sub.l]; index = index + nright

* base = base + jump

* nleft = nleft/[n.sub.i-1]

* nright = nright x [n.sub.i]

During the execution of this algorithm, the variables nleft and nright assume respectively, the size of all left and right identity matrices in the normal factors of Eq. (4). As the normal factors commute, we have chosen, for reasons that will become clear later, to perform the multiplications in the order from N to 1. Furthermore, if we let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], denote the identity matrix of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], then it may be observed that this algorithm operates by implicitly incorporating permutations so that each term in the product becomes

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

that is, a block diagonal matrix whose [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] diagonal blocks are identically equal to [Q.sup.(i)]. In fact, it is actually the permuted elements of the vector x that are used, (the value of index in the innermost loops (the l-loops) is index = (k - 1) x jump + j + (l - 1) x nright). The manner in which the elements of the [Q.sup.(i)] are stored is not explicitly defined so that any storage scheme, including a compact format suitable for sparse matrices, may be used.

As the transition matrix of each automata is often sparse, it is interesting to see how the previous result changes when the [Q.sup.(i)] are stored in a sparse format. Let [z.sub.i] denote the number of nonzero entries in [Q.sup.(i)]. Then the multiply operation of the previous algorithm, that is, the operation inherent in the statement

* Multiply: z' = z x [Q.sup.(i)],

has complexity of order [z.sub.i], so, taking the nested loops into account, the overall operation has a complexity of the order of

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

To compare this complexity with a global sparse format, the number of nonzero entries of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It is hard in general to compare the two numbers [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Note however that if all [z.sub.i] = [N.sup.1/(N - 1)][n.sub.i], both orders of complexity are equal. If all matrices are sparse (e.g., below this bound), the sparse method is probably better in terms of computation time. This remark is valid for a single tensor product. For a descriptor which is a sum of tensor products and where functions in the matrices [Q.sup.(i)] may evaluate to zero, it is hard to compute, a priori, the order of complexity of each operation. In this case, insight can only be obtained from numerical experiments.

We are now ready to examine the effect of introducing functional rates. The savings made in the computation of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are due to the fact that once a product is formed, it may be used in several places without having to re-do the multiplication. With functional rates, the elements in the matrices may change according to their context so that this same savings is not always possible. It was previously observed in Stewart et al. [1995] that in the case of two automata A and B with matrix representations A and B[A] respectively, where to anticipate the introduction of some new notation, B[A] indicates that the matrix B contains elements that may be a function of the state of A, the number of multiplications needed to premultiply A [cross product] B[A] with a vector x remains identical to the nonfunctional case and that exactly the same algorithm could be used. Furthermore, it was also observed that when A contains functional transition rates, but not B, the computation could be rearranged to compute x(A[B] [cross product] B) in the same small number of multiplications as the nonfunctional case. When both contained functional transition rates, no computational savings appeared to be possible. Furthermore, it was speculated that in SANs with many functional automata, there exists optimal permutations of the automata for minimizing the computational task of forming the product of a vector and a SAN descriptor. However, this presupposes that it is, in fact, possible to permute the terms in generalized tensor products.

5. Generalized Tensor Products

We now turn our attention to some properties relating to Generalized Tensor Products (GTPs) as opposed to Ordinary Tensor Products (OTP). We assume throughout that all matrices are square. As indicated in the previous section, B[A] will indicate that the matrix B may contain transitions that are a function of the state of the automaton A, or more generally, [A.sup.(m)], [[A.sup.(1)], [A.sup.(2)], ..., [A.sup.(m - 1)]] will indicate that the matrix [A.sup.(m)] may contain elements that are a function of one or more of the states of the automata [A.sup.(1)], [A.sup.(2)], ..., [A.sup.(m - 1)] We shall use the notation [[cross product].sub.g] to denote a generalized tensor product. Thus, A [[cross product].sub.g] B[A] denotes the generalized tensor product of the matrix A with the functional matrix B[A] and we have

(5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where B([a.sub.k]) represents the matrix B when its functional entries are evaluated with the argument [a.sub.k], k = 1, 2, ..., [n.sub.a], the [a.sub.i] being the states of automaton A. Also,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and [a.sub.ij]([b.sub.k]) is the value of the ijth element of the matrix A when its functional entries are evaluated with the argument [b.sub.k], k = 1, 2, ..., [n.sub.b]. Finally, when both automata are functional we have

(6) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For A[B] [[cross product].sub.g] B[A], the generic entry (l, k) within block (i, j) is [a.sub.ij]([b.sub.l]) x [b.sub.lk]([a.sub.i]).

We are now in a position to determine whether the basic properties of tensor products and additions described in Section 2 also hold in the generalized case.

LEMMA 5.1 (GTP: ASSOCIATIVITY)

(A[B, C] [[cross product].sub.g] B[A, C]) [[cross product].sub.g] C[A, B] = A[B, C] [[cross product].sub.g] (B[A, C] [[cross product].sub.g] C[A, B]).

PROOF. We shall evaluate the left-hand side first. Referring back to Eq. (6), it is evident that D[C] [equivalent] A[B, C] [[cross product].sub.g] B[A, C] is a block matrix whose ijth block is equal to

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Each element on the kth row of B([a.sub.i], C) is thus multiplied by [a.sub.ij]([b.sub.k], C) so that the klth element of [D.sub.ij][C] is

(7) [a.sub.ij]([b.sub.k], C)[b.sub.kl]([a.sub.i], C).

Now consider the effect of incorporating the third term into the generalized tensor product. In computing D[C] [[cross product].sub.g] C[A, B], each of the elements in each of the blocks [D.sub.ij][C] multiplies the matrix C[A, B]; in particular the klth element (given in Eq. (7)) multiplies the matrix C([a.sub.i], [b.sub.k]) giving the ([n.sub.c] x [n.sub.c]) block [a.sub.ij]([b.sub.k], C)[b.sub.kl]([a.sub.i], C)C([a.sub.i], [b.sub.k]). The mnth element of this final block is

[a.sub.ij]([b.sub.k], [c.sub.m])[b.sub.kl]([a.sub.i], [c.sub.m])[c.sub.mn]([a.sub.i], [b.sub.k])

To show that we get the same result from the right-hand side, we proceed in a similar fashion. Let Z[A] [equivalent] B[A, C] [cross product] C[A, B]. Then Z[A] is block matrix whose klth block is equal to

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The mnth element of [Z.sub.kl][A] is

[b.sub.kl](A, [c.sub.m])[c.sub.mn](A, [b.sub.k]).

When forming A[B, C] [[cross product].sub.g] Z[A], the product of each of the elements of A[B, C] and the matrix Z[A] is formed. In particular, the ijth element is multiplied with Z[A] to give the ([n.sub.b][n.sub.c] x [n.sub.b][n.sub.c]) block

[a.sub.ij](B, C) Z[A]

When the mnth element of the klth block of Z[A] is considered, this gives

[a.sub.ij]([b.sub.k], [c.sub.m])[b.sub.kl]([a.sub.i], [c.sub.m])[c.sub.mn]([a.sub.i], [b.sub.k])

as before. []

LEMMA 5.2 (GTP: DISTRIBUTIVITY OVER ADDITION)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. As for the previous lemma, the proof of Lemma 5.2 is by simple algebraic identification. []

As was the case in Section 2 for ordinary tensor products, compatibility with multiplication generally does not hold for the generalized tensor product either. However, there exist three degenerate compatibility forms when some of the factors are identity matrices and not all of the factors have functional entries. These are given next.

LEMMA 5.3 (GTP: COMPATIBILITY OVER MULTIPLICATION: I--TWO FACTORS)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. Since we assume that A[C] and B[C] are square matrices, their dimensions must be identical for the product A [C] x B[C] to exist. Thus, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], all have the same (square) dimensions. Observe that all three of these matrices have block elements that are in fact diagonal matrices of order [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The ijth block element of the matrix on the left-hand side is given by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The corresponding block on the right-hand side is formed from the product of the ith block-row of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] with the jth block-column of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This is given by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] as required. []

Remark 5.4. The same property holds analogously for the case

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

LEMMA 5.5 (GTP: COMPATIBILITY OVER MULTIPLICATION: II--TWO FACTORS)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. The proof follows immediately on observing that the right-hand side

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

may be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

It is worth noticing that factors containing A and B[A] are reversed when comparing the right-hand side and left-hand side.

LEMMA 5.6 (GTP: COMPATIBILITY OVER MULTIPLICATION: III--TWO FACTORS)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. The proof of this lemma follows immediately from the fact that

(8) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Notice from Eq. (8) that the right-hand matrix is a block diagonal matrix whose diagonal blocks are all equal to B and the left-hand matrix is formed from blocks that are diagonal and the same size as B. It therefore follows that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

We wish to point out that this lemma still holds if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is replaced by any constant matrix.

Remark 5.7. A direct consequence of these two lemmas is that for matrix with constant entries, the following product is commutative (see also Davio [1981]). This property does not hold for matrices with functional entries.

(9) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The next lemma is a generalization of Lemma 5.3 and is particularly useful in treating SANs.

LEMMA 5.8 (GTP: COMPATIBILITY OVER MULTIPLICATION--MANY FACTORS)

(10) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. We shall prove this lemma by induction.

Define [H.sup.(1)] = [A.sup.(1)] and [H.sup.(2)] = [H.sup.(1)] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]]. Then, from Lemma 5.5,

(11) [H.sup.(2)] = ([I.sub.1:1] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]]) x ([H.sup.(1)] [[cross product].sub.g] [I.sub.2:2]) = ([I.sub.1:1] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]]) x ([A.sup.(1)] [[cross product].sub.g] [I.sub.2:2]),

where [I.sub.1:1] is the identity matrix of size [n.sub.1] and [I.sub.2:2], the identity matrix of size [n.sub.2] with [n.sub.1] and [n.sub.2] denoting the number of states in automata [A.sup.(1)] and [A.sup.(2)], respectively. [H.sup.(2)] is a matrix of size [n.sub.1][n.sub.2]. Before proceeding to the general induction step, let us consider [H.sup.(3)]. We have

(12) [H.sup.(3)] = [H.sup.(2)] [[cross product].sub.g] [A.sup.(3)][[A.sup.(1)], [A.sup.(2)]].

Applying Lemma 5.5 to the right-hand side of Eq. (12) gives

[H.sup.(3)] = ([I.sub.1:2] [[cross product].sub.g] [A.sup.(3)][[A.sup.(1)], [A.sup.(2)]]) x ([H.sup.(2)] [[cross product].sub.g] [I.sub.3:3]).

We would like to substitute for [H.sup.(2)] from Eq. (11), but we must be careful, for [I.sub.2:2] in Eq. (11) is the identity matrix of order [n.sub.2]. Substituting we find

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Now applying Lemmas 5.1 and 5.3 we obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

which we can write in the form of the lemma statement as

[H.sup.(3)] = ([I.sub.1:2] [[cross product].sub.g] [A.sup.(3)][[A.sup.(1)], [A.sup.(2)]]) x ([I.sub.1:1] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]] [[cross product].sub.g] [I.sub.3:3] x ([A.sup.(1)] [[cross product].sub.g] [I.sub.2:3].

Our inductive approach to proving this lemma should now be clear. The basic step has already been established. For our inductive assumption, we define

[H.sup.(m)] [equivalent] [A.sup.(1)] [[cross product].sub.g] [A.sup.(2)][A.sup.(1)]] [[cross product].sub.g] [A.sup.(3)][[A.sup.(1)], [A.sup.(2)]] [[cross product].sub.g] ... [[cross product].sub.g] [A.sup.(m)][[A.sup.(1)], ..., [A.sup.(m-1)]]

and assume that the relation given in the statement of the lemma is true. We now prove that the relation is true for [H.sup.(m+1)]. We have

[H.sup.(m+1)] = [H.sup.(m)] [[cross product].sub.g] [A.sup.(m+1)][[A.sup.(1)], [A.sup.(2)], ..., [A.sup.(m)]].

Applying Lemma 5.5 gives

[H.sup.(m+1)] = ([I.sub.1:m] [[cross product].sub.g] [A.sup.(m+1)][[A.sup.(1)], [A.sup.(2)], ..., [A.sup.(m)]]) x ([H.sup.(m)] [[cross product].sub.g] [I.sub.m+1:m+1)].

Substituting for [H.sup.(m)] from Eq. (10) and repeatedly applying Lemmas 5.3 and 5.1 gives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The proof of the lemma follows immediately given that the terms of the form [I.sub.i+1:m] for i = 1, 2, ..., m - 1 are identity matrices of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

Remark 5.9. Due to the existence of Lemma 5.6, the same property holds for

(13) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In Lemma 5.8, only one automaton can depend on all the (m - 1) other automata, only one can depend on at most (m - 2) other automata and so on. One automaton must be independent of all the others. This provides a means by which the individual factors on the right-hand side of Eq. (10) must be ranked; that is, according to the automata on which they may depend. A given automaton may actually depend on a subset of the automata in its parameter list.

Now consider an arbitrary permutation of the factors on the left-hand side of Eq. (10). We have on the left-hand side,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Indeed Eq. (13) is just such a permutation. No matter which permutation is chosen, the right-hand side remains essentially identical in the sense that the terms are always ordered from largest to smallest set of possible dependencies. The only change results from the manner in which each normal factor is written. Each must have the form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Consider, as an example, the following three automata, [A.sup.(1)] which is completely independent, [A.sup.(2)] which depends on [A.sup.(3)], and [A.sup.(3)] which depends on [A.sup.(1)]. Notice that the possible dependency set of the automaton [A.sup.(2)] is not only [A.sup.(3)], but [A.sup.(3)] and [A.sup.(1)]. We have, for example, the following left-hand side arrangement:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

while a different rearrangement gives:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

On the right-hand side, the normal factor containing [A.sup.(2)][[A.sup.(3)]] [equivalent] [A.sup.(2)] [[A.sup.(1)], [A.sup.(3)]] is always first while that containing [A.sup.(1)], the independent term, is always last.

LEMMA 5.10 (GTP: PSEUDO-COMMUTATIVITY). Let [Sigma] be a permutation of the integers [1, 2, ..., N], then there exists a permutation matrix, [P.sub.[Sigma]] of order [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. To prove this lemma, we need to address the entries of a matrix X of order [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in different ways. X may be considered as a matrix formed from [n.sub.1] square blocks each of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; each block may in turn be viewed as formed from [n.sub.2] square blocks each of size [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and so on. To refer to an individual element of X, we use the notation

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [i.sub.m], [j.sub.m] [element of] [1, ..., [n.sub.m]] and ([i.sub.m], [j.sub.m]) indicates the position of the element in the mth embedded block. Let [Sigma] be a permutation of the integer [1, 2, ..., N]. Then, the individual elements of X may also be referenced with respect to this permutation and its corresponding embedded block structure. We have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

where [i.sub.m], [j.sub.m] [element of] [1, ..., [n.sub.[Sigma](m)]].

Nothing prevents us from mixing these two possibilities. Thus, we may refer to the element that belongs to row ([i.sub.1], ..., [i.sub.N]) and to column [([j.sub.1], ..., [j.sub.N]).sub.[Sigma]], or vice versa. Let [P.sub.[Sigma]] be the permutation matrix whose elements are given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The elements of its transpose, [P.sup.T], are given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

It follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] permutes the columns of X and P [sub.[Sigma]]X permutes the rows of X. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then the elements of Y are given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which leads directly to

(14) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Notice that, from the definition of a generalized tensor product, the elements of the matrix

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

are given by

(15) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Similarly, we may write the elements of

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Since [Sigma](k) is a bijection of [1, 2, ..., N], a simple change of variable allows us to write the elements of X as

(16) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The proof of the lemma follows directly from Eqs. (14), (15), and (16). []

THEOREM 5.11 (GTP: ALGORITHM). The multiplication

x x ([A.sup.(1)] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]] [[cross product].sub.g] [A.sup.(3)][[A.sup.(1)], [A.sup.(2)]] [[cross product].sub.g] ... [[cross product].sub.g] [A.sup.(N)][[A.sup.(1)], ..., [A.sup.(N-1)]])

where x is a real vector of length [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] may be computed in O([Rho.sub.N]) multiplications, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

by the algorithm described on the next page.

Algorithm: Vector Multiplication with a Generalized Tensor Product

x ([A.sup.(1)] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]] [[cross product].sub.g] [A.sup.(3)][[A.sup.(1)], [A.sup.(2)]] [[cross product].sub.g] ... [[cross product].sub.g] [A.sup.(N)][[A.sup.(1)], ..., [A.sup.(N-1)]])

1. Initialize: nleft = [n.sub.1][n.sub.2] ... [n.sub.N-1]; nright = 1.

2. For i = N, ..., 2, 1 do

* base = 0; jump = [n.sub.i] x nright

* For k = 1,2, ..., nleft do

?? For j= 1, 2, ..., i - 1 do

* [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

?? For j = 1, 2, ..., nright do

* index = base + j

* For l = 1, 2, ..., [n.sub.i] do [multiplied by] [z.sub.l] = [x.sub.index]; index = index + nright

* [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

* index = base + j

* For l = 1, 2, ..., [n.sub.i] do [multiplied by] [x.sub.index] = [z'.sub.l]; index = index + nright

?? base = base + jump

* nleft = nleft/[n.sub.i-1]

* nright = nright x [n.sub.i]

PROOF. Observe that the innermost loop (which is, in fact, implicitly rather than explicitly defined in the algorithm) involves computing the product of a vector z, of length [n.sub.i], with a matrix of size [n.sub.i] x [n.sub.i]. This requires [([n.sub.i]).sup.2] multiplications, assuming that the matrices are full. This operation lies within consecutively nested i, k and j loops. Since the k and j loops together involve [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iterations, and the outermost i loop is executed N times, the total operation count is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

This complexity result improves upon previous results given in the paper by Fourneau et al. [1991]. For example, given a tensor product with two terms, we have [A.sup.(1)] [[cross product].sub.g] [A.sup.(2)][[A.sup.(1)]] with a complexity of [n.sub.1][n.sub.2]([n.sub.1] + [n.sub.2]) given by Theorem 5.11, whereas previously using the procedures given in Fourneau et al. [1991] the complexity is [n.sub.1][n.sub.2]([n.sub.1] + [n.sub.1][n.sub.2]). Naturally, this generalizes to the case of m terms. Furthermore, it is immediately apparent that when the matrices are not full, but possess a special structure such as tridiagonal, contain only one nonzero row or column, etc., or are sparse, then this may be taken into account and the complexity reduced even further.

This algorithm differs from the previous (nonfunctional) algorithm (Theorem 4.1.1) in the computation of the state indices at the beginning of the k-loop and in the evaluation of functions. However, this will not affect the order of the complexity of the algorithm. Indeed, once incorporated into an iterative solution method, it is possible that most of the computation involved in generating the state indices could be performed only once during an initialization phase.

What is not immediately apparent from the algorithm as described is the fact that in OTP the normal factors commute and the order in which the innermost multiplication is carried out may be arbitrary. In GTP, however, this freedom of choice does not exist; the order is prescribed. As indicated by the right-hand side of Eq. (10), each automata [A.sup.(i)] is represexlted by a factor of the form

[I.sub.1:i-1] [[cross product].sub.g] [A.sup.(i)] [[A.sup.(1)], ..., [A.sup.(i-1)]] [[cross product].sub.g] [I.sub.i + 1:m]

which must be processed before any factor of its arguments [[A.sup.(1)], ..., [A.sup.(i-1)]] is processed.

We would like to point out that although Theorem 5.11 allows us to reorganize the terms of the left-hand side of Eq. (10), in any way we wish, the advantage of leaving them in the form given by Eq. (10) is precisely that the computation of the state indices can be moved outside the innermost j-loop of the algorithm. A different rearrangement would require more extensive index computation within the innermost j-loop. Note that the order given by the left-hand side of Eq. (10) is precisely the opposite of that given by its right-hand side.

We now introduce one final lemma that allows us to prove a theorem (Theorem 5.13) concerning the reduction in the cost of a vector-descriptor multiplication in the case when the functional dependencies among the automata do not satisfy the constraints given above and in particular, when the dependencies are cyclic.

LEMMA 5.12 (GTP: DECOMPOSABILITY INTO OTP). Let [l.sub.k](A) denote the matrix obtained by setting all elements of A to zero except those that lie on the kth row which are left unchanged. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Thus, we may write a generalized tensor product as a sum of ordinary tensor products.

PROOF. From Eq. (5)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

from which it follows that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Then, algebraically,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] []

A term [[cross product].sub.g] [A.sub.(i)] [[A.sup.(1)], ..., [A.sup.(N)]] involved in the descriptor of a SAN is said to contain a functional dependency cycle if it contains a subset of automata [A.sup.(P)], [A.sup.(p+1)], ..., [A.sup.(p+c)], c [is greater than or equal to] 1, with the property that the matrix representation of [A.sup.(p+i)] contains transitions that are a function of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], for 0 [is less than or equal to] i [is less than or equal to] c. For example, a SAN with two automata A and B contains a term with a functional dependency cycle if and only if the matrix representations are such that A is a function of B, (A[B]), B is a function of A, (B[A]), and A[B] [cross product] B[A] occurs in the descriptor. Let G denote a graph whose nodes are the individual automata of a SAN and whose arcs represent dependencies among the automata within a term of the descriptor. Thus G contains an arc from node i to node j if and only if automaton [A.sup.(i)] depends on automaton [A.sup.(j)]. Let T be a cutset of the cycles of G [Berge 1970]. Then T is a set of nodes of G with the property that G - T does not contain a cycle where G - T is the graph of G with all arcs that lead into the nodes of T removed.

THEOREM 5.13 (GTP WITH CYCLES: COMPLEXITY OF VECTOR-DESCRIPTOR PRODUCT). Given a SAN descriptor containing a term with a functional dependency graph G and cutset T of size t, the cost of performing the vector-descriptor product

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. Using Lemma 5.10, let us assume, without loss of generality, that the automata are ordered so that the cutset T contains the automata [A.sup.(1)], [A.sup.(2)], ..., [A.sup.(t)]. We have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Applying Lemma 5.2, followed by Lemma 5.12, yields

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As the cycles have been removed, applying Theorem 5.11 to each term leads us to conclude that the number of multiplications involved is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

However, because of the row matrices in the set T, this complexity reduces to

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

as indicated in the statement of the theorem. []

6. Using the Lemmas

We now return to the context of SANs proper. We have seen that the descriptor of a SAN is a sum of tensor products and we now wish to examine each of the terms of these tensor products in detail to see whether they fall into the categories of Theorem 5.11 or 5.13. In the case of SANs in continuous-time, and with no synchronizing transitions, the descriptor is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

and we can apply Theorem 5.11 directly to each term of the summation. Notice that all but one of the terms in each tensor product is an identity matrix, and as we pointed out in the proof of Theorem 5.11, advantage can be taken of this to reduce the number of multiplications even further.

Consider now what happens when we add synchronizing transitions. The part of the SAN that corresponds to local transitions has the same minimal cost as above. As far as the remainder of the SAN is concerned, recall that in Section 3.1, it was mentioned that each synchronizing event results in an additional two terms in the SAN descriptor. The first of these may be thought of as representing the actual transitions and their rates, and the second corresponds to an updating of the diagonal elements in the infinitesimal generator to reflect these transitions. More detailed information may be found in Plateau and Fourneau [1991]. The first of the additional terms may be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For the matrices that represent the different automata in this tensor product, there are three possibilities depending upon whether they are unaffected by the transition, are the designated automaton with which the transition rate of the synchronizing event is associated, or are affected in some other manner by the event. We have

-- The matrices [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] corresponding to automata that do not participate in the synchronizing event are represented by identity matrices of appropriate dimension.

-- We shall use E to denote the matrix belonging to the automaton selected as the one with which the synchronizing event is associated. In a certain sense, this automaton may be considered to be the owner of the synchronizing event. E is therefore a matrix of transition rates. For example, if the synchronizing event e arises as a result of the automaton changing from state i to state j, then the matrix E consists entirely of zeros with a single nonzero elemem in position ij. This nonzero element gives the rate at which the transition occurs. If several of the elements are nonzero, this indicates that the automaton may cause this same synchronizing event by transitions from and to several states.

-- The remaining matrices correspond to automata that are otherwise affected by the synchronizing event. We shall refer to these as [Lambda] matrices. Each of these [Lambda] matrices consists of rows whose nonzero elements are positive and sum to 1 (essentially, they correspond to routing probabilities, not transition rates), or rows whose elements are all equal to zero. The first case corresponds to the automaton being synchronized into different states according to certain fixed probabilities and the state currently occupied by the automaton. In the second case, a row of zeros indicates that this automaton disables the synchronizing transition while it is in the state corresponding to the row of zeros. In many cases, these matrices will have a very distinctive structure, such as a column of ones, the case when the automaton is forced into a particular state, independent of its current state.

When only the rates of the synchronizing transitions are functional, the elements of the matrix E are functional; the other matrices consist of identity matrices or constant [Lambda] matrices. Thus, once again, this case falls into the scope of Theorem 5.1. This only leaves the case in which the [Lambda] matrices contain transition probabilities that are functional. If there is no cycle in the functional dependency graph then we can apply Theorem 5.11; otherwise we must resort to Theorem 5.13. Note that if the functional dependency graph is fully connected, Theorem 5.13 offers no savings compared with ordinary multiplication. This shows that Theorem 5.13 is only needed when the routing probabilities associated with a synchronizing event are functional and result in cycles within the functional dependency graph, (which we suspect to be rather rare). For SANs in discrete-time [Plateau and Atif 1991], it seems that we may not be so fortunate since cycles in the functional dependency graph of the tensor product tend to occur rather more often.

7. Numerical Experiments

Our purpose in this section is to provide a measure of the cost of using the SAN methodology. The complexity results presented in Sections 4 and 5 implicitly assume that the matrices are dense. Their purpose is to assist in developing an optimal multiplication procedure for SANs. When the matrices are sparse, the actual operation count will be reduced. In this section, we present numerical experiments in which advantage is taken of sparsity, both in the SAN approach and in a competing numerical approach in which the global matrix is generated once and stored in a compact format. We refer to this latter as the sparse matrix approach and use a procedure in C++ inspired from the software package MARCA [Stewart 1976]. As we have mentioned before, no one approach will work best in all modelling situations, but each has its niche, and we wish to use these experiments to identify and to a certain extent quantify, the advantages of the SAN approach.

The software package PEPS [Plateau et al. 1988], was used to generate the results concerning both the sparse matrix approach and those concerning SANs. All experiments were run on an IBM RS6000-370 under AIX version 4.2 with 96 Mbytes of RAM and 230 Mbytes of virtual memory. All times are given in seconds and consist of compute time only (i.e., they do not include times to perform I/O or paging). Memory usage is in kilobytes. Using the two models described in Section 3, four different sets of experiments were conducted. In all four sets, the following model characteristics were kept constant:

-- In the Mutex resource sharing example, the numerical values used were [[Lambda].sup.(i)] = 1.0 and [[Mu].sup.(i)] = 0.4 for all i. Values of N and P were varied and are reported in the tables in the columns entitled "Models"

-- In the queuing network example, the numerical values used were [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The capacity values, [C.sub.1], [C.sub.2], and [C.sub.3] were varied and they too are reported in the columns entitled "Models".

In all of the examples, the descriptor is a sum of tensor products rather than a single tensor product, the case that is covered by our theoretical results. Thus, the results show the impact of our algorithm on actual SAN models. Tables I through IV show the results obtained for the four different sets of experiments respectively.

TABLE I. NUMERICAL RESULTS FROM FIRST SET OF EXPERIMENTS PEPS Old 2D Compact Models Version storage storage N = 14 P = 14 146.29 s 0.55 s 0.25 s N = 14 P = 13 146.17 s 4.62 s 3.87 s N = 14 P = 8 115.27 s 4.62 s 3.87 s N = 14 P = 1 0.25 s 4.62 s 3.87 s [C.sub.1] = 13.40 s 0.86 s 0.14 s [C.sub.2] = [C.sub.3] = 10 [C.sub.1] = 14.83 s 0.91 s 0.13 s [C.sub.2] = 8, [C.sub.3] = 12

7.1. EXPERIMENT SET #1. The first set of experiments has two objectives:

-- To examine the benefits of using a sparse format as compared to a full storage format.

-- To compare the method presented in this paper for handling functions with the decomposition approach discussed in Plateau and Atif [1991].

For these experiments, the value of N in the Mutex example was taken to be 14 and the value of P was assigned the values P = 14, 13, 8, 1. In the queuing network example, all queue lengths were taken to be equal to 10, that is, [C.sub.1] = [C.sub.2] = [C.sub.3] = 10 in the first case while in the second case, the values [C.sub.1] = [C.sub.2] = 8, [C.sub.3] = 12 were used. The results obtained are shown in Table I.

The times presented are the times, in CPU seconds, needed to perform a single vector descriptor multiplication. The column "PEPS/Compact storage" shows the results of the current version of PEPS in which the algorithms described in this paper have been incorporated. Note that for the Mutex example, the first case P = 14, which has no functions (the number of processes being equal to the number of resources), the time is smaller than that required for other values of P. Furthermore, the time needed for any of the other values of P is the same. This is because the product state space and the number of function evaluations remain unchanged.

The column "PEPS/2D storage" shows the results when a regular, two-dimensional storage format is used to store the transition matrices of the automata. The CPU times are slightly higher than the preceding ones. For the Mutex example, the local transition matrices are full. The difference is due to the fact that in both methods, the diagonal is extracted and treated apart. So the gain is visible, the sparse version takes advantage of the zeros in the diagonal.

In the column "PEPS/Old version", the function handling procedure described in Plateau and Atif [1991] is used to handle the functions. The results are significantly worse except in the case when P = 1. In this case, the reachable state space is very small, and so the decompositional method of handling functions results in a small number of functionless terms. However, this is not a case in which one would expect to make use of the SANs methodology. In the other Mutex examples, the number of terms in the descriptor is of the order of the size of the reachable state space!

7.2. EXPERIMENT SET #2. The second set of experiments was designed to investigate:

--The proportion of time taken by function evaluations.

--The impact of the reachable state space size (versus the product state space size) in comparison with the sparse method.

The suite of examples chosen for experiment set #1 was also used for this set. The results obtained are presented in Table II.

TABLE II. NUMERICAL RESULTS FROM SECOND SET OF EXPERIMENTS PEPS Models CPU Function Memory N = 14 P = 14 0.25 s 0.00 s 1 Kb N = 14 P = 13 3.87 s 3.62 s 1 Kb N = 14 P = 8 3.87 s 3.62 s 1 Kb N = 14 P = 1 3.87 s 3.62 s 1 Kb [C.sub.1] = 0.14 s 0.02 s 3 Kb [C.sub.2] = [C.sub.3] = 10 [C.sub.1] = 0.13 s 0.02 s 3 Kb [C.sub.2] = 8, [C.sub.3] = 10 Sparse Models Size Nonzeros CPU Memory N = 14 P = 14 16 384 245 760 0.13 s 2 944 Kb N = 14 P = 13 16 383 245 731 0.13 s 2 943 Kb N = 14 P = 8 12 911 175 647 0.07 s 2 108 Kb N = 14 P = 1 15 43 0.00 s 0 Kb [C.sub.1] = 5 500 29 800 0.03 s 370 Kb [C.sub.2] = [C.sub.3] = 10 [C.sub.1] = 4 992 26 720 0.01 s 332 Kb [C.sub.2] = 8, [C.sub.3] = 10

The PEPS column "CPU" lists the total computation time needed while the column entitled "Function" indicates the amount of time spend in function evaluation. These results clearly show that most of the computing time is spent in function evaluation. The time required by the sparse method is less that that required by PEPS. It is about half as long when the model contains no functions and considerably less in the presence of functions.(2) Notice also that the time required by the sparse method diminishes with P. This is due to the fact that the reachable state space decreases with P, but not the product state space.

For the case P = 14, it is interesting to note that, although the number of arithmetic operations is exactly the same in PEPS and in the sparse approach, the time needed is greater in PEPS. The use of PEPS does not generate any savings (this is always the case in a SAN without synchronizing events); the additional cost is the overhead due to loop management. For other values of P, the difference between function evaluation and the rest is constant (=0.25).

Finally, observe that the memory needed to store the descriptor in PEPS is very small compared to that required by the sparse approach.

7.3. EXPERIMENT SET #3. As could be seen from the previous set of experiments, the time required for function evaluation is very important and therefore it behooves us to try to reduce this time. Two possibilities come to mind.

(a) Find a better way than decomposition to handle functions. This has been the primary focus of our paper.

(b) Group automata to reduce the number of function evaluations required, as indicated in Section 3.4.

Our third set of experiments, presented in Table III, shows the impact of grouping and the resulting memory/time trade-offs.

TABLE III. NUMERICAL RESULTS FROM THIRD SET OF EXPERIMENTS B = 14 B = 7 Models Iters CPU Mem. CPU Mem. N = 14 P = 13 172 it. 3.87 s 1 Kb 2.04 s 2 Kb N = 14 P = 8 70 it. 3.87 s 1 Kb 3.87 s 2 Kb B = 4 B = 2* [C.sub.1] = 456 it. 0.14 s 3 Kb 0.03 s 19 Kb [C.sub.2] = [C.sub.3] = 10 [C.sub.1] = 430 it. 0.13 s 3 Kb 0.02 s 15 Kb [C.sub.2] = 8, [C.sub.3] = 12 B = 4 B = 2 Models CPU Mem. CPU Mem. N = 14 P = 13 0.80 s 5 Kb 0.19 s 48 Kb N = 14 P = 8 3.87 s 5 Kb 3.57 s 48 Kb B = 2* B = 1 [C.sub.1] = 0.80 s 18 Kb 0.03 s 370 Kb [C.sub.2] = [C.sub.3] = 10 [C.sub.1] = 0.66 s 17 Kb 0.01 s 332 Kb [C.sub.2] = 8, [C.sub.3] = 12

In this table, B indicates the number of groups formed. In the Mutex example, B = 14 implies no grouping; that is, the 14 automata are kept separate. Experiments were conducted with both 13 and 8 resources. In the queuing network example, B = 4 implies that no grouping was performed; B = 1 indicates that all automata were combined into a single group. Furthermore, in the queuing network example, there is a good grouping, denoted by B = 2*, which decreases the number of function evaluations and a bad grouping, denoted by B = 2*, which increases the number of function evaluations. Further details may be found in Fernandes et al. [1996].

The general tendency is that as automata are pulled together into fewer, but larger groups, CPU time decreases and memory (for the descriptor) increases. In the first case, P = 13, the CPU time decreases right from the beginning because grouping removes function evaluations. For example, when the groups are formed by combining pairs of automata, if one automata of the group does not hold a unit of the resource, the other automaton in the group is sure to get one (as there are 13 resources). When the number of resources is 8, this cannot be guaranteed, and the number of function evaluations (and hence the CPU time) does not decrease until B = 2.

Notice also that in the case B = 2, P = 13, the CPU time needed by PEPS is almost equal to that required by the sparse method, (0.19 seconds compared to 0.13), with a substantial reduction in memory.(3)

Finally we wish to point out that the number of iterations to converge to the same prespecified tolerance criteria, is identical for both PEPS and the sparse matrix approach, since exactly the same multiplication is performed by each method.

7.4. EXPERIMENT SET #4. The fourth and final set of experiments is to demonstrate the SAN approach on examples that were too large to be solved by the sparse method on our machine. Table IV shows the parameters used and the number of iterations required to achieve a precision of

[parallel][x.sup.(k+1)] - [x.sup.(k)][[parallel].sub.[infinity]]/ [parallel][x.sup.(k)][[parallel].sub.[infinity]] [is less than] [10.sup.-8].

TABLE IV. NUMERICAL RESULTS FROM FOURTH SET OF EXPERIMENTS PEPS Total Iter. Models Iterations Time Time Memory N = 20 P = 12 B = 4 106 it. 46 354 s 437 s 18 Kb N = 20 P = 12 B = 2 106 it. 46 725 s 441 s 528 Kb N = 20 P = 19 B = 4 254 it. 10 810 s 43 s 18 Kb N = 20 P = 19 B = 2 254 it. 4 063 s 16 s 528 Kb [C.sub.1] = 20 988 it. 5 438 s 5.50 s 13 Kb [C.sub.2] = 20 [C.sub.3] = 50 [C.sub.1] = 30 1076 it. 13 329 s 12.39 s 239 Kb [C.sub.2] = 30 [C.sub.3] = 50 [C.sub.1] = 20 1100 it. 12 863 s 11.69 s 232 Kb [C.sub.2] = 30 [C.sub.3] = 60 Sparse Models Size Nonzeros N = 20 P = 12 B = 4 910 596 18 114 756 N = 20 P = 12 B = 2 910 596 18 114 756 N = 20 P = 19 B = 4 1 048 575 22 020 055 N = 20 P = 19 B = 2 1 048 575 22 020 055 [C.sub.1] = 20 510 000 2 938 600 [C.sub.2] = 20 [C.sub.3] = 50 [C.sub.1] = 30 1 147 500 6 687 600 [C.sub.2] = 30 [C.sub.3] = 50 [C.sub.1] = 20 1 098 000 6 370 200 [C.sub.2] = 30 [C.sub.3] = 60

8. Conclusion

Our purpose in this paper has been to provide a series of theoretical results that permit a reduction in the time required to compute the product of a vector and a sparse SAN descriptor. The SAN methodology requires a minimum of memory for representing a Markovian model of a complex system, but loses somewhat on the computational side of the scale. The results in this paper help to redress this imbalance. However, more work needs to be done. We have already mentioned that it is likely that elements that compute to zero can be detected early so that unnecessary computation may be avoided. In addition, many small automata may be grouped together to form larger automata. This will increase the memory requirements of the SAN, but it will still be insignificant compared to the requirements of the sparse matrix approach. As indicated in Section 7, this can lead to substantial reductions in computation time. Furthermore, if it is possible to group automata in such a way as to minimize functional transitions and synchronizing events, this helps even further. As our theorems show, it is especially important to compress the set of automata for which the functional dependency graph is cyclic. Properties like symmetry and lumpability could also be used for this purpose. Our final comment must be that SANs are a viable modelling concept, that they offer great promise for the future and that current difficulties with computation time are being overcome. Our results help in this effort.

(1) See, for example, Buchholz [1992, 1993]; Chiola et al. [1993]; Franceschinis and Muntz [1993]; Sanders and Meyer [1991]; and Simone and Marsan [1991].

(2) At this point in our set of experiments, we did not attempt to minimize the effect of functions.

(3) Both methods also require two vectors.

REFERENCES

ABDEREZAK, T. 1992. Resolution des Modeles Markoviens sur Machines a Memoires Distribuees. These de Docteur de l'Institut National Polytechnique de Grenoble, 21 September 1992, Grenoble, France.

ATIF, K. 1992. Modelisation du Parallelisme et de la Synchonisation. These de Docteur de l'Institut National Polytechnique de Grenoble, 24 September 1992. Grenoble, France.

BACCELLI, F., JEAN-MARIE, A., AND MITRANI, I., EDS. 1995. Quantitative Methods in Parallel Systems. Part I: Stochastic Process Algebras. Basic Research Series. Springer-Verlag, New York.

BALBO, G., BRUELL, S., AND SERENO, M. 1994. Arrival theorems for product-form stochastic Petri nets. In Proceedings of the ACM SIGMETRICS Conference 1994 (Nashville, Tenn., May). ACM, New York, pp. 87-97.

BERGE, C. 1970. Graphs et Hypergraphes. Dunod, Paris.

BOUCHERIE, R. 1994. A characterization of independence for competing Markov chains with applications to stochastic Petri nets. IEEE Trans. Softw. Engin. 20, 536-544.

BUCHHOLZ, P. 1992. Hierachical Markovian models--Symmetries and aggregation. In Modelling Techniques and Tools for Computer Performance Evaluation, R. Pooley, ed. J. Hillston, Edinburgh, Scotland, pp. 234-246.

BUCHHOLZ, P. 1993. Aggregation and reduction techniques for hierachical GCSPNs. In Proceedings of the 5th International Workshop on Petri Nets and Performance Models (Toulouse, France, Oct.). IEEE Press, Los Alamitos, Calif., pp. 216-225.

BUCHHOLZ, P. 1995. Equivalence relations for stochastic automata networks. In Computations with Markov Chains; Proceedings of the 2nd International Meeting on the Numerical Solution of Markov Chains. W. J. Stewart, Ed. Kluwer International Publishers, Boston, Mass.

BUIS, P., AND DYKSEN, W. 1996. Efficient vector and parallel manipulation of tensor products. ACM Trans. Math Softw. 22, 18-23.

CHIOLA, G., DUTHEILLET, C., FRANCESCHINIS, G., AND HADDAD, S. 1993. Stochastic well-formed colored nets and symmetric modeling applications. IEEE Trans. Comput. 42, 11, 1343-1360.

CIARDO, G., AND TRIVEDI, K. 1991. Solution of large GSPN models. In Numerical Solution of Markov Chains. W. J. Stewart, ed. Marcel Dekker Publisher, New York, pp. 565-595.

DAVIO, M. 1981. Kronecker products and shuffle algebra. IEEE Trans. Comput. C-30, 2, 1099-1109.

DONATELLI, S. 1993. Superposed stochastic automata: A class of stochastic Petri nets with parallel solution and distributed state space. Perf. Eval. 18, 21-36.

FERNANDES, P., PLATEAU, B., AND STEWART, W. J. 1996. Numerical issues for stochastic automata networks. INRIA Res. Rep. No. 2938. July. Available in postscript form by anonymous ftp from ftp.inria.fr.

FOURNEAU, J.-M., AND QUESSETTE, F. 1995. Graphs and stochastic automata networks. In Computations with Markov chains; Proceedings of the 2nd International Meeting on the Numerical Solution of Markov Chains. W. J. Stewart, ed. Kluwer International Publishers, Boston, New York.

FRANCESCHINIS, G., AND MUNTZ, R. 1993. Computing bounds for the performance indices of quasi-lumpable stochastic well-formed nets. In Proceedings of the 5th International Workshop on Petri Nets and Performance Models (Toulouse, France, Oct.). IEEE Press, Los Alamitos, Calif., pp. 148 -157.

HENDERSON, W., AND LUCIC, D. 1993. Aggregation and disgreggation through insensitivity in stochastic Petri nets. Perf. Eval. 17, 91-114.

HERMANNS, H., AND RETTLEBACH, M. 1994. Syntax, semantics, equivalences, and axioms for MTIPP. In Proceedings of the 2nd Workshop on Process Algebras and Performance Modelling. U. Herzog, M. Rettlebach, eds. Arbeitsberichte, Band 27, No. 4, Erlangen, Nov.

HILLSTON, J. 1995. Computational Markovian modelling using a process algebra. In Computations with Markov Chains; Proceedings of the 2nd International Meeting on the Numerical Solution of Markov Chains. W. J. Stewart, ed. Kluwer International Publishers, Boston, Mass.

KEMPER, P. 1995. Closing the gap between classical and tensor-based iteration techniques. In Computations with Markov Chains; Proceedings of the 2nd International Meeting on the Numerical Solution of Markov Chains. W. J. Stewart, ed. Kluwer International Publishers, Boston, Mass.

PLATEAU, B. 1985. On the stochastic structure of parallelism and synchonization models for distributed algorithms. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modelling of Computer Systems (Austin, Tex., Aug.). ACM, New York, pp. 147-153.

PLATEAU, B., AND ATIF, K. 1991. Stochastic automata network for modelling parallel systems. IEEE Trans. Softw. Engin. 17, 10, 1093-1108.

PLATEAU, B., AND FOURNEAU, J. M. 1991. A methodology for solving Markov models of parallel systems. J. Paral. Dist. Comput. 12, 370-387.

PLATEAU, B., FOURNEAU, J. M., AND LEE, K. H. 1988. PEPS: A package for solving complex Markov models of parallel systems. In Modelling Techniques and Tools for Computer Performance Evaluation. R. Puigjaner and D. Potier, eds. (Spain, Sept.).

PLATEAU, B., AND STEWART, W. J. 1996. Stochastic automata networks: Product forms and iterative solutions. INRIA Res. Rep. No. 2939, July. Available in postscript form by anonymous ftp from: ftp.inria.fr.

SANDERS, W. H., AND MEYER, J. F. 1991. Reduced base model construction methods for stochastic activity networks. IEEE J. Sel. Areas Commun. 9, 1, 25-36.

SIEGLE, M. 1992. On efficient Markov modelling. In Proceedings of the QMIPS Workshop on Stochastic Petri Nets (Sophia-Antipolis, France, Nov.). pp. 213-225.

SIMONE, C., AND MARSAN, M. A. 1991. The application of the EB-equivalence rules to the structural reduction of GSPN models. J. Paral. Dist. Comput. 15, 3, 296-302.

STEWART, W. J. 1976. MARCA: Markov chain analyzer. IEEE Computer Repository No. R76 232. Also IRISA Publication Interne No. 45. Universite de Rennes, Rennes, France.

STEWART, W. J. 1994. An Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton, N.J.

STEWART, W. J., ATIF, K., AND PLATEAU, B. 1995. The numerical solution of stochastic automata networks. Europ. J. Operat. Res. 86, 3, 503-525.

RECEIVED JANUARY 1995; REVISED FEBRUARY 1996; ACCEPTED JANUARY 1998

The research of P. Fernandes and B. Plateau was supported by the (CNRS-INRIA-INPG-UJF) joint project Apache, and CAPES-COFECUB Agreement (Project 140/93).

The research of W. J. Stewart was supported in part by National Science Foundation (NSF) grants DDM 89-06248 and CCR 94-13309.

Authors' present addresses: P. Fernandes, Informatics Institute, PUC/RS, Brazil; B. Plateau, IMAG-LMC, 100 rue des Mathematiques, 38041 Grenoble cedex, France; W. J. Stewart, Department of Computer Science, North Carolina State University, Raleigh, NC 27695-8206.

Printer friendly Cite/link Email Feedback | |

Author: | FERNANDES, PAULO; PLATEAU, BRIGITTE; STEWART, WILLIAM J. |
---|---|

Publication: | Journal of the Association for Computing Machinery |

Geographic Code: | 1USA |

Date: | May 1, 1998 |

Words: | 14706 |

Previous Article: | Time to Publication: A Progress Report. |

Next Article: | Lower Bounds for Distributed Coin-Flipping and Randomized Consensus. |

Topics: |