Similarly, groups can be used in distributed computing systems to help master the complexity of large applications or to help provide non-functional properties, such as availability or security. Computation groups could be limited to the convenience of collectively designating a set of processes' using a common name or address. Such facilities are already offered in many local-area networks (LANs), and we are all accustomed to using Internet news groups or mailing lists. The full benefits of the group concept, however, can be reaped only if we know how to set up and coordinate groups of processes that work together to fulfill a common purpose, like sharing a computational load, increasing performance, or providing a fault-tolerant service. This special section presents some of the current ideas on how such groups can be created and managed.
What facilities or semantics a group management or group communication service should provide is still the subject of much debate within the research community. The ease with which a given group's service semantics can be provided or, indeed, whether or not such a service can be implemented at all, depends heavily on what can be said or assumed about the computation environment in which the service is to be provided. The most important assumptions concern how processes can fail and how well they communicate with each other.
The strongest and most common assumption about process failures is the crash failure model--a process acts in full accordance with its specification until it suddenly ceases all activity. A crash failure model is appropriate if the probability of less well-behaved failures can be neglected. A crash failure model might be appropriate, for example, for a general-purpose computing network in which the most common problem is the unavailability of certain sites. In such an environment, any service that can tolerate process crash failures provides a useful improvement in dependability over one that cannot tolerate any failures at all. A crash failure model can also be appropriate for ultra-dependable distributed systems if nodes have extensive built-in self-checking.
At the other end of the failure spectrum is the arbitrary failure model in which no restrictive assumptions are made about the way processes can fail. For example, they could fail by sending erroneous messages, by saturating the network, or even by colluding with other faulty processes to bring down the system. This is a "worst-case" failure model that frees the system designer of any obligation to justify the realism of a more restrictive assumption. It is particularly appropriate for building ultra-dependable systems or for dealing with processes under the control of a malicious intruder. Unfortunately, protocols that can tolerate such arbitrary failures require more redundancy and more messages than if they were designed to tolerate only crash failures; they are also much more difficult to design and validate.
The strongest assumption that can be made about inter-process communication is that any message sent by a correct process to another correct process is always received within a given delay--the so-called synchronous communication assumption. The nice thing about this assumption is that one process can reliably detect whether another process is alive just by sending it a query and waiting a known bounded time for a response. Unfortunately, in a system where processes must communicate over a shared network, such perfection is guaranteed only with a certain probability, by using multiple communication paths and/or message retransmissions. Often, however, it is impossible to give even a probabilistic guarantee, since the actual load on the network may be totally unpredictable.
The opposite approach is to consider that there is no known limit on the time it takes for a message to reach its destination. Protocols designed without knowledge of time limits could easily be ported from one environment to another, since they would operate correctly whatever the performance of the network. Unfortunately, with such totally asynchronous communication, a process cannot decide whether another process has crashed or whether its query or the expected response is still on its way across the network. In practice, it is essential to introduce some notion of time so that processes know how long to wait for an expected response before suspecting that the originator of the response might have failed.
Note, however, that suspicion of a crash is not the same as detection of a crash; the suspected process might still be perfectly healthy. It is easy to see, therefore, that it is impossible to achieve any sort of deterministic agreement between correct processes. The best that can be done in such an environment is to ensure that certain safety properties are guaranteed whatever the communication delays or safety properties, and that useful progress is made whenever the network performs well enough for processes to communicate with each other in a timely manner.
It might be instructive at this point to draw a few analogies between groups of processes and groups of people. Imagine that a committee meeting is convened and attendees gather round a table in a meeting room. It's a long meeting, so the attendees get very tired, and some of them doze off now and again. However, when they are awake they can easily see who else is awake, since all are sitting round the same table. With a little organization (a protocol), those that stay awake can (for example) take turns to address the meeting, and they should all know who else is awake, what they heard, and what they should have learned.
This analogy illustrates several points:
* There are at least three sorts of groups to be
considered: the people eligible to attend the meeting
(the committee); those who attend the meeting (the
attendees); and those who participate in the
committee's work at a given instant because they were
awake (the participants).
* The meeting room setting is analogous to the
synchronous communication model discussed earlier in
which communication is reliable and timely, and it is
easy for people (processes) to detect whether some of
them have fallen asleep (crashed). Thus, they can make
strong statements about who is
awake (the current membership of the group of
participants), what they have heard (the messages
delivered), and what they have all learned
(changes to internal state resulting from the order of
message delivery). * It shows that we must also worry about how new
attendees (people who join the meeting after it has
started) and recovered attendees (participants who fall
asleep and later awaken) are brought up to
date with what has been decided while they were
absent or asleep.
Now let us consider another setting for the committee meeting. Let us suppose that, instead of meeting round a table in a quiet room, the committee tries to conduct its business in a large and very busy hotel lobby. Because of the hustle and bustle, the attendees cannot always see or talk directly to one another. Even when one attendee can see another attendee, it's not certain that the latter is looking at the former. Some of the attendees could fall asleep or go home without the others ever noticing. It's quite plain that in this setting the committee has a much harder job to process its agenda in some consistent way. We can suppose that the attendees try to gather together to get some work done, but to do so they also have to reach some sort of agreement about who they all think are in their particular gathering (for example, to decide who will act as chair).
As people drift in and out of one another's sight, they have to successively reach new decisions about who is in their gathering. Furthermore, there could be many such gatherings in different parts of the lobby at the same time. In this case, different gatherings of attendees could end up making conflicting decisions. The only way to avoid such conflicts is to impose a rule stating, for example, that only gatherings with a majority of committee members are allowed to make any decisions. Alternatively, the committee could attempt to reconcile conflicting decisions whenever two or more gatherings are able to merge and form a new gathering.
This meeting-in-a-crowded-lobby scenario is analogous to the asynchronous communication model discussed earlier. It illustrates several points:
* Since people (processes) cannot reliably detect whether
some of them are absent or, equivalently, have fallen
asleep (crashed), they cannot decide exactly who is
attending the meeting.
* It is impossible to prevent the attendees from splitting
into separate sub-meetings (gatherings). Such
gatherings or groups of participants are sometimes
called "views" of the meeting, since the participants
consider that their gathering is the meeting. This logical
partitioning of "meetings" is unavoidable in
a synchronous settings, so asynchronous group
protocols must be able to deal with it.
* When an attendee joins an existing gathering (and thus
forms a new, larger gathering), the participants have to
work out what this new participant already knows about
the committee's work and bring him or her up to date.
This reporting has to be done either if the new
participant just woke up from a nap (recovered from a
crash) or if he or she lost touch with an earlier gathering
(became disconnected). In an asynchronous setting, it's
difficult to tell the difference. If the participants are not
going to have to constantly tell "new" participants
everything they have done since the beginning of time,
they have to remember (keep on table storage) what
they did in earlier gatherings, and the protocol has to
ensure that all work done in successive gatherings is
done in some consistent fashion.
Over about the last decade, there has been considerable research into the management of process groups and protocols for communication within and between such groups. The articles in this special section present some of the prototype systems and current research activities typical of the prevalent ideas in this fascinating area.
The article by Louise Moser, Michael Melliar-Smith, and their team at the University of California, Santa Barbara, describes the Totem system, which provides a totally ordered multicast service to application process groups. It is particularly suitable for supporting fault-tolerant soft real-time applications. Totem is a scalable system built using a hierarchy of group communication protocols for groups of processors on a LAN, for groups of interconnected LANs, and for groups of application processes.
In their article on the Transis system, Danny Dolev and Dalia Malki from the Hebrew University of Jerusalem consider some of the difficult problems that arise in diverse network settings. The authors discuss how different components of a partitioned network can operate autonomously and then merge operations when they become reconnected (such partitioned operation is of special interest in mobile applications). They also consider the need for different protocols for fast local communication and for the unavoidably slower communication between local clusters.
Most articles in this special section assume that processes fail only by crashing or by just being slow. The exception is the short article by Mike Reiter from AT&T Bell Laboratories, sketching some of the novel group communication ideas embodied in the Rampart system to provide tolerance for malicious intrusion. Here, faulty processes can collude and fail in quite arbitrary fashion, and groups are used to mask the malicious actions of a minority of the group members.
There are many different ways of defining and using process groups. Features that may be useful in one setting may be impediments in others. The Horus system, described in the article by Robbert van Renesse, Ken Birman, and Silvano Maffeis of Cornell University, aims to provide a very flexible environment for system programmers to configure group protocols specifically adapted to the problem at hand.
The last two articles do not describe specific systems but address instead the fundamentals of group communication. Andre Schiper (EFPL, Switzerland) and Michel Raynal (IRISA, France) trace some interesting directions for future research into various sorts of process groups. In particular, they consider the differences between groups for replicated, fault-tolerant objects and groups for implementing atomic transactions. They propose a multicast primitive for carrying out a specific class of transactions on a set of replicated, fault-tolerant objects.
Flaviu Cristian of the University of California, San Diego, compares the properties of group communication protocols for the synchronous and asynchronous communication models. This comparison underlines the advantages and drawbacks of both models and should be considered essential reading for anyone interested in group communication.
DAVID POWELL is Directeur de Recherche CNRS at the Laboratoire d'Analyse et d'Architecture des Systemes where he works in the Dependable Computing and Fault Tolerance Research Group. Current Address: LAAS-CNRS, 7 Avenue du Colonel Roche, 31077 Toulouse, France; email: firstname.lastname@example.org
|Printer friendly Cite/link Email Feedback|
|Publication:||Communications of the ACM|
|Date:||Apr 1, 1996|
|Previous Article:||A learner-centered tool for students building models.|
|Next Article:||Totem: a fault-tolerant multicast group communication system.|