Printer Friendly

Continuous functioning of soft system bus based centralized persistent computing system.

1 INTRODUCTION

Modern society is more and more dependent on various computing systems that are required to provide their services continuously. Therefore, the systems should have continuously available, reliable, and secure functioning. Modern computing systems like ubiquitous computing, autonomous computing, evolutionary computing, anticipatory reasoning reactive systems [9], many software systems belonging middleware are required to provide services in anytime and from anywhere basis. Traditional computing systems often have to stop its running and service when it has some trouble, it is attacked, and it needs to be maintained, upgraded, or reconfigured. As a result, no traditional computing systems can run continuously without stopping their services [5, 6, 8]. In recent years, some research area has growing attention to develop computing systems with consideration of fault-tolerant systems, pervasive middleware, self-adaptive systems [2, 4, 15]. However, those research areas have concentration for large-scale distributed system applications only. They do not draw any attention for small-scale centralized systems applications. How a centralized computing system provides continuous computing services is never taken into account.

On the other hand, Cheng [7] proposed Persistent Computing Systems (PCSs) as a new generation of reactive systems to provide reliable and secure computing services continuously and persistently. PCSs are computing systems that can provide services continuously without stopping its reactions even when it is being maintained, upgraded or reconfigured or even when it is being attacked. PCSs have two fundamental characteristics:--(1) persistently continuous functioning, i.e., the systems can function continuously without stopping its reactions, and (2) dynamically adaptive functioning, i.e., the systems can be dynamically maintained, upgraded, or reconfigured during its continuous functioning. Here "function" means "provide correct computing service to end-users" and "reaction" means "react to the outside environment". The second

characteristics "dynamically adaptive functioning" requires that the PCSs have to provide continuous functioning during maintenance, upgrade of the system. These two characteristics refers that "continuous functioning" is the main requirement of PCSs, where "dynamically adaptive functioning" is the sub-requirement of "continuous functioning". From these two characteristics we came to the conclusion that "continuous functioning" is the base of a PCS where "continuous functioning" includes both of the fundamental characteristics of PCSs. The term "continuous functioning" means, the functional components of target applications can perform their tasks continuously and persistently. Since various target applications will use the SSB-based CPCS as a platform, by using SSB-based CPCS their functional components can be able to provide their services continuously and persistently, even during the SSB-based CPCS is being upgraded or maintained.

PCSs can be classified into two types: distributed PCSs (DPCSs) and centralized PCSs (CPCSs). A distributed system consists of multiple computers that communicate through a computer network, even spread over large geographical area. A centralized system is a system where all major computing and/or tasks are done at a central computer and the processing is also controlled in the central computer.

The Soft System Bus based system (SSB-based system) [5] was proposed to implement persistent computing systems. An SSB-based system consists of a group of control components (CCs) which performs some specific tasks of the system, a group of functional components (FCs) to carry out tasks of the target application, and Soft System Bus (SSB) to transfer data among components in a component based system. Conceptually, an SSB is simply a communication channel with the facilities of data/instruction transmission, preservation and to connect components in a component-based system. An SSB consists of some data/instruction stations (DISs) which are buffers to preserve data/instruction temporarily. The most important requirement of SSB is that an SSB must provide the facility of data/instruction preservation such that when a component in a system cannot work well temporarily all data/instructions sent to that component should be preserved in some DISs and redeliver those data/instructions when that component resumes to get connected with the SSB-based system. Therefore, other components in the system should work continuously without interruption.

An SSB-based system should be applied to both of distributed systems and centralized systems. Selim et al. [10, 11, 12, 13] proposed and simulated an SSB-based system for distributed persistent computing systems. They used peer-to-peer network based middleware as SSB. This implementation method cannot be applied directly to SSB-based systems for centralized persistent computing systems (SSB-based CPCSs) because of data replication technique and DIS availability technique which is described briefly in section 2.3.

As we mentioned earlier, "continuous functioning" is the base, or main feature of PCSs, the purpose of this paper is to propose a primitive SSB-based CPCS that can satisfy the feature continuous functioning, e.g., continuous and persistent functioning without stopping its reactions and continuous functioning during maintenance, upgrade of the system, because without this feature a computing system cannot be regarded as a PCS. Our main objectives are to analyze requirements of an SSB-based CPCS satisfying the feature, propose a design for it, and evaluate the design.

The rest of the paper is organized as follows. Section 2 represents an overview of persistent computing system, SSB-based system and earlier works on it. Section 3 shows the analysis of the primitive requirements of SSB-based CPCS that satisfies the main feature of PCSs. Section 4 shows a design for the primitive SSB-based CPCS. Section 5 shows the evaluation of our proposed design and concluding remarks are given in Section 6.

2 PERSISTENT COMPUTING SYSTEM AND SSB-BASED

SYSTEM

2.1 Persistent Computing System

A reactive system is a computing system that maintains an ongoing interaction with its environment. A persistent computing system (PCS) is a new generation of reactive system that has two key characteristics or fundamental features:

(1) persistently continuous functioning, i.e., the systems can function continuously and persistently without stopping its reactions, and

(2) dynamically adaptive functioning, i.e., the systems can be dynamically maintained, upgraded, or reconfigured during its continuous functioning and reacting.

Here "function" means "provide correct computing service to end-users" and "reaction" means "react to the outside environment". If a computing system cannot react at all then it will be called a dead state. The dead state is undesirable in a PCS. Therefore, PCSs can be defined as a reactive system which will never be in a dead state such that it can evolve into a new functional state in autonomous or controlled way, or can be recovered into a functional state by some way.

2.2 SSB-based System

Figure 1 shows the architecture of SSB-based systems. The most important requirement of SSB is that an SSB must provide the facility of data/instruction preservation and redelivery. Data/instruction preservation means, when a component in a system cannot work well temporarily all data/instructions sent to that component should be preserved in some DISs until that component resumes to get designated data/instructions. Therefore, other components in the system should work continuously without interruption. Data/instruction redelivery means, after preserving data/instruction for a component that is temporarily unavailable, the DISs should redeliver those preserved data to that component when that component gets connected with the SSB-based system.

SSB and the CCs are the most general and basic elements of SSB-based systems such that any target software application system can be designed by connecting FCs and the CCs with SSB [5]. That means if SSB and CCs can be implemented as a package, then any target software application can provide continuous and persistent services by connecting its FCs with the package.

2.3 Earlier Work

Selim et al. [10, 11, 12, 13] implemented a prototype of SSBs for large-scale distributed systems. To evaluate their design, they built PCS in simulation environment. Their PCS used peer-to-peer network based middleware as SSBs. Using the simulator, they showed that the PCS can ensure the feature of availability, continuous functioning even during the case of broker or application failures. They also solved the runtime upgrade and maintainability, i.e., dynamic adaptability feature partially. They compared their SSB with Microsoft Message Queuing Middleware (MSMQ) and showed that their middleware is especially appropriate for a network of large number of brokers deployed in reasonably large geographic area. They compared their proposed SSB with other messaging middlewares and proved that their proposed middleware is more suitable to use as an SSB for large-scale PCSs than other middlewares.

[FIGURE 1 OMITTED]

In that work, the components had the ability to connect to any DIS. Thus, if a DIS fails or if it is disconnected for upgrade or maintenance, the components connected to it always can connect to another DIS. If a DIS failed, its neighbor DIS took its responsibility. Hence, the components were supposed to get services apparently all the time. SSB designed in this way also ensured preservation of a message if it could not be delivered immediately to the destination component. The preservation process included replicating the message in the neighboring DISs.

Although an SSB-based system should be applied to both of distributed PCSs and Centralized PCSs, Selim's implementation method cannot be applied directly to centralized systems because of their replication technique and DIS availability technique. According to their design, each DIS had some slave DISs. Data were replicated to those slave DISs where the slave DISs were located in different computers and even in different locations. Therefore, they used main memory replication technique to replicate the data. That means data were replicated from main memory of one computer to the main memory of another computer. This replication system is useful for distributed systems. In centralized systems, only one computer is used. Therefore, the DISs are definitely located in the memory of that computer. That means, main memory replication technique is not meaningful for centralized systems. For the availability of DISs in earlier works, since each DIS had two slave DISs, if a DIS would fail, slave DIS would take responsibility of the failed DIS. Moreover, since the master DIS and slave DISs were in different geographic locations, there was a rare possibility that both the master DIS and the slave DISs would fail at the same time. In centralized systems, since, all DISs have to located in the same computer, there is a huge possibility of failure of master and slave DISs at the same time. Therefore, continuous functioning of CPCS should be approached in different way that we are to solve in this paper.

3 REQUIREMENT ANALYSIS OF SSB-BASED CENTRALIZED PERSISTENT COMPUTING SYSTEM

Our requirement analysis of SSB-based CPCSs was done under the following assumptions--

1. The hardware and operating system of a CPCS do not fail.

2. The computer has enough computing resources, e.g., the amount of main memory, CPU speed, storage, etc.

Our ultimate goal is to realize SSB-based CPCSs. However, as the first step, this paper proposes primitive SSB-based CPCSs that provide only the main feature of PCSs, i.e., "continuous functioning". In this section, our target is to find out the requirements for the primitive SSB-based CPCSs.

To define requirements of the primitive SSB-based CPCSs, we analyzed events preventing continuous functioning during communication among components. To conduct this analysis we considered communication among components under two kinds of communication topologies: one-to-one and one-to-many. We considered four kinds of component communication as the scenarios: FC-to-FC, CC-to-CC, CC to-FC and FC-to-CC. From this analysis we found four kinds of events for which continuous functioning of a system interrupts and to prevent those events we identified six requirements for the primitive SSB-based CPCS. Table 1 shows the list of events we found during our analysis.

* Source/Destination DIS Failed: In an SSB-based system, all components must have to communicate with other components in the system through DISs, where each component has one or more DISs. The sender component has to send data to its DIS, i.e., source DIS. The source DIS then send the data to the DIS of the receiver component, i.e., destination DIS. The destination DIS then sends this data to the receiver component. If the sender or receiver component has no DIS, or their DISs failed at runtime, the communication cannot occur. The event of DIS failure leads two things. First, before starting communication, if DIS of a component failed, that component cannot communicate or send/receive data to/from any other components in the SSB-based system. Suppose that, FC1 wants to send some data to FC2, but there is no DIS that FC1 can use. Then it cannot send data to FC2. Similarly, if FC1 sends some data for FC2 but there is no DIS that delivers the data to FC2, then FC2 cannot receive data. Second, if any DIS receives any certain data to deliver them to other components/DISs, and before delivering those data that DIS may fail. As a result data loss will also occur. For example, consider that FC1 sends some data to its DIS, say DIS1. DIS1 has to send this data to DIS2 which is DIS of FC2. Before DIS1 sends data to DIS2, DIS1 failed. Similarly, consider that DIS1 sends data to DIS2 but before DIS2 can send this data to FC2, DIS2 failed. In that case the communication will not succeed and the data that were in the DIS will also be lost. That means, due to DIS failure components cannot communicate and if DIS fails with undelivered data the communication will fail and data will be lost.

* Destination FCs temporary leave or failed: If in the middle of communication, any destination FC becomes temporarily unavailable, or it becomes failed, the communication cannot be completed. For the event of destination component's temporary leave, there is a possibility of the destination component to get connected again and start to receive services from the SSB-based CPCS. In this case, the destination component will be disconnected from the SSB-based CPCS for a short amount of time. For the event of destination component's failure, if the destination component is recoverable then it can get connected later, but in this case the destination component will be disconnected from the system for a longer period of time. For the event of leave of FCs or failure of FCs, the SSB-based CPCS has to wait until FCs are recovered by their developers.

* Destination CCs being maintained/upgraded/failed: CCs are the core part of the SSB-based systems. When CCs are the destination component for any incoming data, if they leave the system for upgrade/maintenance at runtime, or they failed, the communication will not succeed. Moreover, CCs performs very important tasks to maintain continuous functioning of an SSB-based CPCS, so due to their maintenance/upgrade or failure, their tasks may also be lost from the SSB-based CPCS. As a result, the continuous functioning of the SSB-based CPCS may fail.

To prevent the events explained above, we defined six requirements of the primitive SSB-based CPCS that are as follows--

R1--DISs should be available to all components so that any component can communicate in the SSB-based CPCS anytime. As we explained earlier, components are connected with the SSB based CPCS by their DISs. When a component joins in the SSB-based CPCS, at first it needs a DIS so that it can communicate with other components of the system. If any component has no DIS, or their DIS failed, they become disconnected from the SSB-based CPCS and cannot communicate in the system. They will also not be able to receive any data/instruction coming to them from other components. To ensure that, components of the SSB-based CPCS can provide continuous functioning, the SSB-based CPCS must ensure that the components have a DIS always available for them.

R2--Data loss should be protected so that if any DIS fails with undelivered data, that data can be restored and redelivered to the correct destination. The communication among components in the SSB-based system involves transmitting some data/instructions among them. Since all data/instructions among components are transmitting through their DISs, data loss may occur due to failure of DISs, as explained in event one. Therefore, there must have some way to protect data loss even if DISs failed with data.

R3--DISs should have the ability to store data if any destination component is temporary unavailable and it should have the ability to redeliver data when the component becomes active. As we explained in the third event, destination component may become disconnected from the SSB-based CPCS for a few amount of time due to temporary leave. Therefore, the DIS of that destination component should store all incoming data for it during it was disconnected and redeliver those data after the destination component get connected. A component may become disconnected from the SSB-based system for a longer period of time due to recovery of that component. After recovery, if the destination component reconnects to the SSB-based CPCS, the SSB-based CPCS should provide all incoming data to it that was stored during its absence.

R4--The SSB-based CPCS should not stop functioning during any control component (CC) leave for upgrade/maintenance. One feature of SSB-based system is the components can work independently in the system. If any CC leaves for maintenance/upgrade, or if any CC fails, the SSB-based CPCS does not stop all functioning.

R5--The tasks running on each CC should not be lost during upgrade/maintenance of CCs or due to failure of CCs. As CCs are the core part of SSB-based CPCS, their tasks and data should not be lost due to failure of them or due to their upgrade/maintenance. If any CC becomes connected again with the system, the SSB-based CPCS should provide the information of its unfinished task so that the CC can complete that task.

R6--If any CC fails, an SSB-based CPCS should have the ability to recover the failed CC. CCs are doing very important tasks in the SSB-based CPCS. If any CC fails, there must have some way to recover that failed CC in such a way that the recovered CC can provide the features as provided by the failed CC.

4 DESIGN OF SSB-BASED CENTRALIZED PERSISTENT COMPUTING SYSTEM

4.1 Architecture

Figure 2 represents the architecture of SSB-based CPCS. There are three control components (CCs) in our SSB-based CPCS--self-allocator component (Al), self-backup-restore component (Br), and self monitor component (Mo). Inside SSB there is an Interface-DIS (I-DIS) that provides interface for CCs and FCs, and allows them to join in the SSB-based CPCS. The self-allocator component is responsible to allocate DISs for FCs and CCs. Components send joining request to the I-DIS and I-DIS then asks the self-allocator to allocate a DIS for that component. After that, the self-allocator allocates DISs for the FCs and CCs to communicate with the system via DISs. Each component has allocation for a DIS. The self-allocator has a table includes the list of authentic FCs and CCs with their DIS IDs and DIS addresses and status of the DISs, respectively. The self-monitor component monitors the status of DISs and CCs and FCs. Since FCs are developed by the developer of target application, maintaining of FCs are the responsibilities of developers of target application. If any FC leaves temporary or fails, its recovery is the responsibility of developers of target application. If DIS of an FC or CC failed, the self monitor will notice that and asks the self-allocator to allocate a new DIS for the FC/CC. In this way the availability of DISs to FCs/CCs can be ensured. The self-backup-restore component performs the backup and restore operation. Each CC replicates the running task to the self-backup-restore and when the task completes, the self-backup-restore stores this task information to "Backup of CCs" (Backup CCs). The Backup CCs contains timestamps, task name/ID, related data/information, status of the task of all tasks for all CCs in a file in storage. If any CC leaves the system for upgrade/maintenance with any of its task left incomplete, that task can be completed later. When the CC becomes connected the self-backup-restore can fetch information about the incomplete task from Backup CCs. The self-backup-restore can then restore the task information to the CC so that the CC can continue and complete all of its tasks. The Backup CCs also contains code of each CC so that if any CC fails the code can be restored and run the code to recover the failed CC. At present implementation, the SSB-based CPCS need an administrator to run the code of failed CCs. To protect data loss due to DIS failures, there is a DIS Data Table (DISDT) that has log of data for each DIS in a file in storage. When a DIS receives any data, it replicates that data to the DISDT. If any DIS failed with any undelivered data, that data can be recovered from DISDT by the self-backup-restore component. The FCs, CCs, and SSB are located in the main memory of a computer. Since our assumption is hardware and operating system of a CPCS do not fail, the DISDT and Backup CCs are located in storage.

[FIGURE 2 OMITTED]

4.2 Design of Control Components

The self-allocator component is responsible to allocate DISs for FCs and CCs. Components send joining request to the I-DIS and I-DIS then asks the self-allocator to allocate a DIS for that component. The self-allocator also reallocates DISs for components if DIS of any component fails. The self-allocator has a data table named "Lookup Table" that contains a list of authentic FCs and CCs, their corresponding DIS IDs and the DIS status respectively. When the self-allocator asked by the I-DIS to allocate a new DIS for a specific component, the self-allocator generates a random number and generates ID for that component and for its DIS. The self-allocator generates IDs for FCs as FCXXXX, for CCs as CCXXXX and for DISs DISXXXX respectively, where XXXX is the random number generated by the self-allocator. The self-allocator then send these IDs to both to the component and to the DIS so that the component can use its allocated DIS for communicating data and DIS can identify the component as its parent component. The self-allocator also make a new entry in the Lookup Table for each new allocation and initially sets the DIS status as "Running". If any DIS failed, the self-monitor can detect this failure and inform this to the self-allocator. It then make reallocation of a new DIS for the replacement of new DIS and updates the Lookup Table. The representation of the Lookup Table is presented in Table 2.

The self-backup-restore component keeps information of running tasks of each CC. This component has a data table named "Task Table" that contains timestamp, task name, related data for the running tasks of each CC. Each time any CC starts any of its tasks, they replicate their task information to the self-backup-restore component. When the task finished, CCs send this information to the self-backup-restore component. This component contains only information of running tasks of CCs, their statuses and creates the backup of the task information to the Backup CCs. Table 3 shows the representation of the Task Table in self-backup-restore component.

This component also has the status of each CC and the addresses of backup location for each CC. If any CC is temporary leave for maintenance/upgrade without finishing any of its task, the self-backup-restore task to the CC when that CC becomes available. If any CC fails, that CC can be recovered by running its code from the Backup CCs. In that case the self-backup-restore can restore the task information from Backup CCs to the new CC so that the new CC can completes the unfinished tasks left by the old CC.

The self-monitor component monitors DISs, CCs and FCs. If any DIS or CC fails, the self-monitor reports about the failure to the responsible CCs and instructs them to recover those failure. The self-monitor can identify if any FC leaves the system. The SSB-based CPCS then stores all incoming data for that FC so that if that FC becomes active again, it can receive all of its incoming data. The self-monitor is monitoring DISs by sending simple "Hello" message to the DISs in every 30 milliseconds time slot, for example. Then each DIS replies to this message like "Hello from DISXXXX". The self-monitor sends these updates of the DIS statuses to the self-allocator component. If any DIS doesn't reply for the hello message, the self-monitor sets the DIS status as "Not Responding" and sends this information to the self-allocator. If that DIS doesn't reply within next two time slots, the self-monitor assumes that the DIS fails and informs the DIS status as "Failed" to the self-allocator.

Suppose that one of the DIS, say, DIS7523 failed at runtime. Then it cannot reply to the "Hello" message of the self-monitor component. After 30 milliseconds the self-monitor will send the status of that DIS as "Not Responding" to the self-allocator and the self-allocator updates this information in the Lookup Table.

After 3 time slots (30+30+30 = 90 milliseconds), the self-monitor send DIS7523 status as "Failed" and send this information to the self-allocator component. The self-allocator will then reallocate a new DIS for FC7523 and generates the new DISID as DIS7523-1 and update the Lookup Table.

Similarly, the self-monitor is monitoring CCs by continuously sending "Hello" messages and receiving the reply messages like "Hello from CCXXXX". The self-monitor updates the status of CCs to the self-backup-restore component. If any CC leaves for upgrade/maintenance, or fails, that CC cannot reply for the "Hello" message to the self-monitor component. If any CC is not responding for a certain period of time the self-monitor updates the CC status as "Not active". For monitoring the self-monitor component itself, the self-monitor updates its own status to the self-backup-restore component as long as it is active. If the self-monitor does not update its status for certain period of time, the self-backup-restore component assumes that this component is not active and changes the status of self-monitor as "Not active".

To run the SSB-based CPCS from the beginning, at first users have to run the I-DIS, self-allocator and self-backup-restore component. The DISDT and BackupCC will start automatically with the self-backup-restore component. In current implementation, all of these components are running on PORT 4444. If new components (FCs or CCs) are needed to join in the SSB-based CPCS, at first, they have to make a socket connection at port 4444. Then the I-DIS will ask the components to provide the authentication key. After completion of authentication process, the I-DIS will ask the self-allocator to allocate DISs for the components. Once a component joins in the SSB-based CPCS, the process of allocation, backup and replication will be done automatically by the three control components exist in the SSB-based CPCS. Before allocation of DISs, the self-allocator also generates a unique ID for the component and its DIS. If a component becomes disconnected from the SSB-based CPCS, when it comes back, it has to provide its ID to I-DIS. When components send joining request to I-DIS, the I-DIS gives two choices to the components. First one is if the component is a new component, the component has to provide authentication key. Second one is if the component is an old component, it has to give the command "Reconnect FC-ID" to the I-DIS where ID is the old ID of the FC.

4.3 Design of SSB

The SSB consists of some DISs and one Interface DIS (I-DIS). The feature of I-DIS is to provide interface to components to join in the SSB-based CPCSs. We propose a simple interface for components to join in the SSB-based CPCS. When any component wants to join, the I-DIS asks it to provide its name and the authentication key in the "Auth:XXX" format where "XXX" is the secret authentication number. The administrator of the whole CPCS will provide the authentication key to new FC beforehand. If the component can provide correct authentication number, I-DIS allows them to join in the SSB-based CPCS by sending the message "Auth:OK", otherwise I-DIS sends the result as "Auth:NG" to the components. Then the I-DIS asks the self-allocator to allocate a DIS for the component. The self-allocator then allocates a new DIS and generates ID for the DIS and the component and sends this information to the I-DIS. The I-DIS then conveys this message to the newly joined component. That component then can communicate in the SSB-based CPCS by using its DIS.

The features of DISs are same as defined in section 2.2 (e.g., routing, replicating, data delivery). To transfer data among components we studied various IPC methods [1, 3, 14] and found that no IPC methods can satisfy the requirements of SSB [16]. In our SSB-based CPCS, we used IPC Socket for communication. Sockets provide point-to-point, two-way communication between two processes. Sockets are very versatile and are a basic component of interprocess communication. A socket is an endpoint of communication between processes. Since our SSB-based CPCS is very simple system, at present we are using only one socket for the communication. However, as sockets cannot provide the full functionalities of SSB, we adopted an alternative way to store data temporarily and to route data according to our purpose, which is explained below.

To store data temporarily in DISs, we proposed to write data in temporary files. The self-allocator allocates files for each component to store data temporarily. The components are using those files as their DISs. When they want to send data to other components, they just write those data to their allocated files. Then according to our communication protocols data are reached to the correct destination and then those data are removed from the files. Whenever some data are written on the DIS files, our communication protocol can replicate those data to the DISDT. The DISs know their parent component and the destination DISs from the Lookup Table in self-allocator component.

As we mentioned above, when a component joins in the SSB-based CPCS, the self-allocator component generates ID for that component and for its DIS and inform about this ID to the component. That means the components knows their ID generated by the self-allocator component. If a new component has to replace with an old component due to the failure of the old component, or any component becomes active after having temporary leave from the SSB-based CPCS, that component has to send request to the I-DIS again to join in the SSB-based CPCS. In that case, that component has to give ID of the old component to the I-DIS to prove that it is not a fake component. If it can provide correct ID, it is considered as valid component and the undelivered data for the old component is transferred to the new component.

4.4 Communication Protocols

In our SSB-based CPCS, two communicating components will transfer data/message among them using the following route:

sender component [right arrow] sender DIS [right arrow] receiver DIS [right arrow] receiver component.

The sender component has to send data to its DIS at first. The ID of sender component and receiver component will be included in each message. Since the I-DIS informs the DISs about their component ID for which they are allocated, each DIS knows their respective components as its "parent component". By reading the ID of sender component the DIS can identify the message comes from its parent component or not. If the message comes from its parent component, the DIS reads the ID of the destination component and finds out the DIS ID of the destination component from the Lookup Table. Then data are sent from the sender DIS to the receiver DIS. The receiver DIS reads the sender ID and receiver ID and found that the receiver ID is its parent component. Then the receiver DIS sends data to the receiver component. The DISs finds out and confirm about the receiver DIS/component from the Lookup Table in self-allocator component. Since the Lookup Table maintains the list of component IDs and their respective DISs, the DISs can easily extract information of the correct sender and receiver IDs and send data/message to the correct destination.

The message format consists of two parts--header and data. The header contains ID of sender and receiver components like <sender ID, receiver ID>. The data contains the actual message to be sent. For communication between any two components, the message format will look like as follows

header = <sender ID, receiver ID> data = <message/data/instructions>

According to this communication protocol, acknowledgment is done in two steps. In first step, when the receiver DIS will receive data, it sends acknowledgment to the sender component because once the receiver DIS receives data, delivery of that data to the receiver component is the responsibility of receiver DIS. The sender DIS then sends this acknowledgment to the sender component.

ACK1: receiver DIS [right arrow] sender DIS [right arrow] sender component

In second step, when the receiver component receives data, it sends acknowledgment message to the receiver

DIS.

ACK2: receiver component [right arrow] receiver DIS

The DISs will redeliver data to the destination in every 15 milliseconds, for example, until they receives acknowledgment from the destination DISs/components, they will try to redeliver data at most three times. Each DIS will delete data after receiving acknowledgment. If in both cases DISs do not receive acknowledgment within 15 milliseconds, they redeliver the data. They try to redeliver data at most six times. If they receive acknowledgment in the meantime, they stop redelivery. If acknowledgment is not received within the redelivery period, they assumes that the receiver DIS/component is temporary unavailable or failed and stops trying to redeliver data.

4.5 Backup and Replication

In our SSB-based CPCS, when a DIS receives a certain data, they replicate that data to the DIS Data Table (DISDT) in storage. Inside DISDT, there is a table for each DIS that contains timestamps, data/instructions, status of the data ("Delivered", "Undelivered") for the DIS. The DISDT has a linked list to keep track of table of each DIS and a field for the new DIS ID so that if any DIS failed and a new DIS is allocated for the replacement of that DIS, this information can be tracked easily. The replication process is not interrupted if several DISs receive data in same time. The replication process can replicate data of all DISs to their corresponding data tables.

Table 4 shows an example of DIS Table for a DIS, where Timestamps is represented in YYYY:MM:DD-HH:MM:SS format, ABC represents data and the status is shown as undelivered. When any DIS failed, the self-allocator immediately allocates a new DIS to replace the old DIS and updates its Lookup Table. Self-allocator then sends message about this replacement to the self-backup-restore component. The self-backup-restore then adds this information about new DIS in the linked list of DISDT. The self backup-restore also checks inside the data table of the old DIS if there is any data remained undelivered. If it finds any undelivered data in old DIS, it then collects that data and sends that data to the new DIS so that the new DIS can deliver that data to the destination that was lost due to failure of the old DIS. Each CC replicates the information of their running task to the self-backup-restore component. This component then creates entry of this task information to the Backup CCs in storage. The Backup CCs is a list of backup tasks of all CCs and the status of those tasks. It is a log of tasks for each CC that contains timestamps, task information, related data and status of the tasks ("Completed", "Incomplete"). When the task of any CC is finished, the self-backup-restore component updates the information of that task to the Backup CCs. If any CC leave for upgrade/maintenance, when the CC becomes active, the self-backup-restore component can restores the information of incomplete tasks to that CC that was backed up in the Backup CCs. The Backup CCs has separate backup tables for each CC. Table 5 shows an example of the backup table of self-allocator component.

The Backup CCs also has code of each CC so that if any CC fails at runtime, that CC can be recovered by running the code and restores information about the least recently task the CC was performing.

[FIGURE 3 OMITTED]

5 EVALUATION

5.1 Experiment

In this section we will show the evaluation for our proposed SSB-based CPCS. As we mentioned in Section 1, continuous functioning is the base of PCS, through the evaluation we will show the feature of continuous functioning of the SSB-based CPCS, e.g., continuous functioning without stopping the reactions and continuous functioning during maintenance, upgrade of the system.

Since by continuous functioning we mean that functional components (FCs) of the SSB-based CPCS can perform their tasks persistently and continuously, for the evaluation we used four FCs and showed that they can provide continuous functioning without stopping the reactions and during maintenance, upgrade of the system.

As in figure 3, the four functional components (FCs) are--FC A, FC B, FC C and FC D. There are three constant values x(10), y(20), z(30) exists in FC B, FC C, and FC D respectively. At first, three random numbers P, Q, and R is given to FC A as input. FC A then sends these numbers to FC B and sends instruction to calculate the value of Px*Qy+Rz with FC C and FC D. Finally FC D produced the output as instructed by FC A. During the four FCs were performing this task, we tried to interrupt the task by producing the events we have identified in section 3 and observed the FCs can work persistently and continuously or not. Here, "Task" means any specific job of a component. Each component has some specific job to produce some result. Such kind of job is referred as "Task". In our evaluation, each functional component is performing some calculation. For example, after receiving the values of P, Q, R, FC B calculates the value of Px, and then sends the value Px, Q, R to FC C. Therefore, calculating value of Px is one task of FC B. Through this experiment we measured the following things for each event--

Continuous Functioning =

[Number of Input]/[Number of Output] x 100% (1)

This means, if input is given to FC A five times and FC D can produce output for each of the five input, our design can satisfy the feature of continuous functioning during all events. Secondly, we measured the correctness ratio of the data to observe the FCs are giving correct output or not during its continuous functioning.

Correctness Ratio = [[NumberofCorrectOutput]/[TotalNumberofOutput]] x 100% ... (2)

Last of all we measured the time delay, that means the total time it required to produce the output from the time the input was given.

Time Delay = Time of Output--Time of Input ... (3)

The experiment was done for the following events--DIS failure, leave of FCs, leave of CCs. The inputs are given five times for each event. For each event we used same input data. At first we measured the output and delay for general situation, that means without any interruption, which is summarized in Table 6. From this table, we observed that the delay time is 16 ms for each input. This delay time means after the three numbers P, Q, R are given as the input, how long it takes to produce the final result. Then we measured the output, their correctness and time delay for each event with the same set of input. For each event, we calculated the total delay so that we can find out the average delay time for each situation.

AverageDelay =80/5 = 16.0 ms

From the above calculation we see that 16.0 ms is the ideal task execution time when there is no interruption.

Table 7 represents the measured data for the event of DIS failure at runtime. After providing input to FC A, we have intentionally deleted the DISs of the four FCs respectively and in the fifth time we deleted all DISs to observe the output.

From Table 7, we measured the continuous functioning, correctness ratio and average delay. The input was given five times and the FCs produce output for each input. Therefore, the continuous functioning is 100%. We also observed that the FCs produced correct output, so the correctness ratio is also 100%. Last of all we measured the average delay which is 36.43 seconds.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Table 8 represents the measured data for the events where FCs are not available. At first we disconnected FC A at runtime and observed the result. Then we disconnected FC B, FC C and FC D respectively. Lastly, we disconnected FC B, FC C and FC D together at runtime. During all of these events the FCs could provide 100% continuous functioning with 100% correctness and the average delay is 18.6 ms.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Our next target was to observe outputs during leave of CCs. Firstly we disconnected the self-monitor component at runtime. Since each component are working independently, leave of self-monitor didn't affect the task of FCs. The result for the five times calculation is given in Table 9.

Continuous Functioning = 5/5x100% = 100%

Correctness Ratio = 5/5x100% = 100%

Average Delay = 155/5 = 31.0 ms

Then we observed the outputs for the event of leave of self-allocator component. As same as previous event, we disconnected the self-allocator component at runtime and each time we got 100% continuous functioning, the output is represented in Table 10.

Continuous Functioning =5/5 x 100% = 100%

Correctness Ratio = 5/5x100% = 100%

Average Delay = 110/5 = 22.0 ms 5.2 Result

The overall result during occurrences of each of the event are summarized in

Table 11. From this table, it can be seen that during each type of events, DIS failure, leave of destination FCs, leave of CCs, the SSB-based CPCS provides 100% continuous functioning with 100% correctness where delays vary for each event. This means, our proposed design can satisfy "continuous functioning" during the events where in general, a system cannot provide continuous functioning in these events.

From the evaluation we showed that according to our proposed design FCs can provide its services persistently during the event of their leave or their DIS failure. By removing CCs at runtime, when the FCs were performing their tasks, we proved that if any CC leaves the system at runtime, it doesn't affect on continuous functioning. That means, if any CC leaves from the SSB-based CPCS for maintenance, upgrade at runtime, the FCs can provide their services continuously. Therefore, our proposed design can satisfy the characteristic of dynamically adaptive functioning also. From the above result it is clear that our proposed design is a primitive SSB-based CPCS that has the fundamental features of a PCS.

4.5 Discussion

In the work of Selim et. al. [17], one of the feature was "continuous availability" of SSB. By the term "continuous availability" they meant that "a component must not wait for unlimited/long time to get services from SSB". In our works "continuous functioning" means "by using the SSB-based CPCS the functional components of target applications can perform their tasks continuously and persistently".

They measured the continuous availability from the viewpoint of component in two cases: DISs fail/arrive but components do not, and components fail/arrive but DISs do not. For the first case, as DISs arrival/failure rate increases, availability decreases from 99.97% to 99.96% and data delivery efficiency is approximately 100%. For the second case, the continuous availability rate is 99.97% and the delivery efficiency is not affected. In our experimental result, the rate of continuous functioning in 100% in various scenario. They also measured scalability and message overhead of SSB which was not considered in our experiment. They also measured the performance by measuring the propagation delay between components/DISs and their average propagation delay is 35ms. In our work, in general situation the delay is 16ms and in various events the average delay varies from 18.6ms to at most 36.43sec.

6 CONCLUDING REMARKS

This paper proposed a primitive Soft System Bus based Centralized Persistent Computing System (SSB-based CPCS) and evaluates its design. The primitive SSB-based CPCS is proposed to fulfill the fundamental characteristics of Persistent Computing Systems (PCSs), i.e., persistently continuous functioning and dynamically adaptive functioning. This paper concentrated on the two fundamental characteristics because without these two things a computing system cannot be regarded as a PCS.

We have implemented a prototype of the proposed SSB-based CPCS and evaluated the implementation to show the useful features of our design. We have done and experiment to measure the continuous functioning of our proposed SSB-based CPCS during various events. Our experiment result showed that the proposed SSB-based CPCS can provide continuous functioning during all of the events.

This paper showed that our proposed SSB-based CPCS can satisfy the fundamental feature during occurrences of most of the events. The delay rate is higher during those events in comparison with the normal execution time. Especially in the event of DIS failure the delay rate is the highest rate among other delay rates. In future, this limitation should be recovered for a better result.

A fully featured SSB-based CPCS has to satisfy some additional features like scalability, access control and security, unified interface, and so on. To realize such fully featured SSB-based CPCS the requirement analysis, design have proposed [16]. Implementation of that proposed design is left for future work.

6 REFERENCES

[1.] Silberschatz, A., Galvin, P. B., Gagne, G.: Operating System Concepts. 7th edition, John Wiley and Sons (2005).

[2.] Chen, C., Jia, W., Zhou, W.: A reactive system architecture for building fault-tolerant distributed applications. In: Journal of Systems and Software (JSS), vol. 72, no. 3, pp. 401-415, Elsevier, Netherlands (2004).

[3.] Bovet, D. P., Cesati, M.: Understanding the Linux Kernel, 3rd edition, O'Reilly Media (2005).

[4.] Cao, F., Singh, J. P.: Efficient Event Routing in Content-Based Publish-Subscribe Service Networks. In: INFOCOM 2004. 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 2, pp. 929-940, IEEE Communications Society, Hongkong (2004).

[5.] Cheng, J.: Connecting Components with Soft System Busses: A New Methodology for Design, Development, and Maintenance of Reconfigurable, Ubiquitous and Persistent Reactive Systems. In: Proc. 19th IEEE-CS International Conference on Advanced Information Networking and Applications, vol. 1, pp. 667-672, IEEE CS Press, Taiwan (2005).

[6.] Cheng, J.: Comparing Persistent Computing with Automatic Computing. In: Proc. 11th IEEE-CS International Conference on Parallel and Distributed Systems, vol. 2, pp. 428-432, IEEE CS Press, Japan (2005).

[7.] Cheng, J.: Persistent Computing System as Continuously Available, Reliable and Secure Systems. In: Proc. 1st International Conference on Availability, Reliability and Security, pp. 631-638, IEEE CS Press, Austria (2006).

[8.] Cheng, J. Persistent Computing Systems Based on Soft System Buses as an Infrastructure of Ubiquitous Computing and Intelligence. In: Journal of Ubiquitous Computing and Intelligence, vol. 1, no. 1, pp. 35-41, American Scientific Publishers (2007).

[9.] Cheng, J.: Temporal Relevant Logic as the Logical Basis of Anticipatory Reasoning-Reacting Systems. In: D. M. Dubois (Ed.), Computing Anticipatory Systems, CASYS 2003, 6th International Conference, AIP Conference Proceedings, vol. 718, pp. 362-375, The American Institute of Physics, Belgium (2004).

[10.] Selim, M. R., Endo, T., Goto, Y., Cheng, J.: A Comparative Study between Soft System Bus and Traditional Middlewares. In: Proc. On the Move to Meaningful Internet Systems and Ubiquitous Computing, vol. 4278, pp. 1264-1273, Springer-Verlag, France (2006).

[11.] Selim, M. R., Endo, T., Goto, Y., Cheng, J.: Distributed Hash Table Based Design of Soft System Buses. In: Proc. 2nd International Conference on Scalable Information Systems, ACM International Conference Series, vol. 304, article no. 78, ACM Press, China (2007).

[12.] Selim, M. R., Goto, Y., Cheng, J.: A Replication Oriented Approach to Event Based Middleware Over Structured Peer-to-Peer Networks. In: Proc. 5th International Workshop on Middleware for Pervasive and Ad-Hoc Computing, pp. 61-66, ACM Press, USA (2007).

[13.] Selim, M. R., Goto, Y., Cheng, J.: Ensuring Reliability and Availability of Soft System Bus. In: Proc. 2nd IEEE International Conference on Secure System Integration and Reliability Improvement, pp. 52-59, The IEEE Reliability Society and The IEEE Systems, Man, and Cybernetics Society, Japan (2008).

[14.] Matthew, N., Stones, R.: Beginning of Linux Programming, 4th edition, John Willey and Sons (2007).

[15.] Eugster, P. TH., Felber, P. A., Guerraoui, R., Kormarrec, A.: The Many Faces of Publish/Subscribe. In: ACM Computing Surveys, vol. 35 no. 2, pp. 114-131, ACM, USA (2003).

[16.] Jabeen, Q. M., Azim, M. A., Goto, Y., Cheng, J.: A Centralized Persistent Computing System. In: A. Abd Manaf, et al. (Eds.), Informatics Engineering and Information Science, International Conference, Communications in Computer and Information Science, vol. 251, pp. 604-617, Springer-Verlag (2011).

[17.] Selim, M. R.: A Peer-to-Peer Network Based Middleware for Large-Scale Persistent Computing Systems. Ph.D. thesis, Saitama University Graduate School of Science and Engineering, Japan (2008).

Quazi Mahera Jabeen, Muhammad Anwarul Azim, Yuichi Goto and Jingde Cheng

Department of Information and Computer Sciences

Saitama University, Saitama, 338-8570, Japan

{mahera, azim, gotoh, cheng}@aise.ics.saitama-u.ac.jp
Table 1. Event Analysis Table

Communication   Topology
Scenario

                one-to-one                  many-to-one

FC-FC           1. Source DIS fails         1. Any Source DIS fails
                2. Destination DIS fails    2. Destination DIS fails
                3. Destination FC           3. Destination FC
                  temporary leaves or         temporary leaves or
                  fails                       fails
CC-CC           1. Source DIS fails         1. Any Source DIS fails
                2. Destination DIS fails    2. Destination DIS fails
                3. Destination CC being     3. Destination CC being
                  upgraded, maintained or     upgraded, maintained
                  failed                      or failed
CC-FC           1. Source DIS fails         1. Any Source DIS fails
                2. Destination DIS fails    2. Destination DIS fails
                3. Destination FC           3. Destination FC
                  temporary leaves or         temporary leaves or
                  fails                       fails
FC-CC           1. Source DIS fails         1. Any Source DIS fails
                2. Destination DIS fails    2. Destination DIS fails
                3. Destination CC being     3. Destination CC being
                  upgraded, maintained or     upgraded, maintained
                  failed                      or failed

Table 2: Lookup Table in self-allocator component

Component Name           Component ID   DIS ID    DIS Status

Functional Component A   FCXXXX         DISXXXX   Running
Functional Component B   FCXXXX         DISXXXX   Not Responding
self-monitor             CCXXXX         DISXXXX   Running

self-backup-restore      CCXXXX         DISXXXX   Failed

Table 3: Task Table in self-backup-restore component

CC Name               Timestamps            Task      Task Status

self-allocator        YYYY-MM-DD-HH:MM:SS   Task:T2   Finished
self-monitor          YYYY-MM-DD-HH:MM:SS   Task:T3   Finished
self-monitor          YYYY-MM-DD-HH:MM:SS   Task:T7   Running

self-allocator        YYYY-MM-DD-HH:MM:SS   Task:T9   Not Finished
self-backup-restore   YYYY-MM-DD-HH:MM:SS   Task:T5   Running

Table 4. DISDT: DIS Table for a DIS

Timestamps       Data/         Status
              Instructions
YYYY-MM-
DD-HH:MM:SS       ABC        Undelivered

Table 5. Backup CC: Backup Table of self-allocator

Timestamps   Task   Related Data   Task Status

YYYY-MM-
DD-           T6        DEF        Incomplete
HH:MM:SS

YYYY-MM-      T3        ABC         Completed
DD-
HH:MM:SS

Table 6. General Output

Input                   Output   Delay

Input 1: (20, 44, 23)   176690   16 ms
Input 2: (42, 19, 17)   160110   16 ms
Input 3: (13, 5, 26)    13780    16 ms
Input 4: (5, 37, 9)     37270    16 ms

Input 5: (6, 26, 21)    31830    16 ms

Total =                          80 ms

Table 7. Output for the Event of DIS Failure

Event                  Input      Output     Delay

DIS of FC A Failed   20, 44, 23   176690     16 ms
DIS of FC B Failed   42, 19, 17   160110    781 ms
DIS of FC C Failed   13, 5, 26    13780    64734 ms
DIS of FC D Failed    5, 37, 9    37270    45844 ms
All DIS Failed       6, 26, 21    31830    70797 ms

Total =                                    182172 ms

Table 8. Output for the Event of Leave of FCs

Event                                Input      Output   Delay

FC A left at runtime               20, 44, 23   176690   32 ms
FC B left at runtime               42, 19, 17   160110   15 ms
FC C left at runtime               13, 5, 26    13780    15 ms
FC D left at runtime                5, 37, 9    37270    16 ms
FC B, FC C, FC D left at runtime   6, 26, 21    31830    15 ms

Total =                                                  93 ms

Table 9. Output for the Event of Leave of Self-

Monitor Component

Input          Output   Delay

(20, 44, 23)   176690   47 ms
(42, 19, 17)   160110   31 ms
(13, 5, 26)    13780    31 ms
(5, 37, 9)     37270    31 ms
(6, 26, 21)    31830    15 ms

Total                   155 ms

Table 10. Output for the Event of Leave of
Self-Allocator Component

Input          Output   Delay

(20, 44, 23)   176690   16 ms
(42, 19, 17)   160110   16 ms
(13, 5, 26)     13780   31 ms
(5, 37, 9)      37270   16 ms
(6, 26, 21)     31830   31 ms

Total =                110 ms

Table 11. The Overall Experiment Result

Events                            Continuos     Correctness    Average
                                  Functioning      Ratio        Delay

DIS Failed at runtime                100%          100%       36.43 sec
FCs left at runtime                  100%          100%        18.6 ms
Self-Monitor left at runtime         100%          100%        31.0 ms
Self-Allocator left at runtime       100%          100%        22.0 ms
COPYRIGHT 2012 The Society of Digital Information and Wireless Communications
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2012 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Jabeen, Quazi Mahera; Azim, Muhammad Anwarul; Goto, Yuichi; Cheng, Jingde
Publication:International Journal of New Computer Architectures and Their Applications
Date:Apr 1, 2012
Words:8799
Previous Article:Turbo codes for pulse position modulation: applying BCJR algorithm on PPM signals.
Next Article:Dynamic weight parameter for the Random Early Detection (RED) in TCP networks.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters