Optimising Rete for low-memory, multi-agent systems.
Optimisations to the standard Rete algorithm are presented as a means of bringing advanced, rule-based artificial intelligence techniques to low-memory systems--in particular, current and future generations of games consoles. A brief introduction to Rete and the issues involved in its implementing on low memory hardware is followed by a discussion of the solutions developed. By sharing resources it is possible to enable the algorithm to run efficiently on low memory machines. A brief overview of the implementation of a rule interpreter which makes use of these optimisations is included.
Game AI, Rule-Based Languages, Rete, Optimisation, Agents.
There are several reasons why a rule based system can be beneficial to the design of complex agent behaviour, including a declarative programming style allowing for easy maintenance and growth of the rule base, and facilities for rapid application development (RAD) offered by many rule programming environments[Jackson, 1999]. The flexibility of a rule based approach allows for complex characters to be created in computer games. However, rule based systems are sometimes inefficient and offer poor performance. The Rete[Forgy, 1982] algorithm forms the basis of a number of efficient rule based programming languages, such as OPS5[Forgy, 1981] and CLIPS[Giarratano, 2002]. The algorithm improves the performance of such systems, but at a price in terms of memory efficiency.
We have developed a rule interpreter based on the Rete algorithm, but with a number of implementation enhancements which aim to reduce the memory usage of the algorithm. The changes are based upon multiple agents sharing the resources of a single Rete and sharing representations of information about the environment. Traditionally, when several agents are developed using a rule based programming language, each is given its own Rete, and maintains a separate working memory. However, in many applications, agents share elements of behaviour to some degree (there may be a subset of rules which is common to a group of agents). By allowing these agents to share the part of the Rete which corresponds to this common behaviour, some memory savings should be possible. In a similar way, agents may observe the same facts about the world, so it makes sense to try and share working memory elements representing these facts. The main theme of this paper is to investigate the extent to which sharing may be realised in an implementation of the Rete algorithm, and to evaluate the associated trade-offs and drawbacks.
In the next section, we present a brief background to the advantages and disadvantages of rule based programming for game artificial intelligence (AI), along with the discussion of the problems that Rete was designed to overcome, and some of the drawbacks that Rete suffers from. We then present the optimisations that we have developed, some ideas for building on these optimisations, and an overview of the system we have developed.
In a rule based system, patterns in a rule are matched against facts representing the current state, in order to determine actions to be taken. A typical use of such systems is in implementing agents which inhabit an environment. When a rule matches against the currently known facts, a series of actions associated with that rule is executed. Usually there are mechanisms to deal with more than one rule matching the environment (conflict resolution), and, often, some way to prioritise rules. Actions can also cause changes to be made to the database of known facts, which in turn will cause different rules to be executed. An example of a rule and some facts which would cause the rule to fire are given below (in a simplified CLIPS syntax):
(defrule killed-enemy (enemy (name ?X) (health 0)) (target (name ?X) (is-dead no)) => (assert (target (is-dead? yes) (name ?X))) ) (assert (target (name "Jon") (is-dead? no)) (assert (enemy (name "Jon) (health 0)))
The patterns come before the => symbol while the actions come after it. When the enemy is killed (health reaches zero), a fact indicating this is added to the working memory of the agent. The contents of working memory are then matched against the patterns in the rules to decide which action to perform next. The patterns show the layout of the facts which they match, using variables (denoted by a ? prefix) to extract information from the fields of the facts.
The heart of a rule based system is the matching engine which compares the currently known facts against the patterns contained in die rules. As there may be a very great number of rule patterns, and an even larger number of rapidly changing facts, it is crucial that this matching engine works as efficiently as possible, so that the agent may respond quickly to changing situations.
THE RETE ALGORITHM
The Rete algorithm was developed to solve the problem of quickly matching facts against rule patterns. It works by maintaining a cache of intermediate results generated during the matching process. For instance, in our example, a fact may be asserted which states that a certain enemy has become the current target. However, at this time, the enemy is not dead and so our rule will not be activated. In a naive implementation of a rule system, we would keep comparing this fact against the rule pattern until both facts were asserted. In the Rete algorithm, however, the fact is matched once, and the value for the variable X is stored away. Later, if the other fact is asserted, then we simply compare the values for X from each match and if equal, then the rule is activated.
The Rete algorithm dramatically reduces the number of matches which have to be performed on each cycle by maintaining a directed graph data structure containing the cached information. The first level nodes in the graph are called alpha-nodes. Each alpha node represents a single pattern. Where there is more than one pattern in a rule, the alpha nodes are joined by beta-nodes which carry out unification between two nodes. Beta nodes are then joined by further beta nodes until a single node is reached. This final (terminal) node represents the complete rule.
When a fact is asserted, a token is created and passed into the alpha nodes of the Rete which deal with this type of fact. The token is a wrapper around the fact and contains extra information such as whether the fact is being asserted or retracted. If the token matches the pattern in an alpha node, then it is passed on to the next node, while the value of any variables in the pattern are stored in a table, called a node-memory. The beta-nodes take rows from the tables of the alpha nodes and try to join them together, making sure that variables with the same name are assigned the same value. For instance, if a row in one table contains the values A=1, B=2, and a row in another table contains the values A=1, C=3, then a new row will be created in the table of the beta-node containing the values A=1, B=2, C=3, and the token is passed on. If the values for A had been different then this row would not have been created, and the token would not have been passed on. When a token has passed all the way through the Rete it emerges at a terminal node which represents a particular rule. For a token to reach a terminal node, all the patterns in that rule must have matched, and so the rule can be activated. See figure 1 for an example. The key advantage of Rete is that rule conditions are only re-evaluated when a fact is asserted or deleted. In this way, asserting a new fact is simply a case of passing a token through the network, and a smaller number of matching operations are performed. In a naive implementation, each new fact would be compared against every single pattern of every rule, which means a greater time complexity. Retracting a fact is identical to assertion, but items are removed from node memories.
However, the Rete algorithm has been criticised, as it uses a large amount of memory in order to maintain this cache of intermediate results. The algorithm doesn't scale-up well to large rule databases, as the memory usage increases dramatically. The space complexity of Rete is of the order of O(RFP), where R is the number of rules, F is the number of asserted facts, and P is the average number of patterns per rule, compared to just O(F) for a naive implementation of a rule-based system without Rete. This also inhibits its use in low-memory systems, such as computer games consoles. In addition, the Rete algorithm is designed primarily for single agent systems--traditionally, each agent would have a separate Rete and maintain a separate working memory, and yet many agents may share similar rules. In order to improve the memory efficiency of Rete, a few other algorithms have been developed. These include TREAT[Miranker, 1987], which removes beta-nodes and recalculates caches as needed, and the more recent Rete*[Wright and Marshall, 2003] which has TREAT as a special case, and allows for a degree of control over the trade-off between memory consumption and matching speed. These algorithms have had varying degrees of success--for a comparison of Rete and TREAT and a discussion of the issues involved see [Nayak et al., 1988]. However, there are a number of implementation optimisations which can be used to reduce the memory overhead of the original Rete algorithm. This paper briefly documents a few such optimisations developed as part of a project to develop a low-memory, multi-agent rule interpreter for use in console games.
The techniques discussed are based on the following observations:
** Rete networks are expensive in terms of memory usage.
** Agents often have identical rules.
** Different facts may contain the same data.
** Several agents may observe the same fact.
** A fact which was true once may well become true again in the future.
A number of different implementation optimisations can be developed based on these observations. The key principle in all of the techniques is to try and maximise sharing of resources in the Rete, without too much of a speed penalty being introduced. We have developed solutions based on three main areas:
** Sharing the Rete between several agents--i.e. sharing the rule structures.
** Sharing working memory items (facts) between agents.
** Sharing values of fields between facts, and between elements of node memories in the Rete.
Bit-strings are used within our implementation of Rete to flag the validity of facts or rules for particular agents--a setting of 1 in a particular bit position shows the validity of this item for the agent which owns this bit position.
By using bit-strings we can use a single Rete to represent the rules of multiple types of agent. This works by assigning a unique bit-mask to each agent, which has a single bit set in it, and assigning bit values to nodes in the Rete as rules are compiled. With this method we can restrict the path that a token can take through the network, depending on which agent has asserted the fact. When the token is passed through the network, an additional check is performed at each node to see if the agent which is asserting the fact has this rule as part of its behaviour. The check is a simple bit-wise AND operation of the bit-string of the node and the bit-string of the agent. If the result is non-zero then the node is valid for this agent. If a node (or even a whole rule) is shared between two or more agents, then we simply perform a bit-wise OR operation of the strings of each agent, and the resulting bit-string is stored in the node. See Figure 2 for an example. The main benefit of having one Rete network (with n agent masks controlling access to parts of the structure) rather than having n separate Rete networks (and so, no masks needed) is that in this way many individual rules can be shared among agents. Typically, the size of the structures which represent a rule in the Rete (p alpha nodes, and p-1 beta nodes, where p is the number of patterns in the left-hand side of the rule) would be much greater than the combined size of the bit-strings needed (one for each node, so 2p-1 bit-strings, typically 4 bytes each) (a full cost analysis of Rete is given in [Wright and Marshall, 2003]). It thus makes sense to use the bit-strings, rather than duplicate an entire rule. It is possible to reduce the overhead of bit-strings further by only marking alpha-nodes in this way, and assuming that once a token has passed an alpha node it is contained within the structure of a particular rule, and so cannot be passed to an invalid node. This assumption only holds, however, if no sharing of nodes between rules is performed. However, some systems, such as Jess[Friedman-Hill, 2000] do share nodes between rules (see http://herzberg.ca.sandia.gov/jess/docs/52/rete.html for details), and so it would be necessary to mark all beta nodes as well.
The above analysis demonstrates why it is useful to share rules. However, not all rules will be shared (it is quite likely that some agents will have unique behaviour). As there is no way to descriminate between shared and unshared rules, all must have bit-strings applied to the associated nodes in the Rete. It is clear that if very little or no sharing is actually occurring, then the bit-strings are redundant and their cost becomes an issue (the goal is to reduce memory consumption, not add to it). The point at which sharing rules becomes a problem is the point at which the memory cost of the bit-strings (in the entire Rete) is greater than the memory saved by sharing rules, i.e. when:
rs(2p-1) [greater than or equal to] n(ap + b(p - 1))
r = number of rules compiled into Rete,
s = size of a single bit-string,
p = average number of patterns in a rule,
n = number of shared rules (see below),
a = typical size of an alpha node,
b = typical size of a beta node.
A shared rule, as used above, is one which has no individual representation in the Rete, but uses an existing representation of an indentical rule (from a different agent). In this way, we measure only the memory saved--each shared rule must rely on one compiled rule, which does take up memory. In other words, if we add three identical rules to the Rete (and no identical rule already exists), then we have increased the value of r by 1 and the value of n by 2.
By changing the relation to equality, we can calculate the ratio of shared rules to compiled rules (n/r) that marks the lower bound for rule sharing to be cost effective:
n/r = s(2p-1)/[ap + b(p-1)]
As an example, if we assume an average of 5 patterns per rule, take the size of a bit-string to be 4 bytes (an unsigned int), and the size of alpha and beta nodes to be 100 bytes each (a conservative estimate), then the ratio needed is:
n/r = [4 x (2 x 5 - 1)]/[100 x 5 + 100 x (5 - 1)] = 36/900 = 1/25
So, in order for sharing to be effective in this example, there would have to be at least one shared rule for every 25 compiled into the rete. It seems quite likely that this could be achieved in a real application, for example, opponents in first-person shooter (FPS) games such as Unreal Tournament often exhibit identical, or very similar, behaviour in a given situation.
It is easy to adapt the relation to show the total amount of memory that would be saved by sharing in a particular situation. That is given by the amount of memory saved, minus the overhead of using bitstrings:
[M.sub.saved] = n(ap + b(p - 1)) - rs(2p - 1)
So, using the values above, with 50 compiled rules, and 5 un-compiled (shared) rules, the amount of memory shared would be:
5 x (100 x 5 + 100 x 4) - 50 x 4 x (2 x 4) = 2900bytes
while the cost of the Rete in this example would be around 45000 bytes (50 x (100 x 5 + 100 x 4)), giving approximately a 6% saving, from a system with approximately 9% sharing (5/50).
In evaluating the utility of sharing rules, we must also consider the added time required to check the bit-strings when processing a token in the Rete. The checks involve a single simple bitwise operation for each node visited. Compared to the pattern matching and join operations which take place in each node, this extra (very fast) operation should be insignificant. The combined Rete which represents the rules of every agent is likely to be significantly more complex than a Rete which represents just one agent (as there are more rules compiled into it). However, the bit-strings effectively limit the nodes that a token can visit, and so the traversal is not significantly more complex.
In a similar way, it is possible to share working memory items amongst several agents. When an agent asserts a new fact, its bit-string is stored with the fact. If another agent asserts the same fact, then its bit-string is combined with the current fact bit-string, using an OR operation, and the resulting string becomes the new fact bit-string. When unification takes place in beta-nodes, the bit-strings associated with each constituent token are combined using an AND operation, and this result is stored with the node memory. In this way, the new bit-string represents only those agents which knew every fact that was involved in the unification. If this process results in a zero-valued bit-string, then the unification is not valid for any agent, and so can be discarded. (Alternatively, the result can be kept in the anticipation that the conditions may become true for some agent in the near future. A separate garbage-collection sweep can be scheduled to collect zero-valued memory elements when memory is tight.)
We can calculate the expected space savings of using bit-strings with facts in the same way as we did before for rules. In this case, the total amount of memory that facts added to the Rete take up is given by:
rf([alpha]p + [beta](p - 1))
r = number of rules compiled into Rete,
f = number of asserted facts,
p = average number of patterns in a rule,
[alpha] = typical size of an alpha memory element,
[beta] = typical size of a beta memory element.
This assumes the worst case situation, where each fact causes an item to be added to every node memory (which is unlikely). The space taken up by adding bit-strings to facts is given simply by the space of assigning a bit-string to each fact, and the space required for storing bit-strings in alpha and beta nodes.
sf + rsf(2p - 1)
where s is the size of the bit-string, as before. So, the point at which sharing facts becomes cost effective, memory-wise, is the point at which:
rg([alpha]p + [beta](p - 1)) [greater than or equal to] sf + rsf(2p - 1)
where g is the number of shared facts. The memory saved in a particular situation is then given by:
[M.sub.saved] = rg([alpha]p + [beta](p - 1)) - (sf + rsf(2p - 1))
As a worked example, assume p = 5 and s = 4 bytes as before, a Rete with 50 rules, 20 facts, and 5 shared facts, and take [alpha] and [beta] to be 4 bytes. Then the memory saved would be:
50 x 5 x (4 x 5 + 4 x 4) - (4 x 20 x 20 + 50 x 4 x 9) = 5600bytes
However, there are some drawbacks to this approach. The first (and most obvious) is how to tell whether two facts are identical and so can be shared. With complex facts it is costly to perform comparisons between facts to check for equality, and using sophisticated matching techniques such as hashing or tree structures would probably use up more memory than is saved. One solution to this problem would be to only share facts where you can ensure equality of reference. For instance, if the same fact is to be asserted for a group of agents, then the fact structure could be created and stored and the same structure passed in for each agent. Another situation might be where agents can share knowledge with each other (through primitive communication). In this situation the agent which has the knowledge could just update the bit-string on the fact to represent the agents it has just "told" this information to.
On first glance, it may seem that some speed increases could also be achieved through the use of this technique. If an agent asserts a fact which is already known for some other agent, then we could simply synchronise the state of the Rete node memories for both agents. The problem with this approach is that it is the combination of facts which determines the conflict set (set of rules which could fire for an agent), rather than a single fact, so we still need to perform a full Rete traversal for each agent that asserts a fact to perform unification with other facts that the agent knows. We are currently working on ways to avoid this traversal, based on using a shared conflict set for all agents, and storing back-pointers to fact bit-strings in node memories instead of the bit-strings themselves. In this way, we can scan the conflict set for a particular agent and recompute the unified bit-strings for a particular rule activation to determine if the rule is valid for this agent.
A technique used in some modern scripting language interpreter designs is to share values in the system by means of a reference-counting system. For instance, Tel[Ousterhout, 1990] uses such a system, described in [Lewis, 1996]. A value stored in the field of one fact (such as a name field) may also be stored in another field in another fact (such as a target field which identifies the name of an agent's current target in a shoot-'em-up style game). Rather than duplicating all of this data, it is possible to share it by having it stored in one central location, and have facts store a reference to it. A simple reference counting garbage collection scheme can be used to ensure that resources are freed at the correct time. This technique is only useful when data types are complex (e.g. strings, objects). With simple data types (integers, symbols) the amount of memory required to store them (an integer is 4 bytes) would be equal to, or even smaller, than the memory taken up by pointers in the reference counting scheme (a pointer is typically also 4 bytes). So, value sharing is only useful where a majority of data types are complex.
We have implemented a prototype rule interpreter based around the optimisations presented. The system (known as GORE--the Game Oriented Rule Engine), is a language neutral library which can be linked into an existing language interpreter (such as a scripting language) to provide pattern matching facilities suitable for the creation of rule based artificial intelligence programs. The library provides an API which can be used to create rules, assert and retract facts and to run a pattern matching cycle to select a rule to run. Callbacks are provided to allow a script associated with a rule to be run when the rule fires. This API approach is very flexible, as it allows the developer to choose the language in which the actions of the rules can be implemented, and allows any action in that language to be performed. We have implemented a prototype of the library, along with a binding to the Tel scripting language, as a proof of concept which can run toy examples with all the optimisations enabled.
We have presented a number of optimisations which can be used to enable rule based interpreters built upon the Rete algorithm to be brought to low-memory systems, in particular to computer games consoles. The techniques are designed to minimise the memory usage of the algorithm by ensuring a high degree of sharing of resources, while limiting the performance penalty of the changes. Each technique can be implemented separately and is useful independently of the others.
The techniques we described were developed for the Rete algorithm. However, other algorithms (such as TREAT and Rete*) which have much in common with Rete from an implementation point of view, may be able to have the same techniques applied. We are investigating the possibility of applying these methods to TREAT and particularly Rete*.
We are currently improving the performance of the prototype system, and making comparisons to existing Rete implementations in order to test the effectiveness of the optimisations.
[Forgy, 1982] Forgy, C. (1982). RETE: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19:17-37.
[Forgy, 1981] Forgy, C. L. (1981). OPS5 user's manual. Department of Computer Science Technical Report, Carnegie-Mellon University.
[Friedman-Hill, 2000] Friedman-Hill, E. J. (2000). Jess, the Java expert system shell. Available from http://herzberg.ca.sandia.gov/jess/.
[Giarratano, 2002] Giarratano, J. C. (2002). CLIPS User's Guide, v6.20. Technical Report, available from http://www.ghg.net/clips/download/documentation/usrguide.pdf.
[Jackson, 1999] Jackson, P. (1999). Introduction to Expert Systems. Addison-Wesley.
[Lewis, 1996] Lewis, B.T. (1996). An on-the-fly bytecode compiler for Tcl. In Proceedings of 4th annual Tcl/Tk Workshop, pages 103-114.
[Miranker, 1987] Miranker, D. P. (1987). TREAT: A better match algorithm for AI production systems. In AAAI-87 Proceedings.
[Nayak et al., 1988] Nayak, P., Gupta, A., and Rosenbloom, P. (1988). Comparison of the RETE and TREAT production matchers for SOAR. In Proceedings of AAAI-88.
[Ousterhout, 1990] Ousterhout, J. K. (1990). Tel: An embeddable command language. http://www.tcl.tk.
[Wright and Marshall, 2003] Wright, I. and Marshall, J. (2003). The execution kernel of RC++: RETE*: a faster RETE with TREAT as a special case. International Journal of Intelligent Games and Simulation, 2(1):36-48.
Neil Madden, School of Computer Science and IT, University of Nottingham, UK
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||ARTIFICIAL INTELLIGENCE|
|Date:||Jul 1, 2006|
|Previous Article:||Top twenty viruses reported by Kaspersky in February.|
|Next Article:||The law remains an ass: copyright and data protection in digitization projects.|