# Distributed representations based on geometric algebra: the continuous model.

Authors revise the concept of a distributed representation of data as well as two previously developed models: Holographic Reduced Representation (HRR) and Binary Spatter Codes (BSC). A Geometric Analogue ([GA.sub.c]--"c" stands for continuous as opposed to its discrete version) of HRR is introduced--it employs role-filler binding based on geometric products. Atomic objects are real-valued vectors in n-dimensional Euclidean space while complex data structures belong to a hierarchy of multivectors. The paper reports on a test aimed at comparison of [GA.sub.c] with HRR and BSC. The test is analogous to the one proposed by Tony Plate in the mid 90s. We repeat Plate's test on [GA.sub.c] and compare the results with the original HRR and BSC--we concentrate on comparison of recognition percentage for the three models for comparable data size, rather than on the time taken to achieve high percentage. Results show that the best models for storing and recognizing multiple similar structures are [GA.sub.c] and BSC with recognition percentage highly above 90. The paper ends with remarks on perspective applications of geometric algebra to quantum algorithms.Keywords: distributed representation of data, geometric algebra, HRR, BSC, scaling

Povzetek: Clanek se ukvarja s porazdeljeno predstavitvijo podatkov, ki uporablja geometrijsko algrebro.

1 Introduction

Distributed representations of data are very different from traditional structures (e.g. trees, lists) and complex structures bare little resemblance to their components, therefore great care must be taken when composing or decomposing a complex structure. The most widely used definition of a distributed representation is due to Hinton et al. [13]. In a distributed representation of data each concept is represented over a number of units and each unit participates in the representation of some number of concepts. The size of a distributed representation is usually fixed and the units have either binary or continuous-space values. In most distributed representations only the overall pattern of activated units has a meaning.

Let us consider an example of storing the following in formation: "Fido bit Pat". The action in this statement is bite and the features (i.e. roles) of this action are an agent and an object, denoted [bite.sub.agt] and [bite.sub.obj], while their fillers are Fido and Pat respectively. If we consider storing the way that the action is performed, we can add a third feature (role), e.g. [bite.sub.way]. If we store Fido, Pat, [bite.sub.agt] and biteobj as vectors, we are able to encode "Fido bit Pat" as

[bite.sub.agt] * Fido + [bite.sub.obj] * Pat.

The operation of binding, denoted by "*", takes two vectors and produces another vector, often called a chunk of a sentence. It would be ideal for the resulting vector not to be similar to the original vectors but to have the same dimensions as the original vectors. Superposition, denoted by "+", is an operation that takes any number of vectors and creates another one that is similar to the original vectors. Usually, the superimposed vectors are already the result of the binding operation.

Irrespectively of the mathematical model, the above operations are defined in a way that allows to build complex statements, such as "John saw Fido bit Pat":

John * [see.sub.agt] + ([bite.sub.agt] * Fido + [bite.sub.obj] * Pat) * [see.sub.obj].

In order to decode informatiom we have to use the operation of unbinding--it is the inverse (an exact inverse or a pseudo-inverse) of binding enabling us to extract an information from a complex statement, provided that we have one of the bound vectors or a very similar vector as a cue. Marking the unbinding operation by "#" we obtain the following answer to "Who bit Pat?":

([bite.sub.agt] * Fido + [bite.sub.obj] * Pat) # [bite.sub.agt] = Fido'.

We cannot definitely say that the resulting vector Fido' will be an exact copy of Fido, as even an optimal scheme will generate a considerable amount of noise. Since we cannot expect that a noisy decoded information will be identical to what was encoded, we have to rely heavily on various similarity measures--they vary mostly by time taken by computation and the accuracy.

Clean-up memory is an auto-associative collection of all atomic objects and complex statements produced by the system. Given a noisy extracted vector such structure must he able to recall the most similar item stored or indicate, that no matching object had been found.

Independently of the scheme considered, any representation should possess the following qualities

--composition and decomposition--rules of composilion and decomposition must be applicable to all elements of the domain, irrespectively of the degree of complication of a given element. Further, decomposition should support structure-sensitive processing.

--fixed size--structures of different degree of complication should take up the same amount of space in order to facilitate generalization. In the [GA.sub.c] model this feature has been given up. Still, structures of different complexity will be of the same type.

--similarity--the representation scheme should provide a quick way to compute similarity between analogous structures (e.g. Fido bit Pat Smith and Fido bit John).

--noise reduction--decomposed statements should resemble their original counterpart.

--productivity--the model should be able to construct complex nested structures using a set of only few rules.

As far its previously developed models are concerned, Holographic Reduced Representations (HRR), Binary Spatter Codes (BSC), and Associative-Projective Neural Networks (APNN) are distributed representations of cognitive structures where binding of role-filler codevectors maintains predetermined data size. In HRR [23] binding is performed by means of circular convolution

(x [??] y) j = [n-1.summation over k=0] [x.sub.k][y.sub.j]-k mod n.

of real n-tuples or, in 'frequency domain', by component-wise multiplication of (complex) n-tuples,

([x.sub.1],..., [x.sub.1]) [??] ([y.sub.1],..., [y.sub.n] = ([x.sub.1] [y.sub.1],..., [x.sub.n] [y.sub.n].

Bound n-tuples are superposed by addition, and unbinding is performed by an approximate inverse. A dual formalism, where real data are bound by componentwise multiplication, was discussed by Gayler [9]. In BSC [14, 15] one works with binary n-tuples, bound by componentwise addition rood 2,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)

and superposed by pointwise majority-rule addition; unbinding is performed by the same operation as binding. APNN, introduced and further developed by Kussul [16] and his collaborators [17], employ binding and superposition realized by a context-dependent thinning and bitwise disjunction, respectively. As opposed to HRR and BSC, APNN do not require an unbinding procedure to retrieve component codevectors from their bindings. A detailed comparison of HRR, BSC and APNN can be found in [24].

2 Geometric Algebra

One often reads that the above models represent data by vectors, which is not exactly true. Given two vectors one does not know how to perform, say, their convolution or componentwise multiplication since the result depends on basis that defines the components. Basis must be fixed in advance since otherwise all the above operations become ambiguous. It follows that neither of the above reduced representations can be given a true and meaningful geometric interpretation. Geometric analogues of HRR [5] can be constructed if one defines binding by the geometric product, a notion introduced in 19th century works of Grassmann [11] and Clifford [8].

The fact that a geometric analogue of HRR is intrinsically geometric may be important for various conceptual reasons--for example, the rules of geometric algebra may be regarded its a mathematical formalization of the process of understanding geometry. The use of geometric algebra distributed representations has been inspired by a wellknown fact, that most people think in pictures, i.e. two-and three-dimensional shapes, not by using sequences of ones and zeroes. Mere strings of bits are not meaningful to (most) humans, no matter how technically advanced they are.

In order to grasp the main ideas behind a geometric analogue of HRR let us consider an orthonormal basis [b.sub.1],...,[b.sub.n] in some n-dimensional Euclidean space. Now consider two vectors x = [[summation].sup.n.sub.k=1] [x.sub.k][b.sub.k] and y = [[summation].sup.n.sub.k=1] [y.sub.k][b.sub.k]. The scalar

x x y = y x x

is known as the inner product. The bivector

x [conjunction] y = -y [conjunction] x

is the outer product and may be regarded as an oriented plane segment (alternative interpretations are also possible, cf. [7]). 1 is the identity of the algebra. The geometric product of x and y then reads

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Grassmann and Clifford introduced geometric product by means of the basis-independent formula involving the multivector

xy = x x y + x [conjunction] y (2)

which implies the so-called Clifford algebra

[b.sub.k][b.sub.l] + [b.sub.l][b.sub.k] = 2[[delta].sub.kl]1.

when restricted to an orthonormal basis. Inner and outer product can be defined directly from xy:

x x y = 1/2(xy + yx),

x [conjunction] y = 1/2(xy-yx).

The most ingenious element of (2) is that it adds two apparently different objects, a scalar and a plane element, an operation analogous to addition of real and imaginary parts of a complex number. Geometric product for vectors x, y, z can be axiomatically defined by the following rules:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [absolute value of x] is a positive scalar called the magnitude of x. The rules imply that x x y must be a scalar since

xy + yx = [[absolute value of x + y].sup.2] - [[absolute value of x].sup.2] - [[absolute value of y].sup.2]

Geometric algebra allows us to speak of inverses of vectors: [x.sup.-1] = x/[absolute value of x].sup.2]. x is invertible (i.e. possesses an inverse) if its magnitude is nonzero. Geometric product of an arbitrary number of invertible vectors is also invertible. The possibility of inverting all nonzero-magnitude vectors is perhaps the most important difference between geometric and convolution algebras.

Geometric products of different basis vectors

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[k.sub.1] < ... < [k.sub.j], are called basis blades (or just blades). In n-dimensional Euclidean space there are [2.sup.n] different blades. This can be seen as follows. Let {[x.sub.1],..., [x.sub.n]} be a sequence of bits. Blades in an n-dimensional space can be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [b.sup.0.sub.k] = 1, which shows that blades are in a one-to-one relation with n-bit numbers. A general multivector is a linear combination of blades,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

with real or complex coefficients [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Clifford algebra implies that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

where [direct sum] is given by (1). Multiplication of two basis blades is thus, up to a sign, in a one-to-one relation with exclusive alternative of two binary n-tuples. Accordingly, (4) is a projective representation of the group of binary n-tuples with addition modulo 2.

The [GA.sub.c] model is based on binding defined by geometric product (4) of blades while superposition is just addition of blades (3). The discrete [GA.sub.d] [19] is a version of the [GA.sub.c] model obtained if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in (3) equal [+ or -] 1. The first recognition tests of [GA.sub.d], as compared to HRR and BSC, were described in [19]. In the present paper we go further and compare HRR and BSC with [GA.sub.c], a version employing "projected products" [5] and arbitrary real [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We also repeat Plate's scaling test ([22], [23]--Appendix I) and compare test results for [GA.sub.c], HRR and BSC models.

Throughout this paper we shall use the following notation: "*" denotes binding roles and fillers by means of the geometric product and "+" denotes the superposition of sentence chunks. Additionally, "[??]" will denote binding performed by circular convolution used in the HRR model and [a.sup.*] denotes the involution of a HRR vector a. A "+"' in the superscript of [x.sup.+] denotes the operation of reversing a blade or a multivector [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Asking a question (i.e. decoding) will be denoted with "#". The size of a (multi)vector means the number of memory cells it occupies in computer's memory, while the magnitude of a (multi)vector V = {[v.sub.1],..., [v.sub.n]} is its Euclidean norm [square root of ([summation over.sup.n.sub.i-1 [v.sup.2.sub.i]].

For our purposes it is important that geometric calculus allows us to define in a very systematic fashion a hierarchy of associative, non-commutative, and invertible operations that can be performed on [2.sup.n]-tuples. The resulting superpositions are less noisy than the ones based on convolutions, say. Such operations arc in general unknown to a wider audience, which explains popularity of tensor and convolution algebras. Geometric product preserves dimensionality at the level [2.sup.n]-dimensional multivectors, where n is the number of bits indexing basis vectors. Moreover, all nonzero vectors are invertible with respect to geometric product, a property absent for convolutions and important for unbinding and recognition. A detailed analysis of links between [GA.sub.c], HRR and BSC can be found in [5]. In particular, it is shown that both [GA.sub.c] model and BSC are based on two different representations (in group theoretical sense) of the additive group of binary n-tuples with addition modulo 2. Actually, the latter observation was the starting point for studying geometric algebra forms of reduced representations [6].

3 The [GA.sub.c] Model

Multivector (3) associated with n-dimensional Euclidean space can be represented by the [2.sup.n]-tuple ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Geometric product of two such [2.sup.n]-tuples is again a [2.sup.n]-tuple. In this sense geometric product is analogous to bindings employed in HRR or BSC, but we can still proceed in several inequivalent ways. For example, since a product of two basis blades is again a basis blade multiplied by [+ or -]1, we can require that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Such a discrete version of GA HRR was tested vs. HRR and BSC in [19], and will be denoted here by [GA.sub.d] (discrete GA HRR).

The continuous [GA.sub.c] model differs greatly from [GA.sub.d]. First of all, we do not begin with a general [2.sup.n]-dimensional multivector. Atomic objects are real-valued vectors in n-dimensional Euclidean space, in practice represented by n-tuples of components taken in some basis. A hierarchy of multivectors is reserved for complex statements, formed by binding and superposition of atomic objects. An n-dimensional vector, when seen from the multivector perspective, is a highly sparse [2.sup.n]-tuple: Only n out of [2.sup.n] components can be nonzero.

The procedure we employ was suggested in [5]. The space of [2.sup.n]-tuples is split into subspaces corresponding to scalars (0-vectors), vectors (1-vectors), bivectors (2-vectors), and so on. At the bottom of the hierarchy lay vectors V [member of] [R.sup.n], having rank 1 and being denoted as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. An object of rank 2 is created by multiplying two elements of rank 1 with the help of the geometric product. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] be vectors in [R.sup.3]. A multivector [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of rank 2 in [R.sup.3] comprises the following elements (cf. [18])

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the first entry in the array on the right being a scalar and the remaining three entries being 2-blades. For arbitrary vectors in [R.sup.n] we would have obtained one scalar (or, more conviniently: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] scalars and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 2-blades.

Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] be two multivectors in [R.sup.3]. A multivector [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of rank 3 in [R.sup.3] may be created in two ways: as a result of multiplying either [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] or [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Let us concentrate on the first case

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Here, the first three entries in the resulting matrix are 1-blades, while the last entry is a 3-blade. For arbitrary multivectors of rank 1 and 2 in [R.sup.n] we would have obtained [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] vectors and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] trivectors. We cannot generate multivectors of rank higher than 3 in [R.sup.3], but it is easy to check that in spaces [R.sup.n>3] a multivector of rank 4 would have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] scalars, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] bivectors and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 4-blades. The number of k-blades in a multivector of rank r is described by Table 1. It becomes clear that a multivector of rank r over [R.sup.n] is actually a vector over a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-dimensional space.

As an example let us consider the following roles and fillers being normalized vectors drawn randomly from [R.sup.n] with Gaussian distribution N(0, 1/n)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

PSmith, who is a 66 year old male named Pat, is created by first multiplying roles and fillers with the help of the geometric product

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [e.sub.1],..., [e.sub.n] are orthonormal basis blades. In order to be decoded as much correctly as possible, PSmith should have the same magnitude as vectors representing atomic objects, therefore it needs to be normalized. Finally, PSmith takes the form of

PSmith = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

[TABLE 1 OMITTED]

PSmith is now a multivector of rank 2. The decoding operation

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

will produce a multivector of rank 3 consisting of vectors and trivectors. However, the original Pat did not contain any trivector components--they all belong to the noise part and the only interesting blades in [name.sup.+] PSmith are vectors. The expected answer is a vector, therefore there is no point in calculating the whole multivector [name.sup.+] PSmith and only then comparing it with items stored in the clean-up memory. To be efficient, one should generate only the vector-part while computing name+PSmith and skip the noisy trivectors.

Let [<x>.sub.k] denote the projection of a multivector on k-blades. To decode PSmith's name we need to compute

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The resulting Pat' will still be noisy, but to a lesser degree than it would have been if the trivectors were present.

Formally, we are using a map [*.sup.1.sub.1,2] that transforms a multivector of rank 1 (i.e. an n-tuple) and a multivector of rank 2 (i.e. a (1 + (n - 1)n/2)-tuple) into a multivector of rank 1 without computing the unnecessary blades. Let X be a multivector of rank 2

X = [<X>.sub.0] + [<X>.sub.2] = [x.sub.0] + [summation over l<m] [x.sub.lm][e.sub.l][e.sub.m],

where [x.sub.lm] = -[x.sub.ml]. If A = ([A.sub.1],..., [A.sub.n]) is a decoding vector (actually, an inverse of a role vector), then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with Y = ([Y.sub.1],..., [Y.sub.n]) being an n-tuple, i.e. a multivector of rank 1. More explicitly,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The map [*.sup.1.sub.1,2] is an example of a projected product, introduced in [5], reconstructing the vector part of AX without computing the unnecessary parts. The projected product is basis independent, as opposed to circular convolutions. In general, [*.sup.m.sub.l,k] transforms the geometric product of two multivectors [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] into a multivector [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We now need to compare Pat' with other items stored in the clean-up memory using the dot product, and since Pat' is a vector, we need to compare only the vector part. That means, if the clean-up memory contained a multivector [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of an odd rank, we would also need to compute Pat'. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] while searching for the right answer.

This method of decoding suggests that items stored in the clean-up memory should hold information about their ranks, which is dangerously close to employing fixed data slots present in localist architectures. However, a rank of a clean-up memory item can be "guessed" from its size. In a distributed model we also should not "know" for sure how many parts the projected product should reject, but it can certainly reject parts spanned by blades of highest grades. Unfortunately, since the geometric product is non-commutative, questions concerning roles and fillers need to be asked on different sides of a sentence, forcing atomic objects to hold information on whether they are roles or fillers and thus, forcing them to be partly hand-generated. We can either ask question always on the same side of a sentence and be satisfied with less precise answers or always ask about only the roles or only the fillers. It becomes clear, that recognition based on the hierarchy of multivectors and the projected product is best applicable to tasks in which questions need to be asked only on one side of the sentence or in which sentences have predetermined structure.

Before providing formulas for encoding and decoding a complex statement we need to introduce additional notation for the projected product and the projection. We have already introduced the projected product [*.sup.m.sub.l,k] transforming the geometric product of two multivectors of ranks l and k into a multivector of rank m. This will not always be the case for complex statements, since we can produce a multivector that will not be of any given rank. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denote the projected product transforming the geometric product of a multivector A and a multivector B containing [[alpha].sub.1]-blades, [alpha].sub.2]-blades, ... and [[alpha].sub.k]-blades into a multivector [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In this way, the projected product [*.sup.1.sub.1,2] may be written down as [*.sup.1.sub.1,{0,2}]. By analogy, let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denote the projection of a multivector on components spanned by [alpha]1-blades, [alpha]2-blades, ... and [[alpha].sub.m]- blades.

Let [PSI] denote the normalized multivector encoding the sentence "Fido bit PSmith", i.e.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Multivector [PSI] will contain scalars, vectors, bivectors and trivectors and can be written down as the following vector of dimension [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

4 More Examples of Encoding and Decoding Sentences

The following examples illustrate various ways of asking questions in the [GA.sub.c] architecture.

"Who was bitten?"

The answer to that question will be a multivector of rank 2

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Let [bite.sub.obj] = {y.sub.1], ... [y.sub.n]} PSmith' will then have the form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with [[delta].sub.ijt] = [[delta].sub.ijt] = [[delta].sub.ijt]. AS previously, PSmith' should be compared with appropriate items from the cleanup memory to produce the most probable answer.

"What happened to PSmith?"

Asking about roles poses a problem of inverting a (multi)vector. Since not all multivectors are invertible, we have to be satisfied with reverses [5] of multivectors. We will need another type of a projected product: let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denote the projected product transforming the geometric product of a multivector B containing [alpha]1k blades, [[alpha].sub.2]-blades, ... and [alpha]l-blades and a multivector A into a multivector . The answer to our question will be a vector

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If we denote PSmith as

PSmith = [z.sub.0] + [z.sub.12][e.sub.12] + ... + [Z.sub.(n-1)][n.sup.e](n-1)n

then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

"What did Fido do?"

The last question in this example will produce an answer having the form of a vector

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If Fido = {[v.sub.1], ..., [v.sub.n]}, then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Those tedious calculations imply that the [GA.sub.c] model is best applicable to sentences having a similar or identical complexity structure, otherwise it may be hard to automatize the process of asking questions and retrieving answers. Because of this limitation, this construction seems to be a promising candidate for a holographic database.

5 Overview of Plate's Scaling Test

Plate [23] describes a simulation in which approximately 5000 HRR 512-dimensional vectors were stored in the clean-up memory. The purpose of his simulation was to study efficiency of the HRR model but also to provide a counterexample to the claim that role-filler representations do not permit one component of a relation to be retrieved given the others. We will repeat Plate's test on several models and compare the results.

Let us consider the following atomic objects

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

At the beginning of the scaling test, relations concerning multiplication and addition are constructed. For example, "2 . 3 = 6" is constructed as

[times.sub.2,3] = times + operand * ([num.sub.2] + [num.sub.3]) +result * [num.sub.6].

Generally, relations are constructed in the following way

[times.sub.x,y] = times + operand * ([num.sub.x], + [num.sub.y]) + result * [num.sub.x,y],

[plus.sub.x,y] = plus + operand * ([num.sub.x] + [num.sub.y]) + result * [num.sub.x+y].

x and y range from 0 to 50 with y [less than or equal to] x making a total of 2501 number vectors and 2652 instances of each [times.sub.x,y] and [plus.sub.x,y]. As one can notice, the same operand role is used for both x and y to preserve commutativity of multiplication and addition.

Plate writes, that a relation can be "looked up" by supplying enough information to distinguish a specific relation from others. For example, to look up "2 x 3 = 6" one needs to find the most similar relation R to any of the following fragmentary statements

(case 1) times + operand * [num.sub.2] +operand * [num.sub.3],

(case 2) times + operand * [num.sub.2] + result * [num.sub.6],

(case 3) times + operand * [num.sub.3] + result * [num.sub.6],

(case 4) operand * [num.sub.2] + operand * [num.sub.3] + result * [num.sub.6].

Retrieving the missing piece of information in the first three cases can be done by asking any of the subquestions

(case 1) R # result,

(case 2) R # operand,

(case 3) R # operand.

Case 4 is somewhat more problematic--to look up a missing relation name (times or plus) one needs to have a separate clean-up memory containing only relation names or to use an alternative encoding in which there is a role for relation names. We will alter Plate's test by using the latter method.

Plate states that he had tried one run of the system making a query for each component missing in every relation --this amounted to 10608 queries. A further 7956 queries had been made to decode the missing component except for the relation name. Plate goes on to claim, that the system made no errors.

There appear to be two misstatements in Plate's claims. Firstly, we cannot treat subquestions regarding cases 2 and 3 separately, as there are two equally probable answers to each of these subquestions, provided that relations [R.sub.2] and [R.sub.3] point correctly to [times.sub.x,y]. Secondly, consider a fragmentary piece of information

times + operand * [num.sub.0] + result * [num.sub.0].

In this situation, the missing component can be any of the numbers [num.sub.x] where x [member of] {0, ..., 50} and thus, there are 51 atomic objects that are equally probable to be the right answer. This suggests that Plate regards several answers as valid ones, as long as the similarity of these answers exceeds some threshold. To work out the missing component, one then needs to check which of those potential answers is not in the original set used for retrieval.

Such a method of investigating scaling properties has more than a few advantages:

--Inaccuracies mentioned above act as a test if all atomic objects are created and treated equally. Ideally, every atomic object of the [num.sub.x] form should be recognized as a correct answer to the "zero problem" for number of trials/51 x 100% of the time.

--Prime numbers greater than 100 do not appear in any of [times.sub.x,y] and [plus.sub.x,y] relations, therefore they test if the model is immune to garbage data.

--Numbers ranging from [num.sub.0] to [num.sub.100] may be constructed in a multitude of ways by addition ([num.sub.0] by multiplication) and any given sentence chunk result * [num.sub.z] will appear quite often in the [plus.sub.x,y] relation. Hence, this is a great way of checking if the model deals with excessive similarity of a number of complex statements.

--Atomic objects bound with operand and result range in variety. On the other hand, there are just two atomic objects acting as an operation--does it affect in any way the recognition of operation filler? Indeed, it will be shown in Section 7 that recognition of the operation chunk turns out to be quite interesting depending on the choice of the architecture.

6 Notation

For the purpose of explaining test results, let us introduce the following notation. Let and [S.sup.+.sub.x,y] denote relations

[S.sup.*.sub.x,y] = operation * times + operand * [num.sub.x] + operand * [num.sub.y] + result * [num..sub.x,y].

[S.sup.+.sub.x,y] = operation * plus + operand * [num.sub.x] + operand * [num.sub.y] + result * [num..sub.x,y].

for y [less than or equal to] x. We chose to use a separate role for a relation name to enable encoding the information given only operands and the result. Let [F.sup.op.sub.1,x,y] denote fragmentary statements for i [member of] {1, 2, 3, 4} and op [member of] {*, +}

[F.sup.op.sub.1,x,y] = [S.sup.op.sub.x,y] - result * [num.sub.x op y],

[F.sup.op.sub.2,x,y] = [S.sup.op.sub.x,y] - operand * [num.sub.x],

[F.sup.op.sub.3,x,y] = [S.sup.op.sub.x,y] - operand * [num.sub.y],

[F.sup.op.sub.4,x,y] = [S.sup.op.sub.x,y] - operation * op.

If v is an element of the clean-up memory, then let N(v) denote the closest neighbor of v, i.e. an element of the clean-up memory that is most similar to v. If v has more than one neighbor, then all subquestions during the test are asked to all of v's neighbors. In HRR, [GA.sub.d] (with the Hamming measure of similarity) and [GA.sub.c] it is extremely unlikely for an element of the clean-up memory to have more than one neighbor due to the continuous nature of data in these architectures. Let [Q.sup.op.sub.i,x,y] = N([F.sup.op.sub.i,x,y]) for i [member of] {1, 2, 3, 4} and op [member of] {*, +}. During the test we asked subquestions concerning components missing in [F.sup.op.sub.i,x,y] and obtained the following (sets of) answers

[q.sup.op.sub.1,x,y] = N ([Q.sup.op.sub.1,x,y] # result).

[q.sup.op.sub.2,x,y] = N ([Q.sup.op.sub.2,x,y] # operand).

[q.sup.op.sub.3,x,y] = N ([Q.sup.op.sub.3,x,y] # operand).

[q.sup.op.sub.4,x,y] = N ([Q.sup.op.sub.4,x,y] # operation).

We assume that a missing component is identified correctly if it is the only neighbor to appropriate answer [q.sup.op.sub.*,x,y] or it belongs to the set of neighbors of [q.sup.op.sub.*,x,y]

7 Test Results

The software for all tests was developed by A. Patyk-Lonska in Java language. All tests were performed on an ordinary PC with dualcore AMD processor with 2 GB RAM.

Tables 2 through 4 compare scaling test results for

--[GA.sub.c] and HRR, both using dot-product as a similarity measure.

--BSC using Hamming distance as a similarity measure.

Although BSC and HRR models need only n-dimensional vectors, this is not quite the case for and [GA.sub.c], which needs 1 + n(n-1)/2 numbers to represent multivectors of rank 2 over [R.sup.n]. We present recognitions test results close to 100% and comment on vector length required for each model to achieve such percentage. The real number of memory cells used up by each model is given in brackets in the table headings.

The answers to subquestions [Q.sup.op.sub.2,x,y] # operand and [Q.sup.op.sub.3,x,y] # operand were considered to be correct if any of the two possible operands came up as the item most similar to those subquestions. In case of other questions and subquestions only exact answers were taken into consideration.

50 runs of the test were performed on each model. Unlike in Plate's test, x and y ranged from 0 to only 20. Hence, there are 401 number vectors and 462 relation vectors.

The "zero problem" is clearly visible in each tested model, as the recognition percentage of [Q.sup.*.sub.3,x,y] barely exceeds 90%. Nevertheless, [Q.sup.*.sub.3,x,y] almost always contains at least one of the operands from the original sentence [S.sup.*.sub.x,y] since the recognition percentage of [q.sup.*.sub.3,x,y] reaches 100% for sufficiently large data size. On the whole, the recognition percentage of [q.sup.*.sub.3,x,y] and [q.sup.*.sub.3,x,y] does not differ greatly from the recognition percentage of [q.sup.+.sub.2,x,y] and [q.sup.+.sub.3,x,y] in any model.

Table entries marked with a "[DELTA]" indicate that despite the wrong recognition of a fragmentary sentence, the missing component has been identified correctly. In all tested models such situations arise for sentences with one of the operands missing. For HRR, however the missing item has been "accidentally" correctly identified also in cases of missing operation * times and result * [times.sub.x,y] components. Such recognition did not occur in cases of missing operation * plus and result * [plus.sub.x,y] components, which is distressingly asymmetric.

HRR turned out to be the worst model during this experiment. The recognition percentage of [Q.sup.*.sub.1,x,y] and [Q.sup.+.sub.1,x,y] is dangerously low when compared to other Q's. Both [Q.sup.*.sub.1,x,y] and [Q.sup.+.sub.1,x,y] are retrieved from the clean-up memory given only two operands and the operation type. Since we have only two operation types, [Q.sup.*.sub.1,x,y] and [Q.sup.+.sub.1,x,y] will not differ greatly from each other. This phenomenon is also observable in BSC (but not in [GA.sub.c]), where the recognition percentage of [Q.sub.1]'s is only slightly lower than that of the other Q's. Apart from that weakness, BSC performs as well as [GA.sub.c] for adequate data size.

8 Conclusion

Authors developed a new model of distributed representations of data based on geometric algebra. Although the data representations of sentences encoded in this model may have varying lengths (as opposed to HRR and BSC), it can be justified by the fact that it is quite logical for sentences that hold more information to have larger "volume".

Tedious calculations presented in Section 3 imply that the [GA.sub.c] model is best applicable to sentences having a similar or identical complexity structure, otherwise it may be hard to make the process of asking questions and retrieving answers automatic. Because of this limitation, this construction seems to be a promising candidate for a holographic database.

Although research in distributed representations has been thriving in the past decades, no one has yet developed a software tool that would employ distributed representations to implement databases with real-life contents. Of course, some attempts at scaling has been made so far, but they were rather narrowly aimed at specific tasks. Authors hope to develop such a tool in the (near) future.

9 Further Perspectives Quantum--like Computation Based on Geometric Algebra

Quantum algorithms [17] employ tensor product binding and thus are analogous to Smolensky's tensor product representations [25]. The peculiarity of quantum computation is in its putative implementation: hardware based on the rules of micro-world automatically guarantees parallelism of processing the entire superposition of bound objects. The same property, however, makes quantum processors extremely sensitive to noise so it is by no means evident that working devices will be practically constructed.

The question is if we really have to look for micro-world implementations of quantum computation. Replacing tensor products by geometric products one obtains a one-to-one map between quantum mechanical superpositions and multivectors [2, 4], and all elementary quantum gates have geometric analogues [3]. This proves that quantum algorithms can be, in principle, implemented in systems described by geometric algebra.

Acknowledgement

This work was supported by Grant G.0405.08 of the Fund for Scientific Research Flanders.

References

[1] D. Aerts and M. Czachor (2004), "Quantum aspects of semantic analysis and symbolic artificial intelligence", J. Phys. A, vol. 37, pp. L123-L13.

[2] D. Aerts and M. Czachor (2007), "Cartoon computation: Quantum-like algorithms without quantum mechanics", J. Phys. A, vol. 40, pp. F259-F266.

[3] M. Czachor (2007), "Elementary gates for cartoon computation", J. Phys. A, vol. 40, pp. F753-F759.

[4] D. Aerts and M. Czachor (2008), "Tensor-product versus geometric-product coding", Physical Review A, vol. 77, id. 012316.

[5] D. Aerts, M. Czachor, and B. De Moor (2009), "Geometric Analogue of Holographic Reduced Representation", J. Math. Psychology, vol. 53, pp. 389-398.

[6] D. Aerts, M. Czachor, and B. De Moor (2006), "On geometric-algebra representation of binary spatter codes", preprint arXiv:cs/0610075 [cs.AI].

[7] D. Aerts, M. Czachor, and L. Orlowski (2009), "Teleportation of geometric structures in 3D", J. Phys. A vol. 42, 135307.

[8] W.K. Clifford (1878), "Applications of Grassmann's extensive algebra", American Journal of Mathematics Pure and Applied, vol. 1, 350-358.

[9] R. W. Gayler (1998), "Multiplicative binding, representation operators, and analogy", Advances in Analogy Research: Integration of Theory and Dam from the Cognitive, Computational, and Neural Sciences, K. Holoyak, D. Gentner, and B. Kokinov, eds., Sofia, Bulgaria: New Bulgarian University, p. 405.

[10] H. Grassmann (1877), "Der Oft der Hamilton'schen Quaternionen in der Ausdehnungslehre", Mathematische Annalen, vol. 3, 375-386.

[11] G. E. Hinton, J. L. McClelland and D. E. Rumelhart (1986), ""Parallel distributed processing: Explorations in the microstructure of cognition", vol. 1, 77U109, "Distributed representations", The MIT Press, Cambridge, MA.

[12] P. Kanerva (1996), "Binary spatter codes of ordered k-tuples". In C. yon der Malsburg et al. (Eds.), Artificial Neural Networks ICANN Proceedings, Lecture Notes in Computer Science vol. 1112, pp. 869-873.

[13] P. Kanerva (1997), "Fully distributed representation". Proc. 1997 Real World Computing Symposium (RWCS97, Tokyo), pp. 358-365.

[14] E. M. Kussul (1992), Associative Neuron-Like Structures. Kiev: Naukova Dumka (in Russian).

[15] E.M. Kussul and T.N. Baidyk (1990), "Design of Neural-Like Network Architecture for Recognition of Object Shapes in Images", Soviet J. Automation and Information Sciences, vol. 23, no. 5, pp. 53-58.

[16] N.G. Marchuk, and D.S. Shirokov (2008), "Unitary spaces on Clifford algebras", Advances in Applied Clifford Algebras, vol 18, pp. 237-254.

[17] M.A. Nielsen and I.L. Chuang (2000), Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.

[18] A. Patyk (2010), "Geometric Algebra Model of Distributed Representations", in Geometric Algebra Computing in Engineering and Computer Science, E. Bayro-Corrochano and G. Scheuermann, eds. Berlin: Springer. Preprint arXiv: 1003.5899v1 [cs.AI].

[19] T. Plate (1995), "Holographic Reduced Representations", IEEE Trans. Neural Networks, vol. 6, no. 3, pp. 623-641.

[20] T. Plate (2003), Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford.

[21] D.A. Rachkovskij (2001), "Representation and Processing of Structures with Binary Sparse Distributed Codes", IEEE Trans. Knowledge Data Engineering, vol. 13, no. 2, pp. 261-276.

[22] P. Smolensky (1990), "Tensor product variable binding and the representation of symbolic structures in connectionist systems". Artificial Intelligence, vol. 46, pp. 159-216.

Agnieszka Patyk-Lonska, Marek Czachor Gdansk University of Technology ul. Narutowicza 11/12, Gdansk 80-233, Poland E-mail: {patyk, mczachor} @pg.gda.pl, http://www.pg.gda.pl/patyk http://www.mif.pg.gda.pl/kft/czachor.html

Diederik Aerts Centrum Leo Apostel (CLEA), Vrije Universiteit Brussel Krijgskundestraat 33, 1160 Brussels, Belgium E-mail: diraerts@vub.ac.be http://www.vub.ac.be/CLEA/aerts/

Received: October 23, 2011

This paper is based on A. Patyk-Lonska, M. Czachor and D. Aerts Some tests on geometric analogues of Holographic Reduced Representations and Binary Spatter Codes published in the proceedings of the Ist International Workshop on Advances in Semantic Information Retrieval (part of the FedCSIS'2011 conference).

Table 2: Recognition percentage for [GA.sub.c]. Questions [R.sup.10] [R.sup.20] (46) (191) [Q.sup.*.sub.1,x,y] 89.76% 99.98% [q.sup.*.sub.1,x,y] 39.44% 95.28% [Q.sup.*.sub.2,x,y] 91.12% 99.73% [q.sup.*.sub.2,x,y] 36.24% 83.86% [Q.sup.*.sub.3,x,y] 83.97% 91.15% [q.sup.*.sub.3,x,y] 41.27% 84.92% [Q.sup.*.sub.4,x,y] 98.90% 99.60% [q.sup.*.sub.4,x,y] 42.01% 95.56% [Q.sup.+.sub.1,x,y] 89.39% 99.99% [q.sup.+.sub.1,x,y] 39.09% 95.99% [Q.sup.+.sub.2,x,y] 86.96% 99.59% [q.sup.+.sub.2,x,y] 35.32% 83.84% [Q.sup.+.sub.3,x,y] 87.00% 99.63% [q.sup.+.sub.3,x,y] 35.12% 83.84% [Q.sup.+.sub.4,x,y] 99.05% 99.53% [q.sup.+.sub.4,x,y] 45.84% 94.73% Questions [R.sup.30] [R.sup.40] (436) (781) [Q.sup.*.sub.1,x,y] 99.99% 100.0% [q.sup.*.sub.1,x,y] 99.58% 99.88% [Q.sup.*.sub.2,x,y] 99.98% 100.0% [q.sup.*.sub.2,x,y] 97.92% 99.81% [Q.sup.*.sub.3,x,y] 91.33% 91.34% [q.sup.*.sub.3,x,y] [98.05%. [99.82%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.4,x,y] 99.63% 99.59% [q.sup.*.sub.4,x,y] 99.24% 99.52% [Q.sup.+.sub.1,x,y] 100.0% 100.0% [q.sup.+.sub.1,x,y] 99.76% 99.95% [Q.sup.+.sub.2,x,y] 99.96% 100.0% [q.sup.+.sub.2,x,y] 97.97% 99.79% [Q.sup.+.sub.3,x,y] 99.96% 100.0% [q.sup.+.sub.3,x,y] 97.98% 99.79% [Q.sup.+.sub.4,x,y] 99.51% 99.54% [q.sup.+.sub.4,x,y] 99.14% 99.49% Table 3: Recognition percentage for HRR. Questions N = 200 N = 300 [Q.sup.*.sub.1,x,y] 29.1% 27.06% [q.sup.*.sub.1,x,y] [31.08%. [30.03%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.2,x,y] 54.72% 52.06% [q.sup.*.sub.2,x,y] [98.99%. [99.92%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.3,x,y] 50.53% 47.93% [q.sup.*.sub.3,x,y] [98.92%. [99.90%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.4,x,y] 89.23% 90.56% [q.sup.*.sub.4,x,y] [90.28%. [92.69%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.1,x,y] 28.26% 29.46% [q.sup.+.sub.1,x,y] 27.32% 29.37% [Q.sup.+.sub.2,x,y] 53.91% 54.48% [q.sup.+.sub.2,x,y] [98.72%. [99.90%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.3,x,y] 53.73% 55.23% [q.sup.+.sub.3,x,y] [98.67%. [99.91%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.4,x,y] 98.70% 98.75% [q.sup.+.sub.4,x,y] 97.16% 98.55% Questions N = 400 N = 500 [Q.sup.*.sub.1,x,y] 26.28% 28.51% [q.sup.*.sub.1,x,y] [30.30%. [32.23%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.2,x,y] 53.10% 53.32% [q.sup.*.sub.2,x,y] [99.98%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.3,x,y] 49.80% 51.21% [q.sup.*.sub.3,x,y] [99.97%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.4,x,y] 90.51% 90.29% [q.sup.*.sub.4,x,y] [92.42%. [92.31%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.1,x,y] 28.03% 28.81% [q.sup.+.sub.1,x,y] 28.02% 28.80% [Q.sup.+.sub.2,x,y] 55.26% 54.68% [q.sup.+.sub.2,x,y] [99.99%. [99.99%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.3,x,y] 55.34% 54.62% [q.sup.+.sub.3,x,y] [99.98%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.4,x,y] 98.66% 98.75% [q.sup.+.sub.4,x,y] 98.64% 98.74% Table 4: Recognition percentage for BSC. Questions N = 200 N = 300 [Q.sup.*.sub.1,x,y] 86.71% 91.65% [q.sup.*.sub.1,x,y] 82.82% 90.62% [Q.sup.*.sub.2,x,y] 94.42% 97.60% [q.sup.*.sub.2,x,y] [99.68%. [99.97%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.3,x,y] 86.87% 89.43% [q.sup.*.sub.3,x,y] [99.15%. [99.47%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.4,x,y] 94.39% 95.58% [q.sup.*.sub.4,x,y] 90.78% 94.89% [Q.sup.+.sub.1,x,y] 86.38% 91.59% [q.sup.+.sub.1,x,y] 81.71% 90.28% [Q.sup.+.sub.2,x,y] 94.23% 97.77% [q.sup.+.sub.2,x,y] [99.36%. [99.94%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.3,x,y] 94.54% 97.39% [q.sup.+.sub.3,x,y] [99.41%. [99.94%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.4,x,y] 95.40% 95.38% [q.sup.+.sub.4,x,y] 91.81% 94.27% Questions N = 400 N = 500 [Q.sup.*.sub.1,x,y] 93.78% 94.74% [q.sup.*.sub.1,x,y] [93.87%. [94.95%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.2,x,y] 99.03% 99.44% [q.sup.*.sub.2,x,y] [99.98%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.3,x,y] 90.50% 90.97% [q.sup.*.sub.3,x,y] [99.65%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.*.sub.4,x,y] 95.39% 95.50% [q.sup.*.sub.4,x,y] 95.22% 95.44% [Q.sup.+.sub.1,x,y] 93.65% 94.71% [q.sup.+.sub.1,x,y] 93.27% 94.57% [Q.sup.+.sub.2,x,y] 99.19% 99.52% [q.sup.+.sub.2,x,y] [100.0%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.3,x,y] 98.77% 99.48% [q.sup.+.sub.3,x,y] [100.0%. [100.0%. sup. sup. [DELTA]] [DELTA]] [Q.sup.+.sub.4,x,y] 95.65% 95.66% [q.sup.+.sub.4,x,y] 95.02% 95.27%

Printer friendly Cite/link Email Feedback | |

Author: | Patyk-Lonska, Agnieszka; Czachor, Marek; Aerts, Diederik |
---|---|

Publication: | Informatica |

Article Type: | Report |

Geographic Code: | 4EUBL |

Date: | Dec 1, 2011 |

Words: | 8057 |

Previous Article: | A query expansion technique using the EWC semantic relatedness measure. |

Next Article: | Experiments on preserving pieces of information in a given order in holographic reduced representations and the continuous geometric algebra model. |

Topics: |