# Novel complex valued neural networks.

IntroductionStarting in 1950's researchers tried to arrive at models of neuronal circuitry. Thus the research field of artificial neural networks took birth. The so called, perceptron was shown to be able to classify linear separable patterns. Since the Ex-clusive OR gate cannot be synthesized through any perceptron (as the gate outputs are not linearly separable), the interest in artificial neural networks faded away. In the 1970's, it was shown that multi-layer feed forward neural network such as a multi-layer perceptron is able to classify non-linearly separable patterns.

Living systems/machines such as homosapiens, lions, tigers etc have the ability to associate externally presented one/two/three dimensional information such as audio signal/images/three dimensional scenes with the information stored in the brain. This highly accurate ability of association of information is amazingly achieved through the bio-chemical circuitry in the brain. In 1980's Hopfield revived the interest in the area of artificial neural networks through a model of associative memory. The main contribution is a convergence theorem which shows that the artificial neural network reaches a memory/stable state starting in any arbitrary initial input (in a certain important mode of operation). He also demonstrated several interesting variations of associative memory. In a continuous-time version of associative memory is described. It is shown that the celebrated convergence Theorem in discrete time generalizes to the continuous time associative memory. In the model of associative memory in one dimension (Hopfield associative memory) is generalized to multi/infinite dimensions and the associated convergence theorem is proven.

It was realized by researchers such as that the basic model of a neuron must be modified to account for complex valued inputs, complex valued synaptic weights and thresholds. In many real world applications, complex valued input signals need to be processed by neural networks with complex synaptic weights. Thus the need to study, design and analysis of such networks is real. Also, in (Rama3) the results on real valued associative memories are extended to complex valued neural networks. In the celebrated back propagation algorithm is generalized to complex valued neural networks. Also, in based on a novel model of neuron, complex valued neural networks are designed. Thus, based on the results in section 2, section 3, it is reasoned that transforming real valued signals into complex domain and processing them in the complex domain could have many advantages.

This research paper is organized as follows. In Section 2, Discrete Fourier Transform (DFT) is utilized to transform a set of real/complex valued sequences into the complex valued (frequency) domain. It is reasoned that, in a well defined sense, processing the signals using complex valued neural networks is equivalent to processing them in real domain. In Section 3, a novel model of continuous time neuron is discussed. The associated neural networks (based on the novel model of neuron) are briefly outlined. In Section 4, some important generalizations are discussed. In Section 5, some open questions are outlined. The research paper concludes in Section 6.

Discrete Fourier Transform: Some Complex Valued Neural Networks:

In the field of Digital Signal Processing (DSP), discrete sequences are processed by discrete time circuits such as digital filters. The transform which converts the time domain information into frequency domain is called as the Discrete Fourier Transform (DFT). One of the main reasons for utilizing the DFT in many applications is the existence of a fast algorithm to compute DFT. This fast algorithm is called as the Fast Fourier Transform (FFT). In the following, we provide the mathematical expressions for the Discrete Fourier Transform (DFT) as well as Inverse Discrete Fourier Transform (IDFT) of a discrete sequence [{[x.sub.n]}.sup.M-1.sub.n=0] i.e. {[x.sub.0], [x.sub.1], [x.sub.2], ..., [x.sub.M-1]}/

DFT: X(k) = [M-1.summation over (n=0)] x(n) [W.sup.kn.sub.M] for 0 [less than or equal to] k [less than or equal to] (M -1) (1)

DFT: x(n) = [1/M] [M-1.summation over (k=0)] X(k) [W.sup.-kn.sub.M] for 0 [less than or equal to] k [less than or equal to] (M - 1) (1)

Where [W.sub.M] = [e.sup.(-j[2[pi]/M)] (3)

The results in this section are motivated by the question:

A. MAIN QUESTION: Consider a set of samples which are linearly separable in the M-dimensional Euclidean space. Utilizing an invertible (Bijection) Linear Transformation, transform the points. In the transformed domain, are the resulting samples, linearly separable? In answering this question, we are led to the following Lemma.

Lemma 1: Under Bijective Linear Transformation, linearly separable patterns in Euclidean Space are mapped to linearly separable patterns in the transform space.

Proof: For the sake of notational convenience, we consider the patterns in a 2dimensional Euclidean space. Let the bijective/invertible linear transformation be T:

[R.sup.2] [right arrow] [R.sup.2].

Let the original separating line (more generally hyperplane) be given by

[W.sub.1] X + [W.sub.2] Y = C (4)

Two regions (decided by the separating line/hyper plane be) in R 2 are:

[S.sub.1] = {(x, y)|[W.sub.1] x + [W.sub.2] y [greater than or equal to] C}

[S.sub.2] = {(x, y)|[W.sub.1] x + [W.sub.2] y < C}

Now let us consider the Linear Transformation, T: [R.sup.2] [right arrow] [R.sup.2]

(x, y) [right arrow] (p x + q y, r x + s y) (6)

Let the linear transformation be represented by the following matrix:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (7)

Under this transformation, the separating line coordinates become:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (8)

Thus we readily have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

On inverting the linear transformation, we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

Where d is the determinant of the matrix and is given by d = p s - q r. We thus have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)

Thus, substituting for X, Y in the original separating line/hyper plane [W.sub.1] X + [W.sub.2] Y = C, we readily have

[W.sub.1]([s/d] X' - [q/d] Y') + [W.sub.2] ([-r/d] X' + [p/d] Y') = C (12)

([W.sub.1] s - [W.sub.2] r) X + (-[W.sub.1] q + [W.sub.2] p) Y' = Cd

From the above equations, it is clear that a point in two dimensional Euclidean space belonging to the set [S.sub.1] gets transformed to the point T(x.y) = (x',y) i.e

(x,y) [member of] [S.sub.1]

Where the set [S'.sub.1] is given by

T (x,y) = (x',y') [member of] [S'.sub.1]

[S'.sub.1] ={(x,y):([W.sub.1]s - [W.sub.2] r) x' + (-[W.sub.1] q + [W.sub.2] p) y [greater than or equal to] C d} (13)

* Thus we have shown that the patterns which are linearly separable in two dimensional Euclidean space will remain linearly separable after applying a bijective linear transformation to the samples.

* The above proof is easily generalized to samples in n-dimensional Euclidean space (where 'n' is arbitrary).

Consider the equation (1) for computing the Discrete Fourier Transformation of a discrete sequence of samples {x(n): 0 [less than or equal to] n [less than or equal to] (M - 1)}. Let the column vector containing these samples be given by Y. Also, let the column vector containing the transformed samples i.e {X(k): 0 [less than or equal to] k [less than or equal to] (M - 1)} be given by Z. It is clear that equation (1) is equivalent to the following:

Z = F Y, (14)

Where F is the Discrete Fourier Transform matrix. This matrix is invertible. Hence the transformation between the discrete sequence vectors Y, Z is bijective. Thus the above Lemma applies.

B. Complex Valued Perceptron:

Consider a single layer of conventional perceptrons. Let the sequence of input vectors be {[Y.sub.1], [Y.sub.2], ..., [Y.sub.L]}. The following supervised learning procedure is utilized to classify the patterns:

* Apply the DFT to the successive input training sample vectors resulting in the vectors {[Z.sub.1], [Z.sub.2], [Z.sub.L]}.

* Train a single layer of Complex Valued Perceptrons using the transformed sample vectors (Complex valued version of Perceptron learning law provided in [AAV] is used)

* Apply the IDFT to arrive at the proper class of training samples.

* Utilize the trained complex valued neural network to classify the test patterns. In view of Lemma 1, the above procedure converges when the training samples are linearly seperable. Thus the linearly seperable test patterns are properly classified.

The above procedure is also applied for non-linearly seperable patterns using a complex valued Multi-Layer Perceptron. Back propagation algorithm discussed in [Nit1, Nit2] is utilized. Detailed discussion is provided in. It is argued that the complex valued version of back propagation algorithm converges faster than the real one. Thus from computational view point, the above procedure is attractive.

Novel Model of a Neuron: Associated Neural Networks:

In conventional model of neuron, weighted contribution (weights being the synaptic weights) of current input values is taken and a suitable activation function is applied. A biologically more probable model takes the following facts into account

* The output of a neuron depends not only on the current input value, but all the input values over a finite horizon. Thus inputs to neurons are defined over a finite horizon (rather than a single time point).

* Synapses are treated as distributed elements rather than lumped elements. Thus synaptic weights are functions defined on a finite support.

For the sake of convenience, let the input as well as synaptic weight functions be defined on the support [0, T].

A. Mathematical Model of Neuron:

Let the synaptic weights be wi (t), 1 [less than or equal to] i [less than or equal to] M i.e time functions defined on the support [0,T]. Also, let the inputs be given by at (t), 1 [less than or equal to] i [less than or equal to] M. Thus, the output of the neuron is given by

y(t) = Sign ([M.summation over (j=1)] [a.sub.j] (t) w] (t)) (15)

More general activation functions (sigmoid, hyperbolic tangent etc) could be used. The successive input functions are defined over the interval [0,T]. They are fed as inputs to the continuous time neurons at successive SLOTS. For the sake of notational convenience, we call such a neuron, a continuous time perceptron.

[FIGURE 1 OMITTED]

B. Continuos Time Perceptron Learning Law:

As in the case of "conventional perceptron", a continuous time perceptron learning law is given by:

[W.sup.(n+1).sub.i] (t) = [W.sup.(n).sub.i] (t) + [eta] (S(t) - g(t)) [a.sub.i] (t) (16)

where S(t) is the target output for the current training example, g(t) is the output generated by the continuous time perceptron and [eta] is a positive constant called the learning rate. The proof of convergence of conventional perceptron learning law, also guarantees the point wise convergence (not necessarily uniform convergence) of synaptic weight functions.

Using sigmoid function as the activation function and the continuous perceptron as the model of neuron, it is straightforward to arrive at a continuous time Multi-Layer Perceptron. The conventional back propagation algorithm is generalized to such a feed forward network.

C. Modulation Theory: Feed Forward Neural Networks:

Suppose the synaptic weight functions are chosen as sinusoids i.e. [w.sub.i] (t) = cos [v.sub.i]t or sin [v.sub.i]t (where [v.sub.i] = 2 [pi] [f.sub.i] and [f.sub.i]'s are frequencies of the sinusoids). The weighted contribution at each neuron actually corresponds to Amplitude Modulation (where the synaptic weight functions are the carrier frequencies and the inputs are the base band signals).

We seriously expect that the well known results in Modulation Theory (of communication systems) could be effectively utilized in supervised learning using a single/multiple layer perceptron.

Some Important Generalizations:

* Unlike the perceptron model (inputs constitute a vector) discussed previously, it is possible to consider the case where the inputs constitute a three/multidimensional array (For instance in biological systems, the neurons are indexed by three dimension variables). Utilizing Tensor products, the outputs of continuous time neurons are obtained. Also, using the above model of neuron, multi-layer, multi-dimensional neural networks (such as Multi-dimensional Multi-Layer Perceptron) are designed and studied.

* Based on the above model of neuron, it is possible to consider complex valued neural networks in which the input functions, synaptic weight functions, thresholds are complex valued. It is possible to generalize the perceptron learning law, complex valued back propagation algorithm to such complex valued neural networks.

* It should be possible to design and study complex valued associative memories based on the above model of neuron.

Conclusions

In this paper, transforming real valued signals into complex domain (using DFT) and processing them using complex valued neural network is discussed. A novel model of neuron is proposed. Based on such a model real as well as complex valued neural networks are proposed. Some open research questions are provided.

References

[1] I. N. Aizenberg, N. N. Aizenberg and J. Vandewalle, "Multi-Valued and Universal Binary Neurons," Kluwer Academic Publishers, 2000.

[2] A.Hirose, "Complex Valued Neural Networks: Theories and Applications, "World Scientific Publishing Company, November 2003.

[3] H.Kusamichi, T.Isokawa, N.Matsui et.al, "A new scheme for colour night vision by quaternion neural Network, " 2nd International Conference on Autonomous robots & agents, Dec. 13-15, 2004, Palmerston North, New Zealand

[4] T. Nitta and T. Furuya: "A Complex Back-propagation learning", Transactions of Information Processing Society of Japan, Vol.32, No.10, pp.1319-1329 (1991) (in Japanese).

[5] T. Nitta: "An Extension of the Back-Propagation Algorithm to Complex Numbers", Neural Networks, Vol.10, No.8, pp.1391-1415 (1997).

[6] G. Rama Murthy, "Multi-Dimensional Neural Networks: Unified Theory of Control, Communication and Computation, " Research Monograph considered by Pearson Education Publishers, New York

[7] G. Rama Murthy, "Multi/Infinite Dimensional Neural Networks, Multi/Infinite Dimensional Logic Theory, "International Journal of Neural Systems, Vol.15, No. 3 (2005), 1-13, June 2005

[8] G. Rama Murthy and D. Praveen, "Complex-Valued Neural Associative Memory on the Complex Hypercube," Proceedings of 2004 IEEE International Conference on Cybernetics and Intelligent Systems (CIS-2004), Singapore..

[9] G. Rama Murthy, "Linear filter model of synapses: Associated novel real/complex valued neural networks,

A. Ganesh (1) and G. Balasubramanian (2)

(1) Asst.Professor, Department of Mathematics, The Oxford College of Engineering, Bommanahalli, Hosur Road, Bangalore-560 068, Karnataka, India (2) Associate Professor, Department of Mathematics, Govt. Arts College (Men), Krishnagiri, Tamil Nadu, India

E-mail: gane_speed@yahoo.co.in, gbs_geetha@yahoo.com

Printer friendly Cite/link Email Feedback | |

Author: | Ganesh, A.; Balasubramanian, G. |
---|---|

Publication: | International Journal of Computational and Applied Mathematics |

Date: | Aug 1, 2009 |

Words: | 2458 |

Previous Article: | Comparative study of rough sets on fuzzy approximation spaces and intuitionistic fuzzy approximation spaces. |

Next Article: | An estimate of the misclassification error with hinge and square loss. |