# An alternative approach to two phase encryption using back-propagation feed forward neural technique.

IntroductionWith the increase in the electronics eaves dropping and electronics frauds, the security measure should be efficient enough to deal with such large prevailing issues. They became of varied concerned when the data is exchanges via a heterogeneous network system.

Cryptography is probably the most important aspects of communication security and its' becoming increasing important as a basic building for a computer society. The idea is to study the protection techniques like DES algorithm, AES cipher(Symmetric Algorithm) and RSA algorithm (Asymmetric Algorithm) to protect the data. Now a days, the focus is on using the various Artificial Intelligence techniques like Artificial Neural Network(ANN) to explore there usage in cryptography.

The current research for new model of computing the ANN is motivated by searching to solve the task by exploring the developments in Computer Technology [10-12]. ANN is an attempt to mimic some or all the characteristics of Biological Neural Network. The Similarities in the Biological Neural Network and the Information Processing System can be seen as:

(1) The Keys applied to the Input data can be consider as the Hidden Neurons in the Neural Network.

(2) The Back propagation can be considered analogy to the Asymmetric Algorithms like RSA, Knapsack problem.

The main aim of the paper is to explore the suitability of the usability of a Back propagation neural network in cryptography to increase the security aspects. The constant weight being propagated back will not only save the time but also produce the almost uniform outputs for any type of the input signal received. The 2 section provide the details of the encryption Techniques. The 3 section provide the details of two-phase ANN encryption algorithm. The mathematical function is discussed in section 4 followed by the conclusion in section 5.

Encrytion Techniques

As Discussed above the encryption can be done symmetrically and asymmetrically, thus the ingredients for the encryption and decryption are more or less common. They can be define as:

1. Plain Text: The data or the information fed to the system.

2. Encryption algorithm: The algorithms contains the permutation and the substitution of data by the another variable

3. The Secret Keys: This also acts as an input to the algorithm. In case of Asymmetric cryptosystem, there will be two keys available. One kept by the sender to be named as the secret key and one published for the information to the entire user also known as the public key.

4. Cipher Text: This is the scrambled message produced as output. It depends on the plain text and the secret key.

5. Decryption algorithm: The inverse procedure of the encryption algorithm applied.

In case of the asymmetric architecture there will be two keys being used, one for the encryption named as public key and once for the decryption named as private or secret key. The model can be diagrammatically can be shown as

[FIGURE 1 OMITTED]

In this paper, we present a public key Cryptosystem, which is based on biological ideas including the network architecture, biological operations and the learning process. The complexity of the generation of the secure channel is linear with the Size of the network.

Two- Phase Ann Encryption Algorithm

The present architecture of the asymmetric encryption network when combined with ANN yields the following design as shone in figure [2].

[FIGURE 2 OMITTED]

Thus we have eight components which can be defined as:

(1) Plain Text: The readable Message or data which is given as input to algorithm

(2) Preprocessing Plaintext: This Algorithm performs transformation on plaintext so as to suitable to give as ANN input

(3) Encryption Network: Trains the ANN using the back propagation network adjusting the weights between input, hidden and output layers

(4) Public Key and private Key: as defined for the asymmetric these are used for encryption and decryption.

(5) Cipher Text: The Scrambled Message Produced as output of the encryption.

(6) Decryption Networks: The Algorithm that accepts the cipher text and the matching private key network and produces the outputs

(7) Conversion Algorithm: It takes the output of decryption ANN as input and applies transformation techniques to generate the original plain text.

The general architecture says that for every plain text X intended for destination are encrypted with two keys' namely a Public key--[KU.sub.c] & [KU.sub.N] and Private Key [KR.sub.c] & [KR.sub.N]. The private key are only known to the receiver, while the public keys' are known to the sender which can be used for sending the data. Firstly the data is converted in ANN acceptable input NX, where

NX = [T.sub.KUc] (X) (1)

This trained ANN of encrypted is further encrypted using the global public key to produce the final encrypted text 'Y' which can be specified as

Y = [E.sub.KUN] (NX). (2)

The intended receiver can decrypt the message using his/her private key as generally done in RSA algorithms.

The idea differs from the normal encryption as the degree of randomness is increased due to the hidden weights introduced but due to this the statistical nature of the plain text language is completely collapsed.

ANN Encryption key with Back Propagation Feed Forward Model

The feed forward back propagation model is very popular model in neural network, it does not have feed back connection, but the errors are back propagated during training. Back propagating learning consists of two passes through the different layers of the network, a forward pass and the backward pass. During the forward pass the synaptic weights of the network are all fixed.

For the backward pass, the synaptic weights are all adjusted in accordance with the error correction rule. The actual response of the network is subtracted from a desired (target) response to produce an error signal. We proposed that synaptic weights adjusted can be made constant so that error correction time can be reduced and also some constant output can be achieved, against the direction of synaptic connection [12].

The construction of the neural network involves:

* Determine the network properties (the network topology, the type of connections, the order of connection and the weight range).

* Determine the node properties (the activation range and the transfer function)

* Determine the system Dynamics (the weight initialization scheme, the activation calculating formula and learning rule).

The major modification is done in the gradient involves when some training data are very different from the majority of data. Convergence is some times faster if a momentum term is added to the weight updated formulas'

The major contributor to the ANN model is the activation function for a back propagation function. The function should be continuous, differentiable and monotonically non-decreasing function. The activation function used in this paper is Bipolar Sigmoid Function,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This function has similar properties with the sigmoid function. It works well for applications that yield output values in the range of [-1,1].

[FIGURE 3 OMITTED]

Activation functions for the hidden units are needed to introduce non-linearity into the networks. The reason is that a composition of linear functions is again a linear function. However, it is the non-linearity (i.e., the capability to represent nonlinear functions) that makes multi-layer networks so powerful. Almost any nonlinear function does the job, although for back-propagation learning it must be differentiable and it helps if the function is bounded The sigmoid functions are the most common choices [5].

For the output units, activation functions should be chosen to be suited to the distribution of the target values. We have already seen that for binary [0,1] outputs, the sigmoid function is an excellent choice. For continuous-valued targets with a bounded range, the sigmoid functions are again useful, provided that either the outputs or the targets to be scaled to the range of the output activation function. But if the target values have no known bounded range, it is better to use an unbounded activation function, most often the identity function (which amounts to no activation function). If the target values are positive but have no known upper bound, an exponential output activation function can be used [5].

Back-Propagation

Back-propagation is the most commonly used method for training multi-layer feed-forward networks. It can be applied to any feed-forward network with differentiable activation functions. This technique was popularized by Rumelhart, Hinton and Williams [3].

For most networks, the learning process is based on a suitable error function which is then minimized with respect to the weights and bias. If a network has differential activation functions, then the activations of the output units become differentiable functions of input variables, the weights and bias. If we also define a differentiable error function of the network outputs such as the sum-of-square error function, then the error function itself is a differentiable function of the weights. Therefore, we can evaluate the derivative of the error with respect to weights, and these derivatives can then be used to find the weights that minimize the error function, by either using the popular gradient descent or other optimization methods. The algorithm for evaluating the derivative of the error function is known as back-propagation, because it because it propagates the errors backward through the network.

Error Function Derivative Calculation

We considers a general feed-forward network with arbitrary differentiable non-linear activation functions and a differential error function.

We know that in ANN, each unit j is obtained by first forming a weighted sum of its inputs of the form,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.1)

where [z.sub.i] is the activation of an unit, or input. We then apply the activation function

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.2)

Note that one or more of the variables [z.sub.j] in (5.1) could be an input, in which case we will denote it by [x.sub.i]. Similarly, the unit j in (5.2) could be an output unit, which we will denote by [y.sub.k].

The error function will be written as a sum, over all patterns in the training set, of an error defined for each pattern separately,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.3)

where p indexes the patterns, Y is the vector of outputs, and W is the vector of all weights. [E.sub.p] can be expressed as a differentiable function of the output variable [y.sub.k].

The goal is to find a way to evaluate the derivatives of the error functions E with respect to the weights and bias. Using (5.3) we can express these derivatives as sums over the training set patterns of the derivatives for each pattern separately. Now we will consider one pattern at a time.

For each pattern, with all the inputs, we can get the activations of all hidden and output units in the network by successive application of (5.1) and (5.2). This process is called forward propagation or forward pass. Once we have the activations of all the outputs, together with the target values, we can get the full expression of the error function [E.sub.p].

Now consider the evaluation of the derivative of [E.sub.p] with respect to some weight [w.sub.ji]. Applying the chain rule can get the partial derivatives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.4)

where we define

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.5)

From equation (5.4) it is easy to see that the derivative can be obtained by multiplying the value of [partial derivative] for the unit at the output end of the weight by the value of z for the unit at the input end. Thus the task becomes to find the [[partial derivative].sub.j] for each hidden and output unit in the network.

For the output unit, [[partial derivative].sub.k] is very straightforward,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.6)

For a hidden unit, [[partial derivative].sub.k] is obtained indirectly. Hidden units can influence the error only through their effects on the unit k to which they send output connections so

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.7)

The first factor is just the [[partial derivative].sub.k] of unit k so

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.8)

For the second factor we know that if unit j connects directly to unit k then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], otherwise it is zero. So we can get the following back-propagation formula,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.9)

which means that the values of [delta] for a particular hidden unit can be obtained by propagating the [delta]'s backwards from units later in the network, as shown Figure 4. Recursively applying the equation gets the S's for all of the hidden units in a feed-forward network, no matter how many layers it has.

[FIGURE 4 OMITTED]

Weight Adjustment with the Gradient Descent Method

Once we get the derivatives of the error function with respect to weights, we can use them to update the weights so as to decrease the error. There are many varieties of gradient-based optimization algorithms based on these derivatives. One of the simplest of such algorithms is called gradient descent or steepest descent. With this algorithm, the weights are updated in the direction in which the error E decreases most rapidly, i.e., along negative gradient. The weight updating process begins with an initial guess for weights (which might be chosen randomly), and then generates a sequence of weights using the following formula,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.10)

where [eta] is a small positive number called the learning rate, which is the step size we need to take for the next step. Gradient descent only tells us the direction we will move to, but the step size or learning rate needs to be decided as well. Too low a learning rate makes the network learn very slowly, while too high a learning rate will lead to oscillation. One way to avoid oscillation for large [eta] is to make the weight change dependent on the past weight change by adding a momentum term,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5.11)

That is, the weight change is a combination of a step down the negative gradient, plus a fraction [alpha] of the previous weight change, where 0 [less than or equal to] [alpha] < 1 and typically 0 [less than or equal to] [alpha] < 0.9[6].

The role of the learning rate and the momentum term are shown in Figure [3]. When no momentum term is used, it typically takes a long time before the minimum is reached with a low learning rate (a), whereas for large learning rates the minimum may be never reached because of oscillation (b). When adding a momentum term, the minimum will be reached faster (c).

[FIGURE 5 OMITTED]

There are two basic weight-update variations: batch learning and incremental learning. With batch learning, the weights are updated over all the training data. It repeats the following loop: a) Process all the training data; b) Update the weights. Each such loop through the training set is called an epoch. While for incremental learning, the weights are updated for each sample separately. It repeats the following loop: a) Process one sample from the training data; b) Update the weights.

The performance of an algorithm is very sensitive to the proper settings of the learning rates. If the rate is too high, the behavior tends to changes a lot.

Conclusion

The major purpose of this paper is to explore ANN Enciphering techniques to improve the security of the existing cryptosystem. We proposed two phase models as an enhanced cipher. But the major problem that these techniques suffered from is

1) The convergence tends to be extremely slow

2) Convergence to the global minimum is not guaranteed

The solution lies as a future course of work which can be suggested over here

1. replace the gradient descent is conjugate gradient descent

2. stochastic optimization methods such as simulated annealing

3. stochastic optimization methods such as Genetic Algorithms

This solution can overcome the local minima; have also been used successfully in a number of problems. Thus use of the Artificial Neural Network in various network Topologies can laid a new foundation which can help in generating the new security measure in the nearby future.

References

[1] Kuang-Hi Chiu, "Two Phase Encryption and its Application," Journal of Discrete Mathematical Sciences & Cyrptography,5(3),(2002),303-328

[2] A.S. Tanenbaum, "Computer Networks", Prentice-Hall, New-York,(2001).

[3] William Satllings, "Cryptography and Network Security, Principle and Practice", Pearson Education, Third Edition,(2003).

[4] William Satllings, "Cryptography", Pearson Education,(2000).

[5] G. Brassard, "Relative Cryptography", IEEE Trans. on Inf. Theory, Vol. IT 29, No.6(1983), pp.877-894.

[6] R. Rivest, "the RC5 Encryption Algorithm", Proceedings of Second International Workshop on Fast Software Encryption, Dec., 1984

[7] A. Rubin, "An Experience Teaching a Graduate Course of Cryptography," Crptologia,(1997).

[8] Wolf Gang Kinzel and Ido Kanter, "Interacting Neural Network and Cryptography", Internet paper(2003).

[9] Michal Rosen-Zvi et al., "Cryptography based on Neural Network-Analytical Results", Journal of Physics A: Mathematical and General,35,(2002), L707-L713.

[10] K. Anil Jain et al., "Artificial Neural Network: A Tutotrial," IEEE Computer (1996), 31-44

[11] George Cybenko, "Neural Network in Computational Science and Engineering," IEEE Computational Science and Engineering, (1996),36-42.

[12] Laurence Fausett, "Fundamental of Neural Networks: Architecture, Algorithms and Applications, " Prentice Hall, NJ

[13] Simon Haykins, "Neural Networks-A Comprehensive Foundation," Pearson Education,(2001).

[14] J. Singaraju, K. S. Rani, K. Usha Rani, "Two Phase Encryption using Artificial Neural Network Techniques," International Journal of Computing and Applications, Vol 4. No. 1, June 2009, pp.47-55

(1) Sumit Srivastava and (2) Ashish Sharma

(1) Department of Information Technology, Poornima College of Engineering, ISI-6, RIICO Institutional Area, Siatpura, Jaipur-302022(Raj), India

(2) Department of Computer Engineering, Jagannath Institute of Engineering and Technology, ISI-6, RIICO Institutional Area, Siatpura, Jaipur-302022 (Raj), India