# Estimators in cryptographic models.

1. INTRODUCTION

The cryptographic measures are the Shannon entropy (for the key generator module) and Renyi entropy of order [alpha] for the key agreement protocol. It is known that Shannon entropy is a limit case ([alpha] [right arrow] 1) for the Renyi entropy. From this reason we focus on an estimating method of Renyi entropy. This method will be the Bayesian one (Preda, 1991; Guiasu, 1997) which is based on mixing priori information regarding [theta] [member of] [THETA] given by the distribution [pi]([theta]), with the selection information given by X after the experimentation process. The density of X is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)

reason we must estimate it using computational techniques such as Monte Carlo methods. This paper compute Bayesian estimators for the Renyi entropy of order [alpha] and we proof that the Bayesian estimator of Shannon entropy is obtained as an #

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)

limiting case [alpha] [right arrow] 1 and we use like distribution on the frequency space the multinomial distribution and like a priori the Dirichlet distribution because this Distribution is conjugate for the multinomial distribution family.

2. ESTIMATORS

The fundamental problem in cryptography is the generation of a shared secret key by two parties, A and B, not sharing a secret key initially over an insecure channel which is under the control of E. The general mathematical model proposed in Bennett (Bennet, 1992) is that in which A and B are connected only by a public channel and E can eavesdrop the communication.

2.1 Cryptographic Preliminaries

The general protocol take place in a scenario where A, B and E know the correlated random variables X, Y, Z, respectively, distributed according to some joint probability distribution that may be under partial control of E (like for the case of quantum cryptography). This kind of cryptographically protocols is very often used in each part of information transfer phase. We can see that the problem can be solved in the following phases: 1. A and B must detect any modification or insertion of messages. 2. A and B establish a secret communication key.

The first phase is called authentication step. This can be done with classical statistical tests (Blahut, 1987; Maurer, 1994).

2.2 Mathematical Overview

We assume that the reader is familiar with the notion of entropy and the basic concepts of information theory (Blahut, 1987). In privacy amplification, a different and a non-standard entropy measure, collision entropy, is of central importance. Collision entropy is also known as Renyi entropy of order 2 (Cachin, 1997).

Remark. We see that collision entropy is Renyi entropy of order [alpha] = 2 which is defined like [H.sub.[alpha]] (X) = 1/1 - [alpha] log [summation over (x[member of]X)] [P.sup.[alpha].sub.X](x). In the limit case a [alpha] [right arrow] 1 we get [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (Shannon entropy) and when [alpha] [right arrow] [infinity] we obtain the min-entropy [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We also have the following inequalities log[absolute value of X] [greater than or equal to] H(X) [greater than or equal to] [H.sub.2](X) [greater than or equal to] [H.sub.[infinity]](X) and 0 < [alpha] < [beta] [??] [H.sub.[infinity]] (X) [greater than or equal to] [H.sub.[beta]](X) with equality iff X is uniformly distributed over X or a subset of X.

In (Pardo 2003) Pardo generalized the concept of entropy and introduced statistical tests for (h, [phi])-entropies. In (Maurer, 1994) gives a universal statistical test of randomness and a estimation of the Shannon entropy of a binary source. This test measures also the effective size of the key for cipher system that uses the source like key generators. Morales (Morales, 1997) develop new statistical test for differential entropy. The aim of our paper is to give a Bayesian estimation of a general class of entropies which includes Shannon entropy and Renyi entropy of order 2 using priori truncated distribution. If we denote [[alpha].sub.i] = [alpha][[pi].sub.i], with [s.summation over (i=1)] [[pi].sub.i] = 1 by straightforward calculations we have for s [greater than or equal to] 2:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

Remarks.

a) The estimation of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] we obtain after L'Hospital rule the results obtained by Yuan (Yuan 1997).

b) For [gamma] = 2 we obtain the Bayesian estimation of collision entropy which is a measure of protocol security in key agreement protocols over an insecure channel.

The implementation of this mathematical approach has been started to become used in applications since the experimental quantum computing (Bennet, 1992), in Cryptanalysis (Simion & Constantinescu, 2000) and in Reliability Estimations (Shalaby, 1993).

3. DEVELOPMENTS

In general f(x) and [pi] ([phi] |x) are hard to compute. The distributions for which [pi] ([phi] |x) is easy to compute are called priori conjugate distributions.

The parameter alpha is a measure of our confidence about the guess, where the larger alphaimplies more concentration of the prior around ([[pi].sub.1], ..., [[pi].sub.s]). Another interpretation of alphais that, it determines the average discrepancy between ([p.sub.1], ..., [p.sub.s]) and ([[pi].sub.1], ..., [[pi].sub.s]). If we use the square discrepancy [s.summation over (i=1)] [([pi].sub.i] - [[pi].sub.i]).sup.2]/[[pi].sub.i], [[pi].sub.i] > 0, then, the mean discrepancy with respect the prior is [s - 1/[alpha] + 1, a decreasing function of [alpha]. When we do not have any prior knowledge, the uniform prior is the best choice which is a special case of Dirichlet distribution namely, D(1, ..., 1). The Dirichlet distribution is conjugate for the likelihood given above that is, if the prior is Dirichlet, the posterior is also Dirichlet. More specifically, if the prior is D([[alpha].sub.1], ..., [[alpha].sub.s]), then, after the frequency data ([n.sub.1], ..., [n.sub.s]) is incorporated, the posterior is D([[alpha].sub.1] + [n.sub.1], ..., [[alpha].sub.s] + [n.sub.s]).

Let us denote [P.sub.[gamma]](X) = [summation over (x[member of]X] [P.sup.[gamma].sub.X](x) [gamma] - probability. If we denote [[alpha].sub.i] = [alpha].sub.i][[pi].sub.i], with [s.summation over (i=1)] [[pi].sub.i] = 1 by straightforward calculations we have for s [greater than or equal to] 2 :

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

Remarks. a) The estimation of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] we obtain after L'Hospital rule the results obtained by Yuan in (Yuan 1997).

b) For [gamma] = 2 we obtain the Bayesian estimation of collision entropy which is a measure of protocol security in key agreement protocols over an insecure channel.

4. NUMERICAL SIMULATION AND CONCLUSIONS

We had generated 5000 random bits for which we recorded [n.sub.1]=510 and [n.sub.2]=490. The Shannon entropy, which is the limit case [gamma] [right arrow] 1 of Renyi entropy, will be: H = ln 2 [approximately equal to] 0.69314718. We get empirical estimator [H.sub.n]=0.69294716 and the Bayes estimator [H.sup.B.sub.1]=0.69229474. The latest will be computed by the formula

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)

The value of H will be computed by making the partial sum of all intermediate values from the tests. In this way, the finall value is more exactly.

[FIGURE 1 OMITTED]

In figure 1 we present the results obtained by changing the codification of a binary symmetric random source( a source which emits 0 and 1 with the same probability 0.5). We compute Bayes estimator in case of binomial truncated [H.sup.B,(0).sub.n,1] = 0,69244564 and [H.sup.B,0.9.sub.n,1] =0.678229474 for priori beta truncated with t=0.9.

The latest will be computed by the formula

[H.sup.B,t.sub.n,1] = 1/B([alpha] + [n.sub.1], [beta] + [n.sub.2], t) (6)

Obviously, these estimators are consistent. But in case of a small or moderate sample size, the Bayesian estimate does increase the accuracy provided that a plausible prior can be obtained. On the average both the Bayesian and the empirical methods tend to underestimate the true value, but the Bayesian estimate is more stable. We can see that the Bayesian estimation of the Shannon entropy is done with the error of [5.sup.*][10.sup.-3] if we use binary codification (s=2) in the computation, [2.4.sup.*][10.sup.-3] if we use hex codification (s=4), [13.sup.*][10.sup.-4] if we use byte codification (s=8), [8.sup.*][10.sup.-4] if we use 15-bit codification (s=15) and so on.

Thus the Bayesian estimation can be used to derive the effective size a cipher system which has like key generator the investigated random source. The future research will base on the present results and will develop an improved way to test the cryptographically straight for a certain class of algorithms.

5. REFERENCES

Bennett, C.H & F. Bessete. (1992). Experimental quantum cryptology. J. Cryptology, (5) 1, (1992), 3-28

Blahut, R.E. Principles and Practice of Information Theory. Reading, MA: Addison-Wesley, (1987)

Cachin, C. Smooth Entropy and Renyi Entropy. Proceedings of EuroCrypt'97, Lectures Notes in Computer Science, Springer, (1997)

Guiasu. (1997). S. Information Theory with Applications. Springer-Verlag

Morales, D.; Pardo L. & Vajda, I. Some New Statistics for testing Hypothesis in Parametric Models. J. Multivariate Anal., (1997), pp. 137-168

Maurer, U. & Cachin, C. Linking Information Reconciliation and Privacy Amplification. Proceedings of EuroCrypt 94, Perugia, Italy, (1994)

Pardo, L. & Menendez, M.L. Asymptotic behavior of (h,phi)-entropies. Commun. Statist.-Theory Meth. (22)7, (2003), pp. 2015-2031

Preda, V. (1991). Theory of Statistics Decisions. Romanian Academy Press,(1991)