Printer Friendly
The Free Library
19,588,385 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Analysis of an algorithm catching elephants on the Internet.


The paper deals with the problem of catching the elephants in the Internet traffic Internet traffic is the flow of data around the Internet. It includes web traffic, which is the amount of that data that is related to the World Wide Web, along with the traffic from other major uses of the Internet, such as electronic mail and peer-to-peer networks. . The aim is to investigate an algorithm proposed by Azzana based on a multistage Bloom filter The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. , with a refreshment mechanism (called shy in the present paper), able to treat on-line a huge amount of flows with high traffic variations. An analysis of a simplified model estimates the number of false positives. Limit theorems This is a list of theorems, by Wikipedia page. See also
  • list of fundamental theorems
  • list of lemmas
  • list of conjectures
  • list of inequalities
  • list of mathematical proofs
  • list of misnamed theorems
  • Existence theorem
 for the Markov chain (probability) Markov chain - (Named after Andrei Markov) A model of sequences of events where the probability of an event occurring depends upon the fact that a preceding event occurred.

A Markov process is governed by a Markov chain.
 that describes the algorithm for large filters are rigorously obtained. The asymptotic behavior of the stochastic By guesswork; by chance; using or containing random values.

stochastic - probabilistic
 model is here deterministic 1. (probability) deterministic - Describes a system whose time evolution can be predicted exactly.

Contrast probabilistic.
2. (algorithm) deterministic - Describes an algorithm in which the correct next step depends only on the current state.
. The limit has a nice formulation in terms of a M/G/1/C queue, which is analytically tractable tractable

easy to manage; tolerable.
 and which allows to tune the algorithm optimally.

Keywords: attack detection; Bloom filter; coupon collector; elephants and mice; network mining

Introduction

One traditionally distinguishes two kinds of flows in the Internet traffic: long flows, called elephants, which are the less numerous (typically 5-10%), and short flows, called mice, which are the most numerous. The convention is to fix a threshold C and to call elephant any flow having more than C packets, and mouse any flow having strictly less than C packets. For various reasons, detections of attacks, pricing, statistics, it is an important task to be able to "catch" the elephants, that means to be able to get the list of all elephants, with their IP addresses, flowing through a given router. We emphasize the fact that this problem is distinct from the one consisting only in providing statistical estimates for the traffic. Even a simple on-line counting of the number of distinct flows reveals to be difficult due to the high throughput of the traffic. There is a wide literature on algorithms for fast estimations of cardinality A quantity relationship between elements. For example, one-to-one, one-to-many and many-to-one express cardinality. See cardinal number.

(mathematics) cardinality - The number of elements in a set. If two sets have the same number of elements (i.e.
 (i.e. of the number of distinct elements in a set with repeated elements) of huge data sets (see [6], [9] and [11]). A similar question consists in finding the k most frequent flows--the so-called "icebergs" (see [4] and [12]). If one asks for the proportion of elephants or the size distribution of elephants, it is possible to use the Adaptive Sampling algorithm proposed by Wegman and analyzed by Flajolet [8], which provides a sample of the flows independently from their size. This sample can then be used to compute statistics for the elephants (and actually for all the flows). For counting the elephants, Gandouet and Jean-Marie have proposed in [10] an algorithm based on sampling, thus requiring a knowledge on the flow size distribution, which reduces its application. For the target applications, a unique elephant hidden in a huge traffic of mice--which does not exist from a statistical point of view--has to be detected.

For this purpose, an algorithm based on Bloom filters has been already presented by Estan and Varghese [7] in 2002. As explained here, a Bloom filter allows elephants to accumulate but due to huge traffic, collisions occur and mice can be detected as elephants. Collisions will be controlled by taking several stages and cleverly flushing the filter. More precisely, the principle of this latter algorithm is the following. The IP header of each packet is hashed with k independent hashing Creating hash totals or hash tables. See hash total and hash table.

hashing - hash coding
 functions in a k-multi-stage filter. Counters are incremented and when a counter reaches C, the corresponding flow is declared as an elephant and its packets are counted to give eventually its size. The problem is that in fact, under a heavy Internet traffic, the multistage filter quickly gets totally filled. To avoid this problem, Estan and Varghese propose to periodically (every 5 seconds) reinitialize the filter to zero. But, without any a priori a priori

In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience.
 knowledge about the traffic (intensity, flows arrival rate ...), 5 seconds can either be too long (in which case the filter can be saturated) or too short (a lot of long flows can be missed). Therefore the accuracy of the algorithm closely depends on the characteristics of the traffic trace.

In order to settle this problem, Azzana [2] introduces an adaptative refreshment mechanism, that we will call shift, in the multi-stage filter algorithm. It is an efficient method to adapt the algorithm to traffic variations: The filter is refreshed re·fresh  
v. re·freshed, re·fresh·ing, re·fresh·es

v.tr.
1. To revive with or as if with rest, food, or drink; give new vigor or spirit to.

2.
 with a frequency depending on the current traffic intensity. Moreover the filter is not reinitialized to zero, but a softer technique is used to avoid missing some elephants. The main difference with the Estan and Varghese algorithm is its ability to deal with traffic variations. Azzana shows in [2] that the refreshment mechanism improves notably the efficiency and accuracy of the algorithm (see Section 3 for practical results). Parameters, as the filter size or related to the refreshment mechanism, are experimentally optimized. Azzana proposes some elements of analysis for this algorithm. Our purpose is to go further and to provide analytical results when the filter is large.

[FIGURE 1 OMITTED]

Description of the algorithm

The algorithm designed by Azzana uses a Bloom filter with counters (defined below) and involves four parameters in input: the number k of stages in the Bloom filter, the number m of counters in each stage, the maximum value C of each counter, i.e. the size threshold C to be declared as an elephant and the filling rate r.

A Bloom filter is a set of k stages, each of these stages being a set of m counters, initially at 0 and taking values in {0, ..., C}. Together with the k stages [F.sub.1], ..., [F.sub.k], one supposes that k hashing functions [h.sub.1], ..., [h.sub.k] are given, one for each stage. We make the (strong) assumption that these hashing functions are independent, which implies that k is small (k = 10 is probably the upper limit). Each hashing function hi maps the part of the IP header of a packet indicating the flow to which it belongs, to one of the counters of stage [F.sub.i].

The algorithm works on-line on the stream, processing the packets one after the other. Flows identified as elephants are stored in a list E. When a packet is processed, it is first checked if it belongs to a flow already identified as an elephant (that is a flow already in E). Indeed, in this case, there is no interest in mapping it to the k counters, and the algorithm simply forgets this packet. If not, it is mapped by the hashing functions on one counter per stage and it increases these counters by one, except for those that have already reached C, in which case they remain at C. When, after processing some packet, all the k counters are at C, the flow is declared to be an elephant and stored in the dedicated memory E. When the proportion of nonzero non·ze·ro  
adj.
Not equal to zero.



nonzero  

Not equal to zero.
 counters reaches r in the whole set of km counters, one decreases all nonzero counters by one. This last operation is called the shift. See Figure 1 for an illustration.

Motivation of the algorithm

Packets of a same flow hit the same k counters, but two distinct flows may also increase the same counter in one or several stages. The idea of using several stages where flows are mapped independently and uniformly, intends to reduce the probability of collisions between flows. The shift is crucial in the sense that it prevents the filters to be completely saturated, that is, to have many counters with high values. Without the shift operation, mice would be very quickly mapped to counters equal to C and declared as elephants. The algorithm would have a finite lifetime because when the filter is saturated, nothing can be detected.

False positive and false negative

A false positive is a mouse detected as an elephant by the algorithm. A false negative is an elephant not declared as such (hence considered as a mouse) by the algorithm. Generally, a false negative is worse than a false positive. Think of an attack: One does not want to miss it, and a false alarm has less serious consequences than a successful attack.

In our context, a false positive is a mouse one packet of which is mapped onto counters all [greater than or equal to] C - 1. A false negative is due to the shift, and if it happens, it means that there were at least f - C + 1 shifts during the transmission time of some elephant of size f. If shifts do not occur too often, a false negative is then an elephant whose packets are broadcast at a slow rate.

Intuitively, and it will be confirmed by the forthcoming analysis, if the parameters (actually r) are chosen so as to maintain counters at low values, then shifts occur often, and if one tries to decrease the shift frequency, then the counters tend to have high values. Therefore, a compromise has to be found between these two properties (frequency of the shifts, height of the counters), which translates into a compromise between false positives and false negatives. This last compromise depends on the applications.

A Markovian representation

In this paper, we will mainly focus our analysis on the one-stage filter case when the traffic is made up only of mice of size 1. The aim is to estimate the proportion of false positives. From this analysis, we will then derive results for the general case. Let us now introduce our main notations.

We thus assume that k = 1 and that all flows have size 1. Throughout the paper, [W.sup.m.sub.n] (i) denotes the proportion of counters having value i just before the nth shift in a filter with m counters. According to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 this notation, [W.sup.m.sub.n] (0) is close to 1 - r and [[summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) ].sup.C.sub.i=0] [W.sup.m.sub.n](i) = 1. Notice that the nth shift exactly decreases the number of nonzero counters by m[W.sup.m.sub.n](1). An important part of our analysis will consist in estimating [W.sup.m.sub.n](C - 1) + [W.sup.m.sub.n](C). Indeed, it gives an upper bound on the probability that a flow is declared as an elephant (that is a false positive) between the (n - 1)th and the nth shifts, since, according to our assumptions, there is no elephant at all.

In this framework (k = 1 and flows of size one), the algorithm has a simple description in terms of urns and balls. Each flow is a ball thrown at random into one of m urns (each urn being one of the m counters). When a ball falls into an urn with C balls, it is immediately removed, in order to have at most C balls in each urn. When the proportion of non empty urns reaches r, one ball is removed in every non empty urn.

For m fixed, [([W.sup.m.sub.n]).sub.n[member of]N] = [([([W.sup.m.sub.n](i)).sub.i=0], ..., c).sub.n[member of]N] is an ergodic Adj. 1. ergodic - positive recurrent aperiodic state of stochastic systems; tending in probability to a limiting form that is independent of the initial conditions  Markov chain on some finite state space. Its invariant (programming) invariant - A rule, such as the ordering of an ordered list or heap, that applies throughout the life of a data structure or procedure. Each change to the data structure must maintain the correctness of the invariant.  probability measure [[pi].sub.m] is the distribution of some variable [W.sup.m.sub.[infinity]]. For C = 2, the first non-trivial case, even the expression of the transition matrix [P.sub.m] of the Markov chain is combinatorially quite complicated and an expression for [[pi].sub.m], seems out of reach. In practice, the number m of counters per stage is large. This suggests to look at the limiting behavior of the algorithm when m tends to [infinity]. We use as far as possible the Markovian structure of the algorithm in order to derive rigorous limit theorems and analytical expressions for the limiting regime. This is the longest and most technical part of the paper, which also contains the main result, from a mathematical point of view.

Main results

The model considered in the paper describes the collisions between mice in order to evaluate the number of false positives due to these collisions. In a one-stage filter where all flows are mice of size 1, the Markov chain [([W.sup.m.sub.n]).sub.n[member of]N] describes the evolution of the counters observed just before shift times. The main result is that, when m is large, the random vector [W.sup.m.sub.[infinity]] converges in distribution to some deterministic value [bar.w].

This result is not quite completely proved. The way to proceed is classical for large Markovian models (see for example [5] and [1]). The idea is to study the convergence of the process over finite times. It is shown that the Markov chain given by the empirical distributions [([W.sup.m.sub.n]).sub.n[member of]N] converges to a deterministic dynamical system [w.sub.n] + l = F([w.sub.n]), which has a unique fixed point [bar.w]. The situation is analogous in discrete time Discrete time is non-continuous time. Sampling at non-continuous times results in discrete-time samples. For example, a newspaper may report the price of crude oil once every 24 hours.  to the study by Antunes and al. [1]. A Lyapunov function Lyapunov functions are of interest in mathematics, especially in stability theory. Lyapunov functions, named after Aleksandr Mikhailovich Lyapunov, are by definition functions which prove the stability of a certain fixed point in a dynamical system or autonomous differential  for F would allow to prove the convergence in distribution of [W.sup.m.sub.[infinity]]. Such a Lyapunov function is exhibited in the particular case C = 2. The dynamical system provides a limiting description of the original chain which stationary behavior is then described by [bar.w]. The fixed point [bar.w] has the following interpretation.

The fixed point [bar.w] is identified as the invariant probability measure [[mu].sub.[bar.[lambda]]] of the number of customers in an M/G/1/C queue where service times are 1 and arrival rate is some [bar.[lambda]] satisfying the fixed point equation

[[mu].sub.[bar.[lambda]]](0) = 1 - r

or equivalently

[bar.[lambda]] = log (1 + [[mu].sub.[bar.[lambda]]] (1)/1 - r).

As a byproduct by·prod·uct or by-prod·uct  
n.
1. Something produced in the making of something else.

2. A secondary result; a side effect.

Noun 1.
, the stationary time between two shifts divided by m converges in distribution to the constant [bar.[lambda]]. Thus the inter-shift time (closely related to the number of false negatives) and the probability of false positives are respectively approximated by [bar.[lambda]]]m and bounded by [[mu].sub.[bar.[lambda]]] (C - 1) + [[mu].sub.[bar.[lambda]]](C) when m is large.

When mice have general size distribution, the previous model is extended to an approximated model where packets of a given mouse arrive simultaneously. The involved quantity is the invariant measure In mathematics, an invariant measure is a measure that is preserved by some function. Invariant measures are of great interest in the study of dynamical systems. The Krylov-Bogolyubov theorem proves the existence of invariant measures under certain conditions on the function and  of an M/G/1/C queue with arrivals by batches with distribution the mouse size distribution. In the case of size 1 mice, the multi-stage filter case is investigated.

Even if [[mu].sub.[bar.[lambda]]] is not explicit, which complicates the exhibition of a Lyapunov function, the quantities [bar.[lambda]] and [[mu].sub.[bar.[lambda]]](C - 1) + [[mu].sub.[bar.[lambda]]](C) can be numerically computed. It appears that the latter quantity is an increasing function (Math.) a function whose value increases when that of the variable increases, and decreases when the latter is diminished; also called a monotonically increasing function ltname>.

See also: Increase
 of r (as r varies from 0 to 1). Hence, given the mouse size distribution, one can numerically determine the values of r for which the algorithm performs well.

Section 1 is the most technical part of the paper. It investigates the one-stage filter in case of size 1 flows. In Section 2, this analysis is generalized to a general mouse size distribution in a simplified model and to a multi-stage filter. Then Section 3 is devoted to discussing the performance of the algorithm, to experimental results and improvements (validated through an implementation).

1 The Markovian urn and ball model

In this section, C is fixed and we consider the sequence [([W.sup.m.sub.n]).sub.n[member of]N], where [W.sup.m.sub.n] n denotes the vector of the proportions of urns with 0, ..., C balls just before the nth shift time. For m [greater than or equal to] 1, [([W.sup.m.sub.n]).sub.n[member of]N] is an ergodic Markov chain on the finite state space

[p.sup.(r).sub.m] = {w = (w(0), ..., w(C)) [member of] [(N/m).sup.C+1], [C.summation over (i=0)] w(i) = 1 and [C.summation over (i=1)] w(i) = [rm]/m},

(where [rm] denotes the smallest integer integer: see number; number theory  larger or equal to rm) with transition matrix [P.sub.m] defined as follows: If [W.sup.m.sub.n] = w [member of] [P.sup.(r).sub.m], then [W.sup.m.sub.n+1], distributed according to [P.sub.m](w, .), is the empirical distribution of m urns when, starting with distribution w, one ball is removed from every non empty urn and then balls are thrown at random until [rm] urns are non empty again, balls overflowing the capacity C being rejected. The required number of thrown balls is

[[tau].sup.m.sub.n] = [[rm]-1.summation over (l=[rm]-[W.sup.m.sub.n](1)m] [Y.sub.l], (1)

where [Y.sub.l], l [member of] N are independent random variables with geometrical distributions on [N.sup.*] with respective parameters l/m, i.e. P([Y.sub.l] = k) = [(l/m).sup.k-1](1 - l/m), k [greater than or equal to] 1.

Let F be defined on P = {w [member of] [R.sup.C+1.sub.+], [[summation].sup.C.sub.i=0] w(i) = 1} by

F(w) = [T.sub.C] (s(w) * [P.sub.[lambda](w)]) (2)

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ]

and [P.sub.[lambda]] is the Poisson distribution A statistical method developed by the 18th century French mathematician S. D. Poisson, which is used for predicting the probable distribution of a series of events. For example, when the average transaction volume in a communications system can be estimated, Poisson distribution is used  with parameter [lambda]. Notice that F maps P to itself and, also by definition of [lambda], [P.sup.(r)] [??] {w [member of] [R.sup.C+1.sub.+], [[summation].sup.C.sub.i=0] w(i) = 1 and [[summation].sup.C.sub.i=1] w(i) = r} to itself.

1.1 Convergence to a dynamical system

We prove the convergence of [([W.sup.m.sub.n]).sub.n[member of]N] to the dynamical system given by F as m tends to +[infinity]. The following lemma lemma (lĕm`ə): see theorem.

(logic) lemma - A result already proved, which is needed in the proof of some further result.
 is the key argument. The uniform convergence stated below appears as the convenient way to express the convergence of [P.sub.m](w, .) to [[delta].sub.F(w)] in order to prove both the convergence of [([W.sup.m.sub.n]).sub.n[member of]N], and, later on, the convergence of the stationary distributions.

Define [parallel]x[paralle] = [sup.sup.C.sub.i=0] [absolute value of [x.sub.i]] for x [member of] [R.sup.C+1].

Lemma 1 For [epsilon] > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof: The first step is to prove that, for [epsilon] > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

where [lambda](w) = log(1 + w(1)/1 - r) and [P.sub.w](x) denotes P(x|[W.sup.m.sub.0] = w). By Bienayme-Chebyshev's inequality, it is enough to prove that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)

By equation (1), as E([Y.sub.l]) = 1/(1 - l/m), using a change of index,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)

A comparison with integrals leads to the following inequalities:

log 1 - [rm]/m + w(1) + 1/m / 1 - [rm]/m + 1/m [less than or equal to] [E.sub.w] ([[tau].sup.m.sub.1]/m) [less than or equal to] log 1 - [rm]/ m + w(1)/1 - [rm]/m.

It is then easy to show that the two extreme terms tend to [lambda](w) = log(1 + w(1)=(1 - r)), uniformly in w(1) [member of] [0, 1]. This gives (4). For (5), as Var ([Y.sub.l]) = [(l/m)=(1 - l/m).sup.2], by the same change of index,

[Var.sub.w] ([[tau].sup.m.sub.1]/m]) = 1/m [m-[rm]+w(1)m.summation over (j=m-[rm]+1)] m - j/[j.sup.2] = [m-[rm]+w(1)m.summation over (j=m-[rm]+1)] 1/[j.sup.2] - 1/m [E.sub.w]([[tau].sup.m.sub.1]/m). (7)

The first term of the right-hand side is bounded independently of w by [[summation].sup.+[infinity].sub.j=m-[rm]+1]1/[j.sup.2], which tends to 0 as m tends to +[infinity]. The second term tends to 0 uniformly in w using (4) together with the uniform bound [lambda](w) [less than or equal to] log(1 + 1/(1 - r)).

To obtain the lemma, it is then sufficient to prove that, for each [epsilon] > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (8)

Since [W.sup.m.sub.1] and F(w) are probability measures on {0, ..., C}, to get (8), it is sufficient to prove that for j [member of] {0, ..., C - 1},

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

Let w [member of] [P.sup.(r).sub.m]. Define the following random variables: For 1 [less than or equal to] i [less than or equal to] m, [N.sup.m.sub.i] (respectively [[??].sup.m.sub.i] (w)) is the number of additional balls in urn i when [[tau].sup.m.sub.1] (respectively m[lambda](w)) new balls are thrown in the m urns. One can construct these variables from the same sequence of balls (i.e. of i.i.d. uniform on {1, ..., m} random variables), meaning that balls are thrown in the same locations for both operations until stopping. This provides a natural coupling for the [N.sub.i]'s and [[??].sub.i]'s. Let j [member of] {0, ..., C - 1} be fixed. Given [W.sup.m.sub.0] = w, as j [less than or equal to] C - 1, the capacity constraint does not interfere and [W.sup.m.sub.1] (j) can be represented as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

where [I.sup.m.sub.w,k] is the set of urns with k balls in some configuration of m urns with distribution s(w), so that card [I.sup.m.sub.w,k] = ms(w)(k). The sum over i is exactly the number of urns that contains k balls after the removing of one ball per urn, and having j balls after new balls have been thrown. By coupling, on the event {[W.sup.m.sub.0] = w, [absolute value of [[tau].sup.m.sub.1]/m - [lambda](w)] [less than or equal to] [epsilon]/2}, the following is true:

card {i, [N.sup.m.sub.i] [not equal to] [[??].sup.m.sub.i] (w)} [less than or equal to] [epsilon]/2 m (11)

thus, denoting [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], on the same event,

[absolute value of [W.sup.m.sub.1](j)] - [[??].sup.m.sub.1] (j)] [less than or equal to] [epsilon]/2.

To prove equation (9), it is then sufficient to show that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This will result from

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (12)

which is quite standard to prove. The key argument, with classical proof, is the following: If [L.sup.m.sub.i] is the number of balls in urn i when throwing m[lambda] balls at random in m urns, if 0 < a < b, then, for all ([i.sub.1], [i.sub.2]) [member of] [N.sup.2],

(i) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(ii) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It is applied since [lambda](w) [member of] [0, log(1 + 1/(1 - r)]. It ends the proof.

Proposition 1 If [W.sup.m.sub.0] converges in distribution to [w.sub.0] [member of] [P.sup.(r).sub.m] then [([W.sup.m.sub.n]).sub.n[member of]N] converges in distribution to the dynamical system [(w.sub.n]).sub.n[member of]N] given by the recursion In programming, the ability of a subroutine or program module to call itself. It is helpful for writing routines that solve problems by repeatedly processing the output of the same process. See recurse subdirectories.  [w.sub.n+1] = F([w.sub.n]), n [member of] N.

Proof: Assume that [W.sup.m.sub.0] converges in distribution to [w.sub.0] [member of] [P.sup.(r).sub.m]. Convergence of ([W.sup.m.sub.0], ..., [W.sup.m.sub.n]) can be proved by induction on n [member of] N. By assumption it is true for n = 0. Let us just prove it for n = 1, the same arguments holding for general n, from the assumed property for n - 1. Let g be continuous on the (compact) set [P.sup.(r)2.sub.m]. Since the distribution [[mu].sub.m] of [W.sup.m.sub.0] has support in [P.sup.(r).sub.m],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since g(.; F(x)) is continuous on [P.sup.(r).sub.m] (F being continuous as can be easily checked), the last integral converges to g([w.sub.0], [w.sub.1]) by assumption (or case n = 0). The first term is bounded in modulus See modulo. , for each [eta] > 0, by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [epsilon] is associated to [eta] by the uniform continuity of g on [P.sup.(r)2.sub.m]. By Lemma 1, this is less than 2[eta] for m sufficiently large In mathematics, the phrase sufficiently large is used in contexts such as:
is true for sufficiently large
. Thus, as m tends to +[infinity],

E(g([W.sup.m.sub.0], [W.sup.m.sub.1])) [right arrow] g([w.sub.0], [w.sub.1]).

1.2 Convergence of invariant measures

Let, for m [member of] N, [[pi].sub.m] be the stationary distribution of [([W.sup.m.sub.n]).sub.n[member of]N]. Define P as the transition on [P.sup.(r)] given by P(w, x) = [[delta].sub.F(w)].

Proposition 2 Any limiting point [pi] of [([[pi].sub.m]).sub.m[member of] N] is a probability measure on [P.sup.(r)] which is invariant for P i.e. that satisfies F([pi]) = [pi].

Proof: A classical result states that, if P and [P.sub.m], m [member of] N, are transition kernels on some metric space metric space

In mathematics, a set of objects equipped with a concept of distance. The objects can be thought of as points in space, with the distance between points given by a distance formula, such that: (1) the distance from point A to point B is zero if and only if A and
 E such that, for any bounded continuous f on E, Pf is continuous and [P.sub.m]f converges to Pf uniformly on E then, for any sequence ([[tau].sub.m]) of probability measures such that [[pi].sub.m] is invariant under [P.sub.m], any limiting point of [[pi].sub.m] is invariant under P. Indeed, for any m and any bounded continuous f, [[pi] .sub.m] [P.sub.m]f = [[pi].sub.m]f. If a subsequence sub·se·quence  
n.
1. Something that is subsequent; a sequel.

2. The fact or quality of being subsequent.

3. Mathematics A sequence that is contained in another sequence.

Noun 1.
 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] converges weakly to [pi], then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] converges to [pi]f. Writing [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], since Pf continuous (and bounded since f is), the first term [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] converges to [pi]Pf and the second term tends to 0 by uniform convergence of [P.sub.m]f to Pf. Equation [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] thus gives, in the limit, [pi]Pf = [pi]f for any bounded continuous f.

Here the difficulty is that the [P.sub.m]'s and P are transitions on [P.sup.(r).sub.m] and [P.sup.(r)], which are in general disjoint dis·joint
v.
To put out of joint; dislocate.
. To solve this difficulty, extend artificially [P.sub.m] and P to P by setting:

[P.sub.m](w, x) = _[[delta].sub.F(w)] for w [member of] P [P.sup.(r).sub.m]

P(w, x) = [[delta].sub.F(w)] for w [member of] P \ [P.sup.(r)].

The proposition is then deduced from the classical result if we prove that, for each f continuous on P (notice that then Pf = f o F is continuous),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which is straightforward from Lemma 1. The fact that the support of [pi] is in [P.sup.(r)] is deduced from the portmanteau See portmanteau word.  theorem theorem, in mathematics and logic, statement in words or symbols that can be established by means of deductive logic; it differs from an axiom in that a proof is required for its acceptance.  (see Billingsley [3] p.16) using the sequence of closed sets

[P.sup.(r),n] = {w [member of] P, r [less than or equal to] [C.summation over (i=1)] w(i) [less than or equal to] r + 1/n}.

The fixed points of the dynamical system are the probability measures w on [P.sup.(r)] such that

w = F(w) = [T.sub.C](s(w) * [P.sub.[lambda]](w))

where [lambda](w) = log(1 + w(1)=(1 - r)). This is exactly the invariant measure equation for the number of customers just after completion times in an M/G/1/C queue with arrival rate [lambda](w) and service times 1, so that it is equivalent to

w = [[mu].sub.[lambda]](w) (13)

where [[mu].sub.[lambda]] (respectively [v.sub.[lambda]]) is the limiting distribution of the process of the number of customers in an M/G/1/C (respectively M/G/1/[infinity]) queue with arrival rate [lambda] and service times 1.

Indeed, it is well-known that this queue has a limiting distribution for [lambda] [member of] [R.sup.+] (respectively 0 [less than or equal to] [lambda] < 1) which is the invariant probability measure of the embedded Inserted into. See embedded system.  Markov chain of the number of customers just after completion times. The balance equations here reduce to a recursion system, so that, even when [lambda] [greater than or equal to] 1, [v.sub.[lambda]] is well defined up to a multiplicative constant (which can not be normalized into a probability measure in this case). Moreover, [v.sub.[lambda]] is given by the Pollaczek-Khintchine formula for its generating function:

[summation over (n [member of] N] [v.sub.[lambda]] (n)[u.sup.n] = [v.sub.[lambda]](0) [g.sub.[lambda]](u)(u - 1) /u - [g.sub.[lambda]](u), for [absolute value of u] < 1 (14)

where [g.sub.[lambda]](u) = [e.sup.-[lambda](1-u)] and for [lambda] < 1, [v.sub.[lambda]](0) = 1 - [lambda] (see for example Robert [13] p176-177). Notice that [v.sub.[lambda]] (n) (n [member of] N) has no closed form. For example, the expressions of the first terms are

[v.sub.[lambda]] (1) = [v.sub.[lambda]] (0)([e.sup.[lambda]] - 1),

[v.sub.[lambda]] (2) = [v.sub.[lambda]] (0) [e.sup.[lambda]] ([e.sup.[lambda]] - 1 - [lambda]),

[v.sub.[lambda]] (3) = [v.sub.[lambda]] (0) [e.sup.[lambda]] ([lambda]([lambda] + 2)/2 - (1 + 2[lambda])[e.sup.[lambda]] + [e.sup.2[lambda]] (15)

where [v.sub.[lambda]] (0) = 1 - [lambda] if [lambda] < 1. For the M/G/1/C queue,

[[mu].sub.[lambda]] (i) = [v.sub.[lambda]] (i)/[[summation].sup.C.sub.l=0] [v.sub.[lambda]] (l), i [member of] {0, ..., C} (16)

The following proposition characterizes the fixed points of F.

Proposition 3 F defined by (2) has one unique fixed point, denoted by [bar.w], in [P.sup.(r)] given by the limiting distribution [[mu].sub.[bar.[lambda]]] of the number of customers in an M/G/1/C queue with arrival rate [bar.[lambda]] and service times 1, where [bar.[lambda]] is determined by the implicit equation [[mu].sub.[bar.[lambda]]] (0) = 1 - r which is equivalent to

[bar.[lambda]] = log (1 + [[mu].sub.[bar.[lambda]]](1)/1 - r), (17)

where [[mu].sub.[lambda]] is given by (16) and [v.sub.[lambda]] by the Pollaczek-Kintchine formula (14).

Notice that, moreover,

r [less than or equal to] [bar.[lambda]] [less than or equal to] -log(1 - r).

The upper bound on [bar.[lambda]], obtained from equation (17) using [[mu].sub.[bar.[lambda]]] (1) [less than or equal to] r just says that the stationary mean number of balls between two shifts is less than the mean number of balls thrown until the first shift (starting with empty urns). Moreover [bar.[lambda]] [greater than or equal to] r, which is clear if [bar.[lambda]] [greater than or equal to] 1, and obtained if [bar.[lambda]] < 1 writing (16) for i = 0 and using [[summation].sup.C.sub.l=0] [v.sub.[bar.[lambda]]] (l) [less than or equal to] 1 in this equation. This is exactly the fact that the asymptotic stationary mean number of balls [lambda]m arriving between two shift times is greater than the number of removed balls at each shift, which is [rm]. It is due to the losses under the capacity limit C.

Proof: Only the existence and uniqueness result remains to prove. According to (13), w is some fixed point if and only if it is a fixed point of the function

[P.sup.(r)] [right arrow] [P.sup.(r)]

w [??][[mu].sub.[lambda](w)]

with [lambda](w) = log(1 + w(1)/(1 - r)). This function being continuous on the convex Convex

Curved, as in the shape of the outside of a circle. Usually referring to the price/required yield relationship for option-free bonds.
 compact set [P.sup.(r)], by Brouwer's theorem, it has a fixed point. To prove uniqueness, let w and w' be two fixed points of F in [P.sup.(r)]. By definition of [P.sup.(r)],

[[mu].sub.[lambda](w)](0) = [[mu].sub.[lambda](w')](0) = 1 - r. (18)

A coupling argument shows that, if [lambda] [less than or equal to] [lambda]' then [[mu].sub.[lambda]] is stochastically sto·chas·tic  
adj.
1. Of, relating to, or characterized by conjecture; conjectural.

2. Statistics
a. Involving or containing a random variable or variables: stochastic calculus.
 dominated by [[mu].sub.[lambda]'] , and in particular,

[[mu].sub.[lambda]] (0) + [[mu].sub.[lambda]] (1) [greater than or equal to] [[mu].sub.[lambda]'] (0) + [[mu].sub.[lambda]'] (1). (19)

It can then be deduced that [lambda](w) = [lambda](w'). Indeed, if for example [lambda](w) < [lambda](w'), by equations (18) and (19),

[[mu].sub.[lambda](w)] (1) [greater than or equal to] [[mu].sub.[lambda](w')](1).

thus, using (15) together with (18),

[lambda](w) = log(1 + [[mu].sub.[lambda](w)](1)/1 - r) [greater than or equal to] [lambda](w') = log(1 + [[mu].sub.[lambda](w')](1)/1 - r)

which contradicts [lambda](w) < [lambda](w'). One finally gets [lambda](w) = [lambda](w'), and then by equation (13), w = w'.

A Lyapunov function for the dynamical system given by F on [P.sup.(r)] is a function g [greater than or equal to] 0 on [P.sup.(r)] such that, for each w [member of] [P.sup.(r)], g(F(w)) [less than or equal to] g(w) with equality if and only if w is the fixed point of F. In the particular case C = 2, a Lyapunov function can be exhibited, resulting from a contracting property of F in this case.

Indeed, restricted to [P.sup.(r)], F is here given by:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[P.sup.(r)] is some one dimensional subvariety of [R.sup.3], so that any w [member of] [P.sup.(r)] can be identified with its second coordinate w(1) [member of] [0, r], or equivalently with [lambda](w) = log(1 + w(1)/(1 - r)) [member of] [0, log(1=(1 - r))].

Using this last parametrization of [P.sup.(r)], it is easy to show that F rewrites as G, mapping the interval I = [0, log(1/(1 - r))] to itself and defined, for [lambda] [member of] I, by G([lambda]) = log ([lambda] + [e.sup.-[lambda]]/1 - r).

An elementary computation shows that G has derivative on I taking values in the interval ] - 1, 0], which gives the already known existence and uniqueness of a fixed point [bar.[lambda]] for G (or F, both assertions being equivalent, and [bar.[lambda]] being equal to [lambda]([bar.w])). Moreover, the following inequality holds for [lambda] [member of] I:

[absolute value of G([lambda]) - [bar.[lambda]]] [less than or equal to] [absolute value of [lambda] - [bar.[lambda]],

equality occurring only at [lambda] = [bar.[lambda]]. As a result, g defined on [P.sup.(r)] by

g(w) = [absolute value of [lambda]](w) - [bar.[lambda]]] = [absolute value of [lambda](w) - [lambda]([bar.w])] = [absolute value of log 1 - r + w(1)/1 - r + w(1)],

is a Lyapunov function for the dynamical system defined by F.

For C > 2, we conjecture CONJECTURE. Conjectures are ideas or notions founded on probabilities without any demonstration of their truth. Mascardus has defined conjecture: "rationable vestigium latentis veritatis, unde nascitur opinio sapientis;" or a slight degree of credence arising from evidence too weak or too  the existence of such a g.

Theorem 1 Assume that a Lyapunov function exists for the dynamical system given by F on [P.sup.(r)] then, as m tends to +[infinity], the invariant measure of [([W.sup.m.sub.n].sub.n [member of] N] converges to [[delta].sub,[bar.w]] w where [bar.w] is the unique fixed point of F. Thus the following diagram commutes,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof: We prove that [[delta].sub.[bar.w]] is the unique invariant measure [pi] of P with support in [P.sup.(r)]. Let g be the Lyapunov function for F on [P.sup.(r)]. [pi] is P-invariant, thus [pi]P = [pi] and [pi]Pg = [pi]g which can be rewritten [integral](g [omicron om·i·cron
n.
Symbol The 15th letter of the Greek alphabet.
] F - g)d[pi] = 0. This implies that g = g o F holds [pi] almost surely because g - g o F [greater than or equal to] 0. Equality being only true at [bar.w], [pi] has support in {[bar.w]}.

2 A more general model

2.1 Mice with general size distribution

Let [([W.sup.m.sub.n]).sup.n[member of]N] be the sequence of vectors giving the proportions of urns at 0, ..., C just before the nth shift time in a model where balls are thrown by batches. The balls in a batch are thrown together in a unique urn chosen at random among the m urns. The ith batch is composed with [S.sub.i] balls and [([S.sub.i]).sub.i[member of]N] is a sequence of i.i.d. random variables distributed as a random variable S on [N.sup.*] with support containing 1. Let [phi] be the generating function of S. The quantity [S.sub.i] is called the size of batch i. The dynamics is the same: If, before the nth shift time, the state is w [member of] [P.sup.(r).sub.m], it first becomes s(w) and then a number [[tau].sup.m.sub.n] defined by (1) of successive batches are thrown in urns until [rm] urns are non empty. The model generalizes the previous one obtained for S = 1.

Let F be defined on P by

F(w) = [T.sub.c](s(w) * [C.sub.[lambda](w)],S]) (20)

where [T.sub.C], [lambda] and S are already defined and [C.sub.[lambda](w),S] is a compound Poisson distribution In probability theory, a compound Poisson distribution is the probability distribution of a "Poisson-distributed number" of independent identically-distributed random variables. More precisely, suppose



i.e.
 i.e. the distribution of the random variable

Y = [X.summation over (i=1)] [S.sub.i] (21)

where X is independent of [([S.sub.i]).sub.i[member of]N] with Poisson distribution of parameter [lambda](w).

We mimic the arguments in Section 1 to obtain the convergence of the stationary distribution of the ergodic Markov chain [([W.sup.m.sub.n]).sub.n[member of]N] as m tends to +[infinity] to a Dirac measure In mathematics, a Dirac measure is a measure δx on a set X (with any σ-algebra of subsets of X) that gives the singleton set the measure 1, for a chosen element x ∈ X:
 at the unique fixed point of F. Propositions 1 and 2 hold. The fixed points of F are described in the following proposition.

Proposition 4 F defined for w [member of] P by

F(w) = [T.sub.c](s(w) * [C.sub.[lambda](w),S)]

has a unique fixed point on [P.sup.(r)] which is exactly the invariant measure [[mu].sub.[bar.[lambda]] of the number of customers in a M/G/1/C queue with batches of customers arriving according to a Poisson process A Poisson process, named after the French mathematician Siméon-Denis Poisson (1781 - 1840), is a stochastic process which is used for modeling random events in time that occur to a large extent independently of one another (the word event  with intensity [bar.[lambda]], batch sizes being i.i.d. distributed as S with generating function [phi] and service times 1, where [bar.[lambda]] is determined by the implicit equation

[[mu].sub.[bar.[lambda]](0) = 1 - r

which is equivalent to

[bar.[lambda]] = log (1 + [[mu].sub.[bar.[lambda]] (1)/1 - r)

where for i [member of] {0, ..., C},

[[mu].sub.[lambda]](i) = [v.sub.[lambda]](i)/[[summation].sup.C.sub.l=0] [v.sub.[lambda]] (1)

and [v.sub.[lambda]] is given by

[+[infinity].summation over (n=0)] [v.sub.[lambda]] (n)[u.sup.n] = [v.sub.[lambda]](0) g [omicron] [phi](u)(u - 1)/ u - g o [phi](u), [absolute value of u] < 1

where [g.sub.[lambda]](u) = [e.sup.-[lambda](1-u)] and [v.sub.[lambda]] (0) = 1 - [lambda]E(S) when [lambda] < 1.

Recall that the first terms of [v.sub.[lambda]] are given by

[v.sub.[lambda]] (1) = [v.sub.[lambda]] (0)([e.sup.[lambda]] - 1)

and [v.sub.[lambda]] (2) = [v.sub.[lambda]] (0)[e.sup.[lambda]] ([e.sup.[lambda]] - 1 - [lambda]P(S = 1))

where [v.sub.[lambda]] (0) = 1 - [lambda]E(S) when [lambda] < 1, which generalizes the previous expressions. For C = 2, the Lyapunov function defined when S = 1 still works. Furthermore, for C > 2, we assume the existence of a Lyapunov function for F. Theorem 1 still holds.

2.2 A multi-stage filter

The filter is previously supposed to have only one stage. Let now assume that the filter has k stages of m counters each. The natural model then consists of k sets of m urns where, when a ball is thrown, k copies of this ball are sent simultaneously and independently into the k stages, each falling at random in one of the m urns of its set. If some ball hits an urn with C balls, then it is rejected. Moreover, when the proportion of non-empty urns in the whole filter reaches r, then one ball is removed from each non-empty urn: This is called a shift.

The previous analysis extends with one main difference: When the system is initialized at some state ([w.sub.j](i), 1 [less than or equal to] j [less than or equal to] k, 0 [less than or equal to] i [less than or equal to] C), where [w.sub.j](i) is the proportion of urns with i balls in stage j, the number [[tau].sup.m.sub.1] of balls thrown in each stage before the next shift is now asymptotically equivalent to [lambda](w)m, where w = (w(i), 0 [less than or equal to] i [less than or equal to] C) here gives the global proportions of urns in each possible state in the whole filter, that is w(i) = 1/k [k.summation over (j=1)] [w.sub.j](i) for 0 [less than or equal to] i [less than or equal to] C. Thus [lambda] is the same function as for the one-filter case, here evaluated at the global proportions:

[lambda](w) = log (1 + [[summation].sup.k.sub.j=1] [w.sub.j](1)/k(1 - r)).

The one-stage proof of (3) is however not reproducible here, due to the lack of a representation of [[tau].sup.m.sub.1] analogous to (1) of Section 1.

Another proof can be written. The alternative argument is provided by noticing that when [alpha]m balls per stage are thrown, with [alpha] [not member of] [[lambda](w)-[epsilon], [lambda](w)+[epsilon]] (using the same arguments (i) and (ii) as in Section 1), the empirical distributions of the urns at each stage are precisely known (for large m) and do not correspond to the global proportion r of non-empty urns.

Once the (uniform) convergence of [[tau].sup.m.sub.1]/m is established, the proof then proceeds along the same lines as for k = 1 (the same reasoning holding for each stage).

Notice however that the Markov property In probability theory, a stochastic process has the Markov property if the conditional probability distribution of future states of the process, given the present state and all past states, depends only upon the present state and not on any past states, i.e.  does not hold for the process of global proportions at shift times, so that convergence in distribution is proved for the process of proportions detailed by stage, then inducing convergence.

3 Discussions

3.1 Synthesis: false positives and false negatives

From a practical point of view, the main results are Propositions 3 and 4. Given some size distribution for the flows (the generating function [phi] of Section 2), these propositions show how the values of the counters can be computed from the different parameters of the algorithm, since these values are encoded by the fixed point [bar.w] of F: according to Theorem 1, [bar.w] is the state reached in the stationary regime when there is one stage and also when there are several stages (see Subsection subsection
Noun

any of the smaller parts into which a section may be divided

Noun 1. subsection - a section of a section; a part of a part; i.e.
 2.2: one has [[summation].sup.k.sub.j=1] [[bar.w].sub.j] (C) = k[bar.w](C)). Moreover, the convergence is experimentally really fast (see the remark below), which ensures that in practice the algorithm lives in the stationary phase. The component [bar.w](i) of [bar.w] gives the approximate proportion of counters having value i in the whole Bloom filter. [bar.[lambda]] is the number of packets that arrive between two shifts. [bar.w] and [bar.[lambda]] are respectively related to the number of false positives and to the number of false negatives:

The probability that a packet is a false positive is less than ([bar.w](C - 1) + [bar.w][(C)).sup.k] since, in stage j, the probability to hit a counter at height C is at most [[bar.w].sub.j](C - 1) + [[bar.w].sub.j](C) and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The quantity m[bar.[lambda]] is the time (number of packets) between two shifts, which is connected to the number of false negatives according to the discussion "False positive and false negative" of the Introduction.

3.2 Implementation and tests

The algorithm has been implemented with an improvement called the min-rule already proposed in [7]. Instead of increasing the k counters, an arriving packet is incrementing only the counters among k having the minimum value. Analytically more difficult to study, the algorithm should perform better: Heuristically heu·ris·tic  
adj.
1. Of or relating to a usually speculative formulation serving as a guide in the investigation or solution of a problem:
, more flows are needed to reach high values of the counters inducing fewer false positives; moreover, the time between two shifts is longer and hence the number of false negatives is decreased. It has been tested against on two ADSL See DSL.

ADSL - Asymmetric Digital Subscriber Line
 traffic traces from France Telecom, involving millions of flows. The performance of the algorithm is evaluated comparing the real number of elephants with the value estimated by the algorithm. Even under the min-rule, the algorithm performs well only if r stays under a critical value [r.sub.c], closely dependent on the mice distribution.

Simulations have been processed with a one-stage filter with flows of size 1 to evaluate the transient phase duration. It appears that the number of shifts to reach the stationary phase is not much greater than C. Such a result on the speed of convergence seems however theoretically out of reach.

References

[1] Nelson Antunes, Christine Fricker, Philippe Robert, and Danielle Tibi. Stochastic networks with multiple stable points. Annals an·nals  
pl.n.
1. A chronological record of the events of successive years.

2. A descriptive account or record; a history: "the short and simple annals of the poor" 
 of Probability, 36(1):255-278, 2008.

[2] Youssef Azzana. Mesures de la topologie et du trafic Internet. PhD thesis, Universite Pierre et Marie Curie Curie (kürē`), family of French scientists.

Pierre Curie, 1859–1906, scientist, and his wife,

Marie Sklodowska Curie, 1867–1934, chemist and physicist, b.
, july 2006.

[3] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability and Statistics See the separate articles on probability or the article on statistics. Statistical analysis depends on the characteristics of particular probability distributions, and the two topics are normally studied together. : Probability and Statistics. John Wiley John Wiley may refer to:
  • John Wiley & Sons, publishing company
  • John C. Wiley, American ambassador
  • John D. Wiley, Chancellor of the University of Wisconsin-Madison
  • John M. Wiley (1846–1912), U.S.
 & Sons Inc., New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
, second edition, 1999. A Wiley-Interscience Publication.

[4] E. D. Demaine, A. Lopez-Ortiz, and J. I. Munro. Frequency estimation
This article is about the technique in signal processing. The term "frequency estimation" can also refer to probability estimation.


Frequency estimation
, of internet packet streams with limited space. In Proceedings of the 10th Annual European Symposium on Algorithms (ESA 2002), pages 348-360, Rome, Italy, September 2002.

[5] Vincent Dumas, Fabrice Guillemin, and Philippe Robert. A Markovian analysis markovian analysis Decision-making The use of the Markov process to project future events in a system with multiple hypothetical components. See Receiver operator characteristic.  of additive-increase multiplicative-decrease algorithms. Adv. in Appl. Probab., 34(1):85-111, 2002.

[6] M. Durand and P. Flajolet. Loglog counting of large cardinalities. In G. Di Battista and U. Zwick, editors, Proceedings of the annual european symposium on algorithms, (ESA03), pages 605-617, 2003.

[7] C. Estan and G. Varghese. New directions in traffic measurement and accounting. In John Wroclawski, editor, Proceedings of the ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field.  SIGCOMM SIGCOMM Special Interest Group on Data Communication (ACM)  2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM-02), volume 32, 4 of Computer Communication Review, pages 323-338, New York, August 19-23 2002. ACM Press.

[8] P. Flajolet. On adaptative sampling. Computing, pages 391-400, 1990.

[9] P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Proceedings of the 13th conference on analysis of algorithm (AofA 07), pages 127-146, Juan-les-Pins, France, 2007.

[10] O. Gandouet and A. Jean-Marie. Loglog counting for the estimation of ip traffic. In Proceedings of the 4th Colloquium col·lo·qui·um  
n. pl. col·lo·qui·ums or col·lo·qui·a
1. An informal meeting for the exchange of views.

2. An academic seminar on a broad field of study, usually led by a different lecturer at each meeting.
 on Mathematics and Computer Science Algorithms, Trees, Combinatorics combinatorics (kŏm'bənətôr`ĭks) or combinatorial analysis (kŏm'bĭnətôr`ēəl)  and Probabilities, Nancy, France, 2006.

[11] F. Giroire. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, to appear.

[12] G. S. Manku and R. Motwani. Approximate frequency counts aver data streams. In Proceedings of the 28th VLDB (Very Large DataBase) An extremely large database. What constitutes a VLDB is debatable, but databases of 100GB or more are generally considered the starting point.  Conference, pages 346-357, Hong Kong Hong Kong (hŏng kŏng), Mandarin Xianggang, special administrative region of China, formerly a British crown colony (2005 est. pop. 6,899,000), land area 422 sq mi (1,092 sq km), adjacent to Guangdong prov. , China, 2002.

[13] Philippe Robert. Stochastic networks and queues, volume 52 of Applications of Mathematics. Springer-Verlag, Berlin, 2003. Stochastic Modelling This page is concerned with the stochastic modelling as applied to the insurance industry. For other stochastic modelling applications, please see Monte Carlo method. Stochastic model
"Stochastic" means being or having a random variable.
 and Applied Probability Much research involving probability is done under the auspices of applied probability, the application of probability theory to other scientific and engineering domains. However, while such research is motivated (to some degree) by applied problems, it is usually the mathematical .

Yousra Chabchoub (1), Christine Fricker (1), Frederic Meunier (2) and Danielle Tibi (3)

(1) INRIA INRIA - Institut National de Recherche en Informatique et Automatique , Domaine de Voluceau, 78 153 Le Chesnay Le Chesnay is a commune in the western suburbs of Paris, France. It is located 16.7 km. (10.4 miles) from the center of Paris. History
On 1 July 1815, Napoleon's Grande Armée fought its last battle in Rocquencourt and Le Chesnay.
 CX, France

(2) Universite Paris Est, LVMT Ecole des Ponts Ponts is a municipality in the comarca of the Noguera in Catalonia, Spain. It is situated on the left bank of the Segre river near its confluence with the Llobregós river and at the point where the routes from Calaf (currently the C-1412 road) and Cervera (currently the  et Chaussees, 6 et 8 avenue Blaise Pascal, Cite Descartes, Champs CHAMPS Capitol Hill Association of Merchants and Professionals
CHAMPS CHemometrics Applied to Metabonomics Proteomics and SAR
CHAMPS Cleanliness-Hospitality-Accuracy-Maintenance-Product-Service
CHAMPS Characteristics of Hardware Assemblies & Mission-build Planning System
 sur Marne 77455 Marne-la-Vallee

(3) Universite Paris 7, UMR UMR Unite Mixte de Recherche (French: Mixed Unit of Research )
UMR University of Missouri - Rolla
UMR Upper Mississippi River
UMR Uniform Methods and Rules (US Department of Agriculture)
UMR Unit Manning Report
 7599, 2 Place Jussieu, 75251 Paris Cedex 05
COPYRIGHT 2008 DMTCS
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Chabchoub, Yousra; Fricker, Christine; Meunier, Frederic; Tibi, Danielle
Publication:DMTCS Proceedings
Article Type:Report
Geographic Code:4EUFR
Date:Jan 1, 2008
Words:7993
Previous Article:A variant of the Recoil Growth algorithm to generate multi-polymer systems.
Next Article:Convergence to the coalescent and its relation to the time back to the most recent common ancestor.
Topics:

Terms of use | Copyright © 2012 Farlex, Inc. | Feedback | For webmasters | Submit articles