# Concentration of measure and mixing for Markov chains.

We consider Markovian models on graphs with local dynamics. We show that, under suitable conditions, such Markov chains exhibit both rapid convergence to equilibrium and strong concentration of measure in the stationary distribution. We illustrate our results with applications to some known chains from computer science and statistical mechanics.

Keywords: Markov chains, concentration of measure, rapid mixing

1 Introduction

Recent years have witnessed a surge of activity in the mathematics of real-world networks, especially the study of combinatorial and stochastic models. Such networks include, for instance, the Internet, social networks, and biological networks. The techniques used to analyse them draw from a range of mathematical disciplines, such as graph theory, probability, statistical physics, analysis. Strikingly, random processes with rather similar characteristics can occur as models of very different real-world settings.

Random networks can often be regarded as interacting systems of individuals or particles. Under certain conditions, there is a law of large numbers, that is, a large system is close to a deterministic process solving a differential equation derived from the average 'drift', with much simpler dynamics. Further, one may frequently observe chaoticity, i.e. asymptotic approximate independence of particles. Unfortunately, it is often difficult to prove the validity of such approximations, especially when the random process has an unbounded number of components in the limit (e.g. the number of vertices or components of size k in a graph of size n, for k = 1, 2, ..., as n [right arrow] [infinity]).

In other instances, it may be difficult to establish good rates of convergence for mean-field approximations, or determine whether the long-term and equilibrium behaviour of the random process also follows that of the deterministic system. Furthermore, some recent attempts at a more accurate representation of real networks still await any kind of mathematically rigorous analysis. We would hope that over the coming years, the intense interest will produce a coherent and widely applicable theory. However, at present, it often appears that each new problem defies the existing theory in an interesting way.

In many complex systems, laws of large numbers and high concentration of measure in equilibrium have been found to co-exist with so-called rapid mixing [2; 12; 31], that is mixing in time O(n log n), where n is a measure of the system size. (Traditionally, such a system was considered to be rapidly mixing if it converged to equilibrium in a time polynomial in n, but currently the term is more and more restricted to the 'optimal' mixing time O(n log n), see for example [31; 7].) There are some very notable examples of such behaviour, for instance, the subcritical Ising model, see [4; 15] and references therein, as well as the discussion in Section 3.1 of this paper.

The purpose of this article is to propose a new method to establish concentration of measure in complex systems modelled by Markov chains. We illustrate the technique with an application to a balls-and-bins model analysed in some earlier works by this author and McDiarmid, the supermarket model [18; 19]. Strong concentration of measure for this model, over long time intervals starting from a given state, as well as in equilibrium, was established in [18; 19] using the underlying structure of the model that enabled certain functions to be considered as functions of independent random variables so that the bounded differences method could be used.

In Section 4 of the present article we show that such concentration of measure inequalities hold more generally, with fewer assumptions on the structure of the Markov process involved. Our result is somewhat related, in spirit, to results (and arguments) in [16], which establishes transportation cost inequalities for the measure at time t and the stationary measure of a contracting Markov chain, assuming transportation cost inequalities for the kernel. However, the technical approach adopted here is rather different from [16] --discrete and coupling-based rather than functional analytic, and, we think, more 'hands on' and easier to use in practice (though our setting is less general than in [16]). It is striking that our approach, considerably more general than the one taken in [18], enables us to improve on the concentration of measure results proved in [18]. (Accordingly, we could also prove improved versions of results in [19], but we choose not to pursue this here.) The results in Section 4 also significantly extend Lemma 2.6 in [15], which bounds the variance of a real-valued, discrete-time, contracting Markov chain at time t and in equilibrium. We hope many more applications for the ideas presented here will be found in the future.

2 Notation and definitions

Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] be a discrete-time Markov chain with a discrete state space S and transition probabilities P(x, y) for x, y [member of] S, where [summation over (y)] [member of]S] P(x, y) = 1 for each x [member of] S. We assume that, for every pair of states x, y [member of] S, P(x, y) > 0 if and only if P(y, x) > 0. Then we can form an undirected graph with vertex set S where {x, y} is an edge if and only if P(x, y) > 0 and x 4 [not equal to] y. In general, our chains may be lazy, that is we can have P(x, x) > 0 for some x [member of] S. We assume that the graph is locally finite, that is, each vertex is adjacent to only finitely many other vertices. We now endow S with a graph metric d given by d(x, y) = 1 if P(x, y) > 0 and x [not equal to] y, and for all other x, y d(x, y) the length of the shortest path between x and y in the graph, which is assumed to be connected.

This kind of setting is natural and many models in applied probability and combinatorics fit into this framework, including those discussed in Section 3.

For each t [member of] [Z.sup.+], [X.sub.t] may be viewed as a random variable on a measurable space ({OMEGA], F), where

[OMEGA] = {[omega] = ([[omega].sub.0],[[omega.sub.1], ...) : [[omega].sub.i] [member of] S [for all]i}

and F = [sigma]([[union].sup.[infinity].sub.t=0][F.sub.t]), with [F.sub.t] = [sigma]([X.sub.i] : i [less than or equal to] t). Then each [X.sub.i] is the i-co-ordinate projection, that is [X.sub.i]([omega]) = [[omega].sub.i] for i [member of] [Z.sup.+]. Then the a-fields [F.sub.t] form the natural filtration for the process.

Let P(S) be the power set of S. The law of the Markov chain is a probability measure P on ([OMEGA], F), and is determined uniquely by the transition matrix P together with a probability measure [mu] on (S, P(S)) that gives the law of the initial state [X.sub.0], according to

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

for each [x.sub.0], ..., [x.sub.i] [member of] S, for each i [member of] [Z.sup.+]. This gives the law of ([X.sub.t]) conditional on L([X.sub.0]) = [mu], and will be denoted by [P.sub.[mu]] in what follows. Let [P.sup.t] (x, y) be the t-step transition probability from x to y, given inductively by

[P.sup.t] (x,y) = [summation over z[member of]S] [P.sup.t-1](x,z)P(z,y).

Then [P.sub.[mu]]([X.sub.t] [member of] A) = ([mu][P.sub.t]) (A) for A [subset or equal to] S.

Let [E.sub.[mu]] denote the expectation operator corresponding to [P.sub.[mu]]. For t [member of] [Z.sup.+] and f : S [right arrow] R, define the function [P.sup.t] f by

([P.sup.t]f)(x) = [summation over y] [P.sup.t-1](x,y)f(y), x [member of] S.

In other words, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], the expected value of f ([X.sub.t]) at time t conditional on the Markov process starting at x, i.e. the expectation of the function f with respect to measure [[delta].sub.x][P.sup.t]. In general, we write [E.sub.[mu]][f([X.sub.t])] = ([mu][P.sup.t])(f).

A real-valued function f on S is said to be Lipschitz (or 1-Lipschitz) if

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Here, equivalently, we only need to consider vertices at distance 1, so f is Lipschitz if and only if [sup.sub.x,y:d(x,y)=1] [absolute value of f(x) - f (y)] [less than or equal to] 1.

Given a probability measure [mu] on (S, P(S)) and an S-valued random variable X with law L(X) = [mu], we say that [mu] or X has normal concentration if there exist constants C, c > 0 such that, for every u > 0, uniformly over 1-Lipschitz functions f : S [right arrow] R,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.1)

We say that [mu] or X has exponential concentration if there exist constants C, c > 0 such that, for every u > 0, uniformly over 1-Lipschitz functions f : S [right arrow] R,

[mu]([absolute value of f(X) - [mu](f)] [greater than or equal to] u) [less than or equal to] C[e.sup.-cu] (2.2)

These definitions are closely related to the notions used by Ledoux [14].

In Section 4 we shall give conditions under which a discrete-time Markov chain ([X.sub.t]) exhibits normal concentration of measure over long time intervals and in equilibrium.

For probability measures [[mu].sub.1], [[mu].sub.2] on (S, P(S)), the total variation distance between [[mu].sub.1], and [[mu].sub.2] is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It is well known that the total variation distance satisfies

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the infimum is over all couplings [pi] = L(X, Y) of S'-valued random variables X, Y such that the marginals are L(X) = [[mu].sub.1] and L(Y) = [[mu].sub.2].

The Wasserstein distance between probability measures [[mu].sub.1] and [[mu].sub.2] is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the supremum is over all measurable 1-Lipschitz functions f : S [right arrow] R. By the Kantorovich--Rubinstein theorem (see [5], Section 11.8),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the infimum is taken over all couplings [pi] on S x S with marginals [[mu].sub.1] and [[mu].sub.2], and we write [pi][d(X, Y)] for the expectation of d(X, Y) under the coupling [pi]. It is well known that the Wasserstein distance metrises weak convergence in spaces of bounded diameter. Also, since the discrete space (S, P(S)) is necessarily complete and separable, so is the space of probability measures on (S, P(S)) metrised by the Wasserstein distance. See [28] for detailed discussions of various metrics on probability measures and relationships between them.

3 Examples of rapid mixing and concentration

In this section we give some examples of known Markov chains exhibiting both concentration of measure in equilibrium and rapid mixing.

3.1 Mean-field Ising model

Let G = (V, E) be a finite graph. Elements of the state space S : = [{-1, 1}.sup.V] will be called configurations, and for a [member of] S, the value [sigma](v) will be called the spin at v. The nearest-neighbour energy H ([sigma]) of a configurations [sigma] [member of] [{-1, 1}}.sup.V] is defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.1)

where w ~ v means that {w, v} [member of] E. The parameters J(v, w) measure the interaction strength between vertices; we will always take J(v, w) [equivalent to] J, where J is a positive constant.

For [beta] [greater than or equal to] 0, the Ising model on the graph G with parameter [beta] is the probability measure [pi] on S given by

[pi]([sigma]) = [e.sup.-[beta]H([sigma])]/Z([beta]), (3.2)

where Z([beta]) = [[summation].sub.[sigma][member of][OMEGA]] [e.sup.-[beta]H([sigma])] is a normalising constant.

The parameter [beta] is interpreted physically as the inverse of temperature, and measures the influence of the energy function H on the probability distribution. At infinite temperature, corresponding to [beta] = 0, the measure 7 is uniform over S and the random variables [{[sigma](v)}.sub.v[member of]V] are independent.

The (single-site) Glaubey[degrees] dynamics for [pi] is the Markov chain ([X.sub.t]) on S with transitions as follows. When at [sigma], a vertex v is chosen uniformly at random from V, and a new configuration is generated from 7 conditioned on the set

{n [member of] [OMEGA] : [eta](w) = [sigma](w), w [not equal to] v}.

In other words, if vertex v is selected, the new configuration will agree with a everywhere except possibly at v, and at v the spin is +1 with probability

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.3)

where [M.sup.v]([sigma]):= J[[summation].sub.w:w~v] [sigma](w). Evidently, the distribution of the new spin at v depends only on the current spins at the neighbours of v. It is easily seen that ([X.sub.t]) is reversible with respect to the measure [pi] in (3.2), which is thus its stationary measure.

Given a sequence [G.sub.n] = ([V.sub.n], [E.sub.n]) of graphs, write [[pi].sub.n] for the Ising measure and ([X.sup.(n).sub.c])) for the Glauber dynamics on [G.sub.n]. For a given configuration or [sigma] [member of] [S.sub.n], let L[X.sup.(n).sub.t], [sigma]) denote the law of [X.sup.(n).sub.t] starting from [sigma]. The worst-case distance to stationarity of the Glauber dynamics chain after t steps is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.4)

The mixing time [t.sub.mix] (n) is defined as

[t.sub.mix](n) := min{t : [d.sub.n](t) [less than or equal to] 1/4}. (3.5)

Note that [t.sub.mix](n) is finite for each fixed n since, by the convergence theorem for ergodic Markov chains, [d.sub.n](t) [right arrow] 0 as t [right arrow] [infinity]. Nevertheless, [t.sub.mix](n) will in general tend to infinity with n. It is natural to ask about the growth rate of the sequence [t.sub.mix](n).

Definition 1 The Glauber dynamics is said to exhibit a cut-off at {[t.sub.n]} with window size {[w.sub.n] if [w.sub.n] = o([t.sub.n]) and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Informally, a cut-off is a sharp threshold for mixing. For background on mixing times and cut-off, see [21].

Here we consider the mean-field case, taking [G.sub.n] to be [K.sub.n], the complete graph on n vertices. That is, the vertex set is [V.sub.n] = {1, 2, ..., n}, and the edge set [E.sub.n] contains all [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] pairs {i, j} for 1 [less than or equal to] i < j [less than or equal to] n. We take the interaction parameter J to be 1/n; in this case, the Ising measure [pi] on [{-1,1}.sup.n] is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.6)

In the physics literature, this is usually referred to as the Curie- Weiss model. To put this into the framework introduced in Section 2, the state space S consists of all n-vectors with components taking values in {-1,1}, and two vectors are adjacent if they differ in exactly one co-ordinate.

It is a consequence of the Dobrushin- Shlosman uniqueness criterion that [t.sub.mix](n) = O(n log n) when [beta] < 1; see [1]. (See also [2; 31]). We shall see in Section 4 that, in the same regime, the stationary measure [pi] (the Gibbs measure) exhibits normal concentration of measure for Lipschitz functions in the following sense. Let [X.sup.(n)] be a stationary version of [X.sup.(n).sub.t]. Then, for some constants c, C > 0, for all u > 0

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.7)

uniformly over all 1-Lipschitz functions on S and over all n. Thinking about (3.7) simply as a statement about the measure 7 without any mention of the process [X.sup.(n).sub.t), we can also rewrite it in the form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Inequality (3.7) will follow from Theorem 4.1 (i), and is an improvement on Proposition 2.7 in [15].

More precise results about the speed of mixing for [beta] < 1 can be found in [15], where the occurrence of a cut-off is established. The following is Theorem 1 from [15]:

Theorem 3.1 Suppose that [beta] < 1. The Glauber dynamics for the Ising model on [K.sub.n] has a cut-off at [t.sub.n] = [[2(1 - [beta])].sup.-1]n log n with window size n.

It is also easy to show, using the concentration of the Gibbs measure and the method used to prove Theorem 1.4 in [19], that asymptotically the spin values in a bounded set of vertices become almost independent. (In the language of [31]--see also references therein--this corresponds to the decay of correlations or spatial mixing.)

On the other hand, in the case [beta] [greater than or equal to] 1, there is no rapid mixing, and no cut-off (see [15; 4] and references therein): [t.sub.mix](n) is of the order [n.sup.3/2] when [beta] = 1 and is exponential in n when [beta] > 1. For the same range of [beta], the Gibbs measure fails to exhibit normal concentration.

In particular, consider the function m : S [right arrow] R given by m([sigma]) = [[summation].sup.n.sub.i=1] [sigma](i), the magnetisation; it is easy to see that 1/2m is 1-Lipschitz, and [E.sub.[pi]], (m(X)) = [pi](m) = 0. However, when [beta] > 1, then there is a constant c > 0 such that

[pi]({[sigma] : m [sigma]) [greater than or equal to] cn}) = [pi]({[sigma] : m[sigma]) [less than or equal to] -cn})[greater than or equal to] 1/4

i.e. m(X) is bi-modal for [beta] > 1. While there is no bi-modality in the case [beta] = 1, it is easy to calculate directly that m(X) is not concentrated in the sense of (3.7). Further, for [beta] [greater than or equal to] 1, the spins of vertices are no longer approximately independent for large n.

3.2 Supermarket model

Consider the following well-known queueing model with n separate queues, each with a single server. Customers arrive into the system in a Poisson process at rate [lambda]n, where 0 < [lambda] < 1 is a constant. Upon arrival each customer chooses d queues uniformly at random with replacement, and joins a shortest queue amongst those chosen (where she breaks ties by choosing the first of the shortest queues in the list of d). Here d is a fixed positive integer. Customers are served according to the first-come first-served discipline. Service times are independent exponentially distributed random variables with mean 1.

A number of authors have studied this model, as well as its extension to a Jackson network setting [10; 11; 18; 19; 20; 22; 23; 25; 30].

For instance, it is shown by Graham in [10] that the system is chaotic, provided that it starts close to a suitable deterministic initial state, or is in equilibrium. This means that the paths of members of any fixed finite subset of queues are asymptotically independent of one another, uniformly on bounded time intervals. This result implies a law of large numbers for the time evolution of the proportion of queues of different lengths, that is, for the empirical measure on path space [10]. In particular, for each fixed positive integer k o, as n tends to infinity the proportion of queues with length at least [k.sub.0] converges weakly (when the infinite-dimensional state space is endowed with the product topology) to a function [v.sub.t]([k.sub.0]), where [v.sub.t] (0) = 1 for all t [greater than or equal to] 0 and ([v.sub.t] (k) : k [member of] N) is the unique solution to the system of differential equations

d[v.sub.t](k)/dt = [lambda][([v.sub.t](k-1).sup.d] - [v.sub.t][(k).sup.d]) - ([v.sub.t](k) - [v.sub.t](k + 1)) (3.8)

for k [member of] N. Here one needs to assume appropriate initial conditions ([v.sub.0](k) : 1 [member of] N) such that 1 [greater than or equal to] [v.sub.0] (1) [greater than or equal to] [v.sub.0] (2) [greater than or equal to] ... [greater than or equal to] 0. Further, again for a fixed positive integer [k.sub.0], as n tends to infinity, in the equilibrium distribution this proportion converges in probability to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and thus the probability that a given queue has length at least [k.sub.0] also converges to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Although the above results refer only to fixed queue length [k.sub.0] and bounded time intervals, they suggest that when d [greater than or equal to] 2, in equilibrium the maximum queue length may usually be O (log log n). Indeed, one of the contributions of [18] is to show that this is indeed the case, and to give precise results on the behaviour of the maximum queue length. In particular, it turns out that when d [greater than or equal to] 2, with probability tending to 1 as n [right arrow] [infinity], in the equilibrium distribution the maximum queue length takes at most two values; and these values are log log n/ log d + O(1). Along the way, it is also shown in [18] that the system is rapidly mixing, that is the distribution settles down quickly to the equilibrium distribution. In this context, 'quickly' will mean 'in time O (log n), as this is a continuous time process with events happening at rate n, and so O (log n) corresponds to O(n log n) steps of the discrete-time jump chain. It is further established in [18] that the equilibrium measure is strongly concentrated.

Another natural question concerns fluctuations when in the equilibrium distribution: how long does it take to see large deviations of the maximum queue length from its stationary median? An answer is provided in [18] by establishing strong concentration estimates (for Lipschitz functions of the queue lengths vector) over time intervals of length polynomial in rt. The techniques in [18] are partly combinatorial, and are used also in [17] and [19]. In particular, in [19], the concentration estimates obtained in [18] are used to establish quantitative results on the convergence of the distribution of a queue length and on 'propagation of chaos'.

Let us start by discussing the rapid mixing results known for the supermarket model. In [18] two rapid mixing results are established, one in terms of the Wasserstein distance and one in terms of the total variation distance. Unlike for the Ising model in Section 3.1, it turns out to be inappropriate to be looking at the worst-case mixing time, that is the supremum of the mixing times over all possible starting states. In the present case, this quantity is unbounded: the state space is unbounded, and the time to equilibrium from states x with the total number of customers [[parallel] x [parallel].sub.1] = k [much greater than] n is of the order at least k. Then the best one can do is to obtain good upper bounds on the mixing time for copies of the Markov chain starting from nice states--that is, states where the queues are not too 'over-loaded'. This is made more precise below.

Let [X.sup.(n).sub.t] or [X.sub.t] be the queue-lengths vector ([X.sup.(n).sub.t](1), ..., [X.sup.(n).sub.t](n)) in the supermarket model with n servers. For a positive integer n, ([X.sup.(n).sub.t]) is an ergodic continuous-time Markov chain, with a unique distribution [[pi].sup.(n)] or [pi].

For any given state x write L([X.sup.(n).sub.t], x) to denote the law of [X.sup.(n).sub.t] given [X.sub.0](n) = x. Also, for [epsilon] > 0, the mixing time [[tau].sup.(n)] ([member of], x) starting from x us defined by

[[tau].sup.(n)]([epsilon], x) = inf{t [greater than or equal to] 0 : [d.sub.TV]([X.sup.(n).sub.t],x),[[pi].sup.(n))[less than or equal to] [epsilon]}.

The result below, Theorem 1.1 in [18], shows that starting from an initial state in which the queues are not too long, the mixing time is small. In particular, if [epsilon] > 0 is fixed and 0 denotes the all-zero n-vector, then [[tau].sup.(n)] ([epsilon], 0) is O (log n).

Theorem 3.2 Let 0 < [lambda] < 1 and let d be a fixed positive integer: For each constant c > 0 there exists a constant [eta] > 0 such that the following holds for each positive integer n. Consider any distribution of the initial queue-lengths vector [X.sup.(n).sub.0], and for each time t [greater than or equal to] 0 let

[[delta].sub.n,t] = P([absolute value of [X.sup.(n).sub.0]] > cn) + P ([M.sup.(n).sub.0] > [eta]t).

Then

[d.sub.TV] (L([X.sup.(n).sub.t]), [[pi].sup.(n)]) [less than or equal to] n[e.sup.-[eta]t] + 2[e.sup.-[eta]n] + [[delta].sub.n,t].

The O(log n) upper bound on the mixing time [tau] is of the right order. Indeed, it is also proven in [18] that, for a suitable constant [theta] > 0, if t [less than or equal to] [theta] log n then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.9)

Thus [[tau].sup.(n)] ([epsilon], 0) is [THETA](log n) as long as both [[epsilon].sup.-1] and [(1 - [epsilon]).sup.-1] are bounded polynomially in n.

It would be interesting to consider the mixing times more precisely, to establish whether the supermarket model exhibits a cut-off. Again, here we should not be considering the worst-case mixing time, but rather the worst case over a subset of 'good' initial states, which are states where the total number of customers is not too large and the maximum queue not too long. Also, to bring the supermarket model into the discrete framework of Section 2, let us consider the jump chain of the supermarket model. We shall denote the jump chain by [[??].sup.(n).sub.t]) or [[??].sub.t] in what follows, and its stationary measure by [[??].sup.(n) or [??].

The transition probabilities of the jump chain are as follows. Given the state at time t is x, the next event is an arrival with probability [lambda]/([lambda] + 1) and is a potential departure with probability 1/([lambda] + 1). Here 'potential' means that it may be a departure or no change of state at all. Given that the next event is an arrival, the queue to which the new customer is sent is determined by selecting a uniformly random d-tuple of queues and directing the customer to a shortest queue among those chosen, in the same way as for the continuous-time process. Given that the next event is a potential departure, the departure queue is chosen uniformly at random from among all n queues. Then a customer will depart if the selected queue is non-empty; otherwise, nothing happens. It is easy to adapt the proofs in [18] (where the arguments are, in fact, based on analysing the jump chain) to show that Theorem 3.2 implies mixing in time of the order O(n log n) from initial states x such that [[parallel] x [parallel].sub.1] = O(n) and [[parallel] x [parallel].sub.[infinity]] = O(log n).

Accordingly, we make the following conjecture:

Conjecture 3.3 Let c be a positive constant, and let [S.sup.(n).sub.0] be the set of all queue lengths vectors x in the ra server supermarket model such that [[parallel] x [parallel].sub.1] [less than or equal to] and [[parallel] x [parallel].sub.[infinity] [less than or equal to] c log n. Let [epsilon] > 0, and let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then [d.sub.n]([epsilon], t) has a cut-off in the sense of Definition 1, with window size n.

Our conjecture appears supported by some simulation results. Also it is supported by Conjecture 1 from [15], which states that the Glauber dynamics for the Ising model on transitive graphs [G.sub.n] has a cutoff if the mixing time is O(n log n). The jump chain of the supermarket process is of a similar type to Glauber dynamics in that it makes only local transitions, and has mixing time of the order O(n log n), starting from good initial states. Also, it has a lot of symmetry--its stationary distribution is exchangeable. Thus the supermarket chain appears a good candidate for cut-off, though proving it may not be easy.

More generally, perhaps cut-off can be proven to be a phenomenon that also co-occurs with rapid mixing and concentration of measure in equilibrium much more widely, in the context of Markov chains whose jumps are suitably local.

In [18], the authors upper bound mixing in terms of the total variation distance by first upper bounding the Wasserstein distance between the distribution of the process at time t and the stationary distribution. The following result is Lemma 2.1 in [18].

Theorem 3.4 Let 0 < [lambda] < 1 and let d be a fixed positive integer For each constant c > [lambda]/1 - [lambda] there exists a constant [eta] > 0 such that the following holds for each positive integer n. Let M denote the stationary maximum queue length. Consider any distribution of the initial queue-lengths vector [X.sub.0] such that [absolute value of [X.sub.0]] has finite mean. For each time t [greater than or equal to] 0 let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then

dw(L([X.sub.t]), [pi]) [less than or equal to] n[e.sup.-[eta]t] + 2cn [P.sub.[pi](M > [eta]t) + 2[e.sup.2[eta]n] + [[delta].sub.n,t].

The upper bounds on the Wasserstein and total variation distance, and thus on the mixing time, are proven in [18] by means of a monotone coupling. The coupling takes two copies of the queueing process starting in adjacent states (that is, states differing in one customer in one queue) and couples their paths together in such a way that the [l.sub.1]-distance between them is non-increasing (and so always stays equal to 1 until the processes coalesce). Furthermore, the coupling is such that with high probability the [l.sub.-1] distance rapidly becomes 0. The coupling is then extended to all pairs of starting states with not too many customers in queues using the fact that the Wasserstein distance is a metric on the space of probability measures, or a path-coupling argument [2].

The property that the [l.sun.1]-distance is non-increasing in the coupling in [18] is very strong and not commonly encountered in path-coupling scenarios. This property is exploited in [18] to prove strong concentration of measure for the supermarket process, starting from a fixed (or highly concentrated state) for a long time interval. The following is Lemma 4.3 in [18].

Lemma 3.5 There is a constant c > 0 such that the following holds. Let n [greater than or equal to] 2 be an integer and let f be a 1-Lipschitz function on the state space (set of all queue lengths vectors) S. Let also [x.sub.0] [member of] S and assume that the queue-lengths process ([X.sub.t]) satisfies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then for all times t > 0 and all u [greater than or equal to] 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.10)

Lemma 4.3 in [18] is proven by observing that the supermarket process can be 'simulated' by two independent Poisson processes, the arrivals process (with rate [lambda]n) and the (potential) departure process (with rate n), together with corresponding independent choices of queues (d independent uniformly random choices for each event in the arrivals process, and one uniformly random choice in the departures process). One then conditions on the number of events in the interval [0, t], and then the state at time t is conditionally determined by a finite family of independent random variables. In other words, the argument is, just like most of the other arguments in [18], based on studying the jump chain ([[??].sub.t]), although this is not made explicit therein.

The non-increasing distance coupling property is used to show that a Lipschitz function of the queue lengths vector must satisfy a bounded differences condition, so that the discrete bounded differences inequality can be applied to show concentration of measure for Lipschitz functions in the conditional space. The proof is then completed by deconditioning.

The rapid mixing result can be combined with the long-term concentration of measure result to prove concentration of measure in equilibrium for Lipschitz functions of the queue-lengths vector. The following is Lemma 4.1 in [18].

Lemma 3.6 There is a constant c > 0 such that the following holds. Let n [greater than or equal to] 2 be an integer and consider the n-queue system. Let the queue-lengths vector Y have the equilibrium distribution. Let f be a 1-Lipschitz function on S. Then for each u [greater than or equal to] 0

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.11)

Lemmas 3.5 and 3.6 prove strong concentration of measure--normal concentration for small deviations and exponential concentration for larger deviations in the case of starting from a fixed state, and exponential concentration in equilibrium. The factor n in the bound on the right-hand sides of both (3.10) and (3.11) is a limitation of the technique and not the right answer. It is natural to expect the truth to be a lot better--that it can be replaced by a constant. In Section 4 we develop concentration inequalities that achieve that. Although we work with the discrete-time jump chain, it is easy to see that our results apply also to the continuous time chain. One further advantage of our inequalities is that they apply to other settings--for instance where rapid mixing is established by a coupling, but the coupling does not have additional useful properties such as the non-increasing Wasserstein distance.

Even so Lemmas 3.5 and 3.6 are quite powerful. We now explore, briefly, some results concerning the queue lengths in the supermarket model in equilibrium that can be obtained using Lemma 3.6. The following is Lemma 4.2 in [18]. (We drop the subscript [pi] to lighten up the notation.)

Lemma 3.7 Consider the n-queue system, and let the queue-lengths vector Y have the equilibrium distribution. For each non-negative integer k, let l(k, y) denote the number of queues of length at least 1 in state y. Also, for each non-negative integer k, let l(k) = E[l(k, Y)]. Then for any constant c > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Also, there exists a constant c > 0 such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Furthermore, for each integer r [greater than or equal to] 2

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Lemma 5.1 in [18], stated below, yields further precise information about the equilibrium behaviour, over long time intervals.

Lemma 3.8 Let K > 0 be an arbitrary constant and let [tau] = [n.sup.K]. Let ([Y.sub.t]) be in equilibrium and let c > 0 be a constant. Let [B.sub.[tau] be the event that for all times t with 0 [less than or equal to] t [less than or equal to] [tau]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In [18], Lemma 5.1 is used to prove two-point concentration for the stationary maximum queue length and its concentration on only a constant number of values over long time intervals. This is Theorem 1.3 in [18]:

Theorem 3.9 Let 0 < [lambda] < 1 and let d [greater than or equal to] 2 be an integer Then there exists an integer-valued function [m.sub.d] = [m.sub.d](n) = log log n/log d + O(1) such that the following holds. For each positive integer n, suppose that the queue-lengths vector [Y.sup.(n).sub.t] is in the stationary distribution (and thus so is the maximum queue length [M.sup.(n).sub.t]). Then for each time t [greater than or equal to] 0, [M.sup.(n).sub.t]) is [m.sub.d](n) or [m.sub.d](n) - 1 with probability tending to 1 as n [right arrow] [infinity]; and further; for any constant K > 0 there exists c = c(K) such that, with probability tending to 1 as n [right arrow] [infinity],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.12)

The functions [m.sub.2](n), [m.sub.3](n), ... may be defined as follows. For d = 2, 3, ... let [i.sub.d](n) be the least integer i such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then we let [m.sub.2](n) = [i.sub.2] (n) + 1, and for d [greater than or equal to] 3 let [m.sub.d](n) = [i.sub.d](n). (As we have seen, with high probability the proportion of queues of length at least i is close to[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].)

Also, equation (37) in [18] shows that, for r = O(log n),

P(M [greater than or equal to] [m.sub.d](n) + r) [less than or equal to] [e.sup.-cr log n] (3.13)

for a constant c > 0.

In [19], strong concentration of measure results from [18] are used to show that in equilibrium the distribution of a typical queue length converges to an explicit limiting distribution and provide explicit convergence rates. Let [Y.sup.(n)](1) denote the equilibrium length of of queue 1. (Note that the equilibrium distribution is exchangeable.) The following is Theorem 1.1 in [19]. Let [L.sup.[lambda],d] denote the law of a random variable Y such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for each k = 0, 1, .....

Theorem 3.10 For each positive integer n let [Y.sup.(n) be a queue-lengths n-vector in equilibrium, and consider the length [Y.sup.(n)](1) of queue 1. Then

[d.sub.TV](L([Y.sup.(n)](1)),[L.sub.[lambda],d])

is of order [n.sup.-1] up to logarithmic factors.

In fact, it is proven in [19] that the above total variation distance is o([n.sup.-1] [log.sup.3] n) and is [OMEGA]([n.sup.-1]). Also, the following holds (Corollary 1.2 in [19]).

Corollary 3.11 For each positive integer l, the difference between the kth moment E[[Y.sup.(n)][(1).sup.k]] and the kth moment of [L.sub.[lambda],d] is of order [n.sup.-1] up to logarithmic factors.

The above results concern the distribution of a single queue length. One may also consider collections of queues and chaoticity. The terms 'chaoticity' and 'propagation of chaos' come from statistical physics [13], and the original motivation was the evolution of particles in physical systems. The subject has since then received considerable attention, especially following the ground-breaking work of Sznitman [29].

The result below (Theorem 1.4 in [19]) establishes chaoticity for the supermarket model in equilibrium. We see that for fixed r the total variation distance between the joint law of r queue lengths and the product law is at most 0([n.sup.-1]), up to logarithmic factors. More precisely and more generally we have:

Theorem 3.12 For each positive integer n, let [Y.sup.(n)] be a queue-lengths n-vector in equilibrium. Then, uniformly over all positive integers r [less than or equal to] n, the total variation distance between the joint law of [Y.sup.(n)](1), ..., [Y.sup.(n)](r) and the product law L([(Y.sup.(n)][(1).sup.[cross product]r] is at most O([n.sup.-1] [log.sub.2] n[(2log log n).sup.r]); and the total variation distance between the joint law of [Y.sup.(n)](1), ..., [Y.sup.(n)] (r) and the limiting product law [L.sup.[cross product]r.sub.[lambda],d] is at most 0([n.sup.-1] [log.sup.2] n[(2 log log n).sup.r+1]).

Analogous time-dependent results (away from equilibrium) are also given in [19]--proven using Lemma 3.5 above (Lemma 4.3 in [18]) but we omit them here for the sake of brevity. Let us mention that the arguments used in [19] to prove Theorems 1.1 and 1.4 (Theorems 3.10 and 3.12 above) are quite generic and would apply in many other settings. The main property needed is concentration of measure for Lipschitz functions of the state vector, the polynomial form of the generator of the Markov process, and, in the case of Theorem 1.1, also the exchangeability of the stationary distribution. The chaoticity result Theorem 3.12 above is a quantitative version of some of the results in [29].

To conclude this section, we mention that analogues of results in [18; 19] are proved in [17] for a related balls-and-bins model, where, instead of queueing up to receive service on a first-come first-served basis, customers (balls) have independent exponentially distributed 'lifetimes' and each departs its queue (bin) as soon as its lifetime has expired.

Current work in progress [9] includes extensions of the results in [18; 19] to the supermarket model where the number of choices d = d(n) and the arrival rate [lambda] = [lambda](n) are n-dependent, including the interesting case where d [right arrow] [infinity] and [lambda] [right arrow] 1 with various functional dependencies between [lambda] and d.

4 Coupling and bounded differences method generalised

This section contains our main results and applications. We use the notation introduced in Section 2.

Let us state our first theorem, which gives concentration of measure for Lipschitz functions of a discrete-time Markov chain on state space S and with transition matrix P at time t, under assumptions on the Wasserstein distance between its i step transition measures for i [less than or equal to] t.

Theorem 4.1 Let P be the transition matrix of a discrete-time Markov chain with discrete state space S.

(i) Let ([[alpha].sub.i] : i [member of] N) be a sequence of positive constants such that, for all i,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.1)

Let f be a 1-Lipschitz function. Then for all u > 0, [x.sub.0] [member of] S, and t > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.2)

(ii) More generally, let [S.sub.0] be a non-empty subset of S, and let ([[alpha].sub.1] : i [member of] N) be a sequence of positive constants such that, for all i,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.3)

Let

[S.sup.0.sub.0] = {x [member of] [S.sub.0] : y [member of] [S.sub.0] whenever d(x,y) = 1}.

Let f be a 1-Lipschitz function. Then for all [x.sub.0] [member of] [S.sup.0.sub.0], u > 0 and t > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.4)

If the Markov chain becomes contractive after a finite number of steps, then one can deduce from Theorem 4.1 concentration results for the stationary measure of the Markov chain, as in the following corollary.

Corollary 4.2 (i) Suppose that there exists x [member of] S and a sequence [[alpha].sub.i] : S [right arrow] [R.sup.+] of functions such that, for all y [member of] S,

[d.sub.W]([[delta].sub.x][P.sup.i], [[delta].sub.y][P.sup.i]) [less than or equal to] [[alpha].i](y), (4.5)

where [[alpha].sub.i](y) [right arrow] 0 as i [right arrow] [infinity] for each y, and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then ([X.sub.t]) has a unique stationary measure [pi], and [[delta].sub.y][P.sup.t] [right arrow] [pi] as t [right arrow] [infinity] for each y.

(ii) Suppose that (4.1) holds, and the constants [[alpha].sub.i] in Theorem 4.1 satisfy [[summation].sub.i][[alpha].sup.2.sub.i] < [infinity]. Suppose further there exists x [member of] S such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where g(y) = d(x,y). Then ([X.sub.t]) has a unique stationary measure [pi], [[delta].sub.x][P.sup.t] [right arrow] [pi] as t [right arrow] [infinity] for each x.

Furthermore, let X be a stationary copy of [X.sub.t]. Then, for all u > 0, and uniformly over all 1-Lipschitz functions f,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.7)

(iii) Suppose that ([X.sub.t]) has a unique stationary measure 7 and condition (4.3) holds, where [[summation].sub. i][[alpha].sup.2.sub.i] < [infinity]. Let x E [S.sup.0.sub.0], and suppose [delta] > 0 and [t.sub.0] > 0 are such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.8)

Let X be a stationary copy of [X.sub.t]. Then, for all u [greater than or equal to] [delta], uniformly over all 1-Lipschitz functions f,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof:

(i) Consider the sequence [P.sub.i] of measures on (S, P(S)) given by [P.sub.i] = [[delta].sub.x][P.sup.i]; we have, using the coupling characterisation of the Wasserstein distance,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

as i [right arrow] [infinity], by assumption. Thus the sequence ([P.sub.i]) is a Cauchy sequence and so, since the space of probability measures on (S, P(S)) is complete with respect to the Wasserstein distance, it must converge to a probability measure [pi] on (S, P(S)). It is obvious that this measure must be stationary for P.

Now, take y [member of] S, and let [Q.sub.i] = [[delta].sub.y][P.sup.i]. Then

[d.sup.W]([P.sub.i], [Q.sub.i]) = [d.sub.W]([[delta].sub.x][P.sub.i], [[delta].sub.y][P.sup.i]) [less than or equal to] [[alpha].sub.i](y) [right arrow] 0 as i [right arrow] [infinity].

It follows that [Q.sub.i] [right arrow] [pi] as i [right arrow] [infinity], and so [pi] must be the unique stationary measure.

(ii) The assumption that [[summation].sub.i][[alpha].sup.2.sub.i] < [infinity] implies that [[alpha].sub.i] [right arrow] 0 as i [right arrow] [infinity]. Then it is easily seen (using the fact that the distance d(y, z) between each pair y, z of states in finite) that conditions (4.5) and (4.6) of part (i) hold for x, with [[alpha].sub.i](y) [less than or equal to] [[alpha].sub.i]d(x, y), and so, as in (i) one can prove that there exists a (necessarily unique) stationary measure [pi], and that [[delta].sub.x][P.sup.t] [right arrow] [pi] as t [right arrow] [infinity] for each x [member of] S.

Let us now prove the concentration of measure result, inequality (4.7). Take some x [member of] S. Given [epsilon] > 0, for t large enough the Wasserstein distance, and hence the total variation distance, between [[delta].sub.x][P.sup.t] and [pi] is at most [epsilon]. Then, for u [greater than or equal to] [epsilon] and all such t, by Theorem 4.1 part (i),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Here we have used the fact that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since [epsilon] is arbitrary, the result follows.

(iii) Let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Arguing as in (ii), and using Theorem 4.1 part (ii), we can write, for u [greater than or equal to] [delta],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

as required.

To prove Theorem 4.1, we shall make use of a concentration inequality from [26]. Let ([??], [??], [??]) be a probability space, with [??] finite. Let [??] [subset or equal to] [??] be a [sigma]-field. Given a bounded random variable Z on ([??],[??],[??]), the supremum of Z in [??] is the [??]-measurable function given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.9)

Thus sup (Z) takes the value at [omega] equal to the maximum value of Z over the 'smallest' event in [??] containing [omega]. Since [??] is finite, we are assured that the smallest event containing [omega] does exist; the arguments used here would work also in many cases where [??] is countably infinite.

The conditional range of Z in [??], denoted by ran(Z), is the [??]-measurable function

ran(Z | [??]) = sup(Z|[??]) + sup(-Z|[??]). (4.10)

Let {0, [??]} = [[??].sub.0] [subset or equal to] [[??].sub.1] [subset or equal to] ... be a filtration in [??], and let [Z.sub.0], ..., be the martingale obtained by setting [Z.sub.t] = E(Z|[[??].sub.t]) for each t. For each t let [ran.sub.t] denote ran ([Z.sub.t]|[F.sub.t-1]); by definition, [ran.sub.t] is an [[??].sub.t-1]-measurable function. For each t, let the sum of squared conditional ranges [R.sup.2.sub.t] be the random variable [summation].sup.t.sub.i=1] [ran.sup.2.sub.i], and let the maximum sum of squared conditional ranges [[??].sup.2.sub.t] be the supremum of the random variable [R.sup.2.sub.t], that is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The following result is Theorem 3.14 in [26].

Lemma 4.3 Let Z be a bounded random variable on a probability space ([??],[??],[??]) with [??] (Z) = m. Let {0, [??]} = [[??].sub.0] [subset or equal to] [[??].sub.1] [subset or equal to] ... [subset or equal to] [[??].sub.t] be a filtration in [??]. Then for any u [greater than or equal to] 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

More generally, for any u [greater than or equal to] 0 and any value [r.sup.2.sub.t],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof of Theorem 4.1. Let f : S [right arrow] R be 1-Lipschitz. Fix a time t [member of] N, [x.sub.0] [member of] S and consider the evolution of [X.sub.t] conditional on [X.sub.0] = [x.sub.0] for t steps, that is until time t. Since we have assumed that there are only a finite number of possible transitions from any given x [member of] S, we can build this conditional process until time t on a finite probability space [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]: we can take [??] to be the finite set of all possible paths of the process starting at time 0 in state [x.sub.0] until time t, and [??] to be the power set of [??].

In the conditional space, for each time j = 0, ..., t, let [[??].sub.j] = [sigma]([X.sub.0], ..., [X.sub.j]), the [sigma]-field generated by [X.sub.0], ..., [X.sub.0]; so [[??].sub.0] = {0, [??]} and We write E instead of E in what follows to lighten the notation.

Consider the random variable Z = f ([X.sub.t]) : [??] [right arrow] R. Also, for j = 0, ..., t let [Z.sub.j] be given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where we have used the Markov property in the last equality.

Fix 1 [less than or equal to] j [less than or equal to] t; we want to upper bound [ran.sub.j] = ran([Z.sub.j] | [[??].sub.j-1]. Fix also [x.sub.1], ..., [x.sub.j-1] [member of] S, and for x [member of] S consider

g (x) = E[f([X.sub.t])[absolute value of [X.sub.j] = x] = E[f([X.sub.t-j])]|[X.sub.0] = x] = ([P.sup.t-j] f) (x).

Note that [Z.sub.j] ([??]) [member of] {g(x) : d(x, [x.sub.j-1]) [less than or equal to 1} for [??] such that [X.sub.j -1]([??]) = [x.sub.j -1]. It follows that, for such [??],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Let us prove part (i) of the theorem. As f is 1-Lipschitz,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

by assumption. We deduce that [ran.sub.j] ([??]) [less than or equal to] 2[[alpha].sub.t-j] for all [??] [member of] [??]. It follows that

[[??].sup.2.sub.t]([??]) [less than or equal to] 4 [t-1.summation over (r=0)][[alpha.sup.2.sub.t-r],

uniformly over [??] [member of] [??]. Part (i) of Theorem 4.1 now follows from Lemma 4.3.

To prove (ii), observe that the bound

[ran.sub.j](w) = ran([Z.sub.j][[??].sub.j-1])([omega]) [less than or equal to] 2[[alpha.sub.t-j]

still holds on the event [A.sub.t] = {[omega] : [X.sub.j]([omega]) [member of] [S.sup.0.sub.0] for j = 0, ..., t}.

The following special case of model satisfying the hypotheses of Theorem 4.1 is of particular interest and has received considerable attention in computer science literature; see for instance [2; 8; 12]. Suppose (4.1) is satisfied with [a.sub.i] = [a.sup.i], where 0 < [alpha] < 1 is a constant. In the language of [2] this corresponds to the following situation. Consider different copies ([X.sub.t]), ([X'.sub.t]) of the process with initial states x, x' respectively, that is [X.sub.0] = x and [X'.sub.0] = x' almost surely. Suppose that we can couple ([X.sub.t]), ([X'.sub.t]) so that, uniformly over all pairs of states x, x' [member of] S with d(x, x') = 1,

E[d([X.sub.1], [X'.sub.1])|[X.sub.0] = x, [X'.sub.0] = x'] [less than or equal to] [alpha],

for a constant 0 < [alpha] < 1. Thus, under the coupling, ([X.sub.t]), ([X'.sub.t]) will be getting closer and closer together on average as t gets larger, which implies strong mixing properties [2; 12]. Then, uniformly over x, x' [member of] S with d(x, x') = 1, [d.sub.W]([[delta].sub.x]P, [[delta].sub.x'], P) [less than or equal to] [alpha]. By 'path coupling' [2; 12]

E[d([X.sub.1], [X'.sub.1])|[X.sub.0] = x, [X'.sub.0] = x'] [less than or equal to] [alpha]d(x, x'),

and hence [d.sub.W]([[delta].sub.x]P, [[delta].sub.x']P) [less than or equal to] [alpha][d.sub.W]([[delta].sub.x, [[delta].sub.x') for all pairs x, x' [member of] S. By induction on t,

[d.sub.W]([[delta].sub.x][P.sup.t], [[delta].sub.x'][P.sup.t]) [less than or equal to] [[alpha].sup.t]d(x, x')

for all x, x' [member of] S and all t [member of] N. Then, in the same notation as earlier, we can upper bound

[[??].sup.2] [less than or equal to] 4 [t.summation over (r=1)] [[alpha].sup.2r] [less than or equal to 4[[alpha].sup.2][(1-[[alpha].sup.2].sup.-1]

for all t. Hence we obtain the following corollary.

Corollary 4.4 Suppose that there is a constant 0 < a < 1 such that

[d.sup.W]([[delta].sub.x]P, [[delta].sub.x']P) [less than or equal to] [alpha] (4.11)

for all x, x' [member of] S such that d(x, x') = 1. Then for all t > 0

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.12)

for all u > 0, all [x.sub.0] [member of] S, and for every 1-Lipschitz function on S.

Hence, if X has the equilibrium distribution [pi] then, for all u > 0 and every 1-Lipschitz function f,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.13)

The particular choice of [alpha] = 1 - [c.sub.1]/n for a constant [c.sub.1] > 0 corresponds to the 'optimal' mixing time O(n log n) for a Markov chain in a system with size measure n, and gives concentration of measure in equilibrium of the form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.14)

where [c.sub.2] > 0 is a constant. This is the case, for example, for the subcritical ([beta] < 1) mean-field Ising model discussed in Section 3--see for example [21] or [15] for a description of the coupling that implies fast decay of the Wasserstein distance. The same also applies to the Glauber dynamics for colourings on bounded-degree graphs analysed in [7] (see also [8] and [27]). The application is straightforward when the number of colours I is greater than 2D, where D is the maximum degree of the graph. It is only a little more involved in the case (2 -[eta])D [less than or equal to] k [less than or equal to] 2D, where the proof in [7] relies on delayedpath-coupling [3], whereby a new Markov chain is used with one step corresponding to cn steps of the original one, n being the size of the graph to colour.

On the other hand [alpha] = 1 - 6/([n.sup.3] - n) for the Glauber dynamics on linear extensions of a partial order of size n [2; 12] gives an upper bound O([n.sup.3] log n) on mixing. The corresponding bound on deviations of a 1-Lipschitz function from its mean of size u is of the form [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which is useless. However, one cannot do much better in general. To see this, consider the partial order on n points consisting of a chainof length n - 1 and a single incomparable element. It is not hard to check that in this case the mixing time is of the order [n.sup.3]--see [2] for details. It is also easy to see that there is no normal concentration of measure in the sense of (4.14).

We shall now apply Theorem 4.1 and Corollary 4.2 to the supermarket process described in Section 3.2, or rather to the corresponding discrete-time jump chain [[??].sub.t]. Recall that, when in state x, the next event is an arrival with probability [lambda]/ (1 + [lambda]), and is a potential departure with probability 1/ (1 + [lambda]). Given that the next event is an arrival, the queue to which the arrival will go is determined by selecting a uniformly random d-tuple of queues and sending the customer to a shortest one among those chosen, ties being split by always going to the first best queue in the list. Given that the next event is a potential departure, the departure queue is chosen uniformly at random among the n possible queues, and departures from empty queues are ignored. In the Markov chain graph, two states are connected by an edge if and only if they differ exactly in one customer in one queue. Then a function f is 1-Lipschitz if and only if it is 1-Lipschitz with respect to the [l.sub.1] distance on the state space S.

We focus on the case d [greater than or equal to] 2. For d = 1, in equilibrium the queue lengths are independent geometric random variables, so normal concentration of measure can be obtained using the standard bounded differences inequality [26].

By Lemma 2.3 in [18], for all x, y [member of] S such that d(x,y) = 1, and all t [greater than or equal to] 0,

[d.sub.W]([[delta].sub.x][P.sup.t], [[delta].sub.y][P.sup.t]) [less than or equal to] 1.

Let c be a positive constant, and let [S.sub.0] be given by

[S.sub.0] = {x [member of] S :[[parallel] x [parallel].sub.1] [less than or equal to] cn [[parallel] x [parallel].sub.[infinity]] c log n}.

It is very easy to modify the proof of Lemma 2.6 in [18] to show that, if x, y [member of] [S.sub.0] and d(x, y) = 1, then for some constants a, [beta] > 0,

[d.sub.W]([[delta].sub.x][P.sup.t], [[delta].sub.y][P.sup.t]) [less than or equal to] [e.sup.-[beta]t/n] + [2e.sup. -[beta]n] (4.15)

fort [greater than or equal to] [alpha]n log n.

Take a constant K > 2 and let [tau] = [n.sup.K]. Then we can put [[alpha].sub.i] = 1 for t [less than or equal to] [alpha]n log n, and [[alpha].sub.i] = [e.sup.-[beta]t/n] + 2[e.sup.-[beta]n] for [alpha]n log n < t [less than or equal to] [tau]. Then for t [less than or equal to] [tau], we can upper bound

[t.summation over (i=1)] [[alpha].sup.2.sub.i] [less than or equal to] min {t,[alpha]n log n + [n.sup.1-[beta]/[alpha]] [[beta].sup.-1] + 2[e.sup.-[beta]n/2]} [less than or equal to] min{t, 2[alpha]n log n]}.

Consider the all-empty state, 0 [member of] [S.sup.0.sub.0]. Then by choosing the constant c in the definition of [S.sub.0] sufficiently large, we can ensure that, for d [greater than or equal to] 2,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This follows from Lemma 2.3 (monotone coupling for given n and d), Lemma 2.4 (a) and the monotone coupling for given n and different d, d' (see the proof of Lemma 2.4 in [18]) and equation (37) in [18]. (See also the statements of these results in Section 3.2.)

By Theorem 4.1 (i), we can choose c sufficiently large so that, for all t > 0, all u > 0, and every Lipschitz function f,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.16)

By Theorem 4.1 (ii), for [alpha]n log n [less than or equal to] t [less than or equal to] [tau], and all u > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.17)

In particular, for [alpha]n log n [less than or equal to] t [less than or equal to] [tau], and u [less than or equal to] [c.sub.0] [square root of n] log n,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.18)

provided that c is large enough. Inequalities (4.16)--(4.18) improve on what one could obtain for the jump chain from Lemma 3.5 above, for an interesting range of u and t--and it is easy to use them to derive improved concentration of measure inequalities for the continuous chain also. (It is possible to optimise inequality (4.17) by playing with the definition of [S.sub.0] to obtain normal concentration for larger u.)

We now want to relate this to concentration of measure in equilibrium, via Corollary 4.2. It is easy to see from earlier work (see [18] and references therein) that the supermarket jump chain has a unique stationary measure. (This could also be proven showing that the hypotheses of Corollary 4.2 (i) are satisfied, via (4.15) above.)

By Lemma 2.1 in [18] and straightforward calculations for the Poisson process, there is a constant [eta] > 0 such that

[d.sup.W](L([[??].sub.t], 0), [??]) [less than or equal to] n[e.sup.-[eta]t/n] + 2cn[P.sub.[??]](M > [eta]n/n) + [2e.sup.-[eta]n], (4.19)

where M denotes the maximum queue length in equilibrium, and we may take c the same as in the definition of [S.sub.0], assuming that c is sufficiently large. Thus, by (4.19),

[d.sup.W](L([[??].sub.[tau]], 0), [??]) [less than or equal to] (n + 2cn + 2)[e.sup.-[eta]n],

Let [??] denote the queue lengths vector in equilibrium. It then follows by Corollary 4.2 (iii), uniformly for all 1-Lipschitz functions f, for u [greater than or equal to] 1 and n sufficiently large

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.20)

So, choosing c to be sufficiently large, for all u > 0 and n sufficiently large,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.21)

This improves on Lemma 3.6 above, and gives normal concentration for u = O([n.sup.1/2] [(log n).sup.3/2]) (again, it is possible to obtain normal concentration for larger u), but is not the optimal result we are after. In particular, we still cannot show that deviations of size [n.sup.l/2][omega](n) have probability tending to 0 for [omega](n) tending to infinity arbitrarily slowly. We will now derive another inequality that will enable us to achieve our aim.

Theorem 4.5 Assume that there exists a set [S.sub.0] and numbers [alpha.sub.i] (x,y) (x,y [member of] [S.sub.0], i [member of] N) such that, for all i, and all x, y [member of] [S.sub.0] with d(x, y) = 1,

[d.sub.W]([[delta].sub.x][P.sup.i][[delta].sub.y][P.sup.i]) [less than or equal to] [[alpha].sub.i(x,y). (4.22)

Let

[S.sup.0.sub.0] = {x [member of] [S.sub.0] : y [member of] [S.sub.0] whenever d(x, y) = 1}.

For x [member of] S, let [g.sub.x](y) = [d.sub.W]([[delta].sub.y][P.sup.i][[delta].sub.x][P.sup.i]. Assume that, for some sequence ([[alpha].sub.i] : i [member of] N) of positive constants,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.23)

Let t > 0, let v = [summation.sup.t.sub.i=1] [[alpha].sup.2.sub.i], and let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.24)

Let also [A.sub.t] = {w [member of] [OMEGA] : [X.sub.s] ([omega]) [member of] [S.sup.0.sub.0] [for all]0 [less than or equal to] s [less than or equal to] t}

Then, for all u > 0, and uniformly over all 1-Lipschitz functions f,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.25)

To prove Theorem 4.5, we use another result from [26]. With notation as before, for j = 1, ..., t, let

[var.sub.j] = var([Z.sub.j]|[[??].sub.j-1]) = E ([([Z.sub.j] - E ([Z.sub.j]|[[??].sub.j-1])).sup.2]|[[??].sub.j-1];

let V = [summation.sup.t.sub.j=1] [var.sub.j]. Also, for j = 1, ..., t, let [dev.sub.j] = sup([absolute value of [Z.sub.j] - [Z.sub.j-1]] | [[??].sub.j-1]), and let dev = [sup.sub.j] [dev.sub.j]. The following result is essentially Theorem 3.15 in [26].

Lemma 4.6 Let Z be a random variable on a probability space ([??],[??],[??]) with E(Z) = m. Let {0, [??]} [[??].sub.0] [subset or equal to] [[??].sub.1] [subset or equal to] ... [subset or equal to] [[??].sub.t] be a filtration in [??]. Let [??] = max dev, the maximum conditional deviation (and assume that b is finite). Then for any u [greater than or equal to] 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [??] is the maximum sum of conditional variances (which is assumed to be finite).

More generally, for any u [greater than or equal to] 0 and any values b, v [greater than or equal to] 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof of Theorem 4.5. The proof is similar to the proof of Theorem 4.1. Let f : S [right arrow] R be 1-Lipschitz. Fix a time t [member of] N, an [x.sub.0] [member of] S and consider the evolution of [X.sub.t] conditional on [X.sub.0] = [x.sub.0] for t steps, that is until time t. Again this conditional process can be supported by a finite probability space [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As before, in the conditional space, for each time j = 0, ..., t let [[??].sub.j] = [sigma]([X.sub.0], ..., [X.sub.j]), the [sigma]-field generated by [X.sub.0], ..., [X.sub.j]; so [[??].sub.0] = {0, [??]} and [[??].sub.t = [??]. Again, we consider the random variable Z = f([X.sub.t]) : [??] [right arrow] R. And, for j = 0, ..., t, [Z.sub.j] is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Suppose first for simplicity that [S.sub.0] = S. We want to apply Lemma 4.6 and for this we need to calculate the conditional variances [var.sub.j]. To do this, we use the fact that the variance of a random variable Y is equal to 1/2 E[(Y - [??]).sup.2], where Y is another random variable with the same distribution as Y and independent of Y.

Fix j and [x.sub.1], ..., [x.sub.j-1] [member of] S, and for x [member of] S consider

g (x) = E[f([X.sub.t][absolute value of [X.sub.j = x] = E[f([X.sub.t-j]]) [X.sub.0 = x] = ([P.sup.t-j]f)(x).

Then, for [??] such that [X.sub.j-1]([??]) = [x.sub.j-1], [Z.sub.j]([??]) [member of] {g(x) : d(x, [x.sub.j-1]) [less than or equal to] 1}, so that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

by assumption (4.23).

Then we can upper bound the sum

[??] [less than or equal to] 2 [t.summation over (j=1)] [[alpha].sup.2.sub.j].

It remains to bound dev = [sup.sub.j] [dev.sub.j]. We have, for [??] such that [X.sub.j-1]([??]) = [x.sub.j-1],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

It follows that, for each j = 1, ..., t,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

by (4.24) and using the coupling characterisation of the Wasserstein distance. Theorem 4.5 now follows from the first statement in Lemma 4.6 in the case where [S.sub.0] = S. In general, the above bounds on [??] and dev hold on the event [A.sub.t] = {[omega] : [X.sub.j]([omega]) [member of] [S.sup.0.sub.0] for j = 0, ..., t}, and so Theorem 4.5 also follows from the second statement of Lemma 4.6.

Let us now apply Theorem 4.5 to the supermarket model from [18] discussed above. Again, we focus on the case d [greater than or equal to] 2.

Let c be a positive constant, and let [S.sub.0] be given by

{x [member of] S : l(k,x) = [n.summation over (r=1)] [1.sub.x(r)[greater than or equal to]k] [less than or equal to] n[e.sup.-k/c] for k = 1, ...}.

Consider the all-empty state, 0 [member of] [S.sup.0.sub.0]. Let K > 2 be a constant. We claim that we can choose c sufficiently large that, if [tau] = [n.sup.K], then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

This follows easily from Lemma 3.8 in the present paper, together with equation (3.13).

We now want to calculate the quantity in (4.23). For a state [x.sub.0] [member of] [S.sup.0.sub.0] and a state x chosen with probability P([x.sub.0], x), these states will only differ in a queue of length greater than k if P([[x.sub.0], x) is a probability of an event involving a queue of length at least k - a departure from a queue of length at least 1 or an arrival into a queue of length at least k. For [x.sub.0] [member of] [S.sup.0.sub.0] such a transition happens with probability at most c[e.sup.-k/c] (choosing c large enough again).

The proof of Lemma 2.6 in [18] shows that, if x, y [member of] [S.sub.0] are adjacent and differ in a queue of length k, then for some constants [alpha], [beta] > 0 we can upper bound

[d.sub.W]([[delta].sub.x][P.sup.t], [[delta].sub.y][P.sup.t]) [less than or equal to] [e.sup.-[beta]t/n] + 2[e.sup. -[beta]n]

for t [greater than or equal to] [alpha]kn. Also, by Lemma 2.3 in [18],

[d.sub.W]([[delta].sub.x][P.sup.t], [[delta].sub.y][P.sup.t]) [less than or equal to] 1

for all t and hence for t < [alpha]kn.

Combining the above observations and choosing [alpha] > 1 large enough, we find that for t [greater than or equal to] [[alpha].sup.2]n

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence, by choosing c large enough, we can upper bound

[tau.summation over (i=1)] [[alpha].sup.2.sub.i] [less than or equal to] cn

Further, once again using Lemma 2.3 in [18], we can upper bound [??] [less than or equal to] 2.

By Theorem 4.5, there is a constant c > 0 such that, uniformly for all 1-Lipschitz functions f, all t [less than or equal to] [tau], and all u > 0,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.26)

In particular, we can choose c large enough so that, for u [less than or equal to] [c.sub.0][square root or n] log n,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.27)

Now, as before, by (4.19),

[d.sub.W]([[delta].sub.0][P.sup.[tau]], [??]) [less than or equal to] (n + 2cn + 2)[e.sup.-[eta]n]

provided c is large enough. It follows that for n large enough, uniformly for all 1-Lipschitz functions f, and all u [greater than or equal to] 1,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.28)

It follows that, for 0 < u [less than or equal to] [c.sub.0][n.sup.1/2] log n, we obtain

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.29)

provided that the constant c is chosen sufficiently large. Choosing u = [square root of n][omega](n), where [omega](n) is a function tending to infinity with n arbitrarily slowly, we obtain

[P.sub.[??]]([absolutely value of f([??]) - [E.sub.[??]][f([??])]] [greater than or equal to u) = o(1)

as n [right arrow] [infinity].

Inequalities (4.26) and (4.28) could be optimised (by optimising the choice of set [S.sub.0]) to obtain normal concentration for larger u.

For a positive integer k, let l(k, [??]) be the number of queues of length at least k in the stationary jump chain, and let [??](k) be its expectation. Then for any positive integer s, and any u > 0, we can write

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Note that the maximum value that [[absolute value of l(k,[??]) - [??](k)].sup.s] can take is [n.sup.s]. Then, taking u = [n.sup.l/2], and applying inequality (4.28), we obtain

[E.sub.[??]] [[[absolute value of l(k,[??]) - [??](k)].sup.s]] [less than or equal to] c[n.sup.s/2].

assuming the constant c is chosen big enough. Hence, arguing as in Section 4 of [18], it is easy to show that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

And hence, arguing as in Section 5 of [18], we obtain that, for some constant co,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4.30)

which improves on equation (27) in [18], implying that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

5 Conclusions

We have derived concentration inequalities for Lipschitz functions of a Markov chain long-term and in equilibrium, depending on contractivity properties of the chain in question. Our results apply to many natural Markov chains in computer science and statistical mechanics.

One open problem is to show that, in a discrete-time Markov chain with 'local' transitions, under suitable conditions, rapid mixing occurs essentially if and only if there is normal concentration of measure long-term and in equilibrium (with non-trivial bounds). Another open question is to explore how these properties relate to the cut-off phenomenon. Is it the case that, again under suitable assumptions, they are necessary and sufficient conditions for a cut-off to occur?

6 Acknowledgement

References

[1] M. Aizenman and R. Holley (1987). Rapid convergence to equilibrium of stochastic Ising models in the Dobrushin- Shlosman regime. In Percolation Theory and Ergodic Theory of Infinite Particle Systems (H. Kesten, ed.). IMA Vol. Math. Applic. Springer-Verlag, Berlin.

[2] R. Bubley and M. Dyer (1997). Path coupling: a technique for proving rapid mixing in Markov chains. Proc. 38th Ann. Symp. Found. Comp. Sci. 223-231.

[3] A. Czumaj, P. Kanarek, M. Kutylowski and K. Lorys (1999). Delayed path coupling and generating random permutations via distributed stochastic processes. In 10 Ann. Symp. Disc. Alg, ACM-SIAM, 355-363.

[4] J. Ding, E. Lubetzky and Y. Peres. (2008) The mixing time evolution of Glauber dynamics for the mean-field Ising model. Preprint.

[5] R.M. Dudley (1989). Real Analysis and Probability. Wadsworth & Brooks/Cole, Pacific Grove, CA.

[6] M. Dyer, L.A. Goldberg, C. Greenhill, M. Jerrum and M. Mitzenmacher (2000). An extension of path coupling and its application to the Glauber dynamics for graph colourings. In 11 Ann. Symp. Disc. Alg, ACM-SIAM, 616-624.

[7] M. Dyer, C. Greenhill and M. Molloy (2002). Very rapid mixing of the Glauber dynamics for proper colourings on bounded-degree graphs. Rand. Struct Alg 20 98-114.

[8] M. Dyer, A. Goldberg, C. Greenhill, M. Jerrum, and M. Mitzenmacher (2001). An extension of path coupling and its application to the Glauber dynamics for graph colourings. SIAM Journal in Computing 30 1962-1975.

[9] M. Fairthorne (2009+). PhD Thesis, London School of Economics.

[10] C. Graham (2000). Chaoticity on path space for a queueing network with selection of the shortest queue among several. J. Appl. Probab. 37 198-210.

[11] C. Graham (2004). Functional central limit theorems for a large network in which customers join the shortest of several queues. Probab. Theor. Relat. Fields 131 97-120.

[12] M. Jerrum (1998). Mathematical foundations of MCMC. In Probabilistic Methods for Algorithmic Discrete Mathematics. (M. Habib, C. McDiarmid, J. Ramirez and B. Reed, eds.) 116-165. Springer --Verlag, Berlin.

[13] M. Kac (1956). Foundation of kinetic theory. Proc. 3rd Berkeley Symp. on Math. Stat. and Probab. 3 171-197. Univ. of Calif. Press.

[14] M. Ledoux (2001). The Concentration of measure phenomenon. AMS.

[15] D. Levin, M.J. Luczak and Y Peres. Glauber dynamics for the mean field Ising model: cut-off, critical power law, and metastability. To appear in PTRF.

[16] M.J. Luczak and K. Marton (2008). Transportation cost inequalities for stationary distributions of Markov kernels. Preprint.

[17] M.J. Luczak and C. McDiarmid (2005). On the power of two choices: balls and bins in continuous time. Ann. Appl. Probab. 15 1733-1764.

[18] M.J. Luczak and C. McDiarmid (2006). On the maximum queue length in the supermarket model. Ann. Probab. 34 493-527.

[19] M.J. Luczak and C. McDiarmid (2007). Asymptotic distributions and chaos for the supermarket model. Elect. J. Probab. 12 75-99.

[20] M.J. Luczak and J.R. Norris (2005). Strong approximation for the supermarket model. Ann. Appl. Probab. 15 2038-2061.

[21] D. Levin, Y. Peres and E. Wilmer (2009+). Markov chains and mixing times. In preparation.

[22] J.B. Martin and Y.M. Suhov (1999). Fast Jackson networks. Ann. Appl. Probab. 9 854-870.

[23] M. Mitzenmacher (1996). The power of two choices in randomized load-balancing. PhD. Thesis, Berkeley.

[24] M. Mitzenmacher, B. Prabhakar, and D. Shah (2002). Load-balancing with memory. Proc. 43rd Ann. IEEE Symp. Found. Comp. Sci.

[25] M. Mitzenmacher, A. Richa, and R. Sitaraman (2001). The power of two random choices: a survey of techniques and results. In Handbook of Randomized Computing (S. Rajasekaran, P.M. Pardalos, J.H. Reif and J.D.P. Rolim, eds.) 1255-312. Kluwer, Dordrecht.

[26] C. McDiarmid (1998). Concentration. In Probabilistic Methods forAlgorithmic Discrete Mathematics. (M. Habib, C. McDiarmid, J. Ramirez and B. Reed, eds.) 195 - 248. Springer--Verlag, Berlin.

[27] M.Molloy (2001). Very rapidly mixing Markov chains for 2[DELTA]-colourings and for independent sets in a 4-regular graph. Rand. Struct. Alg 18 101-115.

[28] S.T. Rachev (1991). Probability Metrics and the Stability of Stochastic Models. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Chichester.

[29] A. Sznitman (1989). Topics in propagation of chaos. In Ecole d'Ete de Probabilites de Saint-Flour XIX -1989 165 - 251. Springer-Verlag, Berlin.

[30] N.D. Vvedenskaya, R.L. Dobrushin, and F.I. Karpelevich (1996). Queueing system with selection of the shortest of two queues: an asymptotic approach. Problems Inform. Transmission 32 15-27.

[31] D. Weitz (2004). Mixing in time and space for discrete spin systems. PhD Thesis, Berkeley.

Malwina J. Luczak

Department of Mathematics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom

m.j.luczak@lse.ac.uk