# PROBABILITY, MINIMAX APPROXIMATION, AND NASH-EQUILIBRIUM. ESTIMATING THE PARAMETER OF A BIASED COIN.

1. Introduction. The following problem is well known in the area of probability: a biased coin is given, but the probability of heads is unknown. We flip it n times and get k heads. The problem is to estimate the probability of heads. The most typical approach to solve this problem is the maximum likelihood method; see, e.g., [13, 21]. Letp = P(heads). Then

[mathematical expression not reproducible].

Since we have absolutely no information about p, we choose an estimator p [member of] [0,1] for which this expression is maximal, that is, p = k/n. This approach has several shortcomings:

1) For small n we get unrealistic estimations. For example, if n = 1 and we get a head, then the method gives the estimation p = 1, and if we get a tail, then the method gives p = 0.

2) the method yields the most likely value of p but does not take into account the error in the estimation. This can be seen in the following example: suppose we flip the coin n = 4 times and get k = 2 heads; of course, p = 0.5. However, while in the literature a coin is often considered "reasonably fair" when 0.45 [less than or equal to] p [less than or equal to] 0.55, in this case it is easy to verify that P ((|p - 0.5| > 0.05) | (n = 4, k = 2)) = 0.8137..., assuming p is chosen following a uniform distribution, and thus, the probability of dealing with a biased coin is high despite the maximum likelihood estimator.

Another approach for estimating the probability p of a biased coin deals with Bayesian or minimax estimation. While in the former the goal is to minimize the average risk, in the latter the aim is to minimize the maximum risk. Nevertheless, there are other well-known methods such as the so-called Uniform Minimum Variance Unbiased (UMVU) Estimator; see [13]. These methods require the use of a loss function, which measures the difference between the parameter and its estimator. In the current paper, the loss function |p - [a.sub.i] | will be considered, where ai is the estimation of p if i heads are obtained after n tosses, for i = 0,..., n. Then, the expected value of this loss function, commonly called risk or penalty function, is given by

(1.1) [mathematical expression not reproducible],

and our goal will be to choose the [a.sub.i]'s that minimize the sup-norm of D. Figure 1.1 below, corresponding to the case of n = 5 tosses, illustrates the motivation for our analysis. It displays the penalty functions for the maximum likelihood choice (i.e., [a.sub.k] = k/n, k = 0,..., n) and for the choice of the [a.sub.i]'s that we will prove to be optimal; see Section 2. Indeed, the sup-norm of the penalty function D for the optimal choice is clearly smaller than the one corresponding to the maximum likelihood approach. Moreover, Figure 1.1 shows that the optimal choice satisfies what we call hereafter the "equimax" property. This is very similar to the characterization of the set of interpolation nodes to reach the optimal Lebesgue function, as conjectured by Bernstein and Erdos and proved forty years later by Kilgore [12] and de Boor-Pinkus [3]. This similarity served as an important motivation to apply approximation theory techniques for investigating this problem, and it will be discussed in more detail in Section 2.

In the statistics literature, one often prefers the use of squares instead of absolute values in (1.1), that is, the minimization of

(1.2) [mathematical expression not reproducible],

is considered because of its analytical tractability and easier computations. Indeed, for the penalty function (1.2), the optimal (minimax) strategy {[a.sub.0],..., [a.sub.n]} is explicitly computed; see [13]:

(1.3) [mathematical expression not reproducible].

Of course, this is the optimal strategy when measuring the loss using the least-squares norm but not in our "uniform" setting. In Figure 1.2 below, we augment Figure 1.1 with the plot of the penalty function D for the strategy (1.3). Figure 1.2 shows that the behavior of the Squared Error Minimax Estimator (hereafter, SEME) is similar to the optimal choice, or even a bit better, towards the center of the interval, but it is clearly worse close to the endpoints of the interval. Thus, for n = 5, the sup-norm of the penalty function D for the SEME is 0.1545, while for our Absolute Error Minimax Estimator (AEME) it is 0.131. More generally, as mentioned above, along with the minimax estimators, the so-called Bayes estimators are also often employed; see [13, Ch. 4]. In this setting, given a loss function R([theta], [delta]), some "prior" distribution [LAMBDA] for the parameter [theta] to be determined (in our case [theta] = p) is selected, and the estimator [[delta].sub.[LAMBDA]] is chosen in order to minimize the weighted average risk

(1.4) r([lambda], [delta])= [Florin] R([theta], [delta]) d[LAMBDA]([theta]),

also called Bayes risk. As established in [13, Theorem 5.1.4] and some corollaries, in certain situations a Bayes estimator is also a minimax estimator. This connection will be used below (Section 4) when discussing the interpretation of our method within the framework of game theory. Thus, roughly speaking, we can say that in this paper we are dealing with a "multi-faceted" topic, which we investigate from the points of view of point estimation theory, approximation theory, and game theory.

On the other hand, the problem of estimating the probability of a coin (more generally, the parameter of a Bernoulli distribution) from a few tosses has been often considered as a toy-model for randomization processes and for verifying the effectiveness of different methods in statistical inference. In this sense, it is noteworthy that some recent papers have dealt with the problem of simulating a coin with a certain prescribed probability f(p) by using a coin whose probability is actually p; see [9, 16]. Actually, this idea comes from a seminal paper by von Neumann [19]. In [9, 16] the authors also show, just as we do in the current paper, the utility of techniques from approximation theory for solving problems from probability.

The paper is structured as follows. In Section 2, our minimax estimation is thoroughly studied, and the optimal choice is established by Theorem 2.2, which represents the main result of this paper. Some computational results are included. The asymptotic distribution of the set of nodes corresponding to such optimal strategies, when the number of tosses approaches infinity, is established in Section 3. In Section 4 we discuss the problem from the game theory standpoint, as a non-cooperative two-player game, which is described in detail. Also, the existence of a Nash-equilibrium is established. Furthermore, in Section 5, this Nash-equilibrium is explicitly solved for the case of n = 2 tosses. In both Sections 4 and 5, the solution of this Nash-equilibrium problem is related to the connection between minimax and Bayes estimators. Finally, the last section contains some further remarks and conclusions.

2. The minimax estimation. As it was discussed above, the optimal strategy we consider is to choose [a.sub.0],...,[a.sub.n] [member of] [0,1] in order to minimize the sup-norm of the penalty function D([a.sub.0],..., [a.sub.n];p). Thus, our minimax problem has a striking resemblance with the well-known problem of the optimization of the Lebesgue function in polynomial interpolation. Indeed, let us consider the polynomial interpolation of a function f over a set of nodes [x.sub.0],..., [x.sub.n] [member of] [0,1]. By the classical Lagrange formula, we know the expression for such an interpolating polynomial:

[mathematical expression not reproducible],

where [l.sub.k]([x.sub.0],..., [x.sub.n]; x), k = 0,..., n, are the well-known Lagrange interpolation polynomials, and they form a basis for [P.sub.n], the space of polynomials of degree less than or equal to n. Since the norm of the projection operator from C[0,1], the space of all continuous functions on [0,1], onto [P.sub.n] is given by the sup-norm of the Lebesgue function

(2.1) [mathematical expression not reproducible],

the problem of finding optimal choices of nodes [x.sub.0],..., [x.sub.n] [member of] [0,1] minimizing the sup-norm of (2.1) arises in a natural way. It is well known that if the endpoints of the interval belong to the set of nodes, then the solution is unique. As for the characterization of the solution, the famous Bernstein-Erdos conjecture asserted that for an optimal choice, the corresponding Lebesgue function (2.1) must exhibit the following "equimax" property: if the absolute maximum of [LAMBDA]([x.sub.0],...,[x.sub.n];x) on each subinterval [[x.sub.i-1],[x.sub.i]] is denoted by [[lambda].sub.i] = [[lambda].sub.i]([x.sub.0],..., [x.sub.n]), i = 1,..., n, then we have

[[lambda].sub.1] =... = [[lambda].sub.n].

This conjecture was finally proved by Kilgore [12] (see also [3]). An algorithm to compute the corresponding optimal nodes is given in [14, pp. 68-73].

Now, the following result shows that the above mentioned resemblance between both mini-max problems can be extended to the characterization of optimal solutions. Let f(p) := D([a.sub.0],..., [a.sub.n];p) be an optimal penalty function in the sense of minimizing the sup-norm of (1.1). Then, the following result, which gathers some necessary conditions to be satisfied for an optimal choice {[a.sub.0],..., [a.sub.n]}, will be useful. It will be stated without assuming that the points are "well-ordered", i.e., that [a.sub.0] < [a.sub.1] <... < [a.sub.n]. Although this fact may seem obvious, it does require a proof; see Theorem 2.2 below.

LEMMA 2.1. Let

M(f) := {x [member of] [0,1] : f(x) = [||f||.sub.[infinity]]}

be the set of absolute maxima of an optimal penalty function f. Then

(i) M(f) [intersection] {[a.sub.0],..., [a.sub.n]} = [??],

(ii) M(f) [intersection] [0, min{[a.sub.i]}) = [??], M(f) [intersection] (max{[a.sub.i]}, 1] = [??],

(iii) [a.sub.0] [less than or equal to] 1/2 [less than or equal to] [a.sub.n] and M(f) [intersection] ([a.sub.0], [a.sub.n]) = [??].

Proof. The proof of part (i) easily follows from the fact that, from (1.1), the derivative f'(p) has a positive jump as p passes through [a.sub.i], and thus, f(p) cannot be increasing/decreasing as we pass through [a.sub.i].

As for part (ii), suppose that M(f) [intersection] [0, min{[a.sub.i]}) = [??]. Then,

[mathematical expression not reproducible].

Since by (i), min{[a.sub.i]} [??] M(f), we have that there is a [delta] > 0 such that

[mathematical expression not reproducible].

But then, ||D([a.sub.0],[a.sub.1],..., min{[a.sub.i]} + [delta],..., [a.sub.n];p)[||.sub.[infinity]] < [||f||.sub.[infinity]], which contradicts the optimality of f(p). The argument that M(f) [intersection] [max{[a.sub.i]}, 1) [not equal to] [??] is similar.

To prove (iii) we first notice that [a.sub.0] [less than or equal to] 1/2 [less than or equal to] [a.sub.n]. Indeed, it is easily seen that ||D(1/2,..., 1/2;p)[||.sub.[infinity]] [less than or equal to] 1/2, which implies that [||f||.sub.[infinity]] [less than or equal to] 1/2. Therefore, we obtain that [a.sub.0] = f(0) [less than or equal to] [||f||.sub.[infinity]] [less than or equal to] 1/2. Similarly, 1 - [a.sub.n] = f(1) [less than or equal to] 1/2, and thus a0 [less than or equal to] 1/2 [less than or equal to] [a.sub.n]. We note that, as a consequence, we also have that min{[a.sub.i]} [less than or equal to] [a.sub.0] [less than or equal to] 1/2 [less than or equal to] [a.sub.n] < max{[a.sub.i]}.

Suppose now that M(f) [intersection] [[a.sub.0], [a.sub.n]] = [??]. Since [a.sub.i] [??] M(f), there is [delta] > 0 such that [[a.sub.0] - [delta], [a.sub.n] + [delta]] [intersection] M(f) = [??]. Using a similar argument as in the proof of (ii), we have that for [epsilon] > 0 small enough and 0 [less than or equal to] p [less than or equal to] [a.sub.0]-[delta]< 1/2,

[mathematical expression not reproducible]

We can also prove the same inequality for [a.sub.n] + [delta] < p [less than or equal to] 1 from which we conclude that

||D([a.sub.0] - [intersection], [a.sub.1],..., [a.sub.n-1], [a.sub.n] + [??];p)[||.sub.[infinity]] < ||D([a.sub.0],... , [a.sub.n];p)[||.sub.[infinity]] = [||f||.sub.[infinity]],

which is a contradiction to the optimality of f.

The following theorem establishes the equimax property for our minimax estimation and represents one of the two main results of this paper. The other main result is Theorem 3.4. In addition, the "well-ordering" of such optimal choice is also proved.

THEOREM 2.2. Suppose that

f(p) := D([a.sub.0],..., [a.sub.n];p) = D(T;p),

is an optimal penalty function. Then the node set T satisfies [a.sub.0] < [a.sub.1] < * * * < [a.sub.n], and the equimax property holds, that is, M(f) [intersection] ([a.sub.i], [a.sub.i+1]) [not equal to] [??], i = 0,..., n - 1.

Proof. First, we prove that for an optimal penalty function D(T;p), the node set T is well-ordered, i.e., [a.sub.i] [less than or equal to] [a.sub.i]+1 for all i = 0,..., n - 1. Indeed, suppose it is not. Then there is an index i < n such that [a.sub.i] > [a.sub.i+1]. We will perturb the node set T to obtain a penalty function with smaller norm. Select [??] > 0 small enough so that ||f|| > f(p) for p [member of] ([a.sub.i] - [psi], [a.sub.i] + [psi]) [union] ([a.sub.i+1] - [delta], [a.sub.i+1] + [delta]), where

(2.2) [psi] := [??] ([a.sub.i] + [a.sub.i+1])/(2 - [a.sub.i] - [a.sub.i+1]), [delta] := [??] (i + 1)/(n - i),

and max([psi], [delta]) < ([a.sub.i] - [a.sub.i+1])/2. Denote by [T.sub.[??]] the node set obtained by perturbing the nodes [a.sub.i] and [a.sub.i+1] to [a.sub.i] - [psi] and [a.sub.i+1] + [delta], respectively. Then, from (1.1) we have

[mathematical expression not reproducible],

where

[mathematical expression not reproducible]

and using the fact that x/(1 - x) is an increasing function, it is easy to see that g(p) < 0 for all p [member of] [0,1] \ {([a.sub.i] - [psi], [a.sub.i]) [union] ([a.sub.i+1],[a.sub.i+1] + [delta])}. Additionally, as f(p) < ||f|| on ([a.sub.i] -[psi], [a.sub.i] + [psi]) [union] ([a.sub.i]+1 -[delta], [a.sub.i+1] + [delta]), by selecting [??] > 0 smaller if needed, we can guarantee that ||D([T.sub.[??]];p)|| < ||f||, which is a contradiction to the optimality of f. This implies that for optimal penalty functions the node set T is well-ordered, i.e., [a.sub.0] < [a.sub.1] < *** < [a.sub.n].

Next, we prove the equimax property. Denote the global maxima on the consecutive subintervals by

[mathematical expression not reproducible].

We want to show that [[mu].sub.0] = * * * = [[mu].sub.n-1] = [||f||.sub.[infinity]] (we already know from Lemma 2.1 that [[mu].sub.-1] =[[mu].sub.n] = [||f||.sub.[infinity]]).

By contradiction, assume that for some i [member of] {0,..., n - 1} we have [[mu].sub.i] < [||f||.sub.[infinity]]. Then, as in the first part of the proof, we will construct a perturbation [T.sub.[??]] of the initial set of nodes, for which the corresponding penalty function has a smaller norm.

Fix [epsilon] > 0 small enough so that [[mu].sub.i]([T.sub.e]) < [||f||.sub.[infinity]], where

[T.sub.[??]] := {[a.sub.0],...,[a.sub.i] - [psi], [a.sub.i+1] + [delta],..., [a.sub.n]},

with [delta] and [psi] given in (2.2) (notice that now we are enlarging the original interval, while above it was shortened). We show that [[mu].sub.k]([T.sub.[??]]) < [[mu].sub.k](T) for all k, from which we will obtain || D ([T.sub.[??]]; p || < [||f||.sub.[infinity]], which is a contradiction to the optimality of f.

Indeed, let q [member of] [[a.sub.k], [a.sub.k+1]] be such that D([T.sub.[??]]; q) = [[mu].sub.k]([T.sub.[??]]). Then

[mathematical expression not reproducible],

where

[mathematical expression not reproducible]

and, thus, the fact that x/(1 - x) is an increasing function implies that [[mu].sub.k]([T.sub.[??]]) < [[mu].sub.k] < [||f||.sub.[infinity]] for all k = i. Since the choice of e implies that [[mu].sub.i]([T.sub.[??]]) < [||f||.sub.[infinity]], we obtain the desired contradiction.

EXAMPLE 2.3 (The optimal penalty function for some values of n). For illustration, in Figures 2.1-2.4 we present the computational results we obtained for the cases of n = 3,4,6, and 7 tosses (the plot for n = 5 was already given in Figure 1.1). They all confirm the conclusions of Theorem 2.2. In all of these figures, we plot the optimal choice for the penalty function D([a.sub.0],..., [a.sub.n];p), that is, AEME (red), the maximum likelihood function (green), and the maximum value of the optimal penalty function (black).

REMARK 2.4. When using the penalty function D corresponding to the squared error loss function (see (1.2)), it is shown in [13, Ch. 5] that the related minimax estimation (SEME) given by (1.3) has a constant risk function as illustrated in Figure 2.5, where the risk function for SEME (blue) is compared with those for AEME (red) and for the Maximum Likelihood Estimator (hereafter, MLE).

Since a constant function obviously satisfies the equimax property, this supports our conjecture that this equimax characterization should hold for minimax estimators corresponding to any convex loss function.

3. The asymptotic behavior of the minimax estimations. In this section we consider the limiting distribution of the minimax estimations as the number of tosses n approaches infinity.

DEFINITION 3.1. For every positive integer n let [A.sub.n] := {[a.sub.0n], [a.sub.1n],..., [a.sub.nn]} denote a minimax estimations set or simply a minimax node set, namely

[mathematical expression not reproducible].

Observe that, by Theorem 2.2, the node set [A.sub.n] is ordered. Moreover, if [B.sub.n] := [{k/n}.sup.n.sub.k=0] denotes the uniform node set corresponding to the maximum likelihood estimation, then

(3.1) [mathematical expression not reproducible].

Indeed, the O(1/[square root of (n)]) estimate follows as an application of Jensen's inequality to the convex function f(x) = [x.sup.2]

[mathematical expression not reproducible]

where we used the fact that the mean of the binomial distribution is [mu] = np and its variance is [[sigma].sup.2] = np(1 - p).

We will prove that the ordering of [A.sub.n] established in Theorem 2.2 and equation (3.1) yields that the limiting distribution is uniform; see Theorem 3.4 below. For this purpose let us remind the reader the definition of weak* convergence of a sequence of measures.

DEFINITION 3.2. Let {[[mu].sub.n]} be a sequence of measures supported on [0,1]. We say that it converges weakly (or weak*) to a measure [mu] if

[mathematical expression not reproducible],

where C[0,1] denotes all continuous functions on [0,1], or equivalently,

[mathematical expression not reproducible] for all [a,b] [subset] [0,1].

We denote this as

[[mu].sub.n] [??] [mu], as n [right arrow] [infinity].

DEFINITION 3.3. Consider a finite set [K.sub.n] := {[[alpha].sub.0n], [[alpha].sub.1n],... , [[alpha].sub.nn]}. We call the measure

[mathematical expression not reproducible]

a normalized counting measure of [K.sub.n]. Here [[delta].sub.x] denotes the Dirac-delta measure at the point x.

THEOREM 3.4. Suppose that the node sets [K.sub.n] := {[[alpha].sub.0n], [[alpha].sub.1n],... , [[alpha].sub.nn]} [subset] [0,1], n = 1,..., [infinity], are well-ordered and that

(3.2) [mathematical expression not reproducible].

Then the asymptotic distribution of [K.sub.n], as n tends to infinity, is uniform, namely

(3.3) [mathematical expression not reproducible],

where dx denotes the Lebesgue measure on the interval [0,1].

Proof. To prove (3.3) it is sufficient to establish that for all 0 < p < 1 we have

[mathematical expression not reproducible].

We shall prove the third inequality, the first being similar and the second being obvious. Suppose that there is a 0 < p < 1 for which it fails. Then there is an [??] > 0 such that

[mathematical expression not reproducible].

This implies that there is a subsequence {[n.sub.i]} and a number M such that

(3.4) [mathematical expression not reproducible] for all i [greater than or equal to] M.

For every i, denote [mathematical expression not reproducible]. Then the ordering of [mathematical expression not reproducible] implies that [mathematical expression not reproducible] for all l [less than or equal to] [k.sub.i], and hence,

[mathematical expression not reproducible]

where in the last inequality we apply [10, Theorem 1] using the fact that (3.4) yields that [k.sub.i] - q[n.sub.i] > q. This contradicts (3.2), which proves the theorem.

Since the minimax node sets [A.sub.n] are ordered, (3.1) allows us to establish the following corollary:

COROLLARY 3.5. The limiting distribution of the minimax node sets [A.sub.n] is the uniform distribution dx on [0,1].

REMARK 3.6. Observe that Theorem 3.4 establishes that the limiting distribution is the uniform one not just for the sequence of optimal choices, but for every sequence of "acceptable" strategies in the sense that they are well ordered and the sup-norm of their corresponding penalty functions approaches zero as n [right arrow] [infinity]. Thus, we see that all these acceptable estimators are asymptotically unbiased in the sense that their limit distribution as n [right arrow] [infinity] is the uniform distribution-the same as for the maximum likelihood estimators, which are the unique unbiased estimators for the parameter p.

It is also remarkable that this conclusion also holds for the sequence of Squared Error Minimax Estimators given by (1.3), which is easy to verify.

4. A problem in game theory. Now we are going to consider our estimation problem from the viewpoint of game theory; see, e.g., [1, 7, 20]. In particular, a non-cooperative, two-player, zero-sum, and mixed-strategy game will be posed, as we explain below.

Indeed, we are dealing with a simple two-player game, where Player I selects a probability p [member of] [0,1] and creates a coin such that P(heads) = p. He tosses the coin n times and provides the number i [member of] {0,1,..., n} of heads observed to Player II. Then, based on this value, Player II makes a guess [a.sub.i] [member of] [0,1] for the value of p and he will pay a loss of |p - [a.sub.i]| to Player I. Obviously, the goal of Player I is to maximize this loss, while Player II wants to minimize it.

More generally, let us assume that both players are allowed to follow what is commonly referred to as a "mixed strategy" in the game theory framework. That is, the choices of the players are not deterministic ("pure strategy"), but the available actions are selected according to certain probability distributions. Thus, let [OMEGA] denote the set of all probability distributions on the interval [0,1]. Suppose that when making their decisions, Player I is allowed to choose [mu] [member of] [OMEGA], and he picks p to follow the distribution d[mu], and Player II picks [x.sub.i] to follow his choice of d[[sigma].sub.i] distributions, where [[sigma].sub.i] [member of] [OMEGA], i = 0,1,..., n. Therefore, the expected penalty of Player II is

E([[sigma].sub.0],... , [[sigma].sub.n]; [mu]) := [Florin] *** [Florin] D([x.sub.0],...,[x.sub.n];p)d[[sigma].sub.0]([x.sub.0])...d[[sigma].sub.n]([x.sub.n])d[mu](p),

where D([x.sub.0],..., [x.sub.n];p) is the penalty function given in (1.1). In these terms, the goal of Player I will be to find

[mathematical expression not reproducible],

while the second player will try to get

[mathematical expression not reproducible].

Using again the terminology from game theory, this is a "zero-sum" game (that is, the total gains of players minus the total losses add up to zero). Now, we are looking for the so-called mixed-strategy Nash-equilibrium. The basic notion of equilibrium in game theory finds its roots in the work by Cournot (1838) but was formalized in the celebrated papers by Nash [17, 18], where the (now called) Nash-equilibrium was established for finite games by using Kakutani's fixed point theorem [11]. The problem for continuous games, where the players may choose their strategies in continuous sets (as in the present case), is more involved.

The following result, establishing the mixed-strategy Nash-equilibrium for our problem, is a direct application of Glicksberg's theorem [8], which makes use of an extension of Kakutani's theorem to convex linear topological spaces. An alternate method of proof consists in taking finer and finer discrete approximations of our continuous game, for which the existence of a Nash-equilibrium was established, and then using the continuity of (1.1) and standard arguments of weak convergence. For more information about the successive extensions of the Nash-equilibrium problem, see for example the monographs [7, 20]. There are also many papers about such extensions from the point of view of the applications to business; see, e.g., [15, 5], to only cite a few.

THEOREM 4.1.

(4.1) [mathematical expression not reproducible].

For our analysis it is important to make use of the following discretization of the probability distribution setting in (4.1) related to the second player's strategy.

THEOREM 4.2. The minimax problem for distributions (4.1) admits the following discretization

(4.2) [mathematical expression not reproducible]

Proof. Let [a.sub.i] := [[Florin].sub.0] [theta] d[[sigma].sub.i] ([theta]), i = 0, 1,..., n. Since [[Florin].sup.1.sub.0] |p - [theta]|d[[sigma].sub.i]([theta]) > |p - [a.sub.i]|, we have that

E([[sigma].sub.0],... , [[sigma].sub.n]; [mu]) > E([a.sub.0],..., [a.sub.n]; [mu]).

Then, by the continuity of D([a.sub.0],..., [a.sub.n];p), we get

[mathematical expression not reproducible].

Hence,

(4.3) [mathematical expression not reproducible].

On the other hand, if for fixed points [a.sub.0],...,[a.sub.n] [member of] [0,1], we take [[sigma].sub.i] = [[delta].sub.ai], it is clear that

E([[sigma].sub.0],... , [[sigma].sub.n]; [mu]) = E([a.sub.0],..., [a.sub.n]; [mu]),

and thus,

(4.4) [mathematical expression not reproducible].

However, for some [mu]* [member of] [OMEGA],

(4.5) [mathematical expression not reproducible]

and this settles the proof of (4.2).

REMARK 4.3. The above discretization (4.2) shows that the optimal strategy for Player II described in the previous section (Theorem 2.2) agrees with the set of Absolute Error Minimax Estimators (AEME), using the language of point estimation theory.

Therefore, our main concern now is the optimal strategy for Player I. But in this sense, the arguments used in the proof of Theorem 4.2 also have an important consequence for Player I's strategy. Indeed, if we denote by [mu]* an optimal distribution for the first player and by supp [mu]* [subset] [0,1] its support, then we have the following result:

LEMMA 4.4.

(4.6) supp[mu]* [subset] M(f).

Proof. It is enough to realize that in the proof of Theorem 4.2, equations (4.3)-(4.4) show that for an extremal measure [mu]* (for which the Nash-equilibrium (4.1) is attained), (4.5) is actually an equality, and therefore, supp [mu]* must be contained in the set where the equality f(p) = [||f||.sub.[infinity]] = ||D([a.sub.0],... , [a.sub.n];.)[||.sub.[infinity]] is attained (with ([a.sub.0],..., [a.sub.n]) being an optimal choice for Player II). This establishes (4.6).

Theorem 4.2 and Lemma 4.4 suggest that, from this point on, we may assume both players follow strategies based on discrete distributions. Indeed, while a Player II's optimal (pure) strategy will be a choice {[a.sub.0],..., [a.sub.n]} [subset] [0,1], an optimal (mixed) strategy for Player I will be based on an atomic measure

[mathematical expression not reproducible],

where [{[p.sub.j]}.sup.k.sub.j=1] [subset] M(f) and [[summation].sup.k]j=1 [m.sub.j] = 1.

EXAMPLE 4.5 (The case of n = 1 toss). Because of the simplicity and the symmetry of the problem, the method to find the strategies satisfying the Nash-equilibrium (4.1) can be carried out easily in this simple case by using Lemmas 2.1 and 4.4 above. Therefore, we skip the details.

The optimal discrete strategy [mu] of Player I is the following: choose the atomic measure

[mathematical expression not reproducible],

with [p.sub.0] = 0,[p.sub.1] = 0.5, [p.sub.2] = 1 and the corresponding weights given by [m.sub.0] = 0.25, [m.sub.1] = 0.5, and [m.sub.2] = 0.25. Further, for [a.sub.0] [member of] [0,0.5], [a.sub.1] [member of] [0.5,1], we have E([a.sub.0], [a.sub.1]; [mu]) = 0.25 and for any [a.sub.0], [a.sub.1] [member of] [0,1] we have 0.25 [less than or equal to] E([a.sub.0], [a.sub.1];[mu]). Thus, for any [[sigma].sub.0], [[sigma].sub.1] [member of] [OMEGA] we have 0.25 [less than or equal to] E([[sigma].sub.0], [[sigma].sub.1]; [mu]).

The optimal discrete strategy of Player II is the following: [[sigma].sub.0] is the unit point mass at [a.sub.0] = 0.25, and [[sigma].sub.1] is the unit point mass at [a.sub.1] = 0.75.

The graph of f(p) = D(0.25,0.75;p) = (1 - p)|p - 0.25| + p|p - 0.75| is displayed in Figure 4.1. Since for all p [member of] [0,1] we have f(p) [less than or equal to] 0.25, we conclude that for any [mu] [member of] [OMEGA], E([[sigma].sub.0], [[sigma].sub.1];[mu]) [less than or equal to] 0.25. So the Nash-equilibrium is established, and if both players follow the outlined strategies, then Player II pays E([a.sub.0], [a.sub.1]; [mu]) = 0.25 dollars to Player I.

REMARK 4.6. Going back to the point estimation theory approach, the results in the previous Theorem 4.2 and Lemma 4.4 admit an interpretation in terms of the connection between Bayes and minimax estimators, as mentioned in the introduction. Indeed, Theorem 5.1.4 and especially Corollary 5.1.5 in [13] establish sufficient conditions to ensure that a Bayes estimator is also a minimax one, namely, that the average risk (or penalty) of the Bayes estimator [[delta].sub.[LAMBDA]] for a certain prior distribution [LAMBDA] agrees with the value of the maximum of that risk; see (1.4). Hence, for such distribution [LAMBDA], the risk function must be constant. If this is the case, the prior distribution [LAMBDA] is said to be a least favorable one. In this sense, our results are a sort of converse to the ones in [13].

When the squared error loss function is used, one could see in Figure 2.5 that the risk function for the corresponding minimax estimator is constant throughout the interval [0,1]. In this case, it is proven in [13, Ex. 5.1.7] that this minimax estimator is indeed a Bayes estimator with respect to a continuous distribution supported in [0,1], namely the Beta distribution

[mathematical expression not reproducible].

Therefore, this last distribution plays the role of a least favorable one, and its corresponding Bayes estimator is proven to be unique. Hence by [13, Theorem 5.1.4], it is also unique as minimax estimator. Taking into account our previous results, this least favorable distribution would represent the optimal strategy for Player I in this case.

The above connection for our absolute error loss function is more involved, and it will be discussed in the next section, where the Nash-equilibrium for the case of n = 2 tosses is thoroughly analyzed.

5. A constructive proof of the Nash-equilibrium for the case of n = 2 tosses. Now, we are concerned with the existence and uniqueness of a strategy pair solving the Nash-equilibrium (4.1) in the case of n = 2 tosses. Our method will be based on the previous Theorem 2.2 and the Lemmas 2.1, 4.4.

Recall that the strategy of Player II is to minimize the maximum of the penalty function, namely to determine optimal outcomes {[a*.sub.0], [a*.sub.1], [a*.sub.2]} defining an optimal penalty function f(p) := D([a*.sub.0], [a*.sub.1], [a*.sub.2];p) such that

(5.1) [mathematical expression not reproducible]

That such an f exists follows easily by a compactness argument.

The strategy of Player I is to find a probability measure d[mu]* (p) supported on [0,1] that maximizes the expected penalty no matter what the choice of Player II is, i.e., to determine

[mathematical expression not reproducible]

Clearly, for any [mu] [member of] [OMEGA],

[[Florin].sup.1.sub.0] D([a.sub.0],[a.sub.1],[a.sub.2];p)d[mu](p) [less than or equal to] ||D([a.sub.0],[a.sub.1],[a.sub.2];p)||[infinity],

so,

[mathematical expression not reproducible],

which implies, after taking the max over all [mu], that F [less than or equal to] [||f||.sub.[infinity]]. Then, our goal in this section is to find an optimal strategy pair {[a*.sub.0], [a*.sub.1], [a*.sub.2]} [subset] [0,1], [mu]* [member of] [OMEGA], for which the Nash-equilibrium F = [||f||.sub.[infinity]], is uniquely reached. The main result in this section is stated as follows:

THEOREM 5.1. The Nash-equilibrium is reached if the players use the following strategies:

Player I: Choose p according to the distribution [mathematical expression not reproducible], where

(5.2) [mathematical expression not reproducible]

is the unique real root of the polynomial [x.sup.3] - [x.sup.2] + 3x - 1 and

[p.sub.2] = 1 - [p.sub.1] [approximately equal to] 0.6389, [p.sub.0] = 0, [p.sub.3] = 1.

The weights [m.sub.i] are given by:

(5.3) [mathematical expression not reproducible]

Player II: Choose the following values ai

[mathematical expression not reproducible].

Furthermore, the above pair of strategies is unique in the following sense: if the choice of distributions {[[sigma].sub.0], [[sigma].sub.1], [[sigma].sub.2], [mu]} satisfies the Nash-equilibrium (4.1), then E([[sigma].sub.i]) = [a.sub.i], i = 0,1, 2, and [mathematical expression not reproducible], where [a.sub.i], [m.sub.i], and [p.sub.i] are described as above.

The graph of the optimal penalty function given in Theorem 5.1, f(p) = D([a.sub.0], [a.sub.1],[a.sub.2];p), is displayed in Figure 5.1. Observe that the interlacing property also holds

[p.sub.0] < [a.sub.0] < [p.sub.1] < [a.sub.1] < [p.sub.2] < [a.sub.2] < [p.sub.3].

For the proof of Theorem 5.1 two technical lemmas are needed.

LEMMA 5.2. The optimal choice {[a.sub.0], [a.sub.1], [a.sub.2]} satisfies

0 < [a.sub.0] < 0.2 < 0.4 < [a.sub.1] < 0.6 < 0.8 < [a.sub.2] < 1.

In addition, [a.sub.1] - [a.sub.0] < 0.4 and [a.sub.2] - [a.sub.1] < 0.4.

Proof. It is easy to verify that for the symmetric choice {[a.sub.0], [a.sub.1], [a.sub.2]}, where [a.sub.0] = 0.195, [a.sub.1] = 0.5, [a.sub.2] = 1 - [a.sub.0] = 0.805, we have that

||D([a.sub.0], [a.sub.1], [a.sub.2];p)[||.sub.[infinity]] = 0.195 < 0.2.

Therefore, [||f||.sub.[infinity]] < 0.2. Since f(0) = [a.sub.0] and f(1) = 1 - [a.sub.2], we immediately obtain that [a.sub.0] < 0.2 and 0.8 < [a.sub.2]. On the other hand, if there exists a p [member of] [0,1] such that |p - [a.sub.i]| [greater than or equal to] 0.2, i = 0,1,2, then [||f||.sub.[infinity]] > mini |p - [a.sub.i]| [greater than or equal to] 0.2, which is a contradiction. Thus,

[mathematical expression not reproducible] holds for all p [member of] [0,1].

This implies that 0 < [a.sub.0] and [a.sub.2] < 1 (otherwise, the Dirichlet Pigeonhole Principle implies there exists a p such that mi[n.sub.i] |p - [a.sub.i]| > 0.2). The fact that mi[n.sub.i] |p - [a.sub.i]| < 0.2,p [member of] [0,1], also implies that [a.sub.1] - [a.sub.0] < 0.4 and [a.sub.2] - [a.sub.1] < 0.4.

To derive that 0.4 < [a.sub.1] < 0.6, we only need to use the values p = 0.4 and p = 0.6. From mi[n.sub.i] |0.4 - [a.sub.i]| < 0.2, we must have |0.4 - [a.sub.1]| < 0.2 or [a.sub.1] < 0.6, and from mi[n.sub.i] |0.6 - [a.sub.i]| < 0.2, we must have |0.6 - [a.sub.1]| < 0.2 or 0.4 < [a.sub.1].

REMARK 5.3. While the "test" values used in the above proof might seem, at first glance, quite arbitrary, their usefulness can be easily shown by a simple numerical experimentation. In particular, for such a symmetric choice with 0 < [a.sub.0], [a.sub.1] = 0.5, and [a.sub.2] = 1 - [a.sub.0] < 1, we see that the restrictions of the penalty function f to the "end" subintervals [0, [a.sub.0]] and [1 - [a.sub.0], 1] are straight lines whose respective maximum values (attained at the endpoints 0 and 1) are given by [a.sub.0], and the restrictions to the "central" intervals [[a.sub.0], 0.5] and [0.5,1 - [a.sub.0]], are concave functions. This fact will be used again in the proof of Theorem 5.1 below.

Now, as a consequence of Lemma 5.2 we have the following result, which completes the previous Lemma 2.1.

LEMMA 5.4. For an optimal choice {[a.sub.0], [a.sub.1], [a.sub.2]}, we have that {0,1} [subset] M(f).

Proof. Indeed, consider the restriction of f(p) to [0, [a.sub.0]]. Then,

f(p) = ([a.sub.0] - 2[a.sub.1] + [a.sub.2])[p.sup.2] - (1 + 2[a.sub.0] - 2[a.sub.1])p + [a.sub.0].

Since the global maximum is attained on [0, [a.sub.0]], and it is not at [a.sub.0], the only possibility of it not being at p = 0 is when [a.sub.0] - 2[a.sub.1] + [a.sub.2] < 0 and the x-coordinate of the parabola's vertex satisfies (1/2 + [a.sub.0] - [a.sub.1])/([a.sub.0] - 2[a.sub.1] + [a.sub.2]) > 0 or 1/2 + [a.sub.0] - [a.sub.1] < 0. This implies that [a.sub.1] - [a.sub.0] > 1/2, which, as shown in Lemma 5.2, is impossible if f is the optimal solution of (5.1). Therefore, we derive that 0 [member of] M(f). In a similar fashion, one gets 1 [member of] M(f).

Proof of Theorem 5.1. From Lemma 2.1, Theorem 2.2, and Lemma 4.4, we already know that

d[mu](p) = [[beta].sub.0][[delta].sub.0] + [[beta].sub.1][[delta].sub.p] + [[beta].sub.2][[delta].sub.q] + [[beta].sub.3][[delta].sub.1],

for some [[beta].sub.i] > 0, [[beta].sub.0] + [[beta].sub.1] + [[beta].sub.2] + [[beta].sub.3] = 1, and p [member of] ([a.sub.0], [a.sub.1]), q [member of] ([a.sub.1], [a.sub.2]). Since we are in the case of a Nash-equilibrium, the quantity

E([a.sub.0],[a.sub.1],[a.sub.2];[mu]) = [epdilon]([a.sub.0],[a.sub.1],[a.sub.2];p,q) := [Florin][1.sub.0] f(r)d[mu](r) = [[beta].sub.0]f(0) + [[beta].sub.1]f(p) + [[beta].sub.2]f(q) + [[beta].sub.3]f(1) = [a.sub.0] [[beta].sub.0.sup.-] [[beta].sub.1] [(1 - p).sup.2] - [[beta].sub.2] [([1.sup.-] q).sup.2]] + [a.sub.1] [2[[beta].sub.1]p(1 - p) - 2[[beta].sub.2]q(1 - q)] +[a.sup.2] [[[beta].sub.1][p.sup.2] + [[beta].sub.2][q.sup.2] - [[beta].sub.3]] + [[[beta].sub.3].sup.-] [[beta].sub.1]p + [[beta].sub.2]q + 2[[beta].sub.1]p[(1 - p).sup.2] - 2[[beta].sub.2][q.sup.3]

is a global minimum (with respect to {[a.sub.0], [a.sub.1], [a.sub.2]} [member of] (0,1)), which implies that

[mathematical expression not reproducible].

Therefore, the coefficients in front of [a.sub.0], [a.sub.1], and [a.sub.2] vanish for the given choice of p, q, and [[beta].sub.i], i = 0,1,2, 3. Hence, the optimization problem (5.1) becomes a constrained minimization problem

Maximize [[beta].sub.3] - [[beta].sub.1]p + [[beta].sub.2]q + 2[[beta].sub.1]p[(1 - p).sup.2] - 2[[beta].sub.2][q.sup.3]

(5.4) [mathematical expression not reproducible]

Eliminating [[beta].sub.0] and [[beta].sub.3] we reduce (5.4) to

Maximize [[beta].sub.1]p(1 - p)(1 - 2p) + [[beta].sub.2]q(1 - q)(1 + 2q)

(5.5) [mathematical expression not reproducible]

Further, eliminating [[beta].sub.2] from (5.5) we derive

Maximize 2[[beta].sub.1]p(1-p)(1 - p+q)

Subject to [mathematical expression not reproducible] p, q, [[beta].sub.1] [member of] [0,1].

Substituting 2[[beta].sub.1]p(1 -p) and denoting x = 1 - p, y = q, we obtain the minimization problem

(5.6) [mathematical expression not reproducible].

Since 1/[x(1 - x)] is a convex function, it is easy to see that if x + y is kept constant, then the minimum in (5.6) is attained when x = y, which implies that q = 1 - p (and subsequently [[beta].sub.3] = [[beta].sub.0] and [[beta].sub.2] = [[beta].sub.1]) is a necessary condition for a Nash-equilibrium selection. Moreover, assuming x = y, we obtain that x must minimize the function

[mathematical expression not reproducible]

Differentiating, we see that g'(x) = 0 has only one solution in [0,1], which satisfies [x.sup.3] = 2[(1 - x).sup.2] or [(1 - p).sup.3] = 2[p.sup.2], so p = [p.sub.1], where [p.sub.1] is given in (5.2). It now follows easily that [[beta].sub.0] = [m.sub.0] and [[beta].sub.1] = [m.sub.1], where [m.sub.0] and [m.sub.1] are given in (5.3). Therefore, we have proven that if p = [p.sub.1] and q = 1 - [p.sub.1], with

0 < [a.sub.0] < [p.sub.1] < [a.sub.1] < q = 1 - [p.sub.1] < [a.sub.2] < 1,

then E([a.sub.0], [a.sub.1],[a.sub.2]; [mu]) does not depend on [a.sub.0], [a.sub.1], [a.sub.2], or,

[mathematical expression not reproducible]

Keeping in mind the results of Theorem 2.2, it follows that the optimal strategy for Player II is given by: [a.sub.0] = E([p.sub.1]) [approximately equal to] 0.1916..., [a.sub.1] = 0.5,[a.sub.2] = 1 - [a.sub.0] [approximately equal to] 0.8084. The graph of f(p) = D([a.sub.0],[a.sub.1],[a.sub.2];p) is displayed in Figure 5.1.

Indeed,f(0) = f(1) = a0, and direct but cumbersome calculations show that f([p.sub.1]) = [a.sub.0] and f'([p.sub.1]) = 0, and by symmetry f(1 - [p.sub.1]) = [a.sub.0], f'(1 - [p.sub.1]) = 0. One has f'(0) = - f'(1) [approximately equal to] -0.38 and f"([p.sub.1]) = f"(1 - [p.sub.1]) [approximately equal to] -4.43. Since f is a continuous piecewise-polynomial function whose restrictions to [0, [a.sub.0]] and to [[a.sub.n], 1] are straight lines while the restrictions to [[a.sub.0],[a.sub.1]] and [[a.sub.1],[a.sub.2]] are concave functions, it follows that the set of the absolute maxima of f, M(f), cannot contain more than 4 points, which are precisely 0,[p.sub.1], 1 - [p.sub.1], 1. Thus, [||f||.sub.[infinity]] = [a.sub.0] = E([p.sub.1]) = 0.1916... Therefore, the Nash-equilibrium (4.1) is established.

REMARK 5.5. The proof of Theorem 5.1 as well as Example 4.5 (the one-toss case) show that the Nash strategy pair for Players II and I may be seen as the pair consisting of the set of absolute error minimax estimators and the least favorable prior distribution for which the former become Bayes estimators. However, unlike the case when the squared error loss function is used (see Remark 4.6), in the above proof of Theorem 5.1 as well as in the discussion of Example 4.5, it was shown that for the current absolute error loss function, given the least favorable distribution [mu], there is no unique corresponding Bayes estimator. Indeed, we have seen that for n = 1 or 2 tosses and for [mathematical expression not reproducible] given in Example 4.5 and Theorem 5.1, respectively, every configuration {[a.sub.0],..., [a.sub.n]} satisfying the interlacing property 0 = [p.sub.0] [less than or equal to] [a.sub.0] [less than or equal to] [p.sub.1] [less than or equal to]... [less than or equal to] [a.sub.n] [less than or equal to] [p.sub.n]+1 = 1 is a Bayes estimator for [mu]. Does this hold for n > 2? And, furthermore, in spite of the lack of uniqueness of the Bayes estimator, is our minimax estimator unique? These issues will be taken up again in the next section.

6. Conclusions and further remarks. In this paper, minimax techniques from approximation theory have been applied to a classical problem in probability: estimating the probability of a biased coin after a few tosses. We used minimax estimation with absolute error loss function to solve the problem, characterizing the optimal solution, and studying the asymptotics of the optimal estimators as the number of tosses tend to infinity. In addition, the method employed has been described within the framework of a non-cooperative (in particular, zero-sum) two-player game, where both players are allowed to make use of mixed strategies, which in turn is closely related to the connection between minimax and Bayes estimators in point estimation theory. Our main results are Theorem 2.2, where the optimal strategy choice for the second player is characterized by means of a property with a striking resemblance to a well-known problem in polynomial interpolation, and Theorem 3.4 with its Corollary 3.5, where the uniform limiting distribution of the optimal choices is established. Likewise, the result of Theorem 5.1, where the Nash-equilibrium for the case of n = 2 tosses is uniquely solved, is also remarkable.

In view of the results obtained, some further remarks and questions are noteworthy and will be subject to further research.

* We have shown that the sup-norm of the penalty function for the optimal choice is clearly smaller than the one corresponding to the Maximum Likelihood Estimator (MLE) choice (i.e., [a.sub.k] = k/n, k = 0,..., n; see Figure 1.1), especially for small values of n. But it is interesting to observe that for a slight modification of the MLE, namely taking [a.sub.k] = (k + 1)/(n + 2), k = 0,..., n, the results are much closer to those corresponding to the optimal choice (see Figure 6.1, where the initial Figure 1.1 has been augmented by adding the graph of the penalty function for the modified MLE). Indeed, this modified MLE (MMLE) is much easier to compute than the optimal choice (especially for big values of n) and seems to provide near optimal results. In other words, if we replace the optimal selection by this MMLE, a near Nash-equilibrium arises (also commonly referred to as an [epsilon]-Nash-equilibrium). This is a well-known problem in game theory: since the optimal strategies are often difficult to compute, it is customary to look for easily computable approximations for which the deviation from equality in the "pure" Nash-equilibrium (4.1) is small enough. From a game theory standpoint, the difference between pure and near (or [epsilon]-) Nash-equilibria consists in the fact that while in the "pure" setting, no player has a motivation to modify his strategy (corresponding to the optimal strategy pair), in the near equilibrium setting there exists a small incentive to do it. Of course, our version of the near Nash-equilibrium using the MMLE only deals with the second player's strategy. For more information about near Nash-equilibrium, see [6].

This difference between "pure" and near Nash-equilibrium also has a counterpart regarding the optimality of the nodes in the sense of the Lebesgue constant in the context of polynomial interpolation. Indeed, if the interval [-1,1] is considered, it is well known that the so-called extended Chebyshev nodes (that is, the zeros of the Chebyshev polynomial Tn+1 adjusted in such a way that the first and last zero fall on the endpoints of the interval) provide a near optimal choice of interpolating nodes; see [2, 3, 4]. Also, for a deeper discussion on near optimal choices of nodes, see [22].

* However, Theorem 3.4 shows that, as n increases, the optimal choice as well as the MLE and MMLE ones or any other "acceptable" choice (such as the SEME given by (1.3)) approach the same limiting distribution in light of Remark 3.6. In other words, the advantage of using the optimal strategy over the MLE is worthwhile only for a small, or not very large, number of tosses.

* In the solution of the Nash-equilibrium for the case of n = 2 tosses, we found that the interlacing property 0 = [p.sub.0] <[a.sub.0] <[p.sub.1] < [a.sub.1] < [p.sub.2] < [a.sub.2] < [p.sub.3] = 1 was satisfied. But what about the asymptotic distribution of the atomic measures

[mathematical expression not reproducible],

which give the optimal strategy of the first player for each n (provided they exist)?

Of course, the same question may be posed using the language of point estimation theory, regarding the pair formed by the minimax estimator and its corresponding least favorable prior distribution.

* As for the equimax property for the optimal choice of Player II (minimax estimation) given in Theorem 2.2, it is necessary to point out a couple of pending questions about the uniqueness of the solution. First, is there a unique optimal configuration {[a.sub.0],..., [a.sub.n]} for each n? And secondly, Theorem 2.2 gives that the intersection of the set of maxima M(f) with each subinterval ([a.sub.i], [a.sub.i+1]) corresponding to an optimal choice is nonempty. But, is there just a single absolute maximum on each subinterval? It was only established for the case of n = 2 tosses and numerical results seem to confirm that it also holds for larger values of n.

Acknowledgement. The proof of Theorem 3.4 is inspired by an argument presented by Vilmos Totik to the third author, for which the authors are very grateful. In addition, the authors wish to thank Prof. Jose Luis Fernandez Perez (Universidad Autonoma, Madrid) for helping us to suitably place the present work within the point estimation theory literature.

The research of PD was supported, in part, by a Simons Foundation grant no. 282207. The research of RO was partially supported by Ministerio de Ciencia e Innovacion of Spain under grant MTM2015-71352-P and was conducted while visiting PFW as a Scholar-in-Residence.

REFERENCES

[1] D. A. BLACKWELL AND M. A. GIRSHICK, Theory of Games and Statistical Decisions, Dover, New York, 2012.

[2] C. DE BOOR, Polynomial Interpolation, in Proc. of the International Conference of Mathematicians 1978, O. Lehto, ed., Acad. Sci. Fennica, Helsinki, 1980, pp. 917-922.

[3] C. DE BOOR AND A. PINKUS, Proof of the conjectures of Bernstein and Erdos concerning the optimal nodes for polynomial interpolation, J. Approx. Theory, 24 (1978), pp. 289-303.

[4] L. BRUTMAN, On the Lebesgue function for polynomial interpolation, SIAM J. Numer. Anal., 15 (1978), pp. 699-704.

[5] O. CANDOGAN, A. OZDAGLAR, AND P. A. PARRILLO, Dynamics in near-potential games, Games Econom. Behav., 82 (2013), pp. 66-90.

[6] C. DASKALAKIS, P. W. GOLDBERG, AND C. H. PAPADIMITRIOU, The complexity of computing a Nash equilibrium, SIAM J. Comput., 39 (2009), pp. 195-259.

[7] D. FUDENBERG AND J. TIROLE, Game Theory, MIT Press, Cambridge, 1991.

[8] I. L. GLICKSBERG, A further generalization of the Kakutani fixed point theorem with application to Nash-equilibrium, Proc. Amer. Math. Soc., 3 (1952), pp. 170-174.

[9] O. HOLTZ, F. NAZAROV, AND Y. PERES, New coins from old, smoothly, Constr. Approx., 33 (2011), pp. 331-363.

[10] R. KAAS AND J. M. BUHRMAN, Mean, median and mode in binomial distributions, Statist. Neerlandica, 34 (1980), pp. 13-18.

[11] S. KAKUTANI, A generalization of Brouwer's fixed point theorem, Duke Math. J., 8 (1941), pp. 457-159.

[12] T. A. KILGORE, A characterization of the Lagrange interpolating projection with minimal Tchebycheff norm, J. Approx. Theory, 24, (1978), pp. 273-288.

[13] E. L. LEHMANN AND G. CASELLA, Theory of Point Estimation, 2nd ed., Springer, New York, 1998.

[14] G. MASTROIANNI AND G. MILOVANOVIC, Interpolation Processes. Basic Theory and Applications, Springer, Berlin, 2008.

[15] R. D. MCKELVEY AND T. R. PALFREY, A statitistical theory of equilibrium games, Japan Econ. Rev., 47 (1996), pp. 186-209.

[16] S. NACU, Y. PERES, Fast simulation of new coins from old, Ann. Appl. Probab., 15 (2005), pp. 93-115.

[17] J. F. NASH, Equilibrium points in N-person games, Proc. Nat. Acad. Sci. U. S. A., 36 (1950), pp. 48-49.

[18] __, Non-cooperative games, Ann. of Math. (2), 54, (1951), pp. 286-295.

[19] J. VON NEUMANN, Various techniques used in connection with random digits, J. Res. Nat. Bur. Stand. Appl. Math. Series., 3 (1951), pp. 36-38.

[20] M. J. OSBORNE AND A. RUBINSTEIN, A Course in Game Theory, MIT Press, Cambridge, 1994.

[21] J. PFANZAGL, Parametric Statistical Theory, de Gruyter, Berlin, 1994.

[22] S. J. SMITH, Lebesgue constants in polynomial interpolation, Ann. Math. Inform., 33 (2006), pp. 109-123.

DAVID BENKO ([dagger]), DAN COROIAN ([double dagger]), PETER DRAGNEV ([double dagger]), AND RAMON ORIVE ([section])

Dedicated to Walter Gautschi on the occasion of his 90th birthday

(*) Received February 5, 2018. Accepted June 5, 2018. Published online on January 14, 2019. Recommended by G. Milovanovic.

([dagger]) Department of Mathematics and Statistics, University of South Alabama, Mobile, AL 36688 (dbenko@southalabama.edu).

([double dagger]) Department of Mathematical Sciences, Purdue University, Fort Wayne, IN 46805 ({coroiand, dragnevp}@pfw.edu).

([section]) Departmento de Analisis Matematico, Universidad de La Laguna, 38200, The Canary Islands (rorive@ull.es).

DOI: 10.1553/etna_vol50s109
COPYRIGHT 2018 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.