# On the existence theorems of Kantorovich, Miranda and Borsuk.

Abstract. The theorems of Kantorovich, Miranda and Borsuk all give
conditions on the existence of a zero of a nonlinear mapping. In this
paper we are concerned with relations between these theorems in terms of
generality in the case that the mapping is finite-dimensional. To this
purpose we formulate a generalization of Miranda's theorem, holding
for arbitrary norms instead of just the [l.sub.[infinity]]-norm. As our
main results we then prove that the Kantorovich theorem reduces to a
special case of this generalized Miranda theorem as well as to a special
case of Borsuk's theorem. Moreover, it turns out that, essentially,
the Miranda theorems are themselves special cases of Borsuk's
theorem.

Key words. nonlinear equations, existence theorems, fixed points, Newton-Kantorovich theorem, Miranda theorem, Borsuk theorem.

AMS subject classifications. 47H10, 47J05, 65H10.

1. Introduction. In this paper we are concerned with two well-known classical theorems, both of which guarantee the existence of a zero of a nonlinear mapping f from a norm ball in [R.sup.n] to [R.sup.n]. These theorems are Kantorovich's theorem and Borsuk's theorem. Miranda's theorem, which we also consider, is essentially a special version of Borsuk's theorem in the case that the norm ball is a box, i.e. the norm is the maximum norm. Kantorovich's theorem and Borsuk's theorem apparently are very different in nature: Kantorovich's theorem is motivated by the analysis of Newton's iteration to approximate a zero of f and it gives an a priori criterion for the convergence of this iteration, in this manner proving that there is a zero of f within a certain ball centered at the initial guess for Newton's method. The major ingredients in its hypotheses are the Lipschitz-continuity of the derivative of f in a sufficiently large neighbourhood of the starting point and the assumption that the function value at the starting point is sufficiently small. On the other hand, Borsuk's theorem only requires the mapping f to be continuous on the ball and to fulfill a non-colinearity condition for the function values at all pairs of antipodal points on the boundary of the ball.

The purpose of this paper is to prove the remarkable fact that the Kantorovich theorem is (essentially) a special case of Borsuk's theorem in the sense that the hypotheses of the Kantorovich theorem imply those of Borsuk. For the case that the norm is the [l.sub.[infinity]]-norm this result was essentially already obtained in [1], where it was shown that the hypotheses of Kantorovich's theorem imply those of Miranda's theorem. In the present paper we formulate a version of Miranda's theorem holding for arbitrary norms, which is then proven to be 'in between' Kantorovich and Borsuk, i.e. more general than Kantorovich's but (essentially) more special than Borsuk's theorem.

Let us remark here that our results heavily rely on the finite dimension of the underlying vector space. While the Kantorovich theorem immediately extends to general Banach spaces, Borsuk's theorem does so only if one substantially restricts the class of mappings to be considered, for example to compact modifications of the identity or generalizations thereof. The rest of this paper is organized as follows: In the next section we give precise formulations of Kantorovich's theorem, of Miranda's theorem and of Borsuk's theorem. In section 3 we formulate and prove our generalization of Miranda's theorem for arbitrary norms. In section 4 we then come to the major result of this paper establishing the hierarchy of the theorems with respect to generality. Some conclusions are formulated in section 5.

2. The theorems of Kantorovich, Miranda and Borsuk. We start with Kantorovich's theorem. It can be stated in its 'standard' form and in an 'affine invariant' form. Although the latter is the more general one, the standard form is the one that can usually be found in textbooks. We therefore give both versions.

Here, as in the sequel, [parallel]x[parallel] denotes some arbitrary norm in [R.sup.n] and its corresponding operator norm. The closed ball with radius [rho] [greater than or equal to] [x.sup.0], centered at [x.sup.0], is

[bar.B]([x.sup.0], [rho]) = {x [member of] [R.sup.n] : [parallel]x - [x.sup.0][parallel] [less than or equal to] [rho]}.

B([x.sup.0], [rho]) and [partial derivative]B([x.sup.0], [rho]) denote the topological interior and boundary of [bar.B]([x.sup.0], [rho]), respectively.

THEOREM 2.1. (Kantorovich, standard form [8]) Let f : D [subset or equal to] [R.sup.n] [right arrow] [R.sup.n] be differentiable in the open convex set D. Assume that for some point [x.sup.0] [member of] D the Jacobian f'([x.sup.0]) is invertible with

[parallel]f'[([x.sup.0]).sup.-1][parallel] [less than or equal to] [beta], [parallel]f'[([x.sup.0]).sup.-1] f([x.sup.0])[parallel] [less than or equal to] eta.

Let there be a Lipschitz constant [kappa] > 0 for f' such that

[parallel]f'(u) - f'(v)[parallel] [less than or equal to] [kappa] x [parallel]u - v[parallel] for all u, v [member of] D.

If h = [eta][beta][kappa] [less than or equal to] 1/2 and [bar.B]([x.sup.0], [[rho].sub.-]) [subset or equal to] D, where

[[rho].sub.-] = 1 - [square root of (1 - 2h)]/[beta][kappa]

then f has a zero x* in [bar.B]([x.sup.0], [[rho].sub.-]). Moreover, this zero is the unique zero of f in ([bar.B]([x.sup.0], [[rho].sub.-]) [union] B([x.sup.0], [[rho].sub.+])) [intersection] D where [[rho].sub.+] = 1 + [square root of (1 - 2h)]/[beta][kappa] and the Newton iterates [x.sup.k] with

[x.sup.k+1] = [x.sup.k] - f'[([x.sup.k]).sup.-1] f([x.sup.k])

are well-defined, remain in [bar.B]([x.sup.0], [[rho].sub.-]) and converge to x*.

The following affine invariant form of the Kantorovich theorem is a generalization of the standard form as can be seen immediately by setting [omega] = [beta][kappa].

THEOREM 2.2. (Kantorovich, affine invariant form [4, 5]) Let f : D [subset or equal to] [R.sup.n] [right arrow] [R.sup.n] be differentiable in the open convex set D. Assume that for some point [x.sup.0] [member of] D the Jacobian f'([x.sup.0]) is invertible with

[parallel]f'[([x.sup.0]).sup.-1] f([x.sup.0])[parallel] [less than or equal to] [eta].

Let there be a Lipschitz constant [omega] > 0 for f'[([x.sup.0]).sup.-1] f' such that

[parallel]f'[([x.sup.0]).sup.-1](f'(u) - f'(v))[parallel] [less than or equal to] [omega] x [parallel]u - v[parallel] for all u, v [member of] D.

If h = [eta][omega] [less than or equal to] 1/2 and [bar.B]([x.sup.0], [[rho].sub.-]) [subset or equal to] D, where

[[rho].sub.-] = 1 - [square root of (1 - 2h)]/[omega]

then [iota] has a zero [x.sup.*] in [bar.B]([x.sup.0], [[rho].sub.-]). Moreover, this zero is the unique zero of f in ([bar.B]([x.sup.0], [[rho].sub.-]) [union] B([x.sup.0], [[rho].sub.+])) [intersection] D where [[rho].sub.+] = 1 + [square root of (1 - 2h)]/[omega] and the Newton iterates [x.sup.k] with

[x.sup.k+1] = [x.sup.k] - f'[([x.sup.k]).sup.-1] f([x.sup.k])

are well-defined, remain in [bar.B]([x.sup.0], [[rho].sub.-]) and converge to [x.sup.*].

Note that in this theorem one may leave the values of [eta] and [omega] unchanged after transformations f [right arrow] A x f for any non-singular matrix A [member of] [R.sup.nxn]. Therefore, the theorem holds irrespective of linear transformations whence the name 'affine invariant form'.

Note also that [omega] will often be much smaller than [beta][kappa]. It is therefore not difficult to construct examples where for a given differentiable mapping f and a given point [x.sup.0], the main assumption [eta][beta][kappa] [less than or equal to] 1/2 of the standard theorem is not fulfilled, whereas the assumption [eta][omega] [less than or equal to] 1/2 in the affine invariant theorem is met. In this sense, Theorem 2.2 is more general than Theorem 2.1. We now turn to formulate Borsuk's theorem. Let us say that a set B [subset or equal to] [R.sup.n] is symmetric with respect to [x.sup.0] [member of] [R.sup.n], if for all y [member of] [R.sup.n] we have [x.sup.0] + y [member of] B [??] [x.sup.0] - y [member of] B.

Then Borsuk's theorem can be stated as follows.

THEOREM 2.3. (Borsuk [2, 3]) Let B [subset or equal to] [R.sup.n] be open, bounded, convex and symmetric with respect to [x.sup.0] [member of] B. Let f : B [R.sup.n] be a continuous mapping, assume that f(x) [not equal to] 0 on [partial derivative]B and that

f([x.sup.0] + y) [not equal to] [lambda]f([x.sup.0] - y) for all [lambda] > 0 and all [x.sup.0] + y [member of] [partial derivative]B. (2.1)

Then f has a zero in B.

Often, this theorem is stated in terms of the mapping h defined by h(y) = f([x.sup.0] + y) on B' = B - [x.sup.0] = {y [member of] [R.sup.n] : y = x - [x.sup.0], x [member of] B}. Condition (2.1) then reads

h(y) [not equal to] [lambda]h(-y) for all [lambda] > 0 and all y [member of] [partial derivative]B',

where B' is open, bounded, convex and symmetric with respect to the origin 0 [member of] B'.

Very interestingly, Borsuk's theorem is 'naturally' affine invariant: If f satisfies (2.1), then A x f satisfies (2.1), too, for any non-singular matrix A [member of] [R.sup.nxn].

In our comparisons to the Kantorovich theorem, it will be useful to consider Borsuk's theorem on balls B = B([x.sup.0], [rho]), [rho] > 0, with respect to a given norm [parallel]x[parallel]. This is just an apparent restriction, since in our finite-dimensional setting, any set B satisfying the assumptions of Borsuk's theorem, Theorem 2.3, is in fact a norm ball with respect to the Minkowski functional corresponding to B (and its center [x.sup.0]).

If, in addition, we do not want to exclude a zero on the boundary of B, we arrive at the following immediate corollary to Theorem 2.3.

COROLLARY 2.4. Let f : [bar.B]([x.sup.0], [rho]) [right arrow] [R.sup.n] be a continuous mapping. Assume that

f([x.sup.0] + y) [not equal to] [lambda]f([x.sup.0] - y) for all [lambda] > 0 and all [x.sup.0] + y [member of] [partial derivative]B([x.sup.0], [rho]). (2.2)

Then f has a zero in [bar.B]([x.sup.0], [rho]).

We finally formulate Miranda's theorem. This theorem works with the [l.sub.[infinity]]-norm and looks at components of f on the faces of an [l.sub.[infinity]]-ball which is a hypercube. We write [B.sub.[infinity]], ([x.sup.0], [rho]) to denote such a ball centered at [x.sup.0] with its faces given as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then Miranda's theorem can be stated as follows.

THEOREM 2.5. (Miranda [7]) Let f : [[bar.B].sub.[infinity]]([x.sup.0], [rho]) [subset or equal to] [R.sup.n] [right arrow] [R.sup.n] be a continuous mapping.

Assume that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.3)

Then f has at least one zero [x.sup.*] in [[bar.B].sub.[infinity]] ([x.sup.0], [rho]).

In [12] it is shown that Miranda's theorem is equivalent to Brouwer's fixed point theorem (for [l.sub.[infinity]]-balls).

3. Generalization of Miranda's Theorem. Miranda's original theorem has been generalized in several different directions before, see e.g. [11], [14] and [6]. For any of these generalizations, however, it has not been shown that it contains Kantorovich's theorem as a special case, i.e. that whenever Kantorovich's theorem guarantees the existence of a zero then the respective generalization would guarantee the existence of such a zero, too. Indeed, there are examples which show that this is not always the case.

In Theorem 3.2 we present a new generalization of Miranda's theorem which does contain Kantorovich's theorem: As we will show in Theorem 4.1, whenever Kantorovich's theorem guarantees the existence of a zero, our generalization of Miranda's theorem does so, too.

Observe that (2.3) can be interpreted as saying that at each point x on the boundary of [B.sub.[inifinity]]([x.sup.0], [rho]) the image f (x) points in an 'outside' direction. This interpretation is the basis for our generalization of Miranda's theorem to balls with respect to an arbitrary norm formulated as Theorem 3.2 below.

This generalization uses the concept of normal vectors. Let <x,x> denote the usual inner product on [R.sup.n]. We say that the vector a [member of] [R.sup.n] is normal to the open convex set C [subset of equal to] [R.sup.n] at x [member of] [partial derivative]C iff <a, x - y> > 0 for all y [member of] C, i.e. if a is a nonzero vector normal to [bar.C] at x in the sense of [10]. By the Hahn-Banach-Theorem, there exists at least one vector normal to C at each x [member of] [partial derivative]C. If C is a ball B([x.sup.0], [rho]), [rho] > 0, with respect to some norm and [parallel]x[parallel] and [[parallel]x[parallel].sub.d] its dual norm

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.1)

then the vectors normal to C at x can be characterized as follows:

LEMMA 3.1. For any [rho] > 0, the vector a [member of] [R.sup.n] is normal to B([x.sup.0], [rho]) at x [member of] [partial derivative]B([x.sup.0], [rho]) iff a is a positive multiple of some a' [member of] [R.sup.n] for which

[[parallel]a'[parallel].sub.d] = 1 and (a', x - [x.sup.0]) = [rho].

Proof. a is normal to B([x.sup.0], [rho]) at x [member of] [partial derivative]B([x.sup.0], [rho]) iff there is a [lambda] > 0 such that a [member of] [lambda] x [partial derivative][psi](x), where [partial derivative][psi](x) denotes the subdifferential of the convex function

[psi] : [R.sup.n] [right arrow] R, y [??] 1/[rho] [parallel]y - [x.sup.0][parallel],

see e.g. [10, Cor. 23.7.1]. We therefore show

[partial derivative][psi](x) = 1/[rho] {a' [member of] [R.sup.n] : [[parallel]a'[parallel].sub.d] = 1 and <a', x - [x.sup.0]> = [rho]}.

To this purpose we first observe that if a [member of] [partial derivative][psi](x), i.e. [psi](y) [greater than or equal to] [psi](x) + <a, y - x> for all y [member of] [R.sup.n], then

<a, h> [less than or equal to] 1/[rho][parallel]x - [x.sup.0] + h[parallel] - 1 [less than or equal to] 1/[rho][parallel]h[parallel]

holds for all h [member of] [R.sup.n]. Hence [[parallel]a[parallel].sub.d] [less than or equal to] (1/[rho]) and <a, [x.sup.0] - x> [less than or equal to] -1. But this implies [[parallel]a[parallel].sub.d] = (1/[rho]) and <a, x - [x.sup.0]> = 1 since 1 [less than or equal to] <a, x - [x.sup.0]> [less than or equal to] [[parallel]a[parallel].sub.d][parallel]x - [x.sup.0][parallel] [less than or equal to] 1. Consequently, for a' = [rho]a we have [[parallel]a'[parallel].sub.d] = 1 and <a', x - [x.sup.0]> = [rho].

Conversely, if [[parallel]a'[parallel].sub.d] = 1, <a', x - [x.sup.0]> = [rho] and a = (1/[rho])a', then for any y [member of] [R.sup.n] we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

showing a [member of] [partial derivative][psi](x).

THEOREM 3.2. Let B([x.sup.0], [rho]), [rho] > 0, be an open ball with respect to an arbitrary norm [parallel]x[parallel]. Let f : B([x.sup.0], [rho]) [right arrow] [R.sup.n] be continuous and assume that for all x [member of] [partial derivative]B([x.sup.0], [rho]) there exists a vector a normal to B([x.sup.0], [rho]) at x such that

<f(x), a> [greater than or equal to] 0. (3.2)

Then

a) the relation (3.2) actually holds for all vectors a normal to B([x.sup.0], [rho]) at x,

b) f has a zero in [bar.B]([x.sup.0], [rho]).

Proof. To prove part a), we first note that [10, Cor. 23.7.1], which we already used in the proof of Lemma 3.1, implies that it is sufficient to show that for the function [psi] : y [??] (1/[rho]) [parallel]y - [x.sup.0][parallel] we have

<f(x) , a> [greater than or equal to] 0 for all x [member of] [partial derivative]B([x.sup.0], [rho]) and all a [member of] [partial derivative][psi](x).

Secondly, we remark that it is known (see e.g. [10, Th. 25.6]) that

[partial derivative][psi](x) = [bar.conv S (x)]

where a [member of] S(x) iff there is a sequence [x.sup.i] in [R.sup.n] such that for all i the convex function [psi] is differentiable in [x.sup.i], i.e. [partial derivative][psi]([x.sup.i]) = {[psi]'([x.sup.i])}, and such that [lim.sub.i[right arrow][infinity]] [x.sup.i] = x and [lim.sub.i[right arrow][infinity]] [psi]'([x.sup.i]) = a.

For x [member of] [partial derivative]B([x.sup.0], [rho]) let [x.sup.i] denote such a sequence. W.l.o.g. we can assume [x.sup.i] [not equal to] [x.sup.0] for all i. Due to the homogeneity of [parallel]x[parallel], the function [psi] is differentiable not only in [x.sup.i] but also in [??] = [x.sup.0]+(1/[psi]([x.sup.i]))([x.sup.i]-[x.sup.0]) [member of] [partial derivative]B([x.sup.0], [rho]) with [psi]'([??]) = [psi]'([x.sup.i]). Hence [lim.sub.i[right arrow][infinity]] [psi]'([??]) = a and obviously also [lim.sub.i[right arrow][infinity]] [??] = x. At [??] every vector normal to B([x.sup.0], [rho]) is a positive multiple of [psi]'([??]), so by assumption we have <f([??]), [psi]'([??])> [greater than or equal to] 0 and since f is continuous in x we can conclude <f(x), a) [greater than or equal to] 0. We therefore have <f(x), a) [greater than or equal to] 0 for all a [member of] S(x), and since <x,x> is linear and continuous in its second argument we even have <f(x), a) [greater than or equal to] 0 for all a [member of] [bar.conv S(x)].

Part b) is now easily proved via Borsuk's theorem. Let [epsilon] > 0 and take

[f.sub.[epsilon]](x) = f(x) + [epsilon](x - [x.sup.0]).

By a) we know that <f(x), a> [greater than or equal to] 0 for all vectors a normal to B([x.sup.0], [rho]) at x, and thus <[f.sub.[epsilon]](x), a> > 0 for all such a. By Corollary 2.4 and Lemma 3.3 below the mapping [f.sub.[epsilon]] has a zero in [bar.B]([x.sup.0], [rho]). So any limit point of the sequence of these zeros for [epsilon] = 1/n, n [member of] N is a zero of f.

The proof above makes use of the following auxiliary result which we will need again in section 4.

LEMMA 3.3. Let B([x.sup.0], [rho]), [rho] > 0, be an open ball with respect to an arbitrary norm [parallel]x[parallel]. Let f : [bar.B]([x.sup.0], [rho]) [right arrow] [R.sup.n] be continuous and assume that for all x [member of] [partial derivative]B([x.sup.0], [rho]) and for all vectors a normal to B([x.sup.0], [rho]) at x we have

<f(x), a> > 0.

Then for all x = [x.sup.0] + y [member of] [partial derivative]B([x.sup.0], [rho]) and for all [lambda] > 0 we have

f([x.sup.0] + y) [not equal to] [lambda]f([x.sup.0] - y).

Proof. If a is normal to B ([x.sup.0], [rho]) at x = [x.sup.0] + y, its negative -a is normal to B ([x.sup.0], [rho]) at [x.sup.0] - y. So, if we had f([x.sup.0] + y) = [lambda]f([x.sup.0] - y) for some [lambda] > 0, we would arrive at

0 < <f([x.sup.0] + y), a> = <[lambda]f([x.sup.0] - y), a> = -[lambda]<f([x.sup.0] - y), -a>,

which is impossible since <f([x.sup.0] - y), -a> > 0.

Just in passing, let us note that Theorem 3.2 remains valid if we replace B([x.sup.0], [rho]) by an arbitrary non-empty open bounded convex set. Part a) can then be proved by replacing [psi](x) by p(x - [x.sup.0]) where [x.sup.0] is an arbitrary point of C and [rho] is the Minkowski functional of C - [x.sup.0]. A proof for b) can easily be derived from the following proposition, which is just the application of the Leray-Schauder-Theorem [8, 6.3.3] to the mapping g with g(x) = x - f([x.sup.0] + x), x [member of] C - [x.sup.0].

PROPOSITION 3.4. Let C be a non-empty open bounded subset of [R.sup.n] and f : [bar.C] [right arrow] [R.sup.n] a continuous mapping. Assume that there is an [x.sup.0] [member of] C such that for all x [member of] [partial derivative]C

f(x) [not member of] {[lambda](x - [x.sup.0]) : [lambda] < 0}.

Then f has a zero in [bar.C].

4. The hierarchy with respect to generality. In this section we prove our central results. We show that the Kantorovich theorem for an arbitrary norm is a special case of the generalized Miranda theorem, Theorem 3.2. In addition we show that the Kantorovich theorem for an arbitrary norm is also a special case of Borsuk's theorem. This hierarchy is illustrated in Figure 1.

We will use the numbers [[rho].sub.-], [[rho].sub.+] which have been defined in Theorem 2.2.

THEOREM 4.1. Let f satisfy all assumptions of the affine invariant form of the Kantorovich theorem (Theorem 2.2). Put

g : D [right arrow] [R.sup.n], x [??] f'[([x.sup.0]).sup.-1] f(x)

and consider any positive [rho] [meber of] [[[rho].sub.-], [[rho].sub.+]] such that [bar.B]([x.sup.0], [rho]) [subset or equal to] D. Then for all x [member of] [partial derivative]B([x.sup.0], [rho]) and for all normals a to B([x.sup.0], [rho]) at x we have

<g(x), a> [greater than or equal to] 0, (4.1)

which is the hypothesis of the generalized Miranda theorem (Theorem 3.2) for the mapping g and the ball B([x.sup.0], [rho]).

Proof. Let x [member of] [partial derivative]B([x.sup.0], [rho]) and a a normal to B([x.sup.0], [rho]) at x. Because of Lemma 3.1, we can assume [[parallel]a[parallel].sub.d] = 1 and <a, x - [x.sup.0]> = [rho]. We first prove

[absolute value of <g(x) - g([x.sup.0]), a> - [rho]] [less than or equal to] [omega]/2 [[rho].sup.2]. (4.2)

In order to do so, we define the continuously differentiable function [psi] : [0,1] [right arrow] R as

[psi](t) = <g([x.sup.0] + t(x - [x.sup.0])) - tg'([x.sup.0])(x - [x.sup.0]), a>.

[FIGURE 4.1 OMITTED]

Then

[psi]'(t) = <(g' ([x.sup.0] + t(x - [x.sup.0])) - g' ([x.sup.0]))(x - [x.sup.0]), a>

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

On the other hand

[psi](1) - [psi](0) = <g(x) - g'([x.sup.0])(x - [x.sup.0]), a> - <g([x.sup.0]), a> and g'([x.sup.0]) = id as well as <x - [x.sup.0] , a> = [rho], which shows (4.2). ?From (4.2) we conclude

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In the proof of Theorem 4.1 we have actually shown that (4.1) holds with strict inequality as soon as [rho] [member of] ([[rho].sub.-], [[rho].sub.+]). Using Lemma 3.3 we thus arrive at the following result.

THEOREM 4.2. Let f satisfy all assumptions of the affine invariant form of the Kantorovich theorem. Then f satisfies (2.2) for all balls [bar.B]([x.sup.0], [rho]) [subset or equal to] D with [rho] [member of] ([[rho].sub.-], [[rho].sub.+]).

Proof. The discussion above already showed that g = f'[([x.sup.0]).sup.-1] f satisfies (2.2). But (2.2) remains invariant under affine transformations, so f satisfies (2.2), too.

Of course, it would be more satisfying if in the above theorem one could take the closed interval [[[rho].sub.-], [[rho].sub.+]] for [rho] instead of just the open interval. As we will show now, this is indeed so, as soon as [eta] > 0, i.e. if f ([x.sup.0]) [not equal to] 0. This result is proved in our final theorem, where we have to go a direct way without using the generalized Miranda Theorem.

THEOREM 4.3. Let f satisfy all assumptions of the affine invariant form of the Kantorovich theorem (Theorem 2.2) and exclude the case [eta] = 0. Then f satisfies (2.2) for all [rho] [member of] [[[rho].sub.-], [[rho].sub.+]] such that [bar.B]([x.sub.0], [rho]) [subset or equal to] D.

Proof. We will show that (2.2) holds for g = f'[([x.sup.0]).sup.-1] f. It then also holds for f. To start, let [x.sup.0] + y [member of] [partial derivative]B([x.sup.0], [rho]) and assume that (2.2) does not hold, i.e. we have [lambda] [member of] (0, [infinity]) such that

g([x.sup.0] + y) = [lambda]g([x.sup.0] - y). (4.3)

Replacing, if necessary, y by -y, we can even assume [lambda] [greater than or equal to] 1. Let a be an arbitrary vector to be specified later, and set

[psi](t) = <g([x.sup.0] + ty) - tg'([x.sup.0])y, a>.

Then

[psi](t) = <(g'([x.sup.0] + ty) - g'([x.sup.0])) y, a>

and, since g'([x.sup.0]) = id, we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which gives

<g([x.sup.0] + y), a> = <y, a> + <g([x.sup.0]), a> + [[integral].sup.1.sub.0] <(g'([x.sup.0] + ty) - g'([x.sup.0])) y, a> dt, (4.4)

and, similarly for -y instead of y

<g([x.sup.0] - y) , a> = -<y, a> + <g([x.sup.0]), a> - [[integral].sup.1.sub.0] <(g'([x.sup.0] - ty) - g'([x.sup.0])) y, a> dt. (4.5)

By (4.3) we have

<g([x.sup.0] + y), a> = [lambda]<g([x.sup.0] - y), a>,

which, using (4.4) and (4.5) and after rearranging terms gives

(1 + [lambda])<y, a> = ([lambda] - 1)<g([x.sup.0]), a> - [[integral].sup.1.sub.0] <(g'([x.sup.0] + ty) - g'([x.sup.0]))y, a> dt

- [lambda] [[integral].sup.1.sub.0 <(g'([x.sup.0] - ty) - g'([x.sup.0]))y, a> dt. (4.6)

Now take a such that [[parallel]a[parallel].sub.d] = 1 and <y, a> = [parallel]y[parallel]. Such a exists, since ([R.sup.n], [parallel] x [parallel]) is identical to its bidual, i.e.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In exactly the same manner as in the proof of Theorem 4.1, bounding the terms on the right hand side of (4.6), we get

(1 + [lambda]) [parallel]y[parallel] [less than or equal to] ([lambda] - 1)[parallel]g([x.sup.0])[parallel] + [omega]/2 x [[parallel]y[parallel].sup.2] + [lambda] [omega]/2[[parallel]y[parallel].sup.2]

where [parallel]y[parallel] = [rho] and [parallel]g([x.sup.0][parallel] [less than or equal to] [eta]. We therefore have

(1 + [lambda])[rho] [less than or equal to] ([lambda] - 1)[eta] + (1 + [lambda])[omega]/2 [[rho].sup.2]

or

(1 + [lambda]) ([omega]/2 [[rho].sup.2] - [rho] + [eta]) - 2[eta] [greater than or equal to] 0.

But this is impossible, because for [rho] [member of] [[[rho].sub.-], [[rho].sub.+]] the first summand is non-positive and [eta] > 0. Therefore, (4.3) does not hold, and since [x.sup.0] + y was chosen to be an arbitrary point from [partial derivative]B([x.sup.0], [rho]) we have shown that g satisfies (2.2).

As was suggested by an anonymous referee, there is another well-known theorem on the existence of zeros which adds an additional stage to the hierarchy presented so far. This theorem is Smale's theorem [13], which in the finite-dimensional case may be stated as follows:

THEOREM 4.4. Let f : [R.sup.n] [right arrow] [R.sup.n] be analytic in [R.sup.n] and for [x.sup.0] [member of] [R.sup.n] let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then, if [beta]([x.sup.0], f)[gamma]([x.sup.0], f) < [[alpha].sub.0], where [[alpha].sub.o] is an invariant, approximately equal to 0.130707, the iterates of Newton's method starting with [x.sup.0] converge to a zero x* of f.

This result is interesting because it proves convergence of Newton's method from information at just one point [x.sup.0].

As was already mentioned in [13], this result is in fact a special case of the affine invariant version of the Newton-Kantorovich theorem (Theorem 2.2), meaning that the hypotheses of Theorem 4.4 imply those of Theorem 2.2. As was shown by Rheinboldt [9], Theorem 4.4 may be generalized to a local version where f is assumed to be analytic only on some open subset of [R.sup.n]. The proof of that result shows that, again, we are in the presence of a special case of the Newton-Kantorovich theorem.

5. Conclusions. In this paper we have established a hierarchy with respect to generality between Kantorovich's theorem (which contains Smale's theorem), Miranda's theorem and its generalization and Borsuk's theorem. This hierarchy is meant only with respect to the existence of a zero. While the Kantorovich theorem also guarantees the uniqueness of the zero (and the convergence of Newton's method), the other theorems only partly address this aspect: they actually guarantee that the topological degree of the mapping is odd, see [3]. We have proven two major results. The first (Theorem 4.1) shows that if Kantorovich's theorem (Theorem 2.2) guarantees the existence of a zero of f in a ball B([x.sup.0], [rho]), [rho] > 0, [[rho].sub.-] [less than or equal to] [rho] [less than or equal to] [[rho].sub.+] with B([x.sup.0], [rho]) [subset or equal to] D, then the generalized Miranda theorem (Theorem 3.2), applied to f'[([x.sup.0]).sup.-1] f also guarantees the existence of a zero in the same ball. In this sense, the generalized Miranda theorem is the more general theorem. Our second major result, Theorem 4.3, establishes a similar relationship between Kantorovich's theorem and Borsuk's theorem. Here, we can even use the same mapping f in both theorems, i.e. the transition to f'[([x.sup.0]).sup.-1] f is not necessary. On the other hand, we have to restrict ourselves to the case [eta] > 0, i.e. f ([x.sup.0]) [not equal to] 0. If f ([x.sup.0]) = 0, Theorem 4.2 shows that f still satisfies the hypothesis of Borsuk's theorem for all [rho] [member of] (0, [[rho].sub.+]) (note that [[rho].sub.-] = 0 for [eta] = 0). Hence, the only situation where we did not establish the connection between Kantorovich and Borsuk is when we simultaneously have [eta] = 0 and [rho] = [[rho].sub.+]. The following example shows that then there indeed need not be such a connection.

EXAMPLE 1. Let f : R [right arrow] R, f(x) = x(1 - [absolute value of x]). Then f'(x) = 1 - 2[absolute value of x]. Take [x.sup.0] = 0 so that f([x.sup.0]) = 0, f'([x.sup.0]) = 1, and take [eta] = 0, [omega] = 2 in Kantorovich's theorem which thus gives [[rho].sub.-] = 0, [[rho].sub.+] = 1. Therefore, the Kantorovich theorem guarantees that [x.sup.0] is the unique zero of f in (-1, 1). But Borsuk's theorem cannot be applied to the ball [-1, 1], since f(-1) = f(1) = 0, i.e. we have f(x) = [lambda]f(-x) for all [lambda] > 0 and all x [member of] [partial derivative](-1, 1) = {-1, 1}.

REFERENCES

[1] G. Alefeld, F.A. Potra, and Z. Shen, On the existence theorems of Kuntorovich, Moore and Miranda, Comput. Suppl., 15 (2001), pp. 21-28.

[2] Karol Borsuk, Drei Satze uber die n-dimensionale Sphare, Fund. Math., 20 (1933), pp. 177-190.

[3] Klaus Deimling, Nonlinear Functional Analysis, Springer, Berlin, Heidelberg, New York, 1985.

[4] Peter Deuflhard and Gerhard Heindl, A, fine invariant convergence theorems for Newton's method and extensions to related methods, SIAM J. Numer. Anal., 16 (1980), pp. 1-10.

[5] L. Kantorovich, On Newton's method for functional equations (Russian), Dokl. Akad. Nauk SSSR, 59 (1948), pp.1237-1240.

[6] J. Mayer, A generalized theorem of Miranda and the theorem of Newton-Kantorovich, Numer. Funct. Anal. Optim., 23 (2002), nos. 3&4, pp. 333-357.

[7] C. Miranda, Un'osservatione su un teorema di Brouwer, Boll. Un. Mat. Ital., 3 (1940), no. 2, pp. 5 7.

[8] James Ortega and Werner Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970.

[9] Werner C. Rheinboldt, On a theorem of S. Smale about Newton's method for analytic mappings, Appl. Math. Lett., 1 (1988), pp. 69-72.

[10] T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970.

[11] J. Rohn, An existence theorem for systems of nonlinear equations, Z. Angew. Math. Mech., 60 (1980), no. 8, pp. 345.

[12] G. Sansone and R. Conti, Nonlinear Differential Equations, Pergamon Press, Oxford, 1964.

[13] Steve Smale, Algorithms for solving equations, in Proceedings of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley, 1986), A. Gleason, ed., Amer. Math. Soc., Providence, RI, 1987, pp. 172-195.

[14] M. Vrahatis, A short proof and a generalization of Miranda's existence theorem, Proc. Amer. Math. Soc., 107 (1989), no. 3, pp. 701-703.

* Received April 1, 2003. Accepted for publication December 19, 2003. Recommended by Bill Gragg.

GOTZ ALEFELD ([dagger]), ANDREAS FROMMER ([double dagger]), GERHARD HEINDL ([double dagger]), AND JAN MAYER ([dagger])

([dagger]) Institut fur Angewandte Mathematik Universitat Karlsruhe, D-76128 Karlsruhe, Germany. E-mail: {goetz.alefeld,jan.mayer}@math.uni-karlsruhe.de.

([double dagger]) Fachbereich Mathematik and Naturwissenschaften, Universitat Wuppertal, D-42097 Wuppertal, Germany. E-mail: {frommer,heindl}@math.uni-wuppertal.de.

Key words. nonlinear equations, existence theorems, fixed points, Newton-Kantorovich theorem, Miranda theorem, Borsuk theorem.

AMS subject classifications. 47H10, 47J05, 65H10.

1. Introduction. In this paper we are concerned with two well-known classical theorems, both of which guarantee the existence of a zero of a nonlinear mapping f from a norm ball in [R.sup.n] to [R.sup.n]. These theorems are Kantorovich's theorem and Borsuk's theorem. Miranda's theorem, which we also consider, is essentially a special version of Borsuk's theorem in the case that the norm ball is a box, i.e. the norm is the maximum norm. Kantorovich's theorem and Borsuk's theorem apparently are very different in nature: Kantorovich's theorem is motivated by the analysis of Newton's iteration to approximate a zero of f and it gives an a priori criterion for the convergence of this iteration, in this manner proving that there is a zero of f within a certain ball centered at the initial guess for Newton's method. The major ingredients in its hypotheses are the Lipschitz-continuity of the derivative of f in a sufficiently large neighbourhood of the starting point and the assumption that the function value at the starting point is sufficiently small. On the other hand, Borsuk's theorem only requires the mapping f to be continuous on the ball and to fulfill a non-colinearity condition for the function values at all pairs of antipodal points on the boundary of the ball.

The purpose of this paper is to prove the remarkable fact that the Kantorovich theorem is (essentially) a special case of Borsuk's theorem in the sense that the hypotheses of the Kantorovich theorem imply those of Borsuk. For the case that the norm is the [l.sub.[infinity]]-norm this result was essentially already obtained in [1], where it was shown that the hypotheses of Kantorovich's theorem imply those of Miranda's theorem. In the present paper we formulate a version of Miranda's theorem holding for arbitrary norms, which is then proven to be 'in between' Kantorovich and Borsuk, i.e. more general than Kantorovich's but (essentially) more special than Borsuk's theorem.

Let us remark here that our results heavily rely on the finite dimension of the underlying vector space. While the Kantorovich theorem immediately extends to general Banach spaces, Borsuk's theorem does so only if one substantially restricts the class of mappings to be considered, for example to compact modifications of the identity or generalizations thereof. The rest of this paper is organized as follows: In the next section we give precise formulations of Kantorovich's theorem, of Miranda's theorem and of Borsuk's theorem. In section 3 we formulate and prove our generalization of Miranda's theorem for arbitrary norms. In section 4 we then come to the major result of this paper establishing the hierarchy of the theorems with respect to generality. Some conclusions are formulated in section 5.

2. The theorems of Kantorovich, Miranda and Borsuk. We start with Kantorovich's theorem. It can be stated in its 'standard' form and in an 'affine invariant' form. Although the latter is the more general one, the standard form is the one that can usually be found in textbooks. We therefore give both versions.

Here, as in the sequel, [parallel]x[parallel] denotes some arbitrary norm in [R.sup.n] and its corresponding operator norm. The closed ball with radius [rho] [greater than or equal to] [x.sup.0], centered at [x.sup.0], is

[bar.B]([x.sup.0], [rho]) = {x [member of] [R.sup.n] : [parallel]x - [x.sup.0][parallel] [less than or equal to] [rho]}.

B([x.sup.0], [rho]) and [partial derivative]B([x.sup.0], [rho]) denote the topological interior and boundary of [bar.B]([x.sup.0], [rho]), respectively.

THEOREM 2.1. (Kantorovich, standard form [8]) Let f : D [subset or equal to] [R.sup.n] [right arrow] [R.sup.n] be differentiable in the open convex set D. Assume that for some point [x.sup.0] [member of] D the Jacobian f'([x.sup.0]) is invertible with

[parallel]f'[([x.sup.0]).sup.-1][parallel] [less than or equal to] [beta], [parallel]f'[([x.sup.0]).sup.-1] f([x.sup.0])[parallel] [less than or equal to] eta.

Let there be a Lipschitz constant [kappa] > 0 for f' such that

[parallel]f'(u) - f'(v)[parallel] [less than or equal to] [kappa] x [parallel]u - v[parallel] for all u, v [member of] D.

If h = [eta][beta][kappa] [less than or equal to] 1/2 and [bar.B]([x.sup.0], [[rho].sub.-]) [subset or equal to] D, where

[[rho].sub.-] = 1 - [square root of (1 - 2h)]/[beta][kappa]

then f has a zero x* in [bar.B]([x.sup.0], [[rho].sub.-]). Moreover, this zero is the unique zero of f in ([bar.B]([x.sup.0], [[rho].sub.-]) [union] B([x.sup.0], [[rho].sub.+])) [intersection] D where [[rho].sub.+] = 1 + [square root of (1 - 2h)]/[beta][kappa] and the Newton iterates [x.sup.k] with

[x.sup.k+1] = [x.sup.k] - f'[([x.sup.k]).sup.-1] f([x.sup.k])

are well-defined, remain in [bar.B]([x.sup.0], [[rho].sub.-]) and converge to x*.

The following affine invariant form of the Kantorovich theorem is a generalization of the standard form as can be seen immediately by setting [omega] = [beta][kappa].

THEOREM 2.2. (Kantorovich, affine invariant form [4, 5]) Let f : D [subset or equal to] [R.sup.n] [right arrow] [R.sup.n] be differentiable in the open convex set D. Assume that for some point [x.sup.0] [member of] D the Jacobian f'([x.sup.0]) is invertible with

[parallel]f'[([x.sup.0]).sup.-1] f([x.sup.0])[parallel] [less than or equal to] [eta].

Let there be a Lipschitz constant [omega] > 0 for f'[([x.sup.0]).sup.-1] f' such that

[parallel]f'[([x.sup.0]).sup.-1](f'(u) - f'(v))[parallel] [less than or equal to] [omega] x [parallel]u - v[parallel] for all u, v [member of] D.

If h = [eta][omega] [less than or equal to] 1/2 and [bar.B]([x.sup.0], [[rho].sub.-]) [subset or equal to] D, where

[[rho].sub.-] = 1 - [square root of (1 - 2h)]/[omega]

then [iota] has a zero [x.sup.*] in [bar.B]([x.sup.0], [[rho].sub.-]). Moreover, this zero is the unique zero of f in ([bar.B]([x.sup.0], [[rho].sub.-]) [union] B([x.sup.0], [[rho].sub.+])) [intersection] D where [[rho].sub.+] = 1 + [square root of (1 - 2h)]/[omega] and the Newton iterates [x.sup.k] with

[x.sup.k+1] = [x.sup.k] - f'[([x.sup.k]).sup.-1] f([x.sup.k])

are well-defined, remain in [bar.B]([x.sup.0], [[rho].sub.-]) and converge to [x.sup.*].

Note that in this theorem one may leave the values of [eta] and [omega] unchanged after transformations f [right arrow] A x f for any non-singular matrix A [member of] [R.sup.nxn]. Therefore, the theorem holds irrespective of linear transformations whence the name 'affine invariant form'.

Note also that [omega] will often be much smaller than [beta][kappa]. It is therefore not difficult to construct examples where for a given differentiable mapping f and a given point [x.sup.0], the main assumption [eta][beta][kappa] [less than or equal to] 1/2 of the standard theorem is not fulfilled, whereas the assumption [eta][omega] [less than or equal to] 1/2 in the affine invariant theorem is met. In this sense, Theorem 2.2 is more general than Theorem 2.1. We now turn to formulate Borsuk's theorem. Let us say that a set B [subset or equal to] [R.sup.n] is symmetric with respect to [x.sup.0] [member of] [R.sup.n], if for all y [member of] [R.sup.n] we have [x.sup.0] + y [member of] B [??] [x.sup.0] - y [member of] B.

Then Borsuk's theorem can be stated as follows.

THEOREM 2.3. (Borsuk [2, 3]) Let B [subset or equal to] [R.sup.n] be open, bounded, convex and symmetric with respect to [x.sup.0] [member of] B. Let f : B [R.sup.n] be a continuous mapping, assume that f(x) [not equal to] 0 on [partial derivative]B and that

f([x.sup.0] + y) [not equal to] [lambda]f([x.sup.0] - y) for all [lambda] > 0 and all [x.sup.0] + y [member of] [partial derivative]B. (2.1)

Then f has a zero in B.

Often, this theorem is stated in terms of the mapping h defined by h(y) = f([x.sup.0] + y) on B' = B - [x.sup.0] = {y [member of] [R.sup.n] : y = x - [x.sup.0], x [member of] B}. Condition (2.1) then reads

h(y) [not equal to] [lambda]h(-y) for all [lambda] > 0 and all y [member of] [partial derivative]B',

where B' is open, bounded, convex and symmetric with respect to the origin 0 [member of] B'.

Very interestingly, Borsuk's theorem is 'naturally' affine invariant: If f satisfies (2.1), then A x f satisfies (2.1), too, for any non-singular matrix A [member of] [R.sup.nxn].

In our comparisons to the Kantorovich theorem, it will be useful to consider Borsuk's theorem on balls B = B([x.sup.0], [rho]), [rho] > 0, with respect to a given norm [parallel]x[parallel]. This is just an apparent restriction, since in our finite-dimensional setting, any set B satisfying the assumptions of Borsuk's theorem, Theorem 2.3, is in fact a norm ball with respect to the Minkowski functional corresponding to B (and its center [x.sup.0]).

If, in addition, we do not want to exclude a zero on the boundary of B, we arrive at the following immediate corollary to Theorem 2.3.

COROLLARY 2.4. Let f : [bar.B]([x.sup.0], [rho]) [right arrow] [R.sup.n] be a continuous mapping. Assume that

f([x.sup.0] + y) [not equal to] [lambda]f([x.sup.0] - y) for all [lambda] > 0 and all [x.sup.0] + y [member of] [partial derivative]B([x.sup.0], [rho]). (2.2)

Then f has a zero in [bar.B]([x.sup.0], [rho]).

We finally formulate Miranda's theorem. This theorem works with the [l.sub.[infinity]]-norm and looks at components of f on the faces of an [l.sub.[infinity]]-ball which is a hypercube. We write [B.sub.[infinity]], ([x.sup.0], [rho]) to denote such a ball centered at [x.sup.0] with its faces given as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then Miranda's theorem can be stated as follows.

THEOREM 2.5. (Miranda [7]) Let f : [[bar.B].sub.[infinity]]([x.sup.0], [rho]) [subset or equal to] [R.sup.n] [right arrow] [R.sup.n] be a continuous mapping.

Assume that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2.3)

Then f has at least one zero [x.sup.*] in [[bar.B].sub.[infinity]] ([x.sup.0], [rho]).

In [12] it is shown that Miranda's theorem is equivalent to Brouwer's fixed point theorem (for [l.sub.[infinity]]-balls).

3. Generalization of Miranda's Theorem. Miranda's original theorem has been generalized in several different directions before, see e.g. [11], [14] and [6]. For any of these generalizations, however, it has not been shown that it contains Kantorovich's theorem as a special case, i.e. that whenever Kantorovich's theorem guarantees the existence of a zero then the respective generalization would guarantee the existence of such a zero, too. Indeed, there are examples which show that this is not always the case.

In Theorem 3.2 we present a new generalization of Miranda's theorem which does contain Kantorovich's theorem: As we will show in Theorem 4.1, whenever Kantorovich's theorem guarantees the existence of a zero, our generalization of Miranda's theorem does so, too.

Observe that (2.3) can be interpreted as saying that at each point x on the boundary of [B.sub.[inifinity]]([x.sup.0], [rho]) the image f (x) points in an 'outside' direction. This interpretation is the basis for our generalization of Miranda's theorem to balls with respect to an arbitrary norm formulated as Theorem 3.2 below.

This generalization uses the concept of normal vectors. Let <x,x> denote the usual inner product on [R.sup.n]. We say that the vector a [member of] [R.sup.n] is normal to the open convex set C [subset of equal to] [R.sup.n] at x [member of] [partial derivative]C iff <a, x - y> > 0 for all y [member of] C, i.e. if a is a nonzero vector normal to [bar.C] at x in the sense of [10]. By the Hahn-Banach-Theorem, there exists at least one vector normal to C at each x [member of] [partial derivative]C. If C is a ball B([x.sup.0], [rho]), [rho] > 0, with respect to some norm and [parallel]x[parallel] and [[parallel]x[parallel].sub.d] its dual norm

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3.1)

then the vectors normal to C at x can be characterized as follows:

LEMMA 3.1. For any [rho] > 0, the vector a [member of] [R.sup.n] is normal to B([x.sup.0], [rho]) at x [member of] [partial derivative]B([x.sup.0], [rho]) iff a is a positive multiple of some a' [member of] [R.sup.n] for which

[[parallel]a'[parallel].sub.d] = 1 and (a', x - [x.sup.0]) = [rho].

Proof. a is normal to B([x.sup.0], [rho]) at x [member of] [partial derivative]B([x.sup.0], [rho]) iff there is a [lambda] > 0 such that a [member of] [lambda] x [partial derivative][psi](x), where [partial derivative][psi](x) denotes the subdifferential of the convex function

[psi] : [R.sup.n] [right arrow] R, y [??] 1/[rho] [parallel]y - [x.sup.0][parallel],

see e.g. [10, Cor. 23.7.1]. We therefore show

[partial derivative][psi](x) = 1/[rho] {a' [member of] [R.sup.n] : [[parallel]a'[parallel].sub.d] = 1 and <a', x - [x.sup.0]> = [rho]}.

To this purpose we first observe that if a [member of] [partial derivative][psi](x), i.e. [psi](y) [greater than or equal to] [psi](x) + <a, y - x> for all y [member of] [R.sup.n], then

<a, h> [less than or equal to] 1/[rho][parallel]x - [x.sup.0] + h[parallel] - 1 [less than or equal to] 1/[rho][parallel]h[parallel]

holds for all h [member of] [R.sup.n]. Hence [[parallel]a[parallel].sub.d] [less than or equal to] (1/[rho]) and <a, [x.sup.0] - x> [less than or equal to] -1. But this implies [[parallel]a[parallel].sub.d] = (1/[rho]) and <a, x - [x.sup.0]> = 1 since 1 [less than or equal to] <a, x - [x.sup.0]> [less than or equal to] [[parallel]a[parallel].sub.d][parallel]x - [x.sup.0][parallel] [less than or equal to] 1. Consequently, for a' = [rho]a we have [[parallel]a'[parallel].sub.d] = 1 and <a', x - [x.sup.0]> = [rho].

Conversely, if [[parallel]a'[parallel].sub.d] = 1, <a', x - [x.sup.0]> = [rho] and a = (1/[rho])a', then for any y [member of] [R.sup.n] we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

showing a [member of] [partial derivative][psi](x).

THEOREM 3.2. Let B([x.sup.0], [rho]), [rho] > 0, be an open ball with respect to an arbitrary norm [parallel]x[parallel]. Let f : B([x.sup.0], [rho]) [right arrow] [R.sup.n] be continuous and assume that for all x [member of] [partial derivative]B([x.sup.0], [rho]) there exists a vector a normal to B([x.sup.0], [rho]) at x such that

<f(x), a> [greater than or equal to] 0. (3.2)

Then

a) the relation (3.2) actually holds for all vectors a normal to B([x.sup.0], [rho]) at x,

b) f has a zero in [bar.B]([x.sup.0], [rho]).

Proof. To prove part a), we first note that [10, Cor. 23.7.1], which we already used in the proof of Lemma 3.1, implies that it is sufficient to show that for the function [psi] : y [??] (1/[rho]) [parallel]y - [x.sup.0][parallel] we have

<f(x) , a> [greater than or equal to] 0 for all x [member of] [partial derivative]B([x.sup.0], [rho]) and all a [member of] [partial derivative][psi](x).

Secondly, we remark that it is known (see e.g. [10, Th. 25.6]) that

[partial derivative][psi](x) = [bar.conv S (x)]

where a [member of] S(x) iff there is a sequence [x.sup.i] in [R.sup.n] such that for all i the convex function [psi] is differentiable in [x.sup.i], i.e. [partial derivative][psi]([x.sup.i]) = {[psi]'([x.sup.i])}, and such that [lim.sub.i[right arrow][infinity]] [x.sup.i] = x and [lim.sub.i[right arrow][infinity]] [psi]'([x.sup.i]) = a.

For x [member of] [partial derivative]B([x.sup.0], [rho]) let [x.sup.i] denote such a sequence. W.l.o.g. we can assume [x.sup.i] [not equal to] [x.sup.0] for all i. Due to the homogeneity of [parallel]x[parallel], the function [psi] is differentiable not only in [x.sup.i] but also in [??] = [x.sup.0]+(1/[psi]([x.sup.i]))([x.sup.i]-[x.sup.0]) [member of] [partial derivative]B([x.sup.0], [rho]) with [psi]'([??]) = [psi]'([x.sup.i]). Hence [lim.sub.i[right arrow][infinity]] [psi]'([??]) = a and obviously also [lim.sub.i[right arrow][infinity]] [??] = x. At [??] every vector normal to B([x.sup.0], [rho]) is a positive multiple of [psi]'([??]), so by assumption we have <f([??]), [psi]'([??])> [greater than or equal to] 0 and since f is continuous in x we can conclude <f(x), a) [greater than or equal to] 0. We therefore have <f(x), a) [greater than or equal to] 0 for all a [member of] S(x), and since <x,x> is linear and continuous in its second argument we even have <f(x), a) [greater than or equal to] 0 for all a [member of] [bar.conv S(x)].

Part b) is now easily proved via Borsuk's theorem. Let [epsilon] > 0 and take

[f.sub.[epsilon]](x) = f(x) + [epsilon](x - [x.sup.0]).

By a) we know that <f(x), a> [greater than or equal to] 0 for all vectors a normal to B([x.sup.0], [rho]) at x, and thus <[f.sub.[epsilon]](x), a> > 0 for all such a. By Corollary 2.4 and Lemma 3.3 below the mapping [f.sub.[epsilon]] has a zero in [bar.B]([x.sup.0], [rho]). So any limit point of the sequence of these zeros for [epsilon] = 1/n, n [member of] N is a zero of f.

The proof above makes use of the following auxiliary result which we will need again in section 4.

LEMMA 3.3. Let B([x.sup.0], [rho]), [rho] > 0, be an open ball with respect to an arbitrary norm [parallel]x[parallel]. Let f : [bar.B]([x.sup.0], [rho]) [right arrow] [R.sup.n] be continuous and assume that for all x [member of] [partial derivative]B([x.sup.0], [rho]) and for all vectors a normal to B([x.sup.0], [rho]) at x we have

<f(x), a> > 0.

Then for all x = [x.sup.0] + y [member of] [partial derivative]B([x.sup.0], [rho]) and for all [lambda] > 0 we have

f([x.sup.0] + y) [not equal to] [lambda]f([x.sup.0] - y).

Proof. If a is normal to B ([x.sup.0], [rho]) at x = [x.sup.0] + y, its negative -a is normal to B ([x.sup.0], [rho]) at [x.sup.0] - y. So, if we had f([x.sup.0] + y) = [lambda]f([x.sup.0] - y) for some [lambda] > 0, we would arrive at

0 < <f([x.sup.0] + y), a> = <[lambda]f([x.sup.0] - y), a> = -[lambda]<f([x.sup.0] - y), -a>,

which is impossible since <f([x.sup.0] - y), -a> > 0.

Just in passing, let us note that Theorem 3.2 remains valid if we replace B([x.sup.0], [rho]) by an arbitrary non-empty open bounded convex set. Part a) can then be proved by replacing [psi](x) by p(x - [x.sup.0]) where [x.sup.0] is an arbitrary point of C and [rho] is the Minkowski functional of C - [x.sup.0]. A proof for b) can easily be derived from the following proposition, which is just the application of the Leray-Schauder-Theorem [8, 6.3.3] to the mapping g with g(x) = x - f([x.sup.0] + x), x [member of] C - [x.sup.0].

PROPOSITION 3.4. Let C be a non-empty open bounded subset of [R.sup.n] and f : [bar.C] [right arrow] [R.sup.n] a continuous mapping. Assume that there is an [x.sup.0] [member of] C such that for all x [member of] [partial derivative]C

f(x) [not member of] {[lambda](x - [x.sup.0]) : [lambda] < 0}.

Then f has a zero in [bar.C].

4. The hierarchy with respect to generality. In this section we prove our central results. We show that the Kantorovich theorem for an arbitrary norm is a special case of the generalized Miranda theorem, Theorem 3.2. In addition we show that the Kantorovich theorem for an arbitrary norm is also a special case of Borsuk's theorem. This hierarchy is illustrated in Figure 1.

We will use the numbers [[rho].sub.-], [[rho].sub.+] which have been defined in Theorem 2.2.

THEOREM 4.1. Let f satisfy all assumptions of the affine invariant form of the Kantorovich theorem (Theorem 2.2). Put

g : D [right arrow] [R.sup.n], x [??] f'[([x.sup.0]).sup.-1] f(x)

and consider any positive [rho] [meber of] [[[rho].sub.-], [[rho].sub.+]] such that [bar.B]([x.sup.0], [rho]) [subset or equal to] D. Then for all x [member of] [partial derivative]B([x.sup.0], [rho]) and for all normals a to B([x.sup.0], [rho]) at x we have

<g(x), a> [greater than or equal to] 0, (4.1)

which is the hypothesis of the generalized Miranda theorem (Theorem 3.2) for the mapping g and the ball B([x.sup.0], [rho]).

Proof. Let x [member of] [partial derivative]B([x.sup.0], [rho]) and a a normal to B([x.sup.0], [rho]) at x. Because of Lemma 3.1, we can assume [[parallel]a[parallel].sub.d] = 1 and <a, x - [x.sup.0]> = [rho]. We first prove

[absolute value of <g(x) - g([x.sup.0]), a> - [rho]] [less than or equal to] [omega]/2 [[rho].sup.2]. (4.2)

In order to do so, we define the continuously differentiable function [psi] : [0,1] [right arrow] R as

[psi](t) = <g([x.sup.0] + t(x - [x.sup.0])) - tg'([x.sup.0])(x - [x.sup.0]), a>.

[FIGURE 4.1 OMITTED]

Then

[psi]'(t) = <(g' ([x.sup.0] + t(x - [x.sup.0])) - g' ([x.sup.0]))(x - [x.sup.0]), a>

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

On the other hand

[psi](1) - [psi](0) = <g(x) - g'([x.sup.0])(x - [x.sup.0]), a> - <g([x.sup.0]), a> and g'([x.sup.0]) = id as well as <x - [x.sup.0] , a> = [rho], which shows (4.2). ?From (4.2) we conclude

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In the proof of Theorem 4.1 we have actually shown that (4.1) holds with strict inequality as soon as [rho] [member of] ([[rho].sub.-], [[rho].sub.+]). Using Lemma 3.3 we thus arrive at the following result.

THEOREM 4.2. Let f satisfy all assumptions of the affine invariant form of the Kantorovich theorem. Then f satisfies (2.2) for all balls [bar.B]([x.sup.0], [rho]) [subset or equal to] D with [rho] [member of] ([[rho].sub.-], [[rho].sub.+]).

Proof. The discussion above already showed that g = f'[([x.sup.0]).sup.-1] f satisfies (2.2). But (2.2) remains invariant under affine transformations, so f satisfies (2.2), too.

Of course, it would be more satisfying if in the above theorem one could take the closed interval [[[rho].sub.-], [[rho].sub.+]] for [rho] instead of just the open interval. As we will show now, this is indeed so, as soon as [eta] > 0, i.e. if f ([x.sup.0]) [not equal to] 0. This result is proved in our final theorem, where we have to go a direct way without using the generalized Miranda Theorem.

THEOREM 4.3. Let f satisfy all assumptions of the affine invariant form of the Kantorovich theorem (Theorem 2.2) and exclude the case [eta] = 0. Then f satisfies (2.2) for all [rho] [member of] [[[rho].sub.-], [[rho].sub.+]] such that [bar.B]([x.sub.0], [rho]) [subset or equal to] D.

Proof. We will show that (2.2) holds for g = f'[([x.sup.0]).sup.-1] f. It then also holds for f. To start, let [x.sup.0] + y [member of] [partial derivative]B([x.sup.0], [rho]) and assume that (2.2) does not hold, i.e. we have [lambda] [member of] (0, [infinity]) such that

g([x.sup.0] + y) = [lambda]g([x.sup.0] - y). (4.3)

Replacing, if necessary, y by -y, we can even assume [lambda] [greater than or equal to] 1. Let a be an arbitrary vector to be specified later, and set

[psi](t) = <g([x.sup.0] + ty) - tg'([x.sup.0])y, a>.

Then

[psi](t) = <(g'([x.sup.0] + ty) - g'([x.sup.0])) y, a>

and, since g'([x.sup.0]) = id, we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which gives

<g([x.sup.0] + y), a> = <y, a> + <g([x.sup.0]), a> + [[integral].sup.1.sub.0] <(g'([x.sup.0] + ty) - g'([x.sup.0])) y, a> dt, (4.4)

and, similarly for -y instead of y

<g([x.sup.0] - y) , a> = -<y, a> + <g([x.sup.0]), a> - [[integral].sup.1.sub.0] <(g'([x.sup.0] - ty) - g'([x.sup.0])) y, a> dt. (4.5)

By (4.3) we have

<g([x.sup.0] + y), a> = [lambda]<g([x.sup.0] - y), a>,

which, using (4.4) and (4.5) and after rearranging terms gives

(1 + [lambda])<y, a> = ([lambda] - 1)<g([x.sup.0]), a> - [[integral].sup.1.sub.0] <(g'([x.sup.0] + ty) - g'([x.sup.0]))y, a> dt

- [lambda] [[integral].sup.1.sub.0 <(g'([x.sup.0] - ty) - g'([x.sup.0]))y, a> dt. (4.6)

Now take a such that [[parallel]a[parallel].sub.d] = 1 and <y, a> = [parallel]y[parallel]. Such a exists, since ([R.sup.n], [parallel] x [parallel]) is identical to its bidual, i.e.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In exactly the same manner as in the proof of Theorem 4.1, bounding the terms on the right hand side of (4.6), we get

(1 + [lambda]) [parallel]y[parallel] [less than or equal to] ([lambda] - 1)[parallel]g([x.sup.0])[parallel] + [omega]/2 x [[parallel]y[parallel].sup.2] + [lambda] [omega]/2[[parallel]y[parallel].sup.2]

where [parallel]y[parallel] = [rho] and [parallel]g([x.sup.0][parallel] [less than or equal to] [eta]. We therefore have

(1 + [lambda])[rho] [less than or equal to] ([lambda] - 1)[eta] + (1 + [lambda])[omega]/2 [[rho].sup.2]

or

(1 + [lambda]) ([omega]/2 [[rho].sup.2] - [rho] + [eta]) - 2[eta] [greater than or equal to] 0.

But this is impossible, because for [rho] [member of] [[[rho].sub.-], [[rho].sub.+]] the first summand is non-positive and [eta] > 0. Therefore, (4.3) does not hold, and since [x.sup.0] + y was chosen to be an arbitrary point from [partial derivative]B([x.sup.0], [rho]) we have shown that g satisfies (2.2).

As was suggested by an anonymous referee, there is another well-known theorem on the existence of zeros which adds an additional stage to the hierarchy presented so far. This theorem is Smale's theorem [13], which in the finite-dimensional case may be stated as follows:

THEOREM 4.4. Let f : [R.sup.n] [right arrow] [R.sup.n] be analytic in [R.sup.n] and for [x.sup.0] [member of] [R.sup.n] let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Then, if [beta]([x.sup.0], f)[gamma]([x.sup.0], f) < [[alpha].sub.0], where [[alpha].sub.o] is an invariant, approximately equal to 0.130707, the iterates of Newton's method starting with [x.sup.0] converge to a zero x* of f.

This result is interesting because it proves convergence of Newton's method from information at just one point [x.sup.0].

As was already mentioned in [13], this result is in fact a special case of the affine invariant version of the Newton-Kantorovich theorem (Theorem 2.2), meaning that the hypotheses of Theorem 4.4 imply those of Theorem 2.2. As was shown by Rheinboldt [9], Theorem 4.4 may be generalized to a local version where f is assumed to be analytic only on some open subset of [R.sup.n]. The proof of that result shows that, again, we are in the presence of a special case of the Newton-Kantorovich theorem.

5. Conclusions. In this paper we have established a hierarchy with respect to generality between Kantorovich's theorem (which contains Smale's theorem), Miranda's theorem and its generalization and Borsuk's theorem. This hierarchy is meant only with respect to the existence of a zero. While the Kantorovich theorem also guarantees the uniqueness of the zero (and the convergence of Newton's method), the other theorems only partly address this aspect: they actually guarantee that the topological degree of the mapping is odd, see [3]. We have proven two major results. The first (Theorem 4.1) shows that if Kantorovich's theorem (Theorem 2.2) guarantees the existence of a zero of f in a ball B([x.sup.0], [rho]), [rho] > 0, [[rho].sub.-] [less than or equal to] [rho] [less than or equal to] [[rho].sub.+] with B([x.sup.0], [rho]) [subset or equal to] D, then the generalized Miranda theorem (Theorem 3.2), applied to f'[([x.sup.0]).sup.-1] f also guarantees the existence of a zero in the same ball. In this sense, the generalized Miranda theorem is the more general theorem. Our second major result, Theorem 4.3, establishes a similar relationship between Kantorovich's theorem and Borsuk's theorem. Here, we can even use the same mapping f in both theorems, i.e. the transition to f'[([x.sup.0]).sup.-1] f is not necessary. On the other hand, we have to restrict ourselves to the case [eta] > 0, i.e. f ([x.sup.0]) [not equal to] 0. If f ([x.sup.0]) = 0, Theorem 4.2 shows that f still satisfies the hypothesis of Borsuk's theorem for all [rho] [member of] (0, [[rho].sub.+]) (note that [[rho].sub.-] = 0 for [eta] = 0). Hence, the only situation where we did not establish the connection between Kantorovich and Borsuk is when we simultaneously have [eta] = 0 and [rho] = [[rho].sub.+]. The following example shows that then there indeed need not be such a connection.

EXAMPLE 1. Let f : R [right arrow] R, f(x) = x(1 - [absolute value of x]). Then f'(x) = 1 - 2[absolute value of x]. Take [x.sup.0] = 0 so that f([x.sup.0]) = 0, f'([x.sup.0]) = 1, and take [eta] = 0, [omega] = 2 in Kantorovich's theorem which thus gives [[rho].sub.-] = 0, [[rho].sub.+] = 1. Therefore, the Kantorovich theorem guarantees that [x.sup.0] is the unique zero of f in (-1, 1). But Borsuk's theorem cannot be applied to the ball [-1, 1], since f(-1) = f(1) = 0, i.e. we have f(x) = [lambda]f(-x) for all [lambda] > 0 and all x [member of] [partial derivative](-1, 1) = {-1, 1}.

REFERENCES

[1] G. Alefeld, F.A. Potra, and Z. Shen, On the existence theorems of Kuntorovich, Moore and Miranda, Comput. Suppl., 15 (2001), pp. 21-28.

[2] Karol Borsuk, Drei Satze uber die n-dimensionale Sphare, Fund. Math., 20 (1933), pp. 177-190.

[3] Klaus Deimling, Nonlinear Functional Analysis, Springer, Berlin, Heidelberg, New York, 1985.

[4] Peter Deuflhard and Gerhard Heindl, A, fine invariant convergence theorems for Newton's method and extensions to related methods, SIAM J. Numer. Anal., 16 (1980), pp. 1-10.

[5] L. Kantorovich, On Newton's method for functional equations (Russian), Dokl. Akad. Nauk SSSR, 59 (1948), pp.1237-1240.

[6] J. Mayer, A generalized theorem of Miranda and the theorem of Newton-Kantorovich, Numer. Funct. Anal. Optim., 23 (2002), nos. 3&4, pp. 333-357.

[7] C. Miranda, Un'osservatione su un teorema di Brouwer, Boll. Un. Mat. Ital., 3 (1940), no. 2, pp. 5 7.

[8] James Ortega and Werner Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970.

[9] Werner C. Rheinboldt, On a theorem of S. Smale about Newton's method for analytic mappings, Appl. Math. Lett., 1 (1988), pp. 69-72.

[10] T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970.

[11] J. Rohn, An existence theorem for systems of nonlinear equations, Z. Angew. Math. Mech., 60 (1980), no. 8, pp. 345.

[12] G. Sansone and R. Conti, Nonlinear Differential Equations, Pergamon Press, Oxford, 1964.

[13] Steve Smale, Algorithms for solving equations, in Proceedings of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley, 1986), A. Gleason, ed., Amer. Math. Soc., Providence, RI, 1987, pp. 172-195.

[14] M. Vrahatis, A short proof and a generalization of Miranda's existence theorem, Proc. Amer. Math. Soc., 107 (1989), no. 3, pp. 701-703.

* Received April 1, 2003. Accepted for publication December 19, 2003. Recommended by Bill Gragg.

GOTZ ALEFELD ([dagger]), ANDREAS FROMMER ([double dagger]), GERHARD HEINDL ([double dagger]), AND JAN MAYER ([dagger])

([dagger]) Institut fur Angewandte Mathematik Universitat Karlsruhe, D-76128 Karlsruhe, Germany. E-mail: {goetz.alefeld,jan.mayer}@math.uni-karlsruhe.de.

([double dagger]) Fachbereich Mathematik and Naturwissenschaften, Universitat Wuppertal, D-42097 Wuppertal, Germany. E-mail: {frommer,heindl}@math.uni-wuppertal.de.

Printer friendly Cite/link Email Feedback | |

Author: | Alefeld, Gotz; Frommer, Andreas; Heindl, Gerhard; Mayer, Jan |
---|---|

Publication: | Electronic Transactions on Numerical Analysis |

Date: | Jan 1, 2004 |

Words: | 5873 |

Previous Article: | On the estimation of the q-numerical range of monic matrix polynomials. |

Next Article: | Collocation methods for Cauchy singular integral equations on the interval. |

Kantorovich-type generalized sampling series in the setting of Orlicz spaces. |

Fixed point theory and applications; v.6. |