Printer Friendly

On-Line Analysis of the TCP Acknowledgment Delay Problem.

Abstract. We study an on-line problem that is motivated by the networking problem of dynamically adjusting delays of acknowledgments in the Transmission Control Protocol (TCP). We provide a theoretical model for this problem in which the goal is to send acks at times that minimize a linear combination of the cost for the number of acknowledgments sent and the cost for the additional latency introduced by delaying acknowledgments. To study the usefulness of applying packet arrival time prediction to this problem, we assume there is an oracle that provides the algorithm with the times of the next L arrivals, for some L [is greater than or equal to] 0.

We give two different objective functions for measuring the cost of a solution, each with its own measure of latency cost. For each objective function we first give an O([n.sup.2])-time dynamic programming algorithm for optimally solving the off-line problem. Then we describe an on-line algorithm that greedily acknowledges exactly when the cost for an acknowledgment is less than the latency cost incurred by not acknowledging. We show that for this algorithm there is a sequence of n packet arrivals for which it is [Omega] ([square root of n])-competitive for the first objective function, 2-competitive for the second function for L = 0, and 1-competitive for the second function for L = 1. Next we present a second on-line algorithm which is a slight modification of the first, and we prove that it is 2-competitive for both objective functions for all L. We also give lower bounds on the competitive ratio for any deterministic on-line algorithm. These results show that for each objective function, at least one of our algorithms is optimal.

Finally, we give some initial empirical results using arrival sequences from real network traffic where we compare the two methods used in TCP for acknowledgment delay with our two on-line algorithms. In all cases we examine performance with L = 0 and L = 1. Categories and Subject Descriptors: F.1.2 [Computation by Abstract Devices]: Modes of Computation --online computation; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems--sequencing and scheduling; C.2.2 [Computer-Communication Networks]: Network Protocols--applications

General Terms: Algorithms

Additional Key Words and Phrases: Acknowledgment delay problem, competitive analysis, internet traffic simulations, lookahead, Transmission Control Protocol (TCP)

1. Introduction

In this paper, we study an on-line problem that is motivated by the networking problem of dynamically adjusting delays of acknowledgments in the Transmission Control Protocol (TCP) [Stevens 1994], a very popular protocol in use in the Internet. The specific problem we study is the following. There is a sequence of n packet arrival times A = <[a.sub.1], ..., [a.sub.n]>. The goal is to partition A into k subsequences [[Sigma].sub.1], [[Sigma.sub.2], ..., [[Sigma].sub.k] where a subsequence end is defined by an acknowledgment. We use ([[Sigma].sub.i] to denote the set of arrivals in the partition and [t.sub.i] to denote the time when the acknowledgment for [[Sigma].sub.i] is sent. To capture the fact that all arrivals must be acknowledged, we must have k [is greater than or equal to] 1 and [t.sub.k] [is greater than or equal to] an, the time of the last arrival.

By delaying the acknowledgments for some packets, we reduce the number of acknowledgments sent (and the associated costs) from n to k. However, delaying an acknowledgment adds to the latency costs(1), which we want to avoid since it can result in bursty TCP traffic, with potentially disastrous consequences [Paxson 1997]. Thus, one wants to optimally balance the costs of an acknowledgment with the costs associated with the increased latency. We model the tradeoff between the cost of an acknowledgment and the cost for the latency by using the objective function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] measures the extra latency for the arrivals in [[Sigma].sub.i] when the acknowledgment is sent at time [t.sub.i]. Section 3 describes our two measures of lat, which are appropriate for different circumstances. In an optimal solution, [t.sub.i] will always be when the last packet of [[Sigma].sub.i] arrives. The factor [Eta] [Epsilon] [0, 1] weights the relative importance of minimizing acknowledgments and minimizing latency, and would likely be set by a network administrator.

For the real networking problem, past traffic characteristics might be useful in estimating the time of the next arrival(s), which in turn could be used to decide when to send an acknowledgment, an idea endorsed by Clark [1982]. In order to understand the value of designing learning algorithms for this problem, in this paper we study the performance of our acknowledgment delay algorithm when provided with an oracle that gives the exact time for the next L arrivals (we call L a lookahead coefficient). Namely, at the time of arrival [a.sub.i], the on-line algorithm is provided with the values for [a.sub.i+1], ..., [a.sub.i+L]. For all j [is greater than] n, we assume the oracle returns [a.sub.j] = [infinity]. We are primarily interested in studying small values of L, particularly L = 1. Clearly this work is just a first step towards our eventual goal of replacing the oracle by a learning algorithm. For example, if the lookahead oracle is replaced by a learning algorithm then it would be important to consider the case of a noisy oracle. There are also other approaches such as combining the control algorithm with prediction, that is, use reinforcement learning (e.g., Sutton and Barto [1998]) to learn a control policy that decides when to send acknowledgments based on the current state of the world. Fundamental to this is modeling the network as a fully or partially observable Markov decision process (MDP).

This paper is organized as follows. The real problem of delaying TCP acknowledgments is more complex than our above description, so in Section 2 we discuss in more detail some of the other constraints of this problem. For each of the on-line algorithms that we present, we will first consider the simplified setting described above. However, we will then describe how each of our algorithms and their proofs can be extended to cover the other constraints required in a real TCP implementation. Additionally, each of our on-line algorithms runs in constant time per arrival, which is important given the large number of arrivals a TCP implementation must contend with in a short period of time.

Section 3 describes the two objective functions that we consider and the circumstances under which they are appropriate measures of performance. In Section 4, we describe the key differences between the on-line problem considered here and some classic on-line problems such as the rent-to-buy (or spin-block) problem [Krishnan et al. 1999; Karlin et al. 1994] and the snoopy caching problem [Karlin et al. 1994]. Section 5 summarizes the results contained in this paper. All our results for our two objective functions appear in Sections 6 and 7. In each section, we present a dynamic programming algorithm for optimally solving the off-line version of the acknowledgment delay problem. We then describe algorithms [greedy.sub.tot] and [greedy.sub.new] and give their competitive bounds for the cases of no lookahead and non-zero lookahead. We then present lower bounds on the competitive ratio. Finally, we close each section with initial empirical results using arrival sequences from real network traffic where we compare the two methods currently used in TCP for acknowledgment delay with our two on-line algorithms. In all cases, we examine performance with L = 0 and L = 1. We conclude this paper in Section 8.

2. Motivation

Most TCP implementations used today employ some sort of acknowledgment delay mechanism. Delaying acknowledgments allows TCP to acknowledge multiple incoming data segments with a single acknowledgment and sometimes to piggy-back the acknowledgment on an outgoing data segment [Stevens 1994; Paxson 1997; Clark 1982]. By reducing the number of acknowledgments sent and received, we reduce bandwidth consumption and the overhead required to send and receive each acknowledgment (due to generating checksums, handling interrupts, etc.). So we want to minimize the number of acknowledgments sent, but not at the expense of adding excessive latency to the TCP connection.

Note that there is always space in a TCP packet header to indicate whether the current packet is also an acknowledgment and which packet(s) it acknowledges. Thus piggy-backing an acknowledgment on a packet costs no extra bandwidth, since that space in the header would have been otherwise unused [Stevens 1994]. If an acknowledgment is not piggy-backed, then a new packet, to be used only as an acknowledgment (otherwise known as a "pure ack" [Wright and Stevens 1995; Paxson 1997]), must be created. However, after creating the packet for the pure ack, there is no extra cost for acknowledging multiple packets with it.

TCP implementations, if they delay acknowledgments at all, use one of two approaches [Stevens 1994; Paxson 1997]. Solaris TCP uses an interval timer of 50 ms. When a packet arrives, a timer is started that expires in 50 ms, at which point an acknowledgment is sent for all currently unacknowledged packets. The BSD-derived TCPs use a 200 ms "heartbeat" timer. Here the timer fires every 200 ms independent of the arrivals of the packets. When the timer fires, all outstanding packets are acknowledged.

One of our research goals for the acknowledgment delay problem is to use prior traffic patterns to learn a rule for predicting when the next arrival will be. Clark [1982] speculated that such a predictor could be very useful in dynamically adjusting the acknowledgment delay timer. However, having a very good prediction for the next arrival is only valuable if that information could help in deciding whether or not to delay the acknowledgment. Thus it is important to understand how having perfect knowledge of the next arrival time (i.e., a lookahead of L = 1) affects the performance that can be achieved. Similar arguments can be made for studying larger lookahead values.

To model the fact that different applications vary in the cost of an acknowledgment versus the cost of increased latency, we introduce an objective function, f(k, lat) = [Eta]k + (1 - [Eta])lat, to measure the performance where k is the total number of acknowledgments, lat is the total extra latency(2) summed over all [[Sigma].sub.i], and [Eta] is a factor from [0, 1] that weights the relative importance of minimizing acknowledgments and minimizing latency. An algorithm for this problem receives a sequence of arrival times A = <[a.sub.1], ..., [a.sub.n]> interspersed with a sequence of departure times D = <[d.sub.1] ..., [d.sub.m]> and schedules a sequence of acknowledgments that minimizes f(k, lat) for a given [Eta]. The following restrictions apply to the solution. (1) The arrival of a rush packet requires an immediate acknowledgment and a rush departure requires an immediate transmission. A packet is rush if it falls into one of several categories such as packets that are used in the maintenance of a connection like a SYN or FIN [Paxson 1997; Braden 1989; Clark 1982]. (2) The maximum allowed delay of any acknowledgment and any departure is [Tau] [is less than or equal to] 500 ms [Braden 1989]. (3) Each departure can only be combined with acknowledgments (i.e., two departures cannot be combined), and if a pending departure is a non-rush departure then we may choose to delay the departure if we believe that another packet is about to arrive (and hence we can piggyback the acknowledgment on the departing packet). However, we are not allowed to delay any departure [d.sub.i] past the point when a later departure ([d.sub.j], j [is greater than] i) is transmitted, that is, the departures must be transmitted in the order of their ready times given in D. We assume that the latency from delaying a non-rush departure is equivalent to the latency from delaying the acknowledgment of an arrival.

Modifying an algorithm to handle the first restriction (immediate acknowledgment of rush packets and immediate transmission of rush departures) is trivial. Because there is no choice as to whether or not to immediately acknowledge or transmit a rush packet (and hence end the subsequence), the rush packets simply partition the original problem into a set of subproblems for which our algorithms can be applied. Handling the other two restrictions depends on the particular algorithm. In presenting our algorithms, we initially ignore the maximum delay restriction. We also assume that all departures are rush (requiring immediate transmission), so we can visualize A as being partitioned into subsequences, where each subsequence is delimited by a departure. Hence we can use these departure-free subsequences as inputs to our algorithms, so we introduce our algorithms assuming there is no maximum delay f and there are only rush departures. We later argue that our algorithms can be easily modified to accommodate the general setting and that our on-line algorithms' decisions as to when to send an acknowledgment (or departure) can be made using constant time per arrival.

A possible impediment to implementing our on-line algorithms is the need to perform more maintenance on timers in the operating system than we would in the schemes currently used in TCP implementations. This would normally be very expensive, but is mitigated by the work of Costello and Varghese [1998] who reimplemented the BSD kernel to improve the overhead of maintaining timers from linear in the number of timers to constant time. This of course requires the use of the new kernel, but it shows that the requirement of extra timer maintenance need not be a severe problem.

3. Our Objective Functions

As discussed in the introduction, to model the networking problem of dynamically adjusting delays of acknowledgments in TCP, we selected an objective function (under two different ways of measuring latency) to trade off the cost per acknowledgment with the latency cost incurred due to the delay in acknowledging a packet. Clearly, the choice of the objective function affects the decision made by an optimal algorithm. In this section, we explore two definitions of latency cost for use in our generic objective function, which is defined as follows:

Definition 1

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where k is the number of acknowledgments sent, [Eta] [Epsilon] [0, 1] is the factor that weights the cost of acknowledging versus the cost of waiting, and lat([[Sigma].sub.i], [t.sub.i]) is a measure of the total extra latency for subsequence [[Sigma].sub.i] when its acknowledgment is sent at time [t.sub.i]. It can be any function that increases as an acknowledgment is further delayed.

We assume that lat is invertible, which is required by our algorithms to operate. It could be the case that lat is a series of functions <[lat.sub.1], [lat.sub.2], ..., [lat.sub.m]> where [lat.sub.j]([[Sigma].sub.i], t) is the total latency accumulated from the first arrival of subsequence [[Sigma].sub.i] to time t when arrival [a.sub.j] is the most recent arrival added to [[Sigma].sub.i]. If this is the case then we require [lat.sub.j]([[Sigma].sub.i], t) [is less than or equal to] [lat.sub.l]([[Sigma].sub.i], t) for all l [is greater than] j and [a.sub.j], [a.sub.l] [Epsilon] [[Sigma].sub.i]. This way latency for a given subsequence increases as more time passes and as more packets arrive. So flat not only includes the two functions we study in this paper, but any other reasonable(3) objective function of the form of Eq. (1).

Suppose that each arrival [a.sub.j] [Epsilon] [[Sigma].sub.i] was associated with a different computation being performed. Then in this case, the cost introduced to the entire system due to the extra latency would be the sum of the latency costs introduced for each of the computations. This motivates our first definition of lat, which is the sum of all the extra latencies of the packets in [[Sigma].sub.i]. This yields our first objective function, [f.sub.sum].

Definition 2

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [t.sub.i] is the time of acknowledgment i.

Note that [f.sub.sum] is related to the average latency per packet over the entire sequence: simply divide the second term by n.

Now suppose instead that each arrival [a.sub.j] [Epsilon] [[Sigma].sub.i] was associated with the same computation. In this case, the extra latency caused by delaying acknowledgments is best modeled by setting lat to be the maximum of all the extra latencies of the packets in [[Sigma].sub.i]. In other words, the cost of the latency is the difference between the time at which the computation is completed and the minimum time that it could be completed. This leads us to our second objective function, [f.sub.max].

Definition 3

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [t.sub.i] is the time of acknowledgment i.

Of course, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is always [t.sub.i] - [a.sub.first,i] where [a.sub.first,i] is the first arrival of [[Sigma].sub.i].

Note that the only difference between the two objective functions is the cost per unit time of the extra latency for each [[Sigma].sub.i]. For [f.sub.max], this cost is linear in time with slope (1 - [Eta]). For [f.sub.sum], this cost is piecewise-linear in time with slope (1 - [Eta]) |[[Sigma].sub.i]|, where |[[Sigma].sub.i]| increases by 1 with each new arrival until an acknowledgment is sent.

4. Related Work

We use standard competitive analysis [Sleator and Tarjan 1985; Karlin et al. 1988; Borodin et al. 1992] to evaluate our algorithms. Specifically, let [C.sub.opt] be the cost (for acknowledgments and latency) of an optimal solution and let [C.sub.A] be the cost of the solution produced by on-line algorithm A. Then we say that A is [Alpha]-competitive if for all sequences of arrivals, [C.sub.A] [is less than or equal to] [Alpha][C.sub.opt]. We are not including an additive factor as is often done.(4)

Our problem is a generalization of the rent-to-buy problem, which has also been called the spin-block problem [Krishnan et al. 1999; Karlin et al. 1994]. In the rent-to-buy problem, the on-line algorithm is told that a given item can be rented at 1 dollar per day or bought for a cost of K dollars. The item will be needed for d days where d is not known to the on-line algorithm. At the beginning of each day the on-line decision is whether to buy the item or to rent it for another day. Clearly, if d [is greater than or equal to] K the optimal solution is to buy the item at the start for a cost of K. Otherwise (d [is less than] K), the optimal solution is to rent for a cost of d. The rent-to-buy problem can be reduced to the acknowledgment delay problem (for either [f.sub.sum] or [f.sub.max]) as follows: Suppose that there are going to be only two arrivals, the first at time 0 and the second at time a = d/(1 - K). Let the cost for an acknowledgment [Eta] = K, the cost to buy. Furthermore, suppose that the cost of the solution does not include the final acknowledgment required at time a. Then the cost to buy at time t is equivalent to the cost for the acknowledgment delay solution in which an acknowledgment is sent at time t/(1 - K). Similarly, the cost to rent the entire d days is equivalent to the acknowledgment delay solution in which there is no acknowledgment before the second arrival.

The acknowledgment delay problem we study extends the rent-to-buy problem in three ways. First, we consider an arbitrary sequence of arrivals versus just two arrivals. Second, we consider when the on-line algorithm can have any constant lookahead factor L. Third, we consider the on-line problem in the presence of a maximum delay (i.e. maximum rent time) constraint. Additionally, while for the rent-to-buy problem the off-line problem is trivially solved, for the acknowledgment delay problem with objective function [f.sub.sum], finding an optimal solution in the off-line case is non-trivial, even when ignoring the maximum delay constraint.

Another well-studied on-line problem is the snoopy caching problem [Karlin et al. 1994]. This problem is also a generalization of the rent-to-buy problem. However, the generalization there is orthogonal to what we consider here. Finally, there are some problems in which the on-line problem with lookahead has been studied. For example, Yeh et al. [1998] study the on-line disk scheduling problem where one can look ahead at the next k variables that are to be read and from that knowledge one wants to pick the order in which to read the variables from the disk to minimize the seek start-up time. Also, Grove [1995] presents on-line algorithms for the bin packing problem when the algorithm can look into the future up to some total cost, rather than a number of items into the future. While these papers allow the on-line algorithms to have some knowledge about the future, the problems themselves are very different from the acknowledgment delay problem.

Finally, Ben-David and Borodin [1994] studied the problem of lookahead for the k-server problem. They showed that any finite lookahead L could not improve the competitive ratio by simply repeating each request in the request sequence L times. While we obtain the same type of result for the our problem, a more involved lower bound is required. In their paper, Ben-David and Borodin also propose a different competitive measure to study issues like lookahead and memory usage. In particular they propose what they call a Max/Max ratio which is the supremum over all sequence lengths l of the amortized cost of the on-line algorithm divided by the amortized cost of the optimal algorithm.

5. Summary of Results

Section 6 gives our results for [f.sub.sum] and Section 7 gives our results for [f.sub.max]. In each section, we first give a dynamic programming algorithm for the off-line problem of finding an optimal partition when the algorithm knows <[a.sub.1], ..., [a.sub.n]> at the time of the first arrival. The off-line algorithms work even with non-rush departures and a maximum delay constraint. We then consider the on-line problem.

In general, our on-line algorithms work as follows: After the jth arrival (at time [a.sub.j]) an alarm is set for time [a.sub.j] + t, where we vary the method for selecting t. For L = 0 (i.e., no lookahead), we wait until the next arrival or the alarm, whichever comes first. If [a.sub.j+1] [is less than or equal to] [a.sub.i] + t (i.e., the next arrival comes first), then we reset the alarm, possibly using a new value of t. Otherwise, at time [a.sub.j] + t (i.e., the alarm time), we acknowledge all outstanding arrivals, ending the current subsequence. For L [is greater than or equal] 1, we can use knowledge of [a.sub.j+1] to potentially reduce the latency. Specifically, if [a.sub.j+1] [is greater than] [a.sub.i] + t, we immediately send an acknowledgment at time [a.sub.j] (versus unnecessarily waiting until [a.sub.j] + t). Otherwise, we w[a.sub.i]t until arrival [a.sub.j+1] and repeat. By changing the method for selecting t, we get algorithms with very different behaviors.

Our two on-line algorithms, [greedy.sub.tot] and [greedy.sub.new], use different methods to select t. Algorithm [greedy.sub.tot] sets its alarm such that the cost for sending an acknowledgment is equal to the (total) latency cost of waiting time t until the alarm. By picking t in this way one would perform optimally (for lookahead L = 1) if the next arrival were the last one. We give an arrival sequence A of n arrivals for which [greedy.sub.tot] is [Omega]([square root of n])-competitive (even for L = 1) under [f.sub.sum]. Under [f.sub.max], we prove that [greedy.sub.tot] has a competitive ratio of 2 for L = 0 and a competitive ratio of 1 for L = 1 if [Tau] is ignored (i.e., there is no maximum delay constraint). If [Tau] is considered, then [greedy.sub.tot]'S competitive ratio varies between 1 and 2 for L = 1 and between 2 and 3 for L = 0.

On-line algorithm [greedy.sub.new] selects t so that the cost of the (new) latency incurred from the last acknowledgment until the alarm is equal to the cost of one acknowledgment. This method of selecting t is similar to the algorithm of Karlin et al. [1994, 1988] for the rent-to-buy problem in which their on-line algorithm waits exactly long enough such that the cost of waiting ("renting") equals the cost of committing ("buying"). We prove that even when L = 0, [greedy.sub.new] is 2-competitive under [f.sub.lat], implying that it is 2-competitive under both [f.sub.sum] and [f.sub.max]. When L = 1, [greedy.sub.new]'s competitive ratio does not change under [f.sub.sum] and [f.sub.max], so in the worst case [greedy.sub.tot] is superior to [greedy.sub.new] under [f.sub.max] and [greedy.sub.new] is superior to [greedy.sub.tot] under [f.sub.sum].

While it may seem that having L [is greater than or equal to] 1 would enable an algorithm to obtain a better competitive ratio, we show that this is not the case under [f.sub.sum]. Let [C.sub.opt] be the cost of an optimal solution and let [C.sub.A] be the cost of the solution produced by any deterministic on-line algorithm A. We prove that A with any constant lookahead L has competitive ratio [C.sub.A] [is greater than or equal to] 2([C.sub.opt] - c, where c is a factor that can be made arbitrarily small with respect to [C.sub.opt]. Thus, under [f.sub.sum], [greedy.sub.new] with L = 0 is the best possible in the worst case even for on-line algorithms that can use a constant lookahead. We also show that for L = 0, there is a worst-case lower bound of 2 on the competitive ratio under [f.sub.max]. Our results are summarized in Table I.
TABLE I. SUMMARY OF THE RESULTS OF THIS PAPER

                         [f.sub.sum]                 [f.sub.max]

                     L = 0         L = 1         L = 0          L = 1

[greedy.sub.tot]    [Omega]       [Omega]       2([double     1([double
                    ([square      ([square      dagger])      dagger]
                    root of n])   root of n])
                    ([dagger])    ([dagger])
[greedy.sub.new]        2             2             2             2
Lower Bound         2([dagger])   2([dagger])   2([dagger])   1([double
                                                              dagger])
([dagger]) If [Tau] is ignored.

([double dagger]) If [Tau] is ignored or [Tau] < [Eta]/
(1 - [Eta]). For [Tau] [is greater than or equal to] [Eta]/
(1 - [Eta]), under [f.sub.max] [greedy.sub.tot] with L = 1 is
(1 + [Eta]/([Tau](1 - [Eta])))-competitive and [greedy.sub.tot] with
L = 0 is (1 + 2[Eta]/([Tau](1 - [Eta])))-competitive.


Although, in the worst case, having a lookahead of 1 does not help for [f.sub.sum], in practice it is likely to reduce the latency cost. Hence, for each objective function, we give some initial empirical results using arrival sequences from real network traffic where we compare the two methods used in TCP for acknowledgment delay with [greedy.sub.tot] and [greedy.sub.new] under both objective functions. In all cases, we examine performance with L = 0 and L = 1.

6. Objective Function [f.sub.sum]

Recall Definition 2 of Section 3, which defined [f.sub.sum]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where A = <[a.sub.1], ..., [a.sub.n]> is the sequence of packet arrivals, k is the number of acknowledgments, the acknowledgment for the packets in [Sigma.sub.i] comes at [t.sub.i], and [Eta] [element of] [0, 1] weights the importance of minimizing acknowledgments with that of minimizing extra latency. We now present our algorithms for the acknowledgment delay problem under [f.sub.sum.]

6.1. AN OFF-LINE ALGORITHM UNDER [f.sub.sum]. In this section, we consider the off-line problem in which A = <[a.sub.1], ..., [a.sub.n]> is known. The goal is to minimize [f.sub.sum]. Note that [t.sub.i], the time of the acknowledgment ending [[Sigma].sub.i], is exactly the time of the last arrival in [[Sigma].sub.i]. For such a partition the objective function is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Since the last term is independent of the algorithms' actions it suffices to optimize [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We use dynamic programming to obtain an O([n.sup.2])-time, optimal off-line algorithm. Let M[i, j] for i, j [element of] [1, n] be the minimum cost solution (using [f'.sub.sum]) for the subsequence <[a.sub.1], ..., [a.sub.i]> (i.e., there is an acknowledgment just after [a.sub.i]) when the second-to-last acknowledgment of this subsequence is just after [a.sub.i - j]. If i = j, then the acknowledgment after [a.sub.i] is the only one of this subsequence, and the acknowledgment (indicated by a |) is placed as follows: <[a.sub.1], ..., [a.sub.i]|>, with no acknowledgments between [a.sub.1] and [a.sub.i]. Otherwise (if j [is less than] i), the acknowledgments are placed as follows: <[a.sub.1], ..., [a.sub.i - j], |[a.sub.i - j + 1], ..., [a.sub.i]|), possibly with acknowledgments between [a.sub.1] and [a.sub.i - j]. To improve the algorithm's efficiency, we will maintain row i's minimum. Let [M.sub.min][i] be the minimum objective value of row i and let [M.sub.pt][i] be a value of j such that M[i, j] = [M.sub.min][i]. The complete algorithm is shown in Figure 1.
FIG. 1. An optimal off-line algorithm that runs under [f.sub.sum].
Note that we can implement this algorithm without using the array
M, but we retain it for clarity.

Off-Line-1

1   Initialize [M.sub.min] [0] [left arrow] 0

2   Initialize M [1, 1] [left arrow] 1 [multiplied by] [Eta] +
     (1 - [Eta]) [multiplied by] [a.sub.1]

3   Initialize [M.sub.min] [1] [left arrow] M [1, 1]

4   Initialize [M.sub.pt] [1] [left arrow] 1

5   For i [element of] [2, n]

6       [M.sub.min] [i] [left arrow] [infinity]
7       For j [element of] [1, i]
8           M [i, j] [left arrow] 1 [multiplied by] [Eta] +
             (1 - [Eta]) [multiplied by] j [multiplied by]
             [a.sub.i] + [M.sub.min] [i - j]

9           If M[i, j] [is less than] [M.sub.min] [i] then

10             [M.sub.min] [i] [left arrow] M [i, j]

11             [M.sub.pt] [i] [left arrow] j


The final solution consists of tracing back through the [M.sub.pt] values and placing acknowledgments. The cost of this final (optimal) solution is is [f'.sub.sum] [M.sub.min][n] = [min.sub.l][element of][1,n]] M[n, l]. We can then subtract [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] from this quantity to get the true cost as measured by [f.sub.sum].

THEOREM 4. Algorithm Off-Line-1 produces an optimal solution to the acknowledgment delay problem under [f.sub.sum] when all departures are rush and there is no maximum delay constraint.

PROOF. It is well known that if a problem possesses the optimal substructure property (i.e., Bellman optimality), then any dynamic programming algorithm that explores all subproblems is an optimal algorithm for that problem [Cormen et al. 1990; Bellman 1957]. A problem exhibits the optimal substructure property if an optimal solution to the problem contains within it optimal solutions to subproblems. To see that the acknowledgment delay problem possesses the optimal substructure property, note that if an oracle tells us that the subsequence ([a.sub.1], ..., [a.sub.i]|) has its second-to-last acknowledgment after arrival [a.sub.i - j] in an optimal solution, then we merely need to optimize the subsequence ([a.sub.1], ..., [a.sub.i - j]|) to obtain an optimal solution. It is also obvious from Figure 1 that our algorithm explores all possible subproblems. []

In the case when not all departures are rush (i.e., departures can be delayed), a simple extension to the above algorithm yields an optimal off-line algorithm. First we merge A and D into a single supersequence with n + m entries. Now i and j are indices into a (n + m) x (n + m) table. Since latency for departures counts the same as for arrivals, the only change to the algorithm is how many transmissions can occur in a subsequence. So simply change the update step of Line 8 from

M[i, j] [left arrow] 1 [multiplied by] [Eta] + (1 - [Eta]) [multiplied by] j [multiplied by] [a.sub.i] + [M.sub.min] [i - j]

to

M[i, j] [left arrow] numdepartures(i,i - j) [multiplied by] [Eta] + (1 - [Eta]) [multiplied by] j [multiplied by] [a.sub.i] + [M.sub.min][i - j],

where numdepartures(i, i - j) counts the number of rows between rows i and i - j + 1 (inclusive) that correspond to departures in the supersequence, including the acknowledgment sent for row i if that row corresponds to an arrival and not a departure. Note that if all departures are rush, then we have no departure sequence and numdepartures is always 1, as indicated in Figure 1. We now make an observation about the departure sequence that we will use later. Note that this observation applies generically to [f.sub.lat] (Definition 1), so it holds for both [f.sub.sum] and [f.sub.max].

OBSERVATION 5. No more than one departure will ever lie in any subsequence in an optimal solution under [f.sub.lat].

PROOF. By contradiction. Assume there exists an optimal solution in which multiple departures lie in the same subsequence. Then we could split that subsequence into multiple subsequences, decreasing latency without increasing transmission count. []

By Observation 5, in our off-line algorithm, we could assign [infinity] to all M[i, j] for which numdepartures(i, i - j) [is greater than] 1. We now further modify our algorithm to handle the maximum delay restriction. Namely, no arrival's acknowledgment or a departure may be delayed by more than [Tau]. We enforce this constraint by altering the update step to assign [infinity] to all M[i, j] for which [x.sub.i] - [y.sub.i - j + 1] [is greater than] [Tau] for x, y [element of] {a, d}. Finally, we note that by evaluating the function lat on Lines 2 and 8 of Figure 1, we get an algorithm that works under [f.sub.lat].

COROLLARY 6. There is an O([n.sup.2]) dynamic programming algorithm that produces an optimal solution to the acknowledgment delay problem under [f.sub.lat] with non-rush departures and a maximum delay constraint.

6.2. ALGORITHM [greedy.sub.tot] UNDER [f.sub.sum]. Our first deterministic on-line algorithm [greedy.sub.tot] will balance the cost of acknowledging immediately with the cost of delaying the acknowledgment for some time [t.sub.tot,sum]. First, assume that all departures are rush and there is no limit on individual delays. Thus, [greedy.sub.tot] need only greedily partition A. Let [a.sub.j] be the time of the packet that just arrived. Let [Sigma] be the set of unacknowledged arrivals just after time [a.sub.j] (this includes the packet corresponding to [a.sub.j], so |[Sigma]| [is greater or equal to] 1). If the algorithm waits some time [t.sub.tot,sum] (i.e., until time [a.sub.j] + [t.sub.tot,sum]) before sending an acknowledgment, then it will incur extra cost due to latency of (1 - [Eta])|[Sigma]|[t.sub.tot,sum] since each packet in [Sigma] will incur [t.sub.tot,sum] units of latency. Also, it has already incurred a cost of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] due to past latency. Acknowledging immediately incurs a cost of [Eta] for the acknowledgment plus the same cost due to past latency. Thus, on each new arrival [a.sub.j], [greedy.sub.tot] sets an alarm at [a.sub.j] + [t.sub.tot,sum] for [t.sub.tot,sum] such that the cost for [Sigma] if an acknowledgment is not sent at [a.sub.j] (namely, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is equal to the cost for [Sigma] if an acknowledgment is sent at [a.sub.j] (namely, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Solving for [t.sub.tot,sum] yields

[t.sub.tot,sum] = [Eta]/|[Sigma]|(1 - [Eta]).

If no new arrival occurs before the alarm, then an acknowledgment is sent at the alarm. Otherwise, the old alarm is deleted and a new one is set at [a.sub.j + 1] + [Eta]/(|[Sigma]'|(1 - [Eta])) for [Sigma]' = [Sigma] [union] {[a.sub.j + 1]}. If [greedy.sub.tot] has access to an oracle(5) that provides [a.sub.j + 1] at time [a.sub.j] (i.e., L = 1), then [greedy.sub.tot] will send an acknowledgment at [a.sub.j] if and only if ([a.sub.j + 1] - [a.sub.j]) [is greater than] [t.sub.tot,sum].

It is straightforward to extend this algorithm to accommodate non-rush departures and a maximum delay constraint of [Tau]. On the first arrival [a.sub.j] of a new subsequence, the algorithm sets a second alarm at [a.sub.j] + [Tau]. This alarm is not updated at new arrivals within the same subsequence. So an acknowledgment is sent whenever either alarm sounds, while making sure that there is only one departure per subsequence (Observation 5) and that all departures are sent in order of their ready times. Also, the time complexity per arrival needed to update [t.sub.tot,sum] is constant.

Unfortunately, contrary to intuition, [greedy.sub.tot] can perform quite poorly on [f.sub.sum] if [Tau] is ignored (i.e., there is no maximum delay constraint).

LEMMA 7. If we ignore [Tau], then under [f.sub.sum] there exists a sequence of arrivals that forces [greedy.sub.tot]'s competitive ratio to be [Omega]([square root of n]) for L = 0 and L = 1.

PROOF. We first state the proof for L = 0 and then argue why it holds for L = 1. It will be convenient to focus on the latency costs between the arrivals. Let [a.sub.j] and [a.sub.j + 1] be the times of two consecutive arrivals and consider a packet that had arrived by time [a.sub.j] and had not yet been acknowledged just prior to time [a.sub.j + 1]. Let [l.sub.j] denote the latency cost for such a packet for the time period from [a.sub.j] to [a.sub.j + 1]. In other words, [a.sub.j + 1] = [a.sub.j] + [l.sub.j]/(1 - [Eta]).

The arrival sequence for the lower bound is given by [l.sub.j] = [Eta]/j for 1 [is less than or equal to] j [is less than or equal to] n - 1. So [a.sub.j + 1] = [Eta]/(1 - [Eta]) [H.sub.j] where [H.sub.j] is the jth harmonic number (we let [a.sub.1] = 0). Note that [greedy.sub.tot] will keep pushing the alarm out to the point of the next arrival. Thus, it will never be the case that

[l.sub.j]/1 - [Eta] = [a.sub.j + 1] - [a.sub.j] [is greater than] [t.sub.tot,sum] = [Eta]/j(1 - [Eta]) = [l.sub.j]/1 - [Eta],

so the algorithm will not send an acknowledgment until [a.sub.n]. Thus, there will be j arrivals that incur the latency cost of [l.sub.j]. By adding the acknowledgment cost to the latency cost we get that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

An optimal algorithm must do at least as well as the following where n = k(k + 1)/2. Let there be k - 1 partitions of the form [[Sigma].sub.i], where [[Sigma].sub.i] includes [a.sub.i(i + 1)/2] through [a.sub.((i + 1)(i + 2)/2)-1] for 1 [is less than or equal to] i [is less than or equal to] k - 1. Finally, there is a final subsequence with [a.sub.n]. Thus there are k acknowledgments. We now compute the latency cost for [[Sigma].sub.i]. Since there are i + 1 arrivals in [[Sigma].sub.i], there are i latency costs associated with the subsequence. Further, the jth latency cost in the subsequence affects j arrivals since that is the number of arrivals that have not been acknowledged. Thus,

latency cost for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Therefore, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For L = 1, recall that [greedy.sub.tot] acknowledges arrival [a.sub.j] immediately if and only if ([a.sub.j + 1] - [a.sub.j]) [is greater than] [t.sub.tot,sum]. Since the adversary places [a.sub.j + 1] at exactly [a.sub.j] + [t.sub.tot,sum], [greedy.sub.tot] still will not send any acknowledgment until the end of the sequence (arrival [a.sub.n]),(6) so its behavior for L = 1 on this arrival sequence is identical to that for L = 0. []

This bound also holds if not all departures are rush since the adversary need not place any departures at all into the sequence. However, introducing [Tau] into the problem causes difficulties in the above proof since both [greedy.sub.tot] and an optimal solution must send an acknowledgment before [a.sub.j] + [Tau] for all [a.sub.j] [element of] A. But we conjecture that the ratio can still grow with some function of n for sufficiently large [Tau].

Notice that the crux of the proof of Lemma 7 is that [greedy.sub.tot] always sets its alarm at a point where the new extra latency of the current subsequence would be exactly [Eta]/(1 - [Eta]), as indicated in the following observation.

OBSERVATION 8. Under [f.sub.lat], [greedy.sub.tot] always sets its alarm at a point where the new extra latency of the current subsequence would be exactly [Eta]/(1 - [Eta]).

PROOF. Let [lat.sub.ij](t) be the amount of latency accumulated from the start of subsequence [[Sigma].sub.i] to time t when arrival [a.sub.j] is the last arrival to be placed in [[Sigma].sub.i]. So for example, for [f.sub.sum] we use [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then, at arrival [a.sub.j], [greedy.sub.tot] sets its alarm at [a.sub.j] + [t.sub.tot,lat] for [t.sub.tot,lat] such that

(1 - [Eta])([lat.sub.ij]([a.sub.j] + [t.sub.tot,lat]) - [lat.sub.ij]([a.sub.j]) + [lat.sub.ij]([a.sub.j])) = [Eta] + (1 - [Eta])[lat.sub.ij]([a.sub.j]).

Solving for [t.sub.tot,lat] yields

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

So the new alarm is set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In other words, the new alarm is set to where the total latency of the subsequence is the sum of the current accumulated latency and [Eta]/(1 - [Eta]). []

Finally, note that it is unlikely that an application in a real network will exhibit the harmonic behavior of Lemma 7. Thus we do not expect this algorithm to really perform this poorly. The simulation results of Section 6.5 lend more insight into this algorithm's true performance.

6.3. ALGORITHM [greedy.sub.new] UNDER [f.sub.sum]. We now consider when the on-line algorithm waits exactly long enough such that the cost of the latency incurred exactly equals the cost of one acknowledgment, but we do not add the cost of the accumulated latency to the acknowledgment cost. Specifically, on each new arrival [a.sub.j], our algorithm [greedy.sub.new] sets an alarm at [a.sub.j] + [t.sub.new,sum] for [t.sub.new,sum] such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Solving for [t.sub.new,sum] yields

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Thus, the difference between [t.sub.new,sum] and [t.sub.tot,sum] is that in Equation 2 we subtract the average latency of the packets in [Sigma]. We now argue that for a given subsequence, the alarm will always remain fixed or move back in time, but never move forward. (In fact, what we show is that this happens for [greedy.sub.new] for any function [f.sub.lat] as in Definition 1.) This prevents the adversary from blowing up the total latency of a subsequence, which gave us the [Omega]([square root of n]) lower bound of the first algorithm (Lemma 7).

OBSERVATION 9. Under [f.sub.lat], once [greedy.sub.new] sets its alarm for a given subsequence, that alarm will not move into the future for the duration of that subsequence. That is, [greedy.sub.new] does not move its alarm into the future until an acknowledgment is sent. Moreover, the alarm is set such that the final total latency of the subsequence is exactly [Eta]/(1 - [Eta]).

PROOF. As in the proof of Observation 8, let [lat.sub.ij](t) be the amount of latency accumulated from the start of subsequence [[Sigma].sub.i] to time t when arrival [a.sub.j] is the last arrival added to [[Sigma].sub.i]. So for example, for [f.sub.sum] we use [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then, at arrival, [a.sub.j] [greedy.sub.new] sets its alarm at [a.sub.j] + [t.sub.new,lat] for [t.sub.new,lat] such that

(1 - [Eta])([lat.sub.ij]([a.sub.j] + [t.sub.new,lat]) - [lat.sub.ij]([a.sub.j]) + [lat.sub.ij]([a.sub.j])) = [Eta].

Solving for [t.sub.new,lat] yields

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

So the new alarm is at [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In other words, the alarm is set at a point at which the amount of latency accumulated from the first arrival of [[Sigma].sub.i] to the alarm (given the current value of [[Sigma].sub.i] is exactly [Eta]/(1 - [Eta]). Since [lat.sub.ij](t) [is less than or equal to] [lat.sub.il](t) for all l [is greater than or equal to] j and [a.sub.j], [a.sub.l] [element of] [[Sigma].sub.i] (see Section 3), we get [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Therefore, the alarm will not move into the future until an acknowledgment is sent. []

We now prove that [greedy.sub.new] is 2-competitive under [f.sub.lat].

THEOREM 10. Under [f.sub.lat] and with no lookahead, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. Assume [greedy.sub.new] sends k acknowledgments in A, partitioning A into k subsequences, which we will think of as closed intervals.(7) Each interval i starts with the first packet of [[Sigma].sub.i] and ends with [[Sigma].sub.i]'s acknowledgment at time [t.sub.i] (see Figure 2). Note that there could be some time between [t.sub.i] and the first arrival of [[Sigma].sub.i + 1]. We ignore these time periods because they do not add to the cost of [greedy.sub.new]'s solution. Observation 9 implied that the extra latency in each interval is exactly [Eta]/(1 - [Eta]), so the cost due to extra latency for each interval is exactly [Eta]. Since each acknowledgment costs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

[ILLUSTRATION OMITTED]

Let k* be the number of acknowledgments in an optimal partition of A. First we consider the case when k [is less than or equal to] k*. In this case, it immediately follows that [C.sub.opt] [is greater than or equal to] k* [Eta] [is greater than or equal to] k [Eta]. Thus, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

We now consider the case when k [is greater than] k*. Since all partitions of A must send an acknowledgment at the last arrival, at most k* - 1 of the optimal acknowledgments are distributed over the first k - 1 intervals from [greedy.sub.new]'s partition. Thus at least (k - 1) - (k* - 1) = k - k* intervals have no acknowledgments from the optimal solution lying in them. Each of these k - k* intervals contributes to the optimal solution [Eta] cost due to latency since the optimal solution must place an acknowledgment in an interval to not be charged for its latency. So the cost of this optimal solution has a cost of k* [Eta] for its acknowledgments and a cost of at least (k - k*) [Eta] for its latency. So [C.sub.opt] k* [Eta] which is at least half of [greedy.sub.new]'s cost. []

The proof of Theorem 10 assumed L = 0, so clearly if L [is greater or equal to] 1, then the competitive ratio is no worse since [greedy.sub.new] sets its alarms independent of L. Algorithm [greedy.sub.new] handles r and non-rush departures the same way that [greedy.sub.tot] does (Section 6.2) and remains 2-competitive even with L = 0.

COROLLARY 11. Under [f.sub.lat], [greedy.sub.new] with no lookahead remains 2-competitive in the presence of non-rush departures and a maximum delay constraint.

PROOF. Note that each of the first k - 1 intervals created by [greedy.sub.new] was created because (1) cost due to total latency reached [Eta], (2) the maximum delay [Tau] for an individual arrival or departure was reached, or (3) more than one departure appeared in the interval. We will call these intervals [Eta]-intervals, [Tau]-intervals and d-intervals, respectively, appearing [k.sub.[Eta]], [k.sub.[Tau]] and [k.sub.d] times. Again, interval i begins with the first arrival of [[Sigma].sub.i] and ends with an acknowledgment at time [t.sub.i]. Note that each [Tau]-interval must have an optimal acknowledgment lying in it since the optimal algorithm cannot allow any single delay to exceed [Tau]. Also note that based on Observation 5, every d-interval contains an optimal acknowledgment. Finally, the kth interval (which can be of any type) must contain an optimal acknowledgment since all arrivals must be acknowledged. So the number of [Eta]-intervals that contain optimal acknowledgments is at most k* - [k.sub.[Tau]] - [k.sub.d], where again k* is the number of acknowledgments in the optimal solution. Thus, the optimal solution's cost is at least ([k.sub.[Eta]] - k* + [k.sub.[Tau]] + [k.sub.d]) [Eta] + k* [Eta] [is greater or equal to] k [Eta]. But [greedy.sub.new]'s solution is [is less than or equal to] 2k [Eta] since each interval still must have [is less than or equal to] [Eta] cost due to latency. []

Finally, we note that [greedy.sub.new]'s time complexity per arrival is constant under [f.sub.sum]. To see this, recall the definition of [t.sub.new,sum] in Eq. 2:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Maintaining a running sum of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] yields the constant time cost required at each arrival.

6.4. A LOWER BOUND UNDER [f.sub.sum]. We now prove that under [f.sub.sum], any deterministic on-line algorithm A with any constant lookahead L has a competitive ratio [C.sub.A] [is greater or equal to] [2C.sub.opt] - c, where c can be made arbitrarily small with respect to [C.sub.opt]. In other words, [C.sub.A] can be made arbitrarily close to twice that of optimal. Thus not only does [greedy.sub.new] perform as well as possible under [f.sub.sum], but one could not even obtain improvements (in the worst case) by increasing the lookahead by any constant amount.

THEOREM 12. Let A be any deterministic on-line algorithm for the acknowledgment delay problem with constant lookahead L. Then if we ignore the maximum delay constraint, under [f.sub.sum] there exists an arrival sequence such that [C.sub.A] [is greater than or equal to] [2C.sub.opt] - c, cwhere c can be made arbitrarily small with respect to [C.sub.opt].

PROOF. The basic idea is that the adversary would like to have an arrival immediately after each acknowledgment by the on-line algorithm A (which knows the times for the next L arrivals).

We use a burst to denote a set of B arrivals arbitrarily close together where B [much greater than] L. (We can make B arbitrarily larger than L.) Similarly, we use a blip to denote a set of L arrivals that occur arbitrarily close together. For ease of exposition we assume that all the arrivals in a burst or blip arrive at the same time. Let [Epsilon] be a small constant, and let [t.sub.[Epsilon]] = [Epsilon]/(1 - [Eta])B. The adversary begins with a burst at time 0 and a blip at time [t.sub.[Epsilon]]. There will be two possible scenarios at time [t.sub.[Epsilon]]. Either there was really just a blip (i.e., L arrivals) at time [t.sub.[Epsilon]], or there was really a burst (i.e., B arrivals) at time [t.sub.[Epsilon]] and A has just seen the first L arrivals of the burst. If A chooses to send an acknowledgment at time 0, then the adversary places a burst at time [t.sub.[[Epsilon]]. If A does not choose to send an acknowledgment at time 0, then the adversary leaves just a blip at time t, and places a new blip at time [2t.sub.[Epsilon]]. In general, at time t, the on-line algorithm A sees a blip at time t + [t.sub.[Epsilon]]. If A acknowledges at time t, then the adversary places a burst at time t + [t.sub.[Epsilon]]. If A does not acknowledge at time t, then the adversary leaves just a blip at time t + [t.sub.[Epsilon]] and places the next blip at time t + [2t.sub.[Epsilon]] (see Figure 3). This continues until either A sends an acknowledgment or the latency cost for the current subsequence is [Eta]. In the latter case, we can force A to place an acknowledgment (or be worse than 2-competitive) by having the next blip far enough away that the latency cost would be prohibitive. Thus, without loss of generality, we assume that the latency cost of a subsequence is never larger than [Eta]. That is, the time between bursts is at most [Eta]/(B(1 - [Eta])).

[ILLUSTRATION OMITTED]

We now compute a lower bound for the cost of the on-line algorithm's solution. Let [[Sigma].sub.1], ..., [[Sigma].sub.s] be the subsequences defined by A's solution. Let [t.sub.i] be the time from the burst to the acknowledgment in [[Sigma].sub.i]. The cost of A's solution is at least the cost of the acknowledgments and the latency due to the bursts. Thus

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

We now upperbound the cost of the optimal solution by averaging the costs of solution [S.sub.1] that acknowledges after the odd-numbered bursts, and the solution [S.sub.2] that acknowledges after the even-numbered bursts. Both [S.sub.1] and [S.sub.2] must acknowledge after the final arrival. Thus, there are s + 2 acknowledgments among both [S.sub.1] and [S.sub.2]. Also, notice that each odd-numbered burst is immediately acknowledged in [S.sub.1] and each even-numbered burst is immediately acknowledged in [S.sub.2]. If [S.sub.1] (respectively, [S.sub.2]) acknowledges immediately after a burst, then [S.sub.1] (respectively, [S.sub.2]) incurs no latency cost due to that burst. Thus, for each subsequence [[Sigma].sub.i], the burst latency incurred (by exactly the one of [S.sub.1] and [S.sub.2] that does not acknowledge immediately) is [t.sub.i] + [t.sub.[Epsilon]] since there is an extra delay of [t.sub.[Epsilon]] between A's acknowledgment and the acknowledgment of [S.sub.1] (or [S.sub.2]). Letting [l.sub.blips] be the latency costs incurred for all the blips, we get that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

We now argue that c = [Eta]/2 + (s[Epsilon] + [l.sub.blips])/2 can be made arbitrarily small with respect to [C.sub.opt]. First, since the number of acknowledgments in the solution that led to our upperbound for [C.sub.opt] was only one larger than that from [C.sub.A], by increasing s we can make [Eta]/2 arbitrarily small with respect to [C.sub.opt]. Suppose that such an s has been selected and thus its value is now fixed.

We now consider the extra latency cost from c. These come from two sources. There is a cost of [c.sub.1] = s[Epsilon]/2 from the extra latency for the burst at each subsequence, and there is a cost of [c.sub.2] = [l.sub.blips]/2 from the latency costs for the blips. We now argue that both of these quantities are independent of B and thus by making B sufficiently large, we can make [c.sub.1] + [c.sub.2] arbitrarily small with respect to [C.sub.opt]. Clearly [c.sub.1] is independent of B and thus can be made arbitrarily small with respect to [C.sub.opt] by making B large. Since the latency cost for each subsequence is at most .1 and thus the duration is at most [Eta]/(B(1 - [Eta])), the number of blips in [[Sigma].sub.i] is at most ([Eta]/(B(1 - [Eta])))/[t.sub.[Epsilon]] = [Eta]/[Epsilon], which is a constant and thus independent of B. So the stated result follows. []

6.5. SIMULATION RESULTS UNDER [f.sub.sum]. We ran preliminary simulations to empirically evaluate the benefits of lookahead and to contrast our algorithms' performances with those of the heartbeat (HB) and interval (Int) timer algorithms (see Section 2 for a brief description of these algorithms). The data for our simulations came from a trace of two hours of all wide-area TCP traffic between the Lawrence Berkeley Laboratory and the rest of the world [Paxson and Floyd 1995]. This trace (LBL-TCP-3) is available at http://ita.ee.lbl.gov/. While the simulations are based on real data, for our initial simulations we simplified the traces. Specifically, we ignored [Tau] and all departures, so our algorithms only processed one long arrival sequence with no limits on individual acknowledgment delays. We also filtered out connections that did not fit our definition of a "normal" TCP connection, for example, connections that were reset before shutting down normally. Finally, we note that changing the delays of acknowledgments of packets can change the times of subsequent arrivals, so the arrival sequence that each algorithm processed may not be consistent with the sequence it would have seen if it were the algorithm actually used to delay the acknowledgments (see Section 8 for more on this). Our plots are shown in Figures 4 and 5.

[Graphs omitted]

Figure 4 shows each algorithm's average performance on 185 telnet connections from the telnet client to the telnet server (results were similar for the connections from the server to the client). Notice that the timer-based algorithms without lookahead were superior to ours with L = 0 for a particular value of [Eta] (the cost of an ACK), but as we moved away from this value, our algorithms without lookahead were superior to the timer-based algorithms even with lookahead. Our algorithms with lookahead were always the best. We saw similar results for the following other types of connections: NNTP (Usenet), SMTP (e-mail), HTTP (WWW) from the server to the client, gopher from the server to the client, FTP, and FTP-data (connections opened in an FTP session to transfer files). All the connections of these types tended to involve transmission of a large number of packets (typically at least 10-20, frequently [is greater than] 100).

Figure 5 shows each algorithm's average performance on 237 finger connections from the finger client to the finger server (results were similar for the connections from the server to the client). Both [greedy.sub.tot] and [greedy.sub.new] for L = 1 are optimal. Notice that the timer-based algorithms without lookahead were superior to ours for L = 0 for all values of [Eta] [is greater than or equal to] 0.23, and this trend does not seem inclined to change. But our algorithms with lookahead were still always the best. We saw similar results for the following other types of connections: HTTP from the client to the server and gopher from the client to the server. All the connections of these types tended to involve transmission of a small number of packets (typically [approximately equals] 2).

When [Eta] is large and input sequences are long, our algorithms typically work better than the timer-based ones because our algorithms are allowed to set alarms farther into the future while the timer-based algorithms always use constants in setting their alarms, so they cannot take advantage of cheap latency like ours can. Rather, they must send more expensive acknowledgments. This explanation is consistent with the observation that the two curves for an algorithm tend to converge as [Eta] increases, since increasing [Eta] makes latency cheaper, diminishing the gain from looking ahead. When [Eta] is very small, then latency is very expensive and the constants used by the timer-based algorithms cause them to send acknowledgments much too late, even if L = 1. This gives an inherent advantage to our algorithms since the amount they wait is related to the cost of waiting. This phenomenon is independent of the length of the input sequence, as evidenced in Figures 4 and 5.

When sequences consist of only a few arrivals and [Eta] is not extremely small, then very few acknowledgments are required. Thus, all the algorithms will probably send the same number of acknowledgments and the difference in performance between the algorithms will be the cost of the extra latency incurred. As latency becomes cheaper, [greed.sub.tot] and [greedy.sub.new] set their alarms farther into the future while the timer-based algorithms set theirs the same distance into the future regardless of [Eta]. Thus, [greedy.sub.tot] and [greedy.sub.new] set their alarms much farther into the future than the timer-based algorithms do. Since probably the only difference between the algorithms' performances on short sequences is latency cost, it is not surprising that [greedy.sub.tot] and [greedy.sub.new] with L = 0 lose their advantage over the timer-based algorithms on short sequences for large [Eta]. Thus, it seems reasonable that short input sequences favor the timer-based algorithms unless [greedy.sub.tot] and [greedy.sub.new] have lookahead, in which case they are optimal or nearly optimal. Unfortunately, short input sequences do not facilitate inference of upcoming arrival times, so any learning algorithm used to predict arrival times might not do well. A possible exception is a learner that remembers the inter-arrival times for the last connection of the same type, for example, if the current connection is a finger connection, then the learner can look back to the inter-arrival times it learned from the previous finger connection.

Finally, we note two observations concerning our algorithms. First, the cost of our algorithms' solutions were always at most twice optimal (even with L - 0), which is encouraging since we know by Lemma 7 that under [f.sub.sum] there exists an input sequence forcing [greedy.sub.tot] to output a solution that is [Omega]([square root of n]) of optimal. Second, our algorithms with L = 1 were always at least as good (and typically much better than) any other algorithms we tested. This implies that a good learning system to predict packet inter-arrival times would be very useful to any algorithm that delays acknowledgments, including the timer-based algorithms currently in use by many TCP implementations.

7. Objective Function [f.sub.max]

Recall Definition 3 of Section 3, which defined [f.sub.max]:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where A = <[a.sub.1], ..., [a.sub.n]> is the sequence of packet arrivals, k is the number of acknowledgments, the acknowledgment for the packets in [[Sigma].sub.i] comes at [t.sub.i], and [Eta] [element of] [0, 1] weights the importance of minimizing acknowledgments with that of minimizing extra latency. We now present our algorithms for the acknowledgment delay problem under [f.sub.max].

7.1. AN OFF-LINE ALGORITHM UNDER [f.sub.max]. To optimally solve the acknowledgment delay problem under [f.sub.max], we can run [greedy.sub.tot] with lookahead 1, which will yield an optimal solution if there is no maximum delay constraint or if [Tau] [is less than] [Eta]/(1 - [Eta]) (see Section 7.2). This would run in O(n) time and also work in the presence of non-rush departures. But if there exists a maximum delay constraint of [Tau] [is greater than or equal to] [Eta]/(1 - [Eta]), then [greedy.sub.tot] with L = 1 might not be optimal. Thus, we make a simple adaptation to the dynamic programming solution of Section 6.1. Specifically, we change Lines 2 and 8 of Figure 1 to evaluate the function lat that corresponds to [f.sub.max]. The complete algorithm is shown in Figure 6. We can also extend this algorithm to handle non-rush departures and the maximum delay constraint in the same way we extended the algorithm of Section 6.1. So by Corollary 6 we have an O([n.sup.2]) time algorithm for the acknowledgment delay problem under [f.sub.max] in the presence of non-rush departures and a maximum delay constraint.
Figure 6: An optimal off-line algorithm that runs under [f.sub.max].

Off-Line-2

1   Initialize [M.sub.min] [0] [left arrow] 0
2   Initialize M [1,1] [left arrow] 1 [multiplied by] [Eta]
3   Initialize [M.sub.min] [1] [left arrow] M [1,1]
4   Initialize [M.sub.pt] [1] [left arrow] 1
5   For i [element of] [2,n]
6        [M.sub.min] [i] [left arrow] [infinity]
7        For j [element of] [1,i]
8             M [i,j] [left arrow] 1 [multiplied by] [Eta] +
               (1 - [Eta]) [multiplied by] ([a.sub.i] - [a.sub.i-j+1])
               + [M.sub.min] [i - j]
9             If M [i,j] [is less than] [M.sub.min] [i] then
10               [M.sub.min] [i] [left arrow] M [i,j]
11               [M.sub.pt] [i] [left arrow] j


7.2. ALGORITHM [greedy.sub.tot] UNDER [f.sub.max]. As in Section 6.2, first assume that all departures are rush and there is no limit on individual delays. Let [a.sub.j] be the time of the packet that just arrived. Let [Sigma] be the set of unacknowledged arrivals just after time [a.sub.j] (this includes the packet corresponding to [a.sub.j], so |[Sigma]| [is greater than or equal to] 1). If the algorithm waits some time [t.sub.tot,max] (i.e., until time [a.sub.j] + [t.sub.tot,max]) before sending an acknowledgment, then it will incur extra cost due to latency of (1 - [Eta])[t.sub.tot,max]. Also, it has already incurred a cost of (1 - [Eta])([a.sub.j] - [a.sub.first,i]) due to past latency, where [a.sub.first,i] is the first arrival of [[Sigma].sub.i]. Acknowledging immediately incurs a cost of [Eta] for the acknowledgment plus the same cost due to past latency. Thus, on each new arrival [a.sub.j], [greedy.sub.tot] sets an alarm at [a.sub.j] + [t.sub.tot,max] for [t.sub.tot,max] such that the cost for [Sigma] if an acknowledgment is not sent at [a.sub.j] (namely, (1 - [Eta])[t.sub.tot,max] + (1 - [Eta])([a.sub.j] - [a.sub.first,i])) is equal to the cost for [Sigma] if an acknowledgment is sent at [a.sub.j] (namely, [Eta] + (1 - [Eta])([a.sub.j] - [a.sub.first,i])). Solving for [t.sub.tot,max] yields

[t.sub.tot,max] = [Eta]/(1 - [Eta]).

If no new arrival occurs before the alarm sounds, then an acknowledgment is sent when the alarm sounds. Otherwise, the old alarm is deleted and a new one is set at [a.sub.j+1] + [Eta]/(1 - [Eta]). If L = 1, then [greedy.sub.tot] will send an acknowledgment at exactly [a.sub.j] if and only if ([a.sub.j+1] - [a.sub.j]) [is greater than] [t.sub.tot,max].

As with [greedy.sub.tot] under [f.sub.sum], we can extend this algorithm to accommodate nonrush departures and a maximum delay constraint of [Tau]. On the first arrival [a.sub.j] of a new subsequence, the algorithm sets a second alarm at [a.sub.j] + [Tau]. This alarm is not updated at new arrivals within the same subsequence. So an acknowledgment is sent whenever either alarm sounds, again making sure that there is only one departure per subsequence (Observation 5) and that all departures are sent in order of their ready times. In addition to the alarms provoking an acknowledgment transmission, this algorithm also immediately sends a departure if it encounters a second departure within the same subsequence (i.e., the transmission occurs at the second departure's ready time), which is a condition in addition to those for acknowledging under [f.sub.sum]. This is motivated by the following observation.

OBSERVATION 13. In an optimal solution under [f.sub.max], no departure will be scheduled past the next departure's ready time.

PROOF. By contradiction. Assume there exists an optimal solution S in which departure [d.sub.j] is scheduled past [d.sub.j+1]'s ready time. Since S is optimal, then [d.sub.j]'s transmission time must coincide with some arrival ai (otherwise, it would have been sent at its ready time). We can slide [d.sub.j]'s departure time back to the last arrival between the ready times of [d.sub.j] and [d.sub.j+1] (or back to [d.sub.j]'s ready time if no such arrivals exist), and let [d.sub.j+1] acknowledge [a.sub.i] in addition to the other arrivals it acknowledges. This decreases total latency under [f.sub.max] since latency cost increases linearly with time independent of the number of unacknowledged arrivals and unsent departures. Also, since both [d.sub.j] and [d.sub.j+1] must be transmitted anyway, the number of transmissions does not increase. Thus, the total cost decreases, contradicting the optimality of S. []

Finally, we note that the time complexity per arrival needed to update [t.sub.tot,sum] is constant.

Despite [greedy.sub.tot]'s poor performance under [f.sub.sum], under [f.sub.max] it does quite well. Before proving upper bounds on [greedy.sub.tot]'s competitive ratio, we start with a lemma about the structure of an optimal solution.

LEMMA 14. Under [f.sub.max, there exists an optimal solution S that places an acknowledgment at [a.sub.j] if and only if ([a.sub.j+1] - [a.sub.j])(1 - [Eta]) [is greater than] [Eta].

PROOF. Consider a subsequence [[Sigma].sub.l] = <[a.sub.i], [a.sub.i+1], ..., [a.sub.m]> that is a subsequence in S. The total cost for [[Sigma].sub.l] is exactly ([a.sub.m] - [a.sub.i])(1 - [Eta]) + [Eta] (the first term is for the latency, the second for the acknowledgment). We can divide this cost among the arrivals in [[Sigma].sub.l] by associating with [a.sub.j] [element of] {[a.sub.i], ..., [a.sub.m-1]} a latency cost of ([a.sub.j+1] - [a.sub.j])(1 - [Eta]) and charge [a.sub.m] the cost [Eta] for the acknowledgment. This accounts for all of the cost for [[Sigma].sub.l], so no additional cost is associated with any arrivals in [[Sigma].sub.l].

Now consider the case when ([a.sub.j+1] - [a.sub.j])(1 - [Eta]) [is greater than] [Eta]. If S did not send an acknowledgment at [a.sub.j], then [a.sub.j] was charged [is greater than] [Eta] for the latency between [a.sub.j] and [a.sub.j+1]. But if S did send an acknowledgment at [a.sub.j], then [a.sub.j] was charged exactly [Eta]. Thus, it is better to send an acknowledgment at [a.sub.j] than it is to wait. For the case when ([a.sub.j+1] - [a.sub.j])(1 - [Eta]) [is less than or equal to] [Eta], the cost of sending an acknowledgment at [a.sub.j] is at least as much as the cost of waiting. []

Now we prove that [greedy.sub.tot] is 2-competitive under [f.sub.max]. Combined with Theorem 23 (Section 7.4), this means that [greedy.sub.tot] is an optimal on-line algorithm under [f.sub.max] when L = 0.

THEOREM 15. Under [f.sub.max], [greedy.sub.tot] is 2-competitive with lookahead L = 0.

PROOF. In this proof we use S and the same cost accounting strategy used in the proof of Lemma 14. First note that from Observation 8, [greed.sub.tot] avoids incurring more than [Eta]/(1 - [Eta]) units of extra latency (for a latency cost of [Eta]) between any two consecutive arrivals [a.sub.j] and [a.sub.j+1]. We now consider the two possible cases for this pair of arrivals.

If [a.sub.j+1] - [a.sub.j] [is less than or equal to] [Eta]/(1 - [Eta]), then by Observation 8 and Lemma 14, [greedy.sub.tot] does exactly the same thing that S does, namely not send an acknowledgment at [a.sub.j]. If instead [a.sub.j+1] - [a.sub.j] [is greater than] [Eta]/(1 - [Eta]), then S sends an acknowledgment at [a.sub.j] (by Lemma 14), charging [Eta] to [a.sub.j]. But [greedy.sub.tot] waits until [a.sub.j] + [Eta]/(1 - [Eta]) and then sends an acknowledgment, charging 2[Eta] to [a.sub.j] ([Eta] for the latency and [Eta] for the acknowledgment). Note that no matter which case applies, [greedy.sub.tot] is synchronized with S, that is, at any arrival [a.sub.j], [greedy.sub.tot]'s solution always has the same set of unacknowledged arrivals as S. Thus the cost attributed to any arrival in [greedy.sub.tot]'s solution is at most twice that of the same arrival in S. Therefore, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

Obviously [greedy.sub.tot] is better suited for [f.sub.max] than [f.sub.sum]. But in fact, Theorem 15 is only part of the story. In the proof of Theorem 15, the only difference between the costs incurred by [greedy.sub.tot] and S came from their behaviors when [a.sub.j+1] - [a.sub.j] [is greater than] [Eta]/(1 - [Eta]). Namely, [greedy.sub.tot] waited until [a.sub.j] + [Eta]/(1 - [Eta]) before sending an acknowledgment while S sent its acknowledgment at [a.sub.j]. But if [greedy.sub.tot] knew at time [a.sub.j] that the next arrival was too far into the future, it would have sent an acknowledgment at [a.sub.j] just as S did. Thus, we get the following corollary.

COROLLARY 16. Under [f.sub.max], [greedy.sub.tot] is 1-competitive with lookahead L = 1.

Note that in Theorem 15 and Corollary 16, we ignore [Tau] and assume that all departures are rush. By Observations 13 and 5, nonrush departures do not alter [greedy.sub.tot]'s competitive ratios for L = 0 and L = 1 since [greedy.sub.tot] is still synchronized with S. However, adding [Tau] can change the situation if it is sufficiently large. We start by exploring when adding [Tau] does not change the bounds on the competitive ratios. If [Tau] [is less than] [Eta]/(1 - [Eta]), then (since [greedy.sub.tot] and optimal solution S are synchronized without [Tau]) if [greedy.sub.tot] must send an acknowledgment due to [Tau], then so must S. So the algorithms remain synchronized. Thus, [greedy.sub.tot] with L = 1 remains optimal and [greedy.sub.tot] with L = 0 still only pays for extra latency, but this latency still does not exceed [Eta](1 - [Eta]), just as in the proof of Theorem 15. This yields the following corollary.

COROLLARY 17. Under [f.sub.max], when including [Tau] [is less than] [Eta](1 - [Eta]) and non-rush departures, [greedy.sub.tot] is 2-competitive for L = 0 and 1-competitive for L = 1.

When [Tau] [is greater than or equal to] [Eta]/(1 - [Eta]), the proof of Theorem 15 does not apply. But we can still show a bound on the competitive ratio for [greedy.sub.tot].

THEOREM 18. Under [f.sub.max], with lookahead L = 1 and for any [Tau] [is greater than] [Eta](1 - [Eta]), [greedy.sub.tot](8) has a competitive ratio of [is less than or equal to] 1 + [Eta]([Tau](1 - [Eta])) [is less than or equal to] 2. For L = 0, the competitive ratio is [is less than or equal to] 1 + 2[Eta]/([Tau](1 - [Eta])) [is less than or equal to] 3.

PROOF. We prove the theorem for L = 1 and then state how to adapt it for L = 0.

We divide the acknowledgments into those sent because of the [Tau] requirement ([Tau]-acks) and the rest (regular acks).(9) We partition the solution created by [greedy.sub.tot] into groups where each group consists of any number of subsequences ended by [Tau]-acks followed by a subsequence ended by a regular ack. (So each regular ack subsequence is the last subsequence in the group.) Let l be the latency cost (for L = 1) for the subsequence that ends with the regular ack.

We show in Lemma 19 below that there exists an optimal solution S that places an acknowledgment corresponding to each regular ack of [greedy.sub.tot]. Thus, we can treat each group as the first one. Let a [Tau]-interval be any subsequence that ends with a [Tau]-ack. We show in Lemma 20 that the latency cost for each r-interval in S is at least ([Tau] - [Eta](1 - [Eta]))(1 - [Eta]). Since [Tau] [is greater than or equal to] [Eta]/(1 - [Eta]), this quantity is nonnegative. Also, we note that S must place an acknowledgment in each [Tau]-interval (otherwise it would violate the maximum delay constraint). Thus, if there are q [Tau]-acks,(10) then S pays q[Eta] for the [Tau]-acks, [Eta] for the regular ack, at least q([Tau] - [Eta]/(1 - [Eta]))(1 - [Eta]) for the [Tau]-interval latency, and l for the regular ack latency (see Lemma 21). So the cost for S is at least

(q + 1)[Eta] + q(1 - [Eta]) ([Tau] - [Eta]/1 - [Eta]) + l = l + [Eta] + q[Tau](1 - [Eta]).

Also, [greedy.sub.tot]'s cost (for L = 1) is at most

(3) (q + 1)[Eta] + q[Tau](1 - [Eta]) + l = q[Eta] + l + [Eta] + q[Tau](1 - [Eta]).

Hence, the competitive ratio of [greedy.sub.tot] is at most

1 + q[Eta]/l + [Eta] + q[Tau](1 - [Eta]) [is less than or equal to] 1 + [Eta]/[Tau](1 - [Eta])

since l and [Eta] are nonnegative.

The only difference between L = 1 and L = 0 is that for L = 0, [greedy.sub.tot] incurs more latency. Specifically, it pays [Tau](1 - [Eta]) for each [Tau] subsequence and at most l + (1 - [Eta])[Eta]/(1 - [Eta]) = l + [Eta] for the subsequence ending with a regular ack. Equation 3 has already accounted for the latency cost for the [Tau] subsequences, so we merely add [Eta] to the latency cost for L = 1 to correct for L = 0. So the competitive ratio for L = 0 is at most

1 + (q + 1)[Eta]/l + [Eta] + q[Tau](1 - [Eta] [is less than or equal to] 1 + q[Eta] + [Eta]/q[Tau](1 - [Eta]) [is less than or equal to] 1 + 2[Eta]/[Tau](1 - [Eta]. []

We now prove the technical lemmas required to complete the proof of Theorem 18.

LEMMA 19. Let everything be as in the proof of Theorem 18. Then there exists an optimal solution S that places an acknowledgment corresponding to each regular ack of [greedy.sub.tot].

PROOF. Let [a.sub.r] be the arrival that [greedy.sub.tot] sent its regular ack at, so [a.sub.r] is at the end of the regular ack subsequence at the end of a group. Since [greedy.sub.tot] sent a regular ack at [a.sub.r], it must be the case that [a.sub.r+1] - [a.sub.r] [is greater than] [Eta]/(1 - [Eta]). So by Lemma 14, S must also place an acknowledgment at [a.sub.r]. []

LEMMA 20. Let everything be as in the proof of Theorem 18. Then the latency cost for each [Tau]-interval in S is at least ([Tau] - [Eta]/(1 - [Eta]))(1 - [Eta]).

PROOF. By the definition of [greedy.sub.tot], the time between two consecutive arrivals in a [Tau]-interval is [is less than] [Eta]/(1 - [Eta]), and the time between the last arrival of the [Tau]-interval and the next arrival is [is less than] [Eta]/(1 - [Eta]) (otherwise, a regular ack would have been sent). See Figure 7 for an example. Thus, the optimal solution cannot benefit by having more than one acknowledgment in each [Tau]-interval (by Lemma 14), but it must place at least one acknowledgment in the [Tau]-interval so it does not violate the maximum delay constraint. Since S places exactly one acknowledgment in the [Tau]-interval, it only saves [is less than or equal to] [Eta]/(1 - [Eta]) in latency for the entire interval and must pay for the remaining latency. The remaining latency is at least [Tau] -- [Eta]/(1 - [Eta]), so the latency cost for each [Tau]-interval in S is at least ([Tau] - [Eta]/(1 - [Eta])) (1 - [Eta]). []

[ILLUSTRATION OMITTED]

LEMMA 21. Let everything be as in the proof of Theorem 18. Then the latency cost for S for the regular ack subsequence is at least l.

PROOF. By Lemma 19, S places an acknowledgment at the last arrival of the regular ack subsequence. Since each two consecutive arrivals in the regular ack subsequence are separated by [is less than] [Eta]/(1 - [Eta]), by Lemma 14 S will not place an acknowledgment elsewhere in that subsequence. Thus, S incurs as much latency as [greedy.sub.tot] does for L = 1, so S's latency cost for the regular ack subsequence is l. []

7.3. ALGORITHM [greedy.sub.new] UNDER [f.sub.max]. We now describe how [greedy.sub.new] behaves under [f.sub.max]. Specifically, on each new arrival [a.sub.j] [element of] [[Sigma].sub.i], [greedy.sub.new] sets an alarm at [a.sub.j] + [t.sub.new,max] for [t.sub.new,max] such that

(1 - [Eta])[t.sub.new,max] + (1 - [Eta])([a.sub.j] - [a.sub.first,i]) = [Eta].

Solving for [t.sub.new,max] yields

(4) [t.sub.new,max] = [Eta]/(1 - [Eta]) - ([a.sub.j] - [a.sub.first,i])

which can be computed in constant time per arrival.

Since [f.sub.max] is a special case of [f.sub.lat], Observation 9 holds for it as well. We also get from Corollary 11 that [greedy.sub.new] with L = 0 is 2-competitive under [f.sub.max] in the presence of nonrush departures and a maximum delay constraint. According to Theorem 23, for L = 0 this is the best any on-line algorithm can do in the worst case.

While [greedy.sub.new] is optimal under the general-purpose objective function [f.sub.lat] for no lookahead, [greedy.sub.tot] has an advantage over it under [f.sub.max] for L = 1. Corollary 16 and Theorem 18 state that [greedy.sub.tot] is better than 2-competitive with L = 1, but as we show below, [greedy.sub.new] with L = 1 still cannot beat a competitive ratio of 2.

OBSERVATION 22. Under [f.sub.max], even with L = 1, there exists an arrival sequence such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -- c where c can be made arbitrarily small with respect to [C.sub.opt].

PROOF. First, recall Observation 9 that says under [f.sub.lat], [greedy.sub.new] acknowledges if and only if the total latency of a subsequence reaches [Eta]/(1 - [Eta]). The adversary exploits this property by defining an input arrival sequence consisting of m groups, where each group contains x + 1 arrivals, each separated by time [Eta]/(x(1 - [Eta]). (See Figure 8 for an example.) Since [greedy.sub.new] sends an acknowledgment at the end of each group, its total cost is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[ILLUSTRATION OMITTED]

An optimal solution is at least as good as the strategy to send just one acknowledgment at an for n = m(x + 1). This strategy pays [Eta] for the single acknowledgment plus the latency costs within the groups plus the latency cost between the groups. This yields a cost of

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

By making m arbitrarily large, the adversary can make the second term arbitrarily small with respect to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. After fixing m, the adversary can make the last term arbitrarily small with respect to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] by making x arbitrarily large.

7.4. A LOWER BOUND UNDER [f.sub.max]. We now give some lower bounds for the acknowledgment delay problem under [f.sub.max]. First recall that for L = 1, Corollary 16 says that [greedy.sub.tot] is 1-competitive if we ignore [Tau] or if [Tau] [is less than] [Eta]/(1 - [Eta]), and Theorem 18 says that [greedy.sub.tot] is (1 + [Eta]/[Tau](1 - [Eta])))-competitive for finite [Tau] [is greater than or equal to] [Eta]/(1 - [Eta]). We do not know if the latter bound is tight (i.e., can we be 1-competitive in the presence of finite [Tau] [is greater than or equal to] [Eta]/(1 - [Eta])?), but obviously no on-line algorithm can be better than 1-competitive. So here we focus on the more interesting problem of lowerbounding the case for L = 0 and study the general objective function [f.sub.lat].

First, recall Definition 1 that defines [f.sub.lat] as any objective function for which the latency cost increases with the amount of time that an acknowledgment is delayed. By using the standard adversary strategy that places the next arrival in the sequence immediately after the on-line algorithm acknowledges, we can prove that any deterministic on-line algorithm without lookahead is no better than 2-competitive under flat. This means that for L = 0 both [greedy.sub.tot] and [greedy.sub.new] are optimal under [f.sub.max], but it also means that [greedy.sub.new] is optimal for L = 0 under [f.sub.lat]. Of course, [greedy.sub.tot] is not optimal under [f.sub.lat] because Lemma 7 tells us that [greedy.sub.tot] can be as bad as [Omega]([square root of n])-competitive under [f.sub.sum], which is a special case of [f.sub.lat].

THEOREM 23. Let A be any deterministic on-line algorithm for the acknowledgment delay problem with L = 0. Then under flat there exists an arrival sequence such that [C.sub.A] [is greater than or equal to] 2[C.sub.opt] -- c where c can be made arbitrarily small with respect to [C.sub.opt].

PROOF. The proof is a simplified version of the proof of Theorem 12. In this case, the adversary places arrival [a.sub.j+1] immediately(11) after A acknowledges [a.sub.j]. Thus A sends an acknowledgment for every packet. Define [l.sub.j] to be the latency accumulated between arrival [a.sub.j] and [a.sub.j]'s acknowledgment. Then for a sequence of n arrivals, the cost of A's solution is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

We now upperbound the cost of the optimal solution by averaging the costs of solution [S.sub.1] that acknowledges immediately after the odd-numbered arrivals and solution [S.sub.2] that acknowledges immediately after the even-numbered arrivals. Since both solutions must acknowledge all packets, there are a total of n + 1 acknowledgments between the two solutions. Also, between the two solutions each interval of latency [l.sub.j] is incurred exactly once, except [l.sub.n], which neither of them incurs. Thus, the average of [S.sub.1]'s cost and [S.sub.2]'s cost is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

By making n arbitrarily large, the adversary can make [Eta]/2 arbitrary small with respect to [C.sub.A]. Since the cost of [S.sub.1] or [S.sub.2] is at least as good as their average, [C.sub.A] can be made arbitrarily close to 2[C.sub.opt]. []

7.5. SIMULATION RESULTS UNDER [f.sub.max]. We ran simulation results under [f.sub.max] similar to those we ran under [f.sub.sum] (Section 6.5), using the same data and applying the same methodology. Our results are given in Figures 9 and 10. Overall, we found the results under [f.sub.max] similar to those under [f.sub.sum] in general form, except that under [f.sub.max], the fixed timer-based algorithms fared much worse for long connections. Also, empirically we saw that [greedy.sub.tot] performed at least as well as (and often better than) [greedy.sub.new], where under [f.sub.sum] the opposite was true.

[Graphs omitted]

This is not surprising since in terms of loss bounds, greedy.sub.tot] is at least as good as [greedy.sub.new], and in particular, [greedy.sub.tot] is 1-competitive for L = 1.

Figure 9 shows each algorithm's average performance on 185 telnet connections from the telnet client to the telnet server (results were similar for the connections from the server to the client). Notice that the timer-based algorithms without lookahead were superior to our algorithms with L = 0 for a particular value of [Eta], but as we moved away from this value, our algorithms without lookahead were superior to the timer-based algorithms even with lookahead (note that under [f.sub.max], this advantage is more pronounced than under [f.sub.sum]). Our algorithms with lookahead were always the best. We saw similar results for the following other types of connections: NNTP (Usenet), SMTP (e-mail), HTTP (WWW) from the server to the client, gopher from the server to the client, FTP, and FTP-data. All the connections of these types tended to involve transmission of a large number of packets (typically at least 10-20, frequently [is greater than] 100).

Figure 10 shows each algorithm's average performance on 237 finger connections from the finger client to the finger server (results were similar for the connections from the server to the client). Both [greedy.sub.tot] and [greedy.sub.new] for L = 1 are optimal. Notice that the timer-based algorithms without lookahead were superior to ours for L = 0 for sufficiently large [Eta], and this trend does not seem inclined to change. But our algorithms with lookahead were still always the best. We saw similar results for the following other types of connections: HTTP from the client to the server and gopher from the client to the server. All the connections of these types tended to involve transmission of a small number of packets (typically [approximately equals] 2).

In summary, our general observations concerning how [greedy.sub.tot] and [greedy.sub.new] compare to the timer-based algorithms are the same as for [f.sub.sum] in Section 6.5. Our explanations for these observations are also the same. However, how [greedy.sub.tot] performs relative to [greedy.sub.new under [f.sub.max] is a different story. For L = 0, [greedy.sub.tot] was noticeably better than [greedy.sub.new] for telnet, FTP-data, and NNTP. [greedy.sub.new] outperformed [greedy.sub.tot] for gopher from the server to the client, on SMTP for [Eta] [is greater than or equal to] 0.5, and for all the short connection types. For all other connections, [greedy.sub.tot] and [greedy.sub.new] performed comparably. For L = 1, naturally [greedy.sub.tot] produces an optimal solution, so the only question is how well does [greedy.sub.new] fare with L = 1. For all short connections, [greedy.sub.new]'s performance was similar to that under [f.sub.sum]: it did so well that its curve is very near 1. For long connections, its performance was mixed. Its performance for telnet, NNTP, FTP-data from client to server, and FTP from client to server was similar to that in Figure 9. For all other long connections it fared much better: its performance curve was very near 1. Again, our algorithms' outstanding performance for L = 1 makes a good learning system desirable.

8. Concluding Remarks

In this paper, we presented two on-line algorithms for the acknowledgment delay problem, and proved several bounds on their performance when measured by two objective functions that we defined. We then gave initial simulation results that indicate that our algorithms perform well empirically under our objective functions for sufficiently long TCP connections. They also show that the ability to predict the future could be very valuable in enhancing the performance of our algorithms.

Avenues of future work include exploring randomized algorithms for this problem and developing a model that takes packet sizes into account. Also, it would be nice to extend our work under a model in which the lookahead oracle did not provide the exact times for the next L arrivals but rather gave good estimates for these arrival times. Another research direction is to examine other objective functions such as the average latency per packet per subsequence (which is a special case of [f.sub.lat]) or some completely different way of measuring network performance such as throughput or how our algorithms affect burstiness of transmissions. (We chose our objective functions in part due to the fact that our on-line algorithms can directly control how their actions influence the functions' values, unlike, e.g., throughput.) We also plan on performing more extensive simulations, running our algorithms on more traces and including departures and the [Tau] constraint. After developing good learners to act as lookahead oracles, we plan to simulate our algorithms in a network simulator such as ns [UCB/LBNL 2000]. This is necessary to assess how our algorithms affect other aspects of TCP, including round-trip time estimates and packet retransmissions [Stevens 1994].

Another interesting problem to study is to empirically and theoretically analyze our algorithms accounting for their decisions' influence on the system (recall that in our simulations, we ignored these effects). A theoretical analysis might leverage off of fully- or partially-observable Markov decision processes (MDPs). While working with MDP models, a natural next step would be to combine the control algorithm with prediction, that is, use reinforcement learning (e.g., Sutton and Barto [1998]) to learn a control policy that decides when to send acknowledgments based on the current state of the world.

Finally, we can explore the applicability of our model and algorithms to delaying acknowledgments in other contexts, such as in the data link layer over noisy channels [Tanenbaum 1996].

ACKNOWLEDGMENTS. The authors thank George Varghese, Girish P. Chandranmenon, Sally Floyd, Jonathan Turner, and Ellen Zegura for their helpful discussions. We also thank Dan Blandford for his careful reading of a draft of this paper, and the reviewers for STOC and JACM and Mike Kearns for their helpful comments. We also appreciate Thomas Erlebach for identifying a bug in an earlier draft of this paper. Stephen Scott performed this work at Washington University.

(1) In this paper, we use latency to refer to the delay introduced by postponing the transmission of the acknowledgment message. There is additional latency introduced by the transmission of messages, but we do not affect this by delaying acknowledgments.

(2) The time units used for measuring latency would nominally be the same as those used in the system under consideration (for time stamping, etc.).

(3) It seems unlikely that any reasonable objective function would remain constant or decrease when an acknowledgment is further delayed. This would be tantamount to rewarding an algorithm for indefinitely delaying an acknowledgment.

(4) Frequently an algorithm A is said to be [Alpha]-competitive if [C.sub.A] [is less than or equal to] [Alpha][C.sub.opt] + c where c is a constant that is independent of [C.sub.opt].

(5) Recall that the oracle returns [a.sub.j] = [infinity] for all j [is greater than] n.

(6) If instead [greedy.sub.tot] chooses to acknowledge [a.sub.j] immediately if [a.sub.j + 1] - [a.sub.j] = [t.sub.tot,sum], then the adversary can place arrival [a.sub.j + 1] at [a.sub.j] + [t.sub.tot,sum] - O(1/n). This will reduce [greedy.sub.tot]'s competitive ratio by a constant amount, but not affect its asymptotic growth.

(7) An acknowledgment sent at time [t.sub.i] acknowledges all arrivals that come at that time. So the start of interval i + 1 is at some time [is greater than] [t.sub.i]. Thus, the intervals are disjoint.

(8) For this result we require that [greedy.sub.tot] is modified so that it sends an acknowledgment if the next arrival is at the alarm time.

(9) If the acknowledgment qualifies as both a [Tau]- and a regular ack, we call it a regular ack.

(10) For this proof we assume that q [is greater than] 0 for some group. Otherwise, [Tau] is irrelevant and [greedy.sub.tot] remains synchronized with S, implying that the bounds of Corollary 17 hold.

(11) While [a.sub.j+1] will not coincide with [a.sub.j]'s acknowledgment, they can be arbitrarily close. So for ease of exposition we will assume that they are coincident.

REFERENCES

BELLMAN, R. 1957. Dynamic Programming. Princeton University Press, Princeton, N.J.

BEN-DAVID, S., AND BORODIN, A. 1994. A new measure for the study of on-line algorithms. Algorithmica 11, 73-91.

BORODIN, A., LINIAL, N., AND SAKS, M. 1992. An optimal online algorithm for metrical task systems. Commun. ACM 39, 4 (Apr.) 745-763.

BRADEN, R. 1989. Requirements for Internet hosts--communication layers. Internet Request for Comments 1122. http://www.cis.ohio-state.edu/htbin/rfc/rfc1122.html.

CLARK, D. 1982. Window and acknowledgment strategy in TCP. Internet Request for Comments 813. http://www.cis.ohio-state.edu/htbin/rfc/rfc813.html.

CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1990. Introduction to Algorithms. MIT Press, Cambridge, Mass.

COSTELLO, A., AND VARGHESE, G. 1998. Redesigning the BSD callout and timer facilities. Softw. Prac. Exp. 28, 8, 883-896.

GROVE, E. F. 1995. Online bin packing with lookahead. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, Calif., Jan. 22-24). ACM, New York, pp. 430-436.

KARLIN, A. R., MANASSE, M. S., MCGEOCH, L. A., AND OWICKI, S. 1994. Competitive randomized algorithms for non-uniform problems. Algorithmica 11, 542-571.

KARLIN, A. R., MANASSE, M. S., RUDOLPH, L., AND SLEATOR, D. D. 1988. Competitive snoopy caching. Algorithmica 3, 1, 79-119.

KRISHNAN, P., LONG, P. M., AND VITTER, J. S. 1999. Adaptive disk spindown via optimal rent-to-buy in probabilistic environments. Algorithmica 23, 1, 31-56.

PAXSON, V. 1997. Measurements and analysis of end-to-end internet dynamics. Ph.D. dissertation. University of California, Berkeley, Calif.

PAXSON, V., AND FLOYD, S. 1995. Wide-area traffic: The failure of Poisson modeling. IEEE/ACM Trans. Netw. 3, 3 (June), 226-244.

SLEATOR, D., AND TARJAN, R. 1985. Amortized efficiency of list update and paging rules. Commun. ACM 28, 2 (Feb.), 202-208.

STEVENS, W. R. 1994. TCP/IP Illustrated, Vol. 1: The Protocols. Addison-Wesley, Reading, Mass.

SUTTON, R. S., AND BARTO, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, Mass.

TANENBAUM, A. S. 1996. Computer Networks (Third ed.). Prentice-Hall, Englewood Cliffs, N.J. UCB/LBNL. 2000. Network simulator-ns, http://www-mash.cs.berkeley.edu/ns/.

WRIGHT, G. R., AND STEVENS, W. R. 1995. TCP/IP Illustrated, Vol. 2: The Implementation. Addison-Wesley, Reading, Mass.

YEH, T.-H., KUO, C.-M., LEI, C.-L., AND YEN, H.-C. 1998. Competitive analysis of on-line disk scheduling. Theory Comput. Syst. 31, 5, 491-506.

RECEIVED DECEMBER 1998; REVISED AUGUST 2000; ACCEPTED AUGUST 2000

An earlier version of this paper appeared in Proceedings of the 30th Annual ACM Symposium on Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, 1998, pp. 389-398.

This work was supported in part by National Science Foundation (NSF) grant CCR 97-34940 and NSF NYI grant CCR 93-57707 with matching funds provided by Xerox PARC and WUTA.

Authors' addresses: D. R. Dooly and S. A. Goldman, Department of Computer Science, Washington University in St. Louis, Campus Box 1045, One Brookings Drive, St. Louis, MO 63130-4899, e-m[a.sub.i]l: {drd1,sg}@cs.wustl.edu; S. D. Scott, Department of Computer Science & Engineering, University of Nebraska, Ferguson 115, Lincoln, NE 68588-0115, e-mail: sscott@cse.unl.edu.

[C] 2001 ACM 0004-5411/01/0300-0243 $05.00
COPYRIGHT 2001 Association for Computing Machinery, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:DOOLY, DANIEL R.; GOLDMAN, SALLY A.; SCOTT, STEPHEN D.
Publication:Journal of the Association for Computing Machinery
Geographic Code:1USA
Date:Mar 1, 2001
Words:16864
Previous Article:Convex Quadratic and Semidefinite Programming Relaxations in Scheduling.
Next Article:Approximation Algorithms for Metric Facility Location and k-Median Problems Using the Primal-Dual Schema and Lagrangian Relaxation.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters