A genetic based fuzzy Q-learning flow controller for high-speed networks.1. Introduction The growing interest on congestion The condition of a network when there is not enough bandwidth to support the current traffic load. congestion - When the offered load of a data communication path exceeds the capacity. problems in highspeed networks arise from the control of sending rates of traffic sources. Congestion problems result from a mismatch mismatch 1. in blood transfusions and transplantation immunology, an incompatibility between potential donor and recipient. 2. one or more nucleotides in one of the double strands in a nucleic acid molecule without complementary nucleotides in the same position on the other of offered load and available link bandwidth between network nodes. Such problems can cause high packet loss ratio (PLR PLR pupillary light reflex. ) and long delays, and can even break down the entire network system because of the congestion collapse. Therefore, high-speed networks must have an applicable flow control scheme not only to guarantee the quality of service (QoS) for the existing links but also to achieve high system utilization. The flow control of high-speed networks is difficult owing to owing to prep. Because of; on account of: I couldn't attend, owing to illness. owing to prep → debido a, por causa de the uncertainties and highly time-varying of different traffic patterns. The flow control mainly checks the availability of bandwidth and buffer space necessary to guarantee the requested QoS. A major problem here is the lack of information related to the characteristics of source flow. Devising a mathematical model
In this case, the reinforcement learning For reinforcement learning in psychology, see . Derived from the psychological theory of the same name, in computer science, reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environment (RL) shows its particular superiority, which just needs very simple information such as estimable es·ti·ma·ble adj. 1. Possible to estimate: estimable assets; an estimable distance. 2. Deserving of esteem; admirable: an estimable young professor. and critical information, "right" or "wrong" [3]. RL is independent of mathematic model and priori-knowledge of system. It obtains the knowledge through trial-and-error and interaction with environment to improve its behavior policy. So it has the ability of self-learning. Because of the advantages above, RL has been played a very important role in the flow control in high-speed networks [4-7]. The Q-learning algorithm of RL is easy for application and has a firm foundation in the theory. In [8], a Metropolis criterion based Q-learning controller is proposed to solve the problem of flow control in high-speed networks. In Q-learning based control, the learning agent should visit each state in a reasonable time. But in high-speed networks, the state space is large, so the usual approach of storing the Q-values in a look-up table look-up table n (COMPUT) → tabla de consulta look-up table n (Comput) → table f à consulter look-up table n ( is impractical. In [8], a state space partitioning In mathematics, space partitioning is the process of dividing a space (usually a Euclidean space) into two or more disjoint subsets (see also partition of a set). In other words, space partitioning divides a space into non-overlapping regions. method is introduced to reduce the number of state variables, but it can not solve this problem ultimately. In this paper, we adopt fuzzy Q-learning (FQL (language) FQL - A functional database language. ["An Implementation Technique for Database Query Languages", O.P. Buneman et al, ACM Trans Database Sys 7(2):164-186 (June 1982)]. ), which is an adaptation of Q-learning for fuzzy inference system (FIS FIS n abbr (BRIT) (= Family Income Supplement) → ayuda estatal familiar ), to facilitate generalization the state space. In FQL, both the actions and Q-values are inferred from fuzzy rules, and it can map a state-action pair to a Q-value in a continuous state space. Furthermore, we employ the changes of g values as the fitness values, and use the genetic operators to obtain the consequent parts of fuzzy rules. In this paper, a genetic based fuzzy Q-learning flow controller (GFQC) for high-speed networks is proposed. The proposed controller can behave optimally without the explicit knowledge Explicit knowledge is knowledge that has been or can be articulated, codified, and stored in certain media. It can be readily transmitted to others. The most common forms of explicit knowledge are manuals, documents and procedures. Knowledge also can be audio-visual. of the network environment, only relying on the interaction with the unknown environment and provide the best action for a given state. By means of learning process, the proposed controller adjusts the source sending rate to the optimal value to reduce the average length of queue in the buffer. Simulation results show that the proposed controller can avoid the occurrence of congestion effectively with the features of high throughput, low PLR, low end-to-end delay End-to-end delay refers to the time taken for a packet to be transmitted across a network from source to destination. It is commonly referred in RTSP. , and high utilization. 2. Theoretical Framework 2.1. Architecture of the Proposed Flow Controller The architecture of the proposed GFQC is shown in Figure 1. In high-speed networks, GFQC in bottleneck node acts as a flow control agent with flow control ability. The inputs of GFQC are state variables S in high-speed networks composed of the current queue length [q.sub.L] , the current change rate of queue length [q.sub.L], and the current change rate of source sending rate u. The output of GFQC is the feedback signal a to the traffic sources, which is the ratio of the sending rate. It determines the sending rate u of traffic sources. The learning agent and the network environment interact continually in the learning process. At the beginning of each time step of learning, the controller senses the states for the network and gets the reward signal. Then it selects an action to make decision on which ratio the sources should use to determine the source sending rate. The determined sending rate can reduce the PLR and increase the link utilization. After the sources take the determined rate to send the traffic, the network changes its state and gives a new reward to the controller. Then the next step of learning begins. 2.2. Fuzzy Q-Learning Flow Controller Q-learning learns utility values (Q-values) of state and action pairs. During the learning process, learning agent uses its experience to improve its estimate by blending new information into its prior experience. [FIGURE 1 OMITTED] In general form, Q-learning algorithm is defined by a tuple (1) In a relational database, a tuple is one record (one row). See record and relational database. (2) A set of values passed from one programming language to another application program or to a system program such as the operating system. <S, A, r, p> , where S is the set of discrete state space of high-speed networks; A is the discrete action space, which is the feedback signal to traffic sources; r:SxA [right arrow] R is the reward of the agent; p : S x A [right arrow] [DELTA](s) is the transition probability map, where [DELTA](s) [member of] [0,1] is the set of probability distributions Many probability distributions are so important in theory or applications that they have been given specific names. Discrete distributions With finite support
Q-learning provides us with a simple updating procedure, in which the learning agent starts with arbitrary initial values of Q(s, a) for all s [member of] S, a [member of] A, and updates the Q-values as [MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression. NOT REPRODUCIBDLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. .] (1) where [alpha] is the learning rate and [beta] [member of] [0,1) is the discount rate [9]. It is vital to choose an appropriate r in Q-learning [10]. In this paper, based on the requirement and experience of the buffer, r is defined as [MATHEMATICAL EXPRESSION NOT REPRODUCIBDLE IN ASCII.] (2) where [q.sub.LT], is the set value of queue length in the buffer. Refer to (2), if the value of [q.sub.L] is less than 0.9 [q.sub.LT] or more than 1.1 [q.sub.LT], r = 0 , the control result should be considered bad. If the value of [q.sub.L] is equal to [q.sub.L], r = 1, it can be thought that the control result is good. Otherwise, r is in the range (0,1), the larger r is, the better control affects. In Q-learning based control, the usual approach of storing the Q-values in a look-up table is impractical in the case of a large state space in high-speed networks. Furthermore, it is unlikely to visit each state in a reasonable time. Fuzzy Q-learning is an adaptation of Q-learning for fuzzy inference system, where both the actions and Q-values are inferred from fuzzy rules [11]. In high-speed networks, FIS relies on three parameters S([q.sub.L], [q.sub.L], u) to generate a selected action a. For an input state s = {[q.sub.L], [q.sub.L], u}, we find the activate value of each rule [R.sup.i]: [w.sub.i](s) . Each rule has m possible discrete control actions A = {[a.sub.1],[a.sub.2], ..., [a.sub.m]}, and a parameter called q value associated with each control action. The state associates to each action in [R.sup.i], a quality with respect to the task. In FQL, one builds an FIS with competing actions for each rule i [member of] N designated as [R.sup.i] : If [q.sub.L] is [L.sup.i.sub.1] and [q.sub.L] is [L.sup.i.sub.2] and u is [L.sup.i.sub.3] then a is [a.sup.i.sub.j] with [q.sup.i.sub.j] (3) where [q.sup.i.sub.j] is the jth q value in a rule i and [L.sup.i.sub.s] =linguistic term (fuzzy label) of input variable [S.sub.s] in rule [R.sup.i], its membership function is denoted by [mu] [L.sup.i.sub.s]. The q values
Q values are the difference of energies of the parent nuclides to the daughter nuclides. in (3) are calculated according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. total accumulated rewards and rules' activate values. The functional blocks of FIS are a fuzzifier, a defuzzifier, and an inference engine The processing program in an expert system. It derives a conclusion from the facts and rules contained in the knowledge base using various artificial intelligence techniques. inference engine - A program that infers new facts from known facts using inference rules. containing a fuzzy rule base [12]. The fuzzifier performs the function of fuzzification that translates the value of each input linguistic variable into fuzzy linguistic terms. These fuzzy linguistic terms are defined in a term set F(S) and are characterized by a set of membership function u(S). The defuzzier describes an output linguistic variable of selected action a by a term set F(a) , characterized by a set of membership functions [mu](a) , and adopts a defuzzification strategy to convert the linguistic terms of F(a) into a nonfuzzy value representing selected action a. The term set should be determined at an approximate level of granularity to describe the values of linguistic variables. The term set for [q.sub.L] is defined as F([q.sub.L]) = {Low(L), Medium(M), High(H)} , which is used to describe the degree of queue length as "Low", "Medium", or "High". The term set for [q.sub.L] is defined as F([q.sub.L]) = {Decrease(D), Increase(I)}, which describes the change rate of queue length as "Decrease" or "Increase". The term set for u is defined as F(u) = {Negative(N),Positive(P)}, which describes the change rate of source sending rate as "Negative" or "Positive". On the other hand, in order to provide a precise graded feedback signal in various states, the term for feedback signal is defined as F(a) = {Higher(HE), High(H), Normal(N),Low(L),Lower(LE)}. The membership functions (MFs) are shown in Figure 2. In each rule [R.sup.i], the learning agent (controller) can choose one action [a.sup.i.sub.j] from the action set A = {[a.sub.1], [a.sub.2], ... , [a.sub.m], } . The inferred global continuous action a, at state s is calculated as [MATHEMATICAL EXPRESSION NOT REPRODUCIBDLE IN ASCII.] (4) where [a.sup.i.sub.j] is the action selected in rule [R.sup.i] using a Metropolis criterion based exploration/exploitation policy in [8]. [FIGURE 2 OMITTED] Following fuzzy inference, the Q-value for the inferred action [a.sub.t] is calculated as [MATHEMATICAL EXPRESSION NOT REPRODUCIBDLE IN ASCII.] (5) Under action a([s.sub.t]), the system undergoes transition [s.sub.t] r [right arrow] [s.sub.t+1], where r is the reward received by the controller. This information is used to calculate temporal difference (TD) approximation error In the mathematical field of numerical analysis, the approximation error in some data is the discrepancy between an exact value and some approximation to it. An approximation error can occur because
[DELTA]Q = r + [beta] * max/a Q ([s.sub.t+1],a) - Q([s.sub.t],[a.sub.t]) (6) The change of q value can be found by [MATHEMATICAL EXPRESSION NOT REPRODUCIBDLE IN ASCII.] (7) We can rewrite the learning rule (1) of q parameter values as [q.sup.i.sub.j] [left arrow (character) left arrow - The graphic which the 1963 version of ASCII had in place of the underscore character, ASCII 95. ] [q.sup.i.sub.j] + [alpha] * [DELTA][q.sup.i.sub.j] (8) 2.3. The Genetic Operator Based Flow Controller In this section we develop the fuzzy Q-learning controller by genetic operators. The consequent parts of fuzzy rules need to compete for survival within a niche. In this case, each rule in FIS maintains a q value, but it is no longer an estimation of accumulated rewards. The max operator in standard fuzzy Q-learning is not used since the rules that have maximum q value no longer represent rules with the best rewards. Because it is not suitable to use the q values as the fitness values in the learning, we employ their changes Aq as the fitness values. In this paper the fuzzy rule in (3) can be rewritten as follows: [R.sup.i] : If [q.sub.L] is [L.sup.i.sub.1] and [q.sub.L] is [L.sup.i.sub.2] and u is [L.sup.i.sub.3] then a is [a.sup.i.sub.j] with [q.sup.i.sub.j] and [[DELTA]q.sup.i.sub.j] (9) The fitness value for a rule is an inverse measure of [DELTA]q. By using the fitness value calculation in [13], a predicted rule accuracy [kappa Kappa Used in regression analysis, Kappa represents the ratio of the dollar price change in the price of an option to a 1% change in the expected price volatility. Notes: Remember, the price of the option increases simultaneously with the volatility. ] at time step t is defined as [MATHEMATICAL EXPRESSION NOT REPRODUCIBDLE IN ASCII.] (10) The accuracy falls off exponentially for [DELTA][q.sub.t] > [DELTA][q.sub.0]. [DELTA][q.sub.0] is an initial value. The predicted accuracy in (10) can be used to adjust rule's fitness value [f.sub.t] using the standard Widrow-Hoff delta rule The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons in a single-layer perceptron. For a neuron with activation function
[f.sub.t] = [f.sub.t] + [chi] ([[kappa].sub.t] - [f.sub.t]) (11) where [chi] is an adjust rate of fitness values. The niche genetic operators can prevent the population from the premature convergence This loosely means that something has gone wrong. More precisely, that a evolutionary computation population has converged (every individual in the population is identical, see convergence) to a suboptimal solution. Often the term premature convergence is loosely used. or the genetic drift genetic drift: see genetics. genetic drift Change in the pool of genes of a small population that takes place strictly by chance. Genetic drift can result in genetic traits being lost from a population or becoming widespread in a population without resulting from the selection operator. The niche genetic operators maintain population diversity and promote the formation of sub-population in the neighbourhood of local optimal solutions. In fuzzy Q-learning, the fitness sharing is implicitly implemented by assigning fitness values to the activated rules based Using "if-this, do that" rules to perform actions. Rules-based products implies flexibility in the software, enabling tasks and data to be easily changed by replacing one or more rules. on their contributions. The fuzzy rule antecedent ANTECEDENT. Something that goes before. In the construction of laws, agreements, and the like, reference is always to be made to the last antecedent; ad proximun antecedens fiat relatio. constitutes an evolving niche or sub-population where the fuzzy rules with the same antecedent share similar environment states. The rule consequences or actions need to compete for survival within a niche, while the rules from different niches co-operate to generate the output. In the definition of a fuzzy rule in (9), a fuzzy rule can be defined as a sub-population and the rule actions are encoded as individuals in sub-population. If there are N rules in fuzzy Q-learning, there will be N sub-population. As shown in Figure 3, in each learning step, the reward from the environment is apportioned ap·por·tion tr.v. ap·por·tioned, ap·por·tion·ing, ap·por·tions To divide and assign according to a plan; allot: "The tendency persists to apportion blame as suits the circumstances" to the rules that are activated in the previous step. The rule's fitness values are accordingly updated in the form of (11). There is a winner action in each sub-population and the winner actions from all sub-population are formed the consequent parts of fuzzy rules. The selection for the winners in sub-population is implemented by the niche genetic operator. The niche genetic operator uses two operators to select the actions: Reproduce operator: individuals in each sub-population are selected as winners in terms of their fitness values. The roulette roulette (r lĕt`), game of chance popular in gambling casinos, and in a simplified form elsewhere. In gambling houses the roulette wheel is set in an oblong table. wheel
selection is used.
Mutation operator: the mutation is taken for each sub-population with a mutation probability. The operator chooses an individual from sub-population randomly to replace a winner in the sub-population. In the learning process, the network environment provides current states and rewards to the learning agent. The learning agent produces actions to perform in the network. The learning agent includes a performance component, a reinforcement component, and a discovery component. The performance component reads states from network environment, calculates activation degrees of fuzzy rules, and generates an action. The action is then executed by the traffic sources. The network moves into next state and receives evaluating reward from the network environment for its action. [FIGURE 3 OMITTED] The discovery component plays an action selection role. Two genetic operators are used to implement the selection. Finally, a set of rule actions is selected for the performance component. The reinforcement component serves to assign the reward to the individual rules that are activated by current state. 3. Simulation and Comparison The simulation model of high-speed network, as shown in Figure 4, is composed of two switches, Sw1 with a control agent and Sw2 with no controller are cascaded. The constant output link L is 80Mbps. The sending rates of the sources are regulated by the flow controllers individually. In the simulation, we assume that all packets are with a fixed length of 1000bytes, and adopt a finite buffer length of 20packets in the node. On the other hand, the offered loading of the simulation varies between 0.6 and 1.2 corresponding to the systems' dynamics; therefore, higher loading results in heavier traffic and vice versa VICE VERSA. On the contrary; on opposite sides. . For the link of 80Mbps, the theoretical throughput is 62.5K packets. From the knowledge of evaluating system performance, the parameters of the membership functions for input linguistic variables in FIS are selected as follows. For [[mu].sub.L](q.sub.L]), [[mu].sub.M](q.sub.L]), and [[mu].sub.H](q.sub.L]), [L.sub.a] = 0, [L.sub.b] = 6 [L.sub.b1] = 10, [M.sub.a1] = 2, [M.sub.a] = 8, [M.sub.b] = 12, [M.sub.b1] = 20, [H.sub.a1] = 9, [H.sub.a] = 14, [H.sub.b] = 20, and [H.sub.b1] = 20 ; for [[mu].sub.D](q.sub.L]) and [[mu].sub.I](q.sub.L]), [D.sub.a] = 4, [D.sub.b] = [D.sub.b1] = 2, [I.sub.a1] = [I.sub.a] = 2, and [I.sub.b] = 4 ; for [[mu].sub.N](u) and [[mu].sub.P](u), [N.sub.a] = 0.8, [N.sub.b] = 0.4, [N.sub.b1] = 0.2, [P.sub.a1] = 0.2, [P.sub.a] = 0.4, and [P.sub.b] = 0.8. Also, the parameters of the membership functions for output linguistic variables are given by [LE.sub.0] = 0.2, [L.sub.0] = 0.4, [N.sub.0] = 0.6, [H.sub.0] = 0.8, and [HE.sub.0] = 1. The fuzzy rule base is an action knowledge base, characterized by a set of linguistic statements in the form of "if-then" rules that describe the fuzzy logic fuzzy logic, a multivalued (as opposed to binary) logic developed to deal with imprecise or vague data. Classical logic holds that everything can be expressed in binary terms: 0 or 1, black or white, yes or no; in terms of Boolean algebra, everything is in one set or relationship between the input variables and selected action. After the leaning process, the inference rules in fuzzy rule base under various system states are shown in Table 1. According to fuzzy set Fuzzy sets are sets whose elements have degrees of membership. Fuzzy sets have been introduced by Lotfi A. Zadeh (1965) as an extension of the classical notion of set. In classical set theory, the membership of elements in a set is assessed in binary terms according to a bivalent theory, the fuzzy rule base forms a fuzzy set with dimensions 3X2X2=12. For example, rule 1 can be linguistically started as "if the queue length is low, the queue length change rate is decreased, and the sending rate change rate is negative, then the feedback signal is Higher." [FIGURE 4 OMITTED] In the simulation, four schemes of flow control agent, AIMD AIMD Additive Increase, Multiplicative Decrease AIMD Accounting and Information Management Division AIMD Aircraft Intermediate Maintenance Department AIMD American Institute for Managing Diversity AIMD Ab Initio Molecular Dynamics AIMD Active Implantable Medical Device , standard reinforcement learning-based neural flow controller (RLNC), Metropolis criterion based Q-learning flow controller (MQLC), and the proposed GFQC are implemented individually in high-speed network. The first scheme AIMD increases its sending rate by a fixed increment To add a number to another number. Incrementing a counter means adding 1 to its current value. (0.11) if the queue length is less than the predefined threshold; otherwise the sending rate is decreased by a multiple of 0.8 of the previous sending rate to avoid congestion [14]. Finally, for the other schemes, the sending rate is controlled by the feedback control signal a, periodically. The controlled sending rate is defined by the equation [u.sub.t] = [a.sub.t]FL (12) where [a.sub.t] [member of] [0.2,1.0] is the feedback signal by the flow controller, F is a relative value in the ratio of source offered load to the available output bit rate, L denotes the outgoing rate of link, and [u.sub.t] [member of] [0.2. FL, FL] is the controlled sending rate at sample time t. In simulation four measures, throughput, PLR, buffer utilization, and packets' mean delay, are used as the performance indices. The throughput is the amount of received packets at specified nodes (switches) without retransmission Retransmission might refer to:
The performance comparison of throughput, PLR, buffer utilization, and mean delay controlled by four different kinds of agents individually are shown in Figure 5-8. The throughput for AIMD method decrease seriously at loading of 0.9. Conversely, the GFQC proposed remain a higher throughput even though the offered loading is over 1.0, and can decrease the PLR enormously with high throughput and low mean delay. The GFQC has a better performance over RLNC and MQLC in PLR, buffer utilization, and mean delay. It demonstrates once again that GFQC possesses the ability to predict the network behavior in advance. [FIGURE 5 OMITTED] [FIGURE 6 OMITTED] [FIGURE 7 OMITTED] [FIGURE 8 OMITTED] 4. Conclusions In the flow control of high-speed networks, the reactive scheme AIXID could not accurately respond to a time-varying environment due to the lack of prediction capability. The fuzzy Q-learning flow controller has good performance when the state space of high-speed network is large and continuous. The genetic operator is introduced to obtain the consequent parts of fuzzy rules. Through a proper training process, the proposed GFQC can respond to the networks' dynamics and learn empirically without prior information on the environmental dynamics. The sending rate of traffic sources can be determined by the well-trained flow control agent. Simulation results have shown that the proposed controller can increase the utilization of the buffer and decrease the PLR simultaneously. Therefore, the GFQC proposed not only guarantees low PLR for the existing links, but also achieves high system utilization. Received August 15, 2008; revised December 3, 2008; accepted December 30, 2008 5. References [1] R. G. Cheng, C. J. Chang, and L. F. Lin, "A QoS-provisioning neural fuzzy connection admission controller for multimedia high-speed networks," IEEE/ ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field. Transactions on Networking, Vol. 7, No. 1, pp. 111-121, 1999. [2] M. Lestas, A. Pitsillides, P. Ioannou, and G. Hadjipollas, "Adaptive congestion protocol: A congestion control
Congestion control concerns controlling traffic entry into a telecommunications network, so as to avoid congestive collapse by attempting to avoid protocol with learning capability," Computer Networks: The International Journal of Computer and Telecommunications Networking, Vol. 51, No. 13. pp. 3773-3798, September 2007. [3] R. S. Sutton and A. G. Barto, "Reinforcement learning an introduction," Cambridge, MA: MIT MIT - Massachusetts Institute of Technology Press, 1998. [4] A. Chatovich, S. Okug, and G. Dundar, "Hierarchical neuro-fuzzy call admission controller for ATM networks," Computer Communications, Vol. 24, No. 11, pp. 10311044, June 2001. [5] M. C. Hsiao, S. W. Tan, K. S. Hwang, and C. S. Wu, "A reinforcement learning approach to congestion control of high-speed multimedia networks," Cybernetics cybernetics [Gr.,=steersman], term coined by American mathematician Norbert Wiener to refer to the general analysis of control systems and communication systems in living organisms and machines. and Systems, Vol. 36, No. 2, pp. 181-202, January 2005. [6] K. S. Hwang, S. W. Tan, M. C. Hsiao, and C. S. Wu, "Cooperative multiagent congestion control for highspeed networks," IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. Transactions on System, Man, and Cybernetics-Part B: Cybernetics, Vol. 35, No. 2, pp. 255-268, April 2005. [7] X. Li, X. J. Shen Shen, in the Bible, place, perhaps close to Bethel, near which Samuel set up the stone Ebenezer. , Y. W. Jing jing (jing) [Chinese] one of the basic substances that according to traditional Chinese medicine pervade the body, usually translated as "essence"; the body reserves or constitutional makeup, replenished by food and rest, that supports , and S. Y. Zhang, "Simulated annealing-reinforcement learning algorithm for ABR (1) (AutoBaud Rate detect) The analysis of the first characters of a message to determine its transmission speed and number of start and stop bits. (2) (Available Bit R traffic control of ATM networks," in Proceedings of the 46th IEEE Conference on Decision and Control, New Orleans New Orleans (ôr`lēənz –lənz, ôrlēnz`), city (2006 pop. 187,525), coextensive with Orleans parish, SE La., between the Mississippi River and Lake Pontchartrain, 107 mi (172 km) by water from the river mouth; founded , LA, USA, pp. 5716-5721, December 2007. [8] X. Li, Y. W. Jing, G. M. Dimirovski, and S. Y. Zhang, "Metropolis criterion based Q-learning flow control for high-speed networks," in 17th International Federation of Automatic Control The International Federation of Automatic Control (IFAC), founded in September 1957, is a multinational federation of (National Member Organizations), each one representing the engineering and scientific societies concerned with automatic control in its own country. (IFAC IFAC - International Federation of Automatic Control, involved in informatics related to control systems. ) World Congress, Seoul, Korea, pp. 11995-12000, July 2008. [9] C. J. C. H. Watkins, and P. Dayan, "Q-learning," Machine Learning, Vol. 8, No. 3, pp. 279-292, May 1992. [10] M. L. Littman, "Value-function reinforcement learning in Markov games," Journal of Cognitive System Research, Vol. 2, No. l, pp. 55-66, 2001. [11] D. B. Gu, and E. F. Yang, "A policy gradient reinforcement learning algorithm with fuzzy function approximation The need for function approximations arises in many branches of applied mathematics, and computer science in particular. In general, a function approximation problem asks us to select a function among a well-defined class that closely matches ("approximates") a target function in a ," in Proceedings of the 2004 IEEE International Conference on Robotics and Biomimetics bi·o·mi·met·ics n. (used with a sing. verb) The study of the structure and function of biological systems as models for the design and engineering of materials. , Shenyang, China, pp. 934-940, August 2004. [12] Y. Zhou, M. J. Er, and Y. Wen, "A hybrid approach for automatic generation of fuzzy inference systems without supervised learning Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. ," in Proceedings of the 2007 American Control Conference, New York City New York City: see New York, city. New York City City (pop., 2000: 8,008,278), southeastern New York, at the mouth of the Hudson River. The largest city in the U.S. , USA, pp. 3371-3376, July 2007. [13] S. W. Wilson, "Classifier fitness based on accuracy," Evolutionary Computation evolutionary computation - Computer-based problem solving systems that use computational models of evolutionary processes as the key elements in design and implementation. , Vol. 3, No. 2, pp. 145-179,1994. [14] P. Gevros, J. Crowcoft, P. Kirstein, and S. Bhatti, "Congestion control mechanisms and the best effort service model," IEEE Network, Vol. 15, No. 3, pp. 16-26, May-June 2001. Xin LI, Yuanwei JING, Nan JIANG, Siying ZHANG College of Information Science and Engineering, Northeastern University Northeastern University, at Boston, Mass.; coeducational; founded 1898 as a program within the Boston YMCA, inc. 1916, university status 1922, fully independent of the YMCA 1948. , Shenyang, China Email: lixin820106@126.com Table 1. Rule table of FIS. Rule [q.sub.L] [[??].sub.L] u a 1 L D N HE 2 L D P H 3 L I N N 4 L I P N 5 M D N H 6 M D P L 7 M I N N 8 M I P LE 9 H D N L 10 H D P LE 11 H I N L 12 H I P LE |
|
||||||||||||||||||

with activation function
lĕt`)
Printer friendly
Cite/link
Email
Feedback
Reader Opinion