# FPGA Implementation of Reconfigurable Finite State Machine with Input Multiplexing Architecture Using Hungarian Method.

1. IntroductionDesigning a complex digital system requires an efficient method that includes modeling a control unit (i.e., a controller). The operational speed of such systems depends on the speed of their controllers. The mathematical model for designing a controller for applications such as microprocessor control units, circuit testing, and digital signal processing (DSP) is a finite state machine (FSM). Consequently, designing such systems requires an efficient synthesis technique for high-speed FSM [1, 2]. Applications such as DSP [3, 4] and built-in self-test (BIST) [5] require specific operations to be performed only in the particular instances. Different control units are required to complete each operation. Hence, to optimally perform these operations, a single control unit is defined which can configure itself depending upon the applied mode of operation; it is also known as reconfigurable FSM [1]. The mode of operation for such FSM is controlled-by a counter, timer, or any user-defined control signals based on the application requirements. An example of a reconfigurable FSM is given in [1] as a test chip for wireless sensor network. In this example, Transition-Based Reconfigurable FSM (TR-FSM) [1] is configured into one of the MCNC FSM benchmark circuits (i.e., dk15, s386, or cse) at different instances. Moreover, any application which requires sequential processing can be broken down into a series of instances (i.e., multistage reconfigurable signal processing) where at each instance only a particular operation is performed [3]. Hence, for such applications, efficient architectures can be created using reconfigurable FSM. These emerging trends in the research necessitate a framework for optimal synthesis of high-speed reconfigurable FSM.

Conventional LUT-based architectures have been used for FSM implementation on a FPGA platform [6]. Similarly, ROM-based architectures are investigated for FSM implementations. Due to the area and speed advantages, they act as an excellent alternative to their conventional LUT-based counterparts [7]. In such implementations, a considerable reduction in power consumption is obtained by disabling embedded memory blocks (EMBs) during the idle states [8, 9]. The fundamental framework for FSM with input multiplexing (FSMIM) is made in [7] whose prime objective is to shorten the depth of ROM memory. In their approach, an input selector (which consists of a multiplexer bank) is used. The basic idea that has been implemented is to select only a specific set of inputs for a particular state. FSMIM with state-based input selection (FSMIM-S) is proposed in [10], which further reduces the ROM memory size.

Another approach for implementation of reconfigurable FSM is RAM-based architectures. In literature, there are two underlying RAM-based architectures, that is, variation-based reconfigurable multiplexer bank (VRMUX) and combination-based reconfigurable multiplexer bank (CRMUX) [11]. The RAM-based architectures do not serve as a novel tool for implementation of complicated FSM structures such as parallel hierarchical finite state machines (PHFSM) [12] or reversible FSM [13]. Due to significant advantages of FSMIM-S architecture over other architectures, it is used to create a framework for the high-speed Reconfigurable FSMIM-S architecture.

The Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture [10] and an optimized multiplexer bank (which defines the mode of operation). For this, the descriptions of a set of FSMs are taken for a particular application. Hence, the problem is to obtain the optimized multiplexer bank for the given set of FSMs. It can be solved by mapping all the FSMs into one large FSM (called base_ckt) in that set. The objective of this process is to perform optimal matching between base_ckt and the other FSMs in the set so that a minimum number of bits are changed by changing the mode of operation. This situation (i.e., performing one-to-one mapping) transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set to base_ckt with minimal cost [14]. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. In this algorithm, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix. A cost matrix reduction based technique, that is, Hungarian algorithm [15, 16], is used for matching. A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search. At every iteration, descriptions of two FSMs (i.e., base_ckt and one of the FSMs in the set) are taken as inputs. It produces the modified descriptions of the FSMs of the same dimension as outputs. At the end of the algorithm, a mutual XOR operation is performed among the modified descriptions, which provides the required optimized multiplexer bank.

The experimental results from MCNC FSM benchmarks illustrate the advantages of the proposed architecture as compared with VRMUX [11], as operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation. It also shows that the operating speed is improved at an average of 9.14% in comparison with CRMUX [11] during FPGA implementation. The limitation of the proposed technique is the requirement of higher LUTs, as it requires an average of 88.65% more LUTs in comparison with CRMUX [11] during FPGA implementation.

The rest of the paper is outlined as follows. Section 2 consists of the Reconfigurable FSMIM-S architecture and the proposed iterative greedy heuristic based Hungarian algorithm. The experimental evaluation of the proposed algorithm, implementation of the Reconfigurable FSMIM-S architecture, and comparison with other proposals from the literature are presented in Section 3. The concluding remarks are devised in Section 4.

2. Proposed Method

As most of the FPGA platforms use synchronous EMBs, Mealy machines with synchronous outputs are used in this paper. Let a Mealy FSM be described by the following columns: [a.sub.m] is a code of current state ([a.sub.m] [member of] A, where A = {[a.sub.1], ..., [a.sub.M]} is a set of states); K([a.sub.m]) is a code of state [a.sub.m] [member of] A; h is the number of transitions per state (h [member of] H, where H = {[t.sub.1], ..., [t.sub.M]} is a set of number of transitions per state corresponding to A); as is a state of transition (the next state); K([a.sub.s]) is a code of state [a.sub.s] [member of] A; X = {[x.sub.1], ..., [x.sub.L]} is the set of input variables, Y = {[y.sub.1], ..., [y.sub.N]} is the set of output variables; and D = {[d.sub.1], ..., [d.sub.R]} is defined as excitation functions for the flip-flops, where R is the number of flip-flops (i.e., the number of bits in internal state codes), R [member of] {[??][log.sub.2]M[??], M}.

The descriptions ofaset of FSMs are taken for a particular application. The fundamental idea is to obtain the description of a single FSM by mapping all the FSMs into one large FSM (called base_ckt) in that set. The inputs, states, and outputs of an FSM in the set are mapped into base_ckt in their respective order. The mode bits are applied through a2 x 1 multiplexer in those positions where the polarity of bit differs (i.e., 1 in place 0 and vice versa) to perform such mapping. Hence, the resultant FSM operates in two modes, where base_ckt mode is the default mode of operation. Similarly, all other FSMs in the set are mapped into base_ckt. In this way, a single FSM (i.e., base_ckt) combined with a multiplexer bank (which defines the mode of operation) acts as reconfigurable FSM. It can be configured into a particular FSM in the set by applying the specific mode bits. Due to numerous advantages mentioned in the literature, FSMIM-S architecture [10] is chosen to implement the FSM (i.e., base_ckt) part. Therefore, the Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture [10] and multiplexer bank for mode based reconfiguration as shown in Figure 1.

It encounters the following two major difficulties:

(i) The complexity of the resultant multiplexer bank is very high.

(ii) It becomes difficult to define the dummy states and dummy transitions. Dummy states and dummy transitions are such states and transitions which are not present in base_ckt but exist in the other FSMs in the set and vice versa. These states and transitions lead the system to failure.

As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. In this algorithm, the descriptions of a set of FSMs (i.e., [h, X, [a.sub.m], [a.sub.s], Y]) are taken as inputs. It provides the optimized multiplexer bank for mode based reconfiguration as output. It also provides the updated description (i.e., description without dummy states and dummy transitions) of base_ckt, which is used to construct the conventional FSMIM-S part of the proposed architecture. Let (B + 1) be the set of FSMs for a particular application. Based on the complexity of the description of FSM, the largest FSM is selected from the set. It is called base_ckt. The rest of the FSMs are called recon_ckt_1, recon_ckt_2, ..., recon_ckt_B, respectively.

Each input, state, or output of a recon_ckt_b [member of] {recon_ckt_ 1, recon_ckt_2, ..., recon_ckt_B} can be mapped into any one of the inputs, states, or outputs, respectively, of base_ckt; that is, there exists a one-to-one mapping. These mappings cannot be performed independently because inputs, states, and outputs of an FSM are interdependent. Consequently, mapping an input or state of recon_ckt_b into base_ckt is transformed into a weighted bipartite graph matching problem or linear assignment problem (LAP) [14] as shown in Figure 2. In this LAP, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix. The objective of this process is to perform matching with a minimal cost so that a minimum number of bits are changed by changing the mode of operation. Therefore, the complexity of the multiplexer bank is reduced.

In the literature, the following approaches are proposed to solve a LAP:

(i) Modified Hungarian algorithm [16]

(ii) Simple greedy heuristic based algorithm [17]

(iii) Evolutionary heuristic algorithm [18].

The maximum number of inputs or states does not exceed 100 in MCNC FSM benchmarks or FSMs used in real-world applications. So, the number of vertices used in the resultant weighted bipartite graph is always low which results in small LAP. But, the number of LAPs formed in this process is enormous because input matching and state matching are performed together as shown in Figure 2. Hence, the primary requirement of the algorithm to solve LAP becomes the fast convergence. Therefore, a cost matrix reduction based technique, that is, Hungarian algorithm [15, 16], is used for matching. A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search. The pseudocode of this technique is summarized in Algorithm 1. (Note: subscripts "base" and "recon" denote the parameters of base_ckt and recon_ckt, respectively, throughout the paper.)

At every iteration [member of] {1, ..., B}, descriptions of two FSMs, that is, base_ckt and recon_ckt_b, are taken as inputs. The major contributing factors for power consumption and LUT requirement in FSM are the number of inputs and the internal states [8, 19]. In any FSM, input variable and states are interdependent. Thus, input and state matching are performed together between base_ckt and recon_ckt.

If L_base [greater than or equal to] L_recon, then E = [sup.L-base] [P.sub.L_recon] combinations of input lines for base_ckt are generated to match with input lines of recon_ckt_b. (L_base - L_recon) input lines act as don't cares while the system operates in recon_ckt_b mode. Otherwise, E = [sup.L-recon][P.sub.L_base] combinations of input lines for recon_ckt_b are generated to match with input lines of base_ckt. In this case, (L_recon - L_base) input lines act as don't cares while the system operates in base_ckt mode.

Now, for each combination of input lines, state matching is performed (Algorithm 2). This situation can be seen as a LAP where the objective is to match the states of recon_ckt_b to the states of base_ckt with minimal cost [14,17]. For this, the number of states in both the FSMs is equalized. Thus, if M_base [greater than or equal to] M_recon, then ([alpha] - M_recon, where [alpha] = M_base) dummy states are added in recon_ckt_b. Otherwise ([alpha] - M_base, where [alpha] = M_recon) dummy states are added in base_ckt.

All LAP solving algorithms require a cost matrix as an input to perform an optimal assignment. So, to form a cost matrix for this problem, a procedure named weight_assignment is proposed.

In this procedure, the combinations of input lines, [a.sub.m] and h, for base_ckt and recon_ckt_b are taken as inputs. It provides the cost matrix to map recon_ckt_b states into base.ckt states. An array is created at each transition in both base_ckt and recon_ckt_b by combining [input_combination [member of] {[x.sub.1], [x.sub.2], ..., [x.sub.L]}, [a.sub.m]].

The basic idea that has been implemented is as follows: (i) replace the recon_ckt_b state with the base_ckt state sequentially in the recon_ckt_array; (ii) evaluate the weight by performing Bitwise-XOR operation (i.e., transition matching) for that particular replacement; (iii) then, construct the cost matrix.

For each transition in recon_ckt_array (i.e., [h.sub.recon] [member of] {1, 2, ..., [t.sub.m_recon]})> transition matching is performed. This situation can be seen as a LAP where the objective is to match the transition of recon_ckt_b to the transition ofbase_ckt with minimal cost [14,17]. For this, the number of transitions for the particular state is equalized in both the FSMs. Therefore, if [t.sub.m-base] [greater than or equal to] [t.sub.m_recon], then ([beta] - [t.sub.m_recon], where [beta] = [t.sub.m-base]) dummy transitions are added in the recon_ckt_array. Otherwise ([beta] - [t.sub.m_base], where [beta] = [t.sub.m_base]) dummy transitions are added in the base_ckt_array. Thus, for each transition in base_ckt_array (i.e., [h.sub.base] [member of] {1, 2, ..., [t.sub.m_base]}), a Bitwise-XOR operation is performed between the arrays for that particular transition. The total number of Is in the BitwiseXOR operations is counted to create a cost matrix for transition matching. Then, optimal assignment of transitions is performed by greedy based heuristic Hungarian algorithm (GBH_hungarian_algorithm) between base_ckt_array and recon_ckt_array. Let match_count be a variable defined as

match_count = [[beta].summation over (i=1)] [[beta].summation over (j=1)] [C.sub.ij] [[lambda].sub.ij]

where, [C.sub.ij] [left arrow] cost matrix, [[lambda].sub.ij] [left arrow] decision variable. (1)

Input. The descriptions of the FSMs (i.e. [h, X, [a.sub.m], [a.sub.s], Y]) Output. The optimized multiplexer bank for mode based reconfiguration begin select the largest FSM from the set based on the description; base_ckt [left arrow] largest FSM; recon_ckt_1, recon_ckt_2, ..., recon_ckt_B [left arrow] rest of the FSMs in the set; for each recon_ckt_b [member of] {recon_ckt_ 1, recon_ckt_2, ..., recon_ckt_B} do if (L_base [greater than or equal to] L_recon) then /* performing the input matching */ generate, E [left arrow] [sup.L/base] [P.sub.L_recon] combinations_of_input_lines for base.ckt to match with input lines of recon_ckt_b; go to state_matching; /* calling the function/"state_matching" */ else if (L_base < L_recon) then generate, [left arrow] [sup.L-base] [P.sub.L_recon] combinations_of_inputjines for recon_ckt_b to match with input lines of base_ckt; go to state_matching; /* calling the function/ "state_matching" */ end select combinations_of_input_lines with min{assignment_[cost.sub.1], ..., assignment_[cost.sub.E]} &min{total_[cost.sub.1], total_[cost.sub.2], ..., total_[cost.sub.E]}; perform binary state assignment in base.ckt & recon_ckt_b i.e. apply K([a.sub.m]) & K([a.sub.s]); weight_assignment(); /*creating arrays by [selected_input_combination, K([a.sub.s]), K([a.sub.m])] */go to dummy .replacement; /* calling the function/ "dummy .replacement" */ if (N_base [greater than or equal to] N_recon) then /* performing the output matching */ generate, G [left arrow] [sup.N_base][P.sub.N_recon] combinations_of_output_lines for base.ckt to match with output lines of recon_ckt_b; go to output_matching; /* calling the function/ "output_matching" */ else if (N_base < N_recon) then generate, G [left arrow] [sup.N_base][P.sub.N_recon] combinations_of_output_lines for recon_ckt_b to match with output lines of base.ckt; go to output_matching; /* calling the function/ "output_matching" */ end select combinations_of_outputjines with min{XOR_[count.sub.1], ..., XOR_[count.sub.G]}; update the description of base_ckt; end for each recon_ckt_b [member of] {recon_ckt_ 1, recon_ckt_2, ..., recon_ckt_(B--1)} do go to dummy .replacement; /* calling the function/ "dummy .replacement" */update the description of recon_ckt_b; end perform a mutual (i.e. [sup.B][C.sub.2]) Bitwise-XOR operations between the updated descriptions of FSMs; obtain the optimized multiplexer bank for mode based reconfiguration; end Algorithm 1: Iterative greedy heuristic based Hungarian algorithm.

In this way, by using match_count (from (1)), the cost matrix formation to map recon_ckt_fr states into base_ckt states is completed. The pseudocode of the procedure, weight_assignment, is summarized in Algorithm 5.

Let V and U represent the set of vertices (i.e., transitions or states) for recon_ckt and base_ckt, respectively. [mu] = (V U U, [xi]) is defined as a balanced weighted bipartite graph, where [absolute value of V] = [absolute value of U] = [eta]. C is the cost matrix. A number [C.sub.ij] [greater than or equal to] 0 for each edge [i, j] [member of] [xi] is called the cost (or weight) of the edge [i, j].

In GBH_hungarian_algorithm, the cost matrix C is taken as input. It provides an optimal assignment between V and U as output. GBH in [17] is an iterative cost matrix reduction based approach to solve the LAP. At each iteration, a single vertex is eliminated from either V or U until the advent of some stopping conditions. Let k be the last iteration (whereas k is a positive integer). Therefore, either k or (k - 1) vertices are eliminated from [mu] at the last iteration.

Let [v.sub.k] [subset not equal to] V and [u.sub.k] [subset not equal to] U be the subsets of the remaining vertices in V and U, respectively, at iteration k. At the first iteration, that is, k = 1, [v.sub.1] = V, and [u.sub.1] = U, respectively, the objective of the LAP is to assign [eta] resources to [eta] tasks in such a way that optimal total cost should be obtained for the assignment. The LAP can be mathematically formulated as follows:

[mathematical expression not reproducible]; (2)

s.t. [[eta].summation over (j=1)] [[lambda].sub.ij], = i, (3)

where [for all]i = 1, ..., [eta];

[[eta].summation over (j=1)] [[lambda].sub.ij], = 1, (4)

where [for all]j = 1, ..., [eta];

[[lambda].sub.ij] [member of] {0, 1}, (5)

where [for all]i, j = 1, ..., [eta].

Equation (2) represents the objective function for LAP. If Resource i is allocated to task j then the decision variable [[lambda].sub.ij] = 1 and 0 otherwise as depicted in (5). One-to-one mapping should be practiced between resources and tasks. Equations (3) and (4) ensure these criteria.

Input. Combinations_of_input_lines, the descriptions of base.ckt & recon_ckt_b (i.e. [h, X, [a.sub.m], [a.sub.s], Y]) Output. Assignment_[cost.sub.e], total_[cost.sub.e], modified description of recon_ckt_b begin for all (combinations_of_input_lines) do if (M_base [greater than or equal to] M_recon) then /* equating the number of states in both the FSMs */ add (M_base--M_recon) dummy states in recon_ckt_b; [alpha] [left arrow] M_base; else if (M_base < M_recon) then add (M_recon--M_base) dummy states in base_ckt; [alpha] [left arrow] M_recon; end weight_assignment(); /* calling the procedure/ "weight_assignment" */ GBH_hungarian_algorithm(); /* performing present state matching */ assignment_[cost.sub.e] [left arrow] [[summation].sup.[alpha].sub.i=1] [[summation].sup.[alpha].sub.j=1] [C.sub.ij] x [[lambda].sub.ij]; where e [member of] {1, 2, ..., E}; total_[cost.sub.e] [left arrow] [[summation].sup.[alpha].sub.i=1] [[summation].sup.[alpha].sub.j=1] [C.sub.ij]; where e [member of] {1, 2, ..., E}; replace all states of recon_ckt_b by their corresponding states of base_ckt obtained from GBH_hungarian_algorithm; corresponding to [a.sub.m_base] order, arrange all the complete arrays of recon_ckt_b; end end Algorithm 2: State_matching.

At each iteration, there are two options to eliminate a vertex, that is, from either V or U. For each i [member of] [v.sub.k] and j [member of] [u.sub.k], the following parameters are defined to select one of the above options:

[mathematical expression not reproducible], (6)

[mathematical expression not reproducible]; (7)

[mathematical expression not reproducible]. (8)

In (6), [C.sup.V.sub.k] and [C.sup.U.sub.k] can act as "potential cost contribution" [17] of vertices i [member of] [v.sub.k] and j [member of] [v.sub.k] to [f.sub.k] in (2). Thus, the potential cost contribution is evaluated for the vertices, and if it exceeds the corresponding removal cost, then such vertices are eliminated.

If [C.sup.V.sub.k] [less than or equal to] [C.sup.U.sub.k], then an attempt is made to remove one of the vertices from [v.sub.k] [subset not equal to] V. From (7), if [C.sup.V.sub.k] [less than or equal to] [C.sup.U.sub.k], that is, the objective function value is improved by eliminating [i.sub.k], then [v.sub.k+1] is set to [v.sub.k] and the next iteration is executed.

Otherwise, one of the vertices from [v.sub.k] [subset not equal to] U is eliminated. From (8), if [f.sup.U.sub.k] [less than or equal to] [f.sub.k-1], that is, the objective function value is improved by eliminating [j.sub.k], then [u.sub.k+1] is set to [u.sub.k] and the next iteration is executed.

In this case, when the objective function value is not improved by eliminating either [i.sub.k] or [j.sub.k], then algorithm halts and the obtained solution is [f.sub.k-1]. Furthermore, if [C.sup.V.sub.k] > [C.sup.U.sub.k], then the above steps are repeated in the opposite order. The pseudocode of this approach is devised in Algorithm 6.

Therefore, after obtaining the cost matrix from weight_assignment for state matching, GBH_hungarian_ algorithm is applied to obtain the following parameters:

[mathematical expression not reproducible]. (9)

Input. The descriptions of base.ckt & recon_ckt_b (i.e. [h, X, [a.sub.m], [a.sub.s], Y]) Output. The descriptions base_ckt & recon_ckt_b without dummy states and dummy transitions. begin if (M_base [greater than or equal to] M_recon) then /* replacing the dummy states */ replace dummy states in recon_ckt_b by Proposition 1; /* by considering states in place of transitions in the modified cost matrix */ else if (M_base < M_recon) then replace dummy states in base.ckt by by Proposition 2; end for each (matched state, am recon e recon_ckt_b) do /* replacing the dummy transitions */ for each (transition in recon_ckt_b, hrecon [member of] {1, 2, ..., [t.sub.m_recon]}) do if ([t.sub.m_base] > [t.sub.m_recon]) then replace dummy transitions in recon_ckt_b by Proposition 1; else if ([t.sub.m_base] < [t.sub.m_recon]) then replace dummy transitions in base.ckt by Proposition 1; end end end end Algorithm 3: Dummy .replacement.

Thus, all the recon_ckt_b states are replaced by their assigned base_ckt states, and all the complete arrays of recon_ckt_b are arranged corresponding to [a.sub.m_base] order. Hence, from (9), the combinations of input lines are selected with min{assignment_[cost.sub.1], ..., assignment_[cost.sub.E]} & min{totaLcost1, total_[cost.sub.2], ..., total_[cost.sub.g]}.

Now, binary state codes K([a.sub.m]) and K([a.sub.s]) are applied in base_ckt and recon_ckt_b. As it changes the weights of cost matrix, weight_assignment is again applied to construct a modified cost matrix. In this case, arrays are created by combining [selected_input_combination, K([a.sub.s]), K([a.sub.m])]. Dummy states are replaced in matched states of base_ckt and recon_ckt_b by using Propositions 1 and 2. Then, dummy transitions are replaced by using Proposition 1. The dummy replacement algorithm is shown in Algorithm 3.

Proposition 1. Dummy transitions in a matched state of base.ckt or recon_ckt_b should be replaced with one of the existing transitions in that particular state with a minimum cost.

Proof. For each matched state (or assigned state after matching) [member of] recon_ckt_b if ([t.sub.m_base] [greater than or equal to] [t.sub.m_recon]) then ([t.sub.m_base] - [t.sub.m_recon]) dummy transitions are present in recon_ckt_b state.

Hence, there are ([t.sub.m_base] - [t.sub.m_recon]) transitions, present in the corresponding state of base_ckt which are unassigned. These unassigned transitions in base_ckt will lead the system to failure while operating in recon_ckt_b mode. As a solution, these unassigned transitions of base_ckt are assigned to the existing transitions of recon_ckt_b with the least cost by looking at the particular column of the modified cost matrix.

Similarly, for each matched state (or assigned state after matching) [member of] recon_ckt_b, if ([t.sub.m_base] < [t.sub.m_recon]) then ([t.sub.m_recon] - [t.sub.m_base]) dummy transitions are present in base_ckt state. Hence, there are ([t.sub.m_recon] - [t.sub.m_base]) transitions, present in the corresponding state of recon_ckt_b which are unassigned. These unassigned transitions in recon_ckt_b will lead the system to failure while operating in base_ckt mode. As a solution, these unassigned transitions of recon_ckt_b are assigned to the existing transitions of base_ckt with the least cost by looking at the particular row of the modified cost matrix.

Let M.[C.sub.ixj] represent the modified cost matrix for a matched state, where rows ([U.sub.i]) and columns ([V.sub.j]) denote the base_ckt and recon_ckt_b transitions, respectively. Thus, the unassigned transitions in base_ckt state can be assigned by (10) as follows:

Unassigned_[U.sub.i] [right arrow] [V.sub.j]: min (M_[C.sub.1j], M_[C.sub.2j], M_[C.sub.3j], ..., M_[C.sub.ij]). (10)

Similarly, the unassigned transitions in recon_ckt_b state can be assigned by (11) as follows:

Unassigned_[V.sub.j] [right arrow] [U.sub.j]}: min (M_[C.sub.i1], M_[C.sub.i2], M_[C.sub.i3], ..., M_[C.sub.ij]). (11)

Proposition 2. If M_base < M_recon, then dummy states are replaced by splitting the matched state in base-ckt.

Proof. In FSM, splitting a state with high transitions results in low power consumption [8,19]. It also improves the operating speed [2,20]. IfM_base > M_recon, then there are (M_base - M_recon) states, present in base_ckt which are unassigned. These unassigned states in base_ckt will lead to failure in the system while operating in recon_ckt_b mode. As base_ckt is the largest FSM in the collection and its transitions per state are greater than recon_ckt_b, splitting recon_ckt_b states are insignificant for the system performance. So, these unassigned states of base_ckt are assigned using Proposition 1. ?

If M_base < M_recon, then (M_recon - M_base) dummy states are replaced by splitting the matched state in base_ckt. Let [PSI]([a.sub.m_base]) = Q([t.sub.m_base] - [a.sub.m_recon]), where Q is a positive integer. Only the states for which [absolute value of [PSI]([a.suub.m_base]) > 1] can be split [19]. Each state can be split into nonoverlapping subsets of ([t.sub.m_base] - [t.sub.m_recon]) transitions. Algorithm 7 is proposed to split abase_ckt state.

At this stage, the states and the input lines of both the FSMs are completely matched and fixed. Hence, the output matching is performed by performing a Bitwise-XOR operation and selecting the combination with the least count of 1's. If N-base [greater than or equal to] N_recon, then G = [sup.N_recon] [P.sub.N_base] combinations of output lines for base_ckt are generated to match with output lines of recon_ckt_b. Otherwise, G = [sup.N_recon] [P.sub.N_base] base combinations of output lines for recon_ckt_b are generated to match with output lines of base_ckt. Then, for each combination of output lines, Bitwise-XOR operation is performed between corresponding output lines of base_ckt and recon_ckt_b. Let XOR_[count.sub.g], where g [member of] {1, 2, ..., G} represents the total number of 1's in the Bitwise-XOR operation for a particular combination of output lines. Therefore, the combinations of output lines with min{XOR_[count.sub.1], ..., XOR_[count.sub.G]} are selected.

At the end of every iteration, the description of base_ckt is updated to operate on the next iteration. At the end of Bth iteration, for each recon_ckt_k [member of] {recon_ckt_1, recon_ckt_2, ..., recon_ckt_(B - 1)}, replacement of dummy transitions and states is performed and updated descriptions of recon_ckt_1, recon_ckt_2, ..., recon_ckt_(B - 1) are obtained. In this way, descriptions of all FSMs are optimally matched, having the same dimension. Therefore, a mutual (i.e., [sup.B][C.sub.2]) Bitwise-XOR operation between the updated descriptions of FSMs is conducted which provides the optimized multiplexer bank for mode based reconfiguration.

3. Experimental Evaluation

Experiments have been conducted to illustrate the advantages of the proposed architecture using the FSM benchmark circuits from MCNC/LGSynth [21] as shown in Table 1.

The proposed iterative greedy heuristic based Hungarian algorithm has been implemented in MATLAB (2016b) environment. MATLAB HDL Coder tool is used to generate the Verilog HDL code for multiplexer bank for mode based reconfiguration. The Reconfigurable FSMIM-S architecture is described in Verilog HDL and implemented on a Xilinx xc6vlx75t Speed Grade-3 device (Virtex-6) by using Xilinx ISE 14.6 [15]. All computations are performed using a computer with an Intel(R)Core(TM)i5,8 GB RAM, and 2.67 GHz CPU.

Let [x.sub.1], [x.sub.2], [x.sub.3], ... be the input lines, [y.sub.1], [y.sub.2], [y.sub.1], ... be the output lines, and S1,S2, S3, ... be the states of an FSM. In the proposed algorithm, at the first stage, input matching is performed along with the state matching; after that, dummy states and transitions are replaced. Then, output matching is performed (Algorithm 4).

As the number of inputs or outputs exceeds 8, it requires the generation of more than [sup.8][P.sub.8] = 40320 combinations for matching, which exhausts the simulation resources. Hence, the excess input lines are discarded from input matching, which contains the maximum number of don't cares out of the total number of transitions. Similarly, the excess output lines are discarded from output matching, which contains the minimum number of 1's out of the total number of transitions. Therefore, the complexities of input selector bank and group encoder are reduced because the information content of these lines is minimum.

The FSM "s1494" has been considered as base_ckt, because it consists of 48 states, 8 inputs, 19 outputs, and 250 transitions which are of higher values as compared with any of the FSMs in the collection. Hence, "s1494" is considered as an FSM included in the design at the 0th iteration. In this case, state splitting is never used for dummy state replacement, because base_ckt contains the highest number of states. All dummy states and transitions are replaced by using Proposition 1. For output match [y.sub.1], [y.sub.2], [y.sub.3], [y.sub.4], [y.sub.5], [y.sub.6], [y.sub.7], [y.sub.9], [y.sub.10], [y.sub.16] and [y.sub.18] are discarded because they contain 62,12,13, 8, 6,6, 38,4,16,46, and 78 instances of 1's, respectively, out of a total of 250 transitions.

In the 1st iteration, an FSM, "sand," is included in the design. For input matching, [x.sub.6], [x.sub.7], and [x.sub.9] are discarded because they contain 178,150, and 182 don't cares, respectively, out of a total of 184 transitions. All states are matched with the states of base_ckt in respective order. For output matching, y8 is discarded because it contains 3 instances of 1's, out of a total of 253 transitions.

Input. Combinations_of_output_lines Output. XOR_[count.sub.g] begin for all (combinations_of_output_lines) do perform Bitwise-XOR operation between corresponding output lines of base.ckt & recon_ckt_ b; XOR_[count.sub.g] [less than or equal to] the total number of 1's in the Bitwise-XOR operation; where, g [member of] {1, 2, ..., G}; end end Algorithm 4: Output_matching.

In the 2nd iteration, an FSM, "styr," is included in the design. For input matching, [x.sub.9] is discarded because it contains 160 don't cares, out of a total of 166 transitions. States S3, S4, S15, and S16 are matched with S4, S3, S16, and S15, respectively, of base_ckt. The rest of the states are matched with the states of base_ckt in respective order. For output matching, [y.sub.4] and [y.sub.9] are discarded because they contain 5 and 6 instances of 1's, respectively, out of a total of 254 transitions.

In the 3rd iteration, an FSM, "planet," is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.4], [y.sub.10], [y.sub.11], [y.sub.12], [y.sub.13], [y.sub.14], [y.sub.15], [y.sub.16], [y.sub.17], [y.sub.18] and [y.sub.19] are discarded because they contain 19, 5, 2, 26, 13, 3, 4, 2, 4, 4, and 23 instances of 1's, respectively, out of a total of 255 transitions.

In the 4th iteration, an FSM, "s832," is included in the design. For input matching, [x.sub.1], [x.sub.2], [x.sub.3], [x.sub.4], [x.sub.6], [x.sub.6], [x.sub.11], [x.sub.12], [x.sub.13], [x.sub.14], and [x.sub.15] are discarded because they contain 240, 228, 239, 234, 241, 224, 230, 235, 241, and 241 don't cares, respectively, out of a total of 245 transitions. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.2], [y.sub.4], [y.sub.6], [y.sub.9], [y.sub.10], [y.sub.12], [y.sub.13], [y.sub.14], [y.sub.16], [y.sub.17] and [y.sub.18] are discarded because they contain 3, 5, 2, 2, 2, 6, 2, 4, 2, 4, and 6 instances of 1's, respectively, out of a total of 259 transitions.

In the 5th iteration, an FSM, "cse," is included in the design. All states of "cse" are matched with the states of base_ckt in respective order. In the 6th iteration, an FSM, "s386," is included in the design. All states of "s386" are matched with the states of base_ckt in respective order. In the 7th iteration, an FSM, "ex6," is included in the design. All states of "ex6" are matched with the states of base_ckt in respective order. In the 8th iteration, an FSM, "mc," is included in the design. All states of "mc" are matched with the states of base_ckt in respective order.

In the 9th iteration, an FSM, "planet1," is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.4], [y.sub.10], [y.sub.11], [y.sub.12], [y.sub.13], [y.sub.14], [y.sub.15], [y.sub.16], [y.sub.17], [y.sub.18] and [y.sub.19] are discarded because they contain 19, 5, 2, 26, 13, 3, 4, 2, 4, 4, and 23 instances of 1's, respectively, out of a total of 279 transitions.

In the 10th iteration, an FSM, "s1488," is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.1], [y.sub.2], [y.sub.3], [y.sub.4], [y.sub.5], [y.sub.6], [y.sub.8], [y.sub.10], [y.sub.11], [y.sub.14] and [y.sub.17] are discarded because they contain 6, 7, 4, 13, 6, 38, 10, 16, 42, 61, and 64 instances of 1's, respectively, out of a total of 281 transitions.

In the 11th iteration, an FSM, "s208," is included in the design. For input matching, [x.sub.3], [x.sub.4], and [x.sub.5] are discarded because they contain 153,153, and 153 don't cares, respectively, out of a total of 153 transitions. All states are matched with the states of base_ckt in respective order. The input matching and output matching among FSMs are shown in Tables 2 and 3, respectively, along with the minimum assignment_cost, totaLcost, and XOR_count (as defined in Algorithm 1).

At the end of the proposed algorithm, optimized multiplexer bank for mode based reconfiguration is formed by performing a mutual Bitwise-XOR operation between the updated descriptions of FSMs. Hence, to evaluate the individual hardware contribution of FSMs in the Reconfigurable FSMIM-S, architecture is evaluated as follows: (i) the Bitwise-XOR operation is performed iteratively between the updated description of base_ckt and the FSM included in that particular iteration. (ii) The Verilog HDL code for the required multiplexer bank is generated at every iteration to implement in Xilinx xc6vlx75t-3 device. (iii) Thus, the number of LUTs occupied by the particular FSM is measured by the difference between the numbers of LUTs in the current iteration and previous iteration of the system. The iterative implementation of the Reconfigurable FSMIM-S architecture on Virtex-6 is shown in Table 4. Therefore, at the last iteration, the total number of LUTs required and the average speed of operation are obtained for the proposed architecture.

The experimental results from MCNC FSM benchmarks illustrate the advantages of the proposed architecture as compared with VRMUX [11]. As a result, operating speed is enhanced at an average of 30.43%, and LUT consumption is reduced by an average of 5.16% in FPGA implementation. It also shows that the operating speed is improved at an average of 9.14% in comparison with CRMUX [11] during FPGA implementation. The limitation of the proposed technique is the requirement of higher LUTs, as it requires an average of 88.65% more LUTs in comparison with CRMUX [11] during FPGA implementation. The comparisons of hardware requirement and operating speed are presented in Figures 3 and 4, respectively.

Input. Combination_of_input Jines for base.ckt, [a.sub.m_base], [h.sub.base], combination_of_input Jines for recon.ckt b, [a.sub.m_recon], [h.sub.recon]; Output. Cost matrix C /* cost matrix formation to map recon.ckt Jj states into base_ckt states */ begin base_ckt_array [left arrow] create an array at each transition in base_ckt by combining [input.combination [member of] {[x.sub.1_base], ..., [x.sub.L_base]}, [a.sub.m_base]]; recon_ckt_array [left arrow] create an array at each transition in recon_ckt_fo by combining [input.combination [member of] {[x.sub.1_recon], ..., [x.sub.L_recon]}, [a.sub.m_recon]]; for each (state, [a.sub.m_recon] [member of] recon_ckt_array) do replace [a.sub.1_base] [left arrrow] [a.sub.1_recon]; 1 go to transition matching; /*calling the function/ "transition matching"*/ C[l, 1] [left arrow] match_count; replace [a.sub.2_base] [left arrrow] [a.sub.1_recon]; go to transition matching; /*calling the function/"transition matching"*/ C[l, 2] [left arrow] match count; replace [a.sub.M_base] [left arrrow] [a.sub.1_recon]; go to transition matching; /* calling the function/"transition matching" * / C[l, M_base] [left arrow] match count; *** transition matching *** for each (transition in recon ckt array, [h.sub.recon] [member of] [1, 2, ..., [t.sub.m_recon]}) do if ([t.sub.m_base] [greater than or equal to] [t.sub.m_recon]) then add ([t.sub.m_base] - [t.sub.m_recon]) dummy transitions in the recon_ckt_array; [beta] [left arrow] [t.sub.m_base]; else if ([t.sub.m_base] < [t.sub.m_recon]) then add ([t.sub.m_recon] - [t.sub.m_base]) dummy transitions in the recon_ckt_array; [beta] [left arrow] [t.sub.m_recon]; end for each (transition in base ckt array, [h.sub.base] [member of] {1, 2, ..., [t.sub.m_base}) do perform Bitwise-XOR operation between the arrays for that particular transition; C[[h.sub.recon], [h.sub.base]] [left arrow] the total number of l's in the Bitwise-XOR operation; end end GBH hungarian algorithm(); /*calling the procedure/"GBH hungarian algorithm"*/ arrange the reconcktarrays based on assignment obtained from GBHhungarianalgorithm; matchcount [left arrow] [[summation].sup.[beta].sub.i=1] [[summation].sup.[beta].sub.j=1] [C.sub.ij] [[lambda].sub.ij]; end end Algorithm 5: Weight-assignment.

The operating speed of the proposed system is maximum (i.e., 810.17 MHz) and its LUT requirement is minimum (i.e., 42 LUTs) in the 0th iteration. The operating speed is reduced, and the LUT requirement is increased successively by adding an FSM at each iteration as shown in Table 4. Therefore, the proposed architecture acts as an ideal candidate for such applications where the similarity between the sets of FSMs is high (i.e., fewer differences in their descriptions). Many FPGA families such as Altera stratix-IV, stratix-V, or MAXII do not contain RAM blocks, and hence CRMUX cannot be used. The proposed architecture is preferred in such cases.

Input. Cost matrix C Output. Optimal assignment between V and U begin [v.sub.1] [left arrow] V, [u.sub.1] [left arrow] U, [f.sub.0] [left arrow] [phi]([v.sub.1], [u.sub.1]) and k [left arrow] 1 /* Initialization */ while [mathematical expression not reproducible]; if [C.sup.V.sub.k] [less than or equal to] [C.sup.U.sub.k] then /* performing vertex elimination from set V */ if [f.sup.V.sub.k] [less than or equal to] [f.sub.k/1] then /* identify whether vertex elimination from V is "profitable" */ [f.sub.k] [less than or equal to] [f.sup.V.sub.k], [v.sub.k+1] [left arrow] [u.sub.k+1] [left arrow] [u.sub.k]; else if [f.sup.U.sub.k] [less than or equal to] [f.sub.k/1] then /* identify whether vertex elimination from U is "profitable" */ [f.sub.k] [left arrow] [f.sup.U.sub.k], [v.sub.k+1], [u.sub.k+1] [left arrow] [u.sub.k]; end else if [C.sup.V.sub.k] > [C.sup.U.sub.k] then /* performing vertex elimination from set U */ if [f.sup.U.sub.k] [less than or equal to] [f.sub.k/1] then /* identify whether vertex elimination from U is "profitable" */ [f.sub.k] [left arrow] [f.sup.U.sub.k], [v.sub.k+1] [left arrow] [v.sub.k], [u.sub.k+1] [left arrow] [u.sub.k]; else if [f.sup.V.sub.k] [less than or equal to] [f.sub.k/1] then /* identify whether vertex elimination from V is "profitable" */ [f.sub.k] [left arrow] [f.sup.V.sub.k], [v.sub.k+1] [left arrow] [v.sub.k], [u.sub.k+1] [left arrow] [u.sub.k] end end k [left arrow] k + 1; end end Algorithm 6: GBH_hungarian_algorithm. Input. [a.sub.m_base], M-base, M_recon, h Output. Resultant states to replace dummy states in base_ckt begin while ((matched state, [a.sub.m_base] 6 base.ckt) && (M_base [greater than or equal to] M_recon)) do for each (transition in base_ckt, [h.sub.base] 6 {1, 2, ..., [t.sub.m_base]}) do if ([t.sub.m_base] -[t.sub.m_base]) [greater than or equal to] 1 then split the state, |[PSI]([a.sub.m_base]) = Q > 1|; end end m_base [left arrow] m_base + 1; end if (M_base > M_recon) then replace dummy states in recon_ckt_fo by Proposition 1; end end Algorithm 7: Base.ckt state splitting.

Moreover, in the proposed algorithm, the next state function is partially included in matching, and binary state encoding is used. The experimental results from [22-24] show that the evolutionary state encoding algorithms such as [23] or [24] outperform the binary or random state encoding techniques by an average of 59.72% and 64.06%, respectively.

Therefore, the LUT requirement for the proposed architecture can be further reduced by 20 to 30% by using the evolutionary state encoding techniques.

4. Concluding Remarks

This paper presents a high-speed reconfigurable FSM with input multiplexing and state-based input selection (Reconfigurable FSMIM-S) architecture. The creation of such architecture leads to a problem of defining the optimized multiplexer bank for mode based reconfiguration for the set of FSMs in a particular application. This situation transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set with minimal cost. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed, which provides the required optimized multiplexer bank. By using the proposed architecture, operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation in comparison with VRMUX [11]. It has also been shown that the operating speed is improved at an average of 9.14% as compared with CRMUX [11]. The only trade-off of the proposed technique is that it requires 88.65% more LUTs compared with CRMUX [11] during FPGA implementation.

Further, the improvement on this work is focused on reducing the LUT requirement to implement the proposed architecture. In this study, a binary state encoding is used, and next state function is partially included in matching. However, evolutionary state encoding algorithms such as [23] or [24] can be used to reduce the increased LUT requirement.

https://doi.org/10.1155/2018/6831901

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

[1] J. Glaser, M. Damm, J. Haase, and C. Grimm, "TR-FSM: Transition-based Reconfigurable finite state machine," ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 4, no. 3, article no. 23, 2011.

[2] A. Karatkevich, Design of Reconfigurable Logic Controllers, vol. 45, springer, Berlin, Germany, 2016.

[3] J. Wu, D. Yang, and Z. Chen, "Design and application of multi-stage reconfigurable signal processing flow on FPGA," Computers and Electrical Engineering, vol. 42, pp. 1-11, 2015.

[4] E. De Lucas, M. Sanchez-Elez, and I. Pardines, "DSPONE48: A methodology for automatically synthesize HDL focus on the reuse of DSP slices," Journal of Parallel and Distributed Computing, vol. 106, pp. 132-142, 2017.

[5] M. Nandini Priya and R. Brindha, "An enhanced architecture for high performance BIST TPG," in Proceedings of the 2nd IEEE International Conference on Innovations in Information, Embedded and Communication Systems, ICIIECS 2015, pp. 1-6, Coimbatore, India, March 2015.

[6] K. Mielcarek, A. Barkalov, and L. Titarenko, "Designing LUT-based mealy FSM with transformation of collections of output functions," in Proceedings of the 5th International Conference on Modern Circuits and Systems Technologies, MOCAST 2016, pp. 1-4, Thessaloniki, Greece, May 2016.

[7] R. Senhadji-Navarro, I. Garcia-Vargas, G. Jimenez-Moreno, and A. Civit-Ballcels, "ROM-based FSM implementation using input multiplexing in FPGA devices," IEEE Electronics Letters, vol. 40, no. 20, pp. 1249-1251, 2004.

[8] A. Klimowicz, V. Solov'Ev, and T. Grzes, "Minimization method of finite state machines for low power design," in Proceedings of the 18th Euromicro Conference on Digital System Design, DSD 2015, pp. 259-262, Funchal, Portugal, August 2015.

[9] S. N. Pradhan and P. Choudhury, "Low power and high testable Finite State Machine synthesis," in Proceedings of the International Conference and Workshop on Computing and Communication, IEMCON2015,pp. 1-5, Canada, 0ctober2015.

[10] I. Garcia-Vargas and R. Senhadji-Navarro, "Finite state machines with input multiplexing: A performance Study," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 5, pp. 867-871, 2015.

[11] R. Senhadji-Navaro and I. Garcia-Vargas, "high-speed and area-efficient reconfigurable multiplexer bank for RAM-based finite state machine implementations," Journal of Circuits, Systems and Computers, vol. 24, no. 7, Article ID 1550101, 2015.

[12] V. Sklyarov and I. Skliarova, "Synthesis of parallel hierarchical finite state machines," in Proceedings of the 2013 21st Iranian Conference on Electrical Engineering, ICEE2013, Iran, May2013.

[13] S. Gupta, V. Pareek, S. C. Jain, and D. Jain, "Realization of sequential reversible circuit from finite state machine," in Proceedings of the International Computer Science and Engineering Conference, ICSEC 2014, pp. 458-463, Khon Kaen, Thailand, August 2014.

[14] M. Barketau, E. Pesch, and Y. Shafransky, "Minimizing maximum weight of subsets of a maximum matching in a bipartite graph," Discrete Applied Mathematics, vol. 196, pp. 4-19, 2015.

[15] K. Date and R. Nagi, "GPU-accelerated Hungarian algorithms for the linear assignment problem," Parallel Computing, vol. 57, pp. 52-72, 2016.

[16] J. Dutta and S. C. Pal, "A note on Hungarian method for solving assignment problem," Journal of Information and Optimization Sciences, vol. 36, no. 5, pp. 451-459, 2015.

[17] V. Stozhkov, V. Boginski, O. . Prokopyev, and E. L. Pasiliao, "A simple greedy heuristic for linear assignment interdiction," Annals of Operations Research, vol. 249, no. 1-2, pp. 39-53, 2017.

[18] S. K. Ramadoss, A. P. Singh, and I. K. G. Mohiddin, "An evolutionary heuristic algorithm for the assignment problem," OPSEARCH, vol. 51, no. 4, pp. 589-602, 2014.

[19] T. N. Grzes and V. V. Solov'ev, "Minimization of power consumption of finite state machines by splitting their internal states," Journal of Computer and Systems Sciences International, vol. 54, no. 3, pp. 367-374, 2015.

[20] V. Salauyou, "Synthesis of high-speed finite state machines in FPGAs by state splitting," in Computer Information Systems and Industrial Management: 15th IFIP TC8 International Conference, CISIM 2016, K. Saeed and W. Homenda, Eds., vol. 9842 of Lecture Notes in Computer Science, pp. 741-751, Springer International Publishing, Vilnius, Lithuania, 2016.

[21] https://people.engr.ncsu.edu/brglez/CBL/benchmarks/ LGSynth89/fsmexamples/.

[22] T. Villa and A. Sangiovanni-Vincentelli, "NOVA: State assignment of finite state machines for optimal two-level logic implementations," in Proceedings of the 26th ACM/IEEE Design Automation Conference, pp. 327-332, June 1989.

[23] A. H. El-Maleh, "Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits," IET Computers & Digital Techniques, vol. 10, no. 1, pp. 30-36, 2016.

[24] A. H. El-Maleh, "A probabilistic pairwise swap search state assignment algorithm for sequential circuit optimization," Integration, the VLSI Journal, vol. 56, pp. 32-43, 2017.

NitishDas (iD) and P. Aruna Priya (iD)

Department of ECE, SRM University, Kattankulathur, Chennai 603203, India

Correspondence should be addressed to P. Aruna Priya; arunapriya.p@ktr.srmuniv.ac.in

Received 25 July 2017; Revised 27 October 2017; Accepted 16 November 2017; Published 10 January 2018

Academic Editor: Michael Hubner

Caption: Figure 1: Reconfigurable FSMIM-S architecture.

Caption: Figure 2: Flow chart for iterative greedy heuristic based Hungarian algorithm.

Caption: Figure 3: Comparison of hardware requirements during FPGA implementation.

Caption: Figure 4: Comparison of operating speeds during FPGA implementation.

Table 1: Benchmark circuits from MCNC/LGSynth [21]. Benchmark Number of Number of Number of circuits states inputs outputs s1494 48 8 19 sand 32 11 9 styr 30 9 10 planet 48 7 19 s832 25 18 19 cse 16 7 7 s386 13 7 7 ex6 8 5 8 mc 4 3 5 planet1 48 7 19 s1488 48 8 19 s208 18 11 2 Table 2: Input matching among FSMs. Circuits sl494 sand styr planet s832 base_ckt recon_ckt_1 recon_ckt_2 recon_ckt_3 recon_ckt_4 [x.sub.1] [x.sub.4] [x.sub.7] [x.sub.6] [x.sub.9] [x.sub.2] [x.sub.2] [x.sub.4] [x.sub.3] [x.sub.16] [x.sub.3] [x.sub.1] [x.sub.5] [x.sub.4] [x.sub.18] [x.sub.4] [x.sub.10] [x.sub.1] [x.sub.2] [x.sub.17] [x.sub.5] [x.sub.5] [x.sub.8] [x.sub.5] [x.sub.8] [x.sub.6] [x.sub.3] [x.sub.6] [x.sub.7] [x.sub.7] [x.sub.7] [x.sub.0] [x.sub.1] [x.sub.1] [x.sub.5] [x.sub.8] [x.sub.11] [x.sub.2] -- [x.sub.10] Minimum 0 0 0 0 assignment-cost Minimum 11 50 14 22 total_cost Circuits s1494 cse s386 ex6 mc base_ckt recon_ckt_5 recon_ckt_6 recon_ckt_7 recon_ckt_8 [x.sub.1] [x.sub.5] [x.sub.4] [x.sub.3] -- [x.sub.2] [x.sub.2] -- [x.sub.4] [x.sub.3] [x.sub.3] -- [x.sub.2] -- [x.sub.1] [x.sub.4] [x.sub.3] [x.sub.7] [x.sub.1] -- [x.sub.5] [x.sub.4] [x.sub.1] [x.sub.2] [x.sub.2 [x.sub.6] [x.sub.1] [x.sub.5] -- -- [x.sub.7] [x.sub.6] [x.sub.6] [x.sub.5] -- [x.sub.8] [x.sub.7] [x.sub.3] -- -- Minimum 0 0 0 0 assignment-cost Minimum 19 31 6 1 total_cost Circuits s1494 planetl sl488 s208 base_ckt recon_ckt_9 recon_ckt_10 recon_ckt_11 [x.sub.1] [x.sub.5] [x.sub.1] [x.sub.1] [x.sub.2] [x.sub.1] [x.sub.5] [x.sub.8] [x.sub.3] [x.sub.6] [x.sub.3] [x.sub.6] [x.sub.4] [x.sub.3] [x.sub.4] [x.sub.9] [x.sub.5] [x.sub.4] [x.sub.8] [x.sub.11] [x.sub.6] [x.sub.2] [x.sub.7] [x.sub.2] [x.sub.7] [x.sub.7] [x.sub.2] [x.sub.10] [x.sub.8] -- [x.sub.6] [x.sub.7] Minimum 0 2 0 assignment-cost Minimum 17 42 33 total_cost Table 3: Output matching among FSMs. Circuits s1494 sand styr planet s832 base.ckt recon_ckt_1 recon_ckt_2 recon_ckt_3 recon_ckt_4 [y.sub.8] [y.sub.4] [y.sub.5] [y.sub.2] [y.sub.19] [y.sub.11] [y.sub.6] [y.sub.6] [y.sub.6] [y.sub.15] [y.sub.12] [y.sub.2] [y.sub.7] [y.sub.8] [y.sub.2] [y.sub.13] [y.sub.9] [y.sub.8] [y.sub.9] [y.sub.8] [y.sub.14] [y.sub.7] [y.sub.3] [y.sub.5] [y.sub.7] [y.sub.15] [y.sub.3] [y.sub.2] [y.sub.3] [y.sub.1] [y.sub.17] [y.sub.5] [y.sub.1] [y.sub.1] [y.sub.11] [y.sub.19] [y.sub.1] [y.sub.10] [y.sub.7] [y.sub.5] XOR_count 95 163 106 76 Circuits s1494 cse s386 ex6 mc base.ckt recon_ckt_5 recon_ckt_6 recon_ckt_7 recon_ckt_8 [y.sub.8] -- [y.sub.6] [y.sub.5] [y.sub.4] [y.sub.11] [y.sub.7] [y.sub.1] [y.sub.8] [y.sub.2] [y.sub.12] [y.sub.3] [y.sub.3] [y.sub.4] [y.sub.5] [y.sub.13] [y.sub.5] [y.sub.4] [y.sub.1] [y.sub.1] [y.sub.14] [y.sub.1] -- [y.sub.6] [y.sub.3] [y.sub.15] [y.sub.4] [y.sub.7] [y.sub.7] -- [y.sub.17] [y.sub.6] [y.sub.5] [y.sub.2] -- [y.sub.19] [y.sub.2] [y.sub.2] [y.sub.3] -- XOR_count 134 154 86 72 Circuits s1494 planet1 s1488 s208 base.ckt recon_ckt_9 recon_ckt_10 recon_ckt_11 [y.sub.8] [y.sub.6] [y.sub.18] -- [y.sub.11] [y.sub.7] [y.sub.7] [y.sub.1] [y.sub.12] [y.sub.1] [y.sub.13] -- [y.sub.13] [y.sub.8] [y.sub.19] -- [y.sub.14] [y.sub.2] [y.sub.9] [y.sub.2] [y.sub.15] [y.sub.5] [y.sub.15] -- [y.sub.17] [y.sub.3] [y.sub.12] -- [y.sub.19] [y.sub.9] [y.sub.16] -- XOR_count 142 103 136 Table 4: Iterative implementation of the Reconfigurable FSMIM-S architecture on Virtex-6. Iteration FSM included in #LUTs occupied Maximum the particular in the operating iteration particular frequency iteration 0th s1494 42 810.17 MHz 1st sand 79 779.271 MHz 2nd styr 105 775.164MHz 3rd planet 137 728.704MHz 4th s832 199 725.005MHz 5th cse 230 722.335MHz 6th s386 249 720.643 MHz 7th ex6 255 715.231 MHz 8th mc 269 706.889 MHz 9th planet1 303 676.338 MHz 10th s1488 330 671.760 MHz 11th s208 349 665.181 MHz Iteration Maximum path #LUTs occupied by delay the FSM (#LUTs in the current iteration--#LUTs in the previous iteration) 0th 4.571 ns 42 1st 4.778 ns 37 2nd 4.455 ns 26 3rd 4.609 ns 32 4th 5.381ns 62 5th 5.140 ns 31 6th 5.662 ns 19 7th 4.967 ns 6 8th 4.526 ns 14 9th 5.098 ns 34 10th 4.486 ns 27 11th 3.014 ns 19 Note. #LUTs denotes the number of LUTs in ISE.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Research Article |
---|---|

Author: | Das, Nitish; Priya, P. Aruna |

Publication: | International Journal of Reconfigurable Computing |

Date: | Jan 1, 2018 |

Words: | 9807 |

Previous Article: | RP-Ring: A Heterogeneous Multi-FPGA Accelerator. |

Next Article: | Reconfigurable Network Stream Processing on Virtualized FPGA Resources. |

Topics: |