Printer Friendly

FPGA Implementation of Reconfigurable Finite State Machine with Input Multiplexing Architecture Using Hungarian Method.

1. Introduction

Designing a complex digital system requires an efficient method that includes modeling a control unit (i.e., a controller). The operational speed of such systems depends on the speed of their controllers. The mathematical model for designing a controller for applications such as microprocessor control units, circuit testing, and digital signal processing (DSP) is a finite state machine (FSM). Consequently, designing such systems requires an efficient synthesis technique for high-speed FSM [1, 2]. Applications such as DSP [3, 4] and built-in self-test (BIST) [5] require specific operations to be performed only in the particular instances. Different control units are required to complete each operation. Hence, to optimally perform these operations, a single control unit is defined which can configure itself depending upon the applied mode of operation; it is also known as reconfigurable FSM [1]. The mode of operation for such FSM is controlled-by a counter, timer, or any user-defined control signals based on the application requirements. An example of a reconfigurable FSM is given in [1] as a test chip for wireless sensor network. In this example, Transition-Based Reconfigurable FSM (TR-FSM) [1] is configured into one of the MCNC FSM benchmark circuits (i.e., dk15, s386, or cse) at different instances. Moreover, any application which requires sequential processing can be broken down into a series of instances (i.e., multistage reconfigurable signal processing) where at each instance only a particular operation is performed [3]. Hence, for such applications, efficient architectures can be created using reconfigurable FSM. These emerging trends in the research necessitate a framework for optimal synthesis of high-speed reconfigurable FSM.

Conventional LUT-based architectures have been used for FSM implementation on a FPGA platform [6]. Similarly, ROM-based architectures are investigated for FSM implementations. Due to the area and speed advantages, they act as an excellent alternative to their conventional LUT-based counterparts [7]. In such implementations, a considerable reduction in power consumption is obtained by disabling embedded memory blocks (EMBs) during the idle states [8, 9]. The fundamental framework for FSM with input multiplexing (FSMIM) is made in [7] whose prime objective is to shorten the depth of ROM memory. In their approach, an input selector (which consists of a multiplexer bank) is used. The basic idea that has been implemented is to select only a specific set of inputs for a particular state. FSMIM with state-based input selection (FSMIM-S) is proposed in [10], which further reduces the ROM memory size.

Another approach for implementation of reconfigurable FSM is RAM-based architectures. In literature, there are two underlying RAM-based architectures, that is, variation-based reconfigurable multiplexer bank (VRMUX) and combination-based reconfigurable multiplexer bank (CRMUX) [11]. The RAM-based architectures do not serve as a novel tool for implementation of complicated FSM structures such as parallel hierarchical finite state machines (PHFSM) [12] or reversible FSM [13]. Due to significant advantages of FSMIM-S architecture over other architectures, it is used to create a framework for the high-speed Reconfigurable FSMIM-S architecture.

The Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture [10] and an optimized multiplexer bank (which defines the mode of operation). For this, the descriptions of a set of FSMs are taken for a particular application. Hence, the problem is to obtain the optimized multiplexer bank for the given set of FSMs. It can be solved by mapping all the FSMs into one large FSM (called base_ckt) in that set. The objective of this process is to perform optimal matching between base_ckt and the other FSMs in the set so that a minimum number of bits are changed by changing the mode of operation. This situation (i.e., performing one-to-one mapping) transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set to base_ckt with minimal cost [14]. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. In this algorithm, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix. A cost matrix reduction based technique, that is, Hungarian algorithm [15, 16], is used for matching. A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search. At every iteration, descriptions of two FSMs (i.e., base_ckt and one of the FSMs in the set) are taken as inputs. It produces the modified descriptions of the FSMs of the same dimension as outputs. At the end of the algorithm, a mutual XOR operation is performed among the modified descriptions, which provides the required optimized multiplexer bank.

The experimental results from MCNC FSM benchmarks illustrate the advantages of the proposed architecture as compared with VRMUX [11], as operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation. It also shows that the operating speed is improved at an average of 9.14% in comparison with CRMUX [11] during FPGA implementation. The limitation of the proposed technique is the requirement of higher LUTs, as it requires an average of 88.65% more LUTs in comparison with CRMUX [11] during FPGA implementation.

The rest of the paper is outlined as follows. Section 2 consists of the Reconfigurable FSMIM-S architecture and the proposed iterative greedy heuristic based Hungarian algorithm. The experimental evaluation of the proposed algorithm, implementation of the Reconfigurable FSMIM-S architecture, and comparison with other proposals from the literature are presented in Section 3. The concluding remarks are devised in Section 4.

2. Proposed Method

As most of the FPGA platforms use synchronous EMBs, Mealy machines with synchronous outputs are used in this paper. Let a Mealy FSM be described by the following columns: [a.sub.m] is a code of current state ([a.sub.m] [member of] A, where A = {[a.sub.1], ..., [a.sub.M]} is a set of states); K([a.sub.m]) is a code of state [a.sub.m] [member of] A; h is the number of transitions per state (h [member of] H, where H = {[t.sub.1], ..., [t.sub.M]} is a set of number of transitions per state corresponding to A); as is a state of transition (the next state); K([a.sub.s]) is a code of state [a.sub.s] [member of] A; X = {[x.sub.1], ..., [x.sub.L]} is the set of input variables, Y = {[y.sub.1], ..., [y.sub.N]} is the set of output variables; and D = {[d.sub.1], ..., [d.sub.R]} is defined as excitation functions for the flip-flops, where R is the number of flip-flops (i.e., the number of bits in internal state codes), R [member of] {[??][log.sub.2]M[??], M}.

The descriptions ofaset of FSMs are taken for a particular application. The fundamental idea is to obtain the description of a single FSM by mapping all the FSMs into one large FSM (called base_ckt) in that set. The inputs, states, and outputs of an FSM in the set are mapped into base_ckt in their respective order. The mode bits are applied through a2 x 1 multiplexer in those positions where the polarity of bit differs (i.e., 1 in place 0 and vice versa) to perform such mapping. Hence, the resultant FSM operates in two modes, where base_ckt mode is the default mode of operation. Similarly, all other FSMs in the set are mapped into base_ckt. In this way, a single FSM (i.e., base_ckt) combined with a multiplexer bank (which defines the mode of operation) acts as reconfigurable FSM. It can be configured into a particular FSM in the set by applying the specific mode bits. Due to numerous advantages mentioned in the literature, FSMIM-S architecture [10] is chosen to implement the FSM (i.e., base_ckt) part. Therefore, the Reconfigurable FSMIM-S architecture is constructed by combining the conventional FSMIM-S architecture [10] and multiplexer bank for mode based reconfiguration as shown in Figure 1.

It encounters the following two major difficulties:

(i) The complexity of the resultant multiplexer bank is very high.

(ii) It becomes difficult to define the dummy states and dummy transitions. Dummy states and dummy transitions are such states and transitions which are not present in base_ckt but exist in the other FSMs in the set and vice versa. These states and transitions lead the system to failure.

As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed. In this algorithm, the descriptions of a set of FSMs (i.e., [h, X, [a.sub.m], [a.sub.s], Y]) are taken as inputs. It provides the optimized multiplexer bank for mode based reconfiguration as output. It also provides the updated description (i.e., description without dummy states and dummy transitions) of base_ckt, which is used to construct the conventional FSMIM-S part of the proposed architecture. Let (B + 1) be the set of FSMs for a particular application. Based on the complexity of the description of FSM, the largest FSM is selected from the set. It is called base_ckt. The rest of the FSMs are called recon_ckt_1, recon_ckt_2, ..., recon_ckt_B, respectively.

Each input, state, or output of a recon_ckt_b [member of] {recon_ckt_ 1, recon_ckt_2, ..., recon_ckt_B} can be mapped into any one of the inputs, states, or outputs, respectively, of base_ckt; that is, there exists a one-to-one mapping. These mappings cannot be performed independently because inputs, states, and outputs of an FSM are interdependent. Consequently, mapping an input or state of recon_ckt_b into base_ckt is transformed into a weighted bipartite graph matching problem or linear assignment problem (LAP) [14] as shown in Figure 2. In this LAP, the weights are assigned based on the input combinations, state code, and the output combinations to form a cost matrix. The objective of this process is to perform matching with a minimal cost so that a minimum number of bits are changed by changing the mode of operation. Therefore, the complexity of the multiplexer bank is reduced.

In the literature, the following approaches are proposed to solve a LAP:

(i) Modified Hungarian algorithm [16]

(ii) Simple greedy heuristic based algorithm [17]

(iii) Evolutionary heuristic algorithm [18].

The maximum number of inputs or states does not exceed 100 in MCNC FSM benchmarks or FSMs used in real-world applications. So, the number of vertices used in the resultant weighted bipartite graph is always low which results in small LAP. But, the number of LAPs formed in this process is enormous because input matching and state matching are performed together as shown in Figure 2. Hence, the primary requirement of the algorithm to solve LAP becomes the fast convergence. Therefore, a cost matrix reduction based technique, that is, Hungarian algorithm [15, 16], is used for matching. A greedy based heuristic (GBH) search technique [17] is combined with the Hungarian algorithm to optimize the augmenting path search. The pseudocode of this technique is summarized in Algorithm 1. (Note: subscripts "base" and "recon" denote the parameters of base_ckt and recon_ckt, respectively, throughout the paper.)

At every iteration [member of] {1, ..., B}, descriptions of two FSMs, that is, base_ckt and recon_ckt_b, are taken as inputs. The major contributing factors for power consumption and LUT requirement in FSM are the number of inputs and the internal states [8, 19]. In any FSM, input variable and states are interdependent. Thus, input and state matching are performed together between base_ckt and recon_ckt.

If L_base [greater than or equal to] L_recon, then E = [sup.L-base] [P.sub.L_recon] combinations of input lines for base_ckt are generated to match with input lines of recon_ckt_b. (L_base - L_recon) input lines act as don't cares while the system operates in recon_ckt_b mode. Otherwise, E = [sup.L-recon][P.sub.L_base] combinations of input lines for recon_ckt_b are generated to match with input lines of base_ckt. In this case, (L_recon - L_base) input lines act as don't cares while the system operates in base_ckt mode.

Now, for each combination of input lines, state matching is performed (Algorithm 2). This situation can be seen as a LAP where the objective is to match the states of recon_ckt_b to the states of base_ckt with minimal cost [14,17]. For this, the number of states in both the FSMs is equalized. Thus, if M_base [greater than or equal to] M_recon, then ([alpha] - M_recon, where [alpha] = M_base) dummy states are added in recon_ckt_b. Otherwise ([alpha] - M_base, where [alpha] = M_recon) dummy states are added in base_ckt.

All LAP solving algorithms require a cost matrix as an input to perform an optimal assignment. So, to form a cost matrix for this problem, a procedure named weight_assignment is proposed.

In this procedure, the combinations of input lines, [a.sub.m] and h, for base_ckt and recon_ckt_b are taken as inputs. It provides the cost matrix to map recon_ckt_b states into base.ckt states. An array is created at each transition in both base_ckt and recon_ckt_b by combining [input_combination [member of] {[x.sub.1], [x.sub.2], ..., [x.sub.L]}, [a.sub.m]].

The basic idea that has been implemented is as follows: (i) replace the recon_ckt_b state with the base_ckt state sequentially in the recon_ckt_array; (ii) evaluate the weight by performing Bitwise-XOR operation (i.e., transition matching) for that particular replacement; (iii) then, construct the cost matrix.

For each transition in recon_ckt_array (i.e., [h.sub.recon] [member of] {1, 2, ..., [t.sub.m_recon]})> transition matching is performed. This situation can be seen as a LAP where the objective is to match the transition of recon_ckt_b to the transition ofbase_ckt with minimal cost [14,17]. For this, the number of transitions for the particular state is equalized in both the FSMs. Therefore, if [t.sub.m-base] [greater than or equal to] [t.sub.m_recon], then ([beta] - [t.sub.m_recon], where [beta] = [t.sub.m-base]) dummy transitions are added in the recon_ckt_array. Otherwise ([beta] - [t.sub.m_base], where [beta] = [t.sub.m_base]) dummy transitions are added in the base_ckt_array. Thus, for each transition in base_ckt_array (i.e., [h.sub.base] [member of] {1, 2, ..., [t.sub.m_base]}), a Bitwise-XOR operation is performed between the arrays for that particular transition. The total number of Is in the BitwiseXOR operations is counted to create a cost matrix for transition matching. Then, optimal assignment of transitions is performed by greedy based heuristic Hungarian algorithm (GBH_hungarian_algorithm) between base_ckt_array and recon_ckt_array. Let match_count be a variable defined as

match_count = [[beta].summation over (i=1)] [[beta].summation over (j=1)] [C.sub.ij] [[lambda].sub.ij]

where, [C.sub.ij] [left arrow] cost matrix, [[lambda].sub.ij] [left arrow] decision variable. (1)
Input. The descriptions of the FSMs (i.e. [h, X, [a.sub.m], [a.sub.s],
Y])
Output. The optimized multiplexer bank for mode based reconfiguration
begin
select the largest FSM from the set based on the description;
base_ckt [left arrow] largest FSM;
recon_ckt_1, recon_ckt_2, ..., recon_ckt_B [left arrow] rest of the
FSMs in the set; for each recon_ckt_b [member of] {recon_ckt_ 1,
recon_ckt_2, ..., recon_ckt_B} do
       if (L_base [greater than or equal to] L_recon) then /*
       performing the input matching */
              generate, E [left arrow] [sup.L/base] [P.sub.L_recon]
              combinations_of_input_lines for base.ckt to match with
              input lines of recon_ckt_b; go to state_matching; /*
              calling the function/"state_matching" */
       else if (L_base < L_recon) then
              generate, [left arrow] [sup.L-base] [P.sub.L_recon]
              combinations_of_inputjines for recon_ckt_b to match with
              input lines of base_ckt;
              go to state_matching; /* calling the function/
              "state_matching" */
       end
       select combinations_of_input_lines with
       min{assignment_[cost.sub.1], ...,
       assignment_[cost.sub.E]}
       &min{total_[cost.sub.1], total_[cost.sub.2], ...,
       total_[cost.sub.E]};
       perform binary state assignment in base.ckt &
       recon_ckt_b i.e. apply K([a.sub.m]) & K([a.sub.s]);
       weight_assignment(); /*creating arrays by
       [selected_input_combination, K([a.sub.s]), K([a.sub.m])]
       */go to dummy .replacement; /* calling the function/
       "dummy .replacement" */
       if (N_base [greater than or equal to] N_recon) then /*
       performing the output matching */
              generate, G [left arrow] [sup.N_base][P.sub.N_recon]
              combinations_of_output_lines for base.ckt to match with
              output lines of recon_ckt_b;
              go to output_matching; /* calling the function/
              "output_matching" */
       else if (N_base < N_recon) then
              generate, G [left arrow] [sup.N_base][P.sub.N_recon]
              combinations_of_output_lines for recon_ckt_b to match
              with output lines of base.ckt;
              go to output_matching; /* calling the function/
              "output_matching" */
       end
       select combinations_of_outputjines with
       min{XOR_[count.sub.1], ..., XOR_[count.sub.G]}; update
       the description of base_ckt;
end
for each recon_ckt_b [member of] {recon_ckt_ 1, recon_ckt_2, ...,
recon_ckt_(B--1)} do
       go to dummy .replacement; /* calling the function/
       "dummy .replacement" */update the description of
       recon_ckt_b;
end
perform a mutual (i.e. [sup.B][C.sub.2]) Bitwise-XOR operations
between the updated descriptions of FSMs; obtain the optimized
multiplexer bank for mode based reconfiguration;
end

Algorithm 1: Iterative greedy heuristic based Hungarian algorithm.


In this way, by using match_count (from (1)), the cost matrix formation to map recon_ckt_fr states into base_ckt states is completed. The pseudocode of the procedure, weight_assignment, is summarized in Algorithm 5.

Let V and U represent the set of vertices (i.e., transitions or states) for recon_ckt and base_ckt, respectively. [mu] = (V U U, [xi]) is defined as a balanced weighted bipartite graph, where [absolute value of V] = [absolute value of U] = [eta]. C is the cost matrix. A number [C.sub.ij] [greater than or equal to] 0 for each edge [i, j] [member of] [xi] is called the cost (or weight) of the edge [i, j].

In GBH_hungarian_algorithm, the cost matrix C is taken as input. It provides an optimal assignment between V and U as output. GBH in [17] is an iterative cost matrix reduction based approach to solve the LAP. At each iteration, a single vertex is eliminated from either V or U until the advent of some stopping conditions. Let k be the last iteration (whereas k is a positive integer). Therefore, either k or (k - 1) vertices are eliminated from [mu] at the last iteration.

Let [v.sub.k] [subset not equal to] V and [u.sub.k] [subset not equal to] U be the subsets of the remaining vertices in V and U, respectively, at iteration k. At the first iteration, that is, k = 1, [v.sub.1] = V, and [u.sub.1] = U, respectively, the objective of the LAP is to assign [eta] resources to [eta] tasks in such a way that optimal total cost should be obtained for the assignment. The LAP can be mathematically formulated as follows:

[mathematical expression not reproducible]; (2)

s.t. [[eta].summation over (j=1)] [[lambda].sub.ij], = i, (3)

where [for all]i = 1, ..., [eta];

[[eta].summation over (j=1)] [[lambda].sub.ij], = 1, (4)

where [for all]j = 1, ..., [eta];

[[lambda].sub.ij] [member of] {0, 1}, (5)

where [for all]i, j = 1, ..., [eta].

Equation (2) represents the objective function for LAP. If Resource i is allocated to task j then the decision variable [[lambda].sub.ij] = 1 and 0 otherwise as depicted in (5). One-to-one mapping should be practiced between resources and tasks. Equations (3) and (4) ensure these criteria.
Input. Combinations_of_input_lines, the descriptions of base.ckt &
recon_ckt_b (i.e. [h, X, [a.sub.m], [a.sub.s], Y])
Output. Assignment_[cost.sub.e], total_[cost.sub.e], modified
description of recon_ckt_b
begin
for all (combinations_of_input_lines) do
       if (M_base [greater than or equal to] M_recon) then /*
       equating the number of states in both the FSMs */
              add (M_base--M_recon) dummy states in recon_ckt_b;
              [alpha] [left arrow] M_base;
       else if (M_base < M_recon) then
              add (M_recon--M_base) dummy states in base_ckt;
              [alpha] [left arrow] M_recon;
       end
       weight_assignment(); /* calling the procedure/
       "weight_assignment" */
       GBH_hungarian_algorithm(); /* performing present state
       matching */
              assignment_[cost.sub.e] [left arrow]
              [[summation].sup.[alpha].sub.i=1]
              [[summation].sup.[alpha].sub.j=1] [C.sub.ij] x
              [[lambda].sub.ij]; where e [member of] {1, 2, ..., E};
              total_[cost.sub.e] [left arrow]
              [[summation].sup.[alpha].sub.i=1]
              [[summation].sup.[alpha].sub.j=1] [C.sub.ij]; where e
              [member of] {1, 2, ..., E};
       replace all states of recon_ckt_b by their corresponding
       states of base_ckt obtained from
       GBH_hungarian_algorithm;
       corresponding to [a.sub.m_base] order, arrange all the
       complete arrays of recon_ckt_b;
end
end

Algorithm 2: State_matching.


At each iteration, there are two options to eliminate a vertex, that is, from either V or U. For each i [member of] [v.sub.k] and j [member of] [u.sub.k], the following parameters are defined to select one of the above options:

[mathematical expression not reproducible], (6)

[mathematical expression not reproducible]; (7)

[mathematical expression not reproducible]. (8)

In (6), [C.sup.V.sub.k] and [C.sup.U.sub.k] can act as "potential cost contribution" [17] of vertices i [member of] [v.sub.k] and j [member of] [v.sub.k] to [f.sub.k] in (2). Thus, the potential cost contribution is evaluated for the vertices, and if it exceeds the corresponding removal cost, then such vertices are eliminated.

If [C.sup.V.sub.k] [less than or equal to] [C.sup.U.sub.k], then an attempt is made to remove one of the vertices from [v.sub.k] [subset not equal to] V. From (7), if [C.sup.V.sub.k] [less than or equal to] [C.sup.U.sub.k], that is, the objective function value is improved by eliminating [i.sub.k], then [v.sub.k+1] is set to [v.sub.k] and the next iteration is executed.

Otherwise, one of the vertices from [v.sub.k] [subset not equal to] U is eliminated. From (8), if [f.sup.U.sub.k] [less than or equal to] [f.sub.k-1], that is, the objective function value is improved by eliminating [j.sub.k], then [u.sub.k+1] is set to [u.sub.k] and the next iteration is executed.

In this case, when the objective function value is not improved by eliminating either [i.sub.k] or [j.sub.k], then algorithm halts and the obtained solution is [f.sub.k-1]. Furthermore, if [C.sup.V.sub.k] > [C.sup.U.sub.k], then the above steps are repeated in the opposite order. The pseudocode of this approach is devised in Algorithm 6.

Therefore, after obtaining the cost matrix from weight_assignment for state matching, GBH_hungarian_ algorithm is applied to obtain the following parameters:

[mathematical expression not reproducible]. (9)
Input. The descriptions of base.ckt & recon_ckt_b (i.e. [h, X,
[a.sub.m], [a.sub.s], Y])
Output. The descriptions base_ckt & recon_ckt_b without dummy states
and dummy transitions.
begin
if (M_base [greater than or equal to] M_recon) then /* replacing the
dummy states */
       replace dummy states in recon_ckt_b by Proposition 1; /* by
       considering states in place of transitions in the modified cost
       matrix */
else if (M_base < M_recon) then
       replace dummy states in base.ckt by by Proposition 2;
end
for each (matched state, am recon e recon_ckt_b) do /* replacing the
dummy transitions */
       for each (transition in recon_ckt_b, hrecon [member of] {1, 2,
        ..., [t.sub.m_recon]}) do
              if ([t.sub.m_base] > [t.sub.m_recon]) then
                     replace dummy transitions in recon_ckt_b by
                     Proposition 1;
              else if ([t.sub.m_base] < [t.sub.m_recon]) then
                     replace dummy transitions in base.ckt by
                     Proposition 1;
              end
       end
end
end

Algorithm 3: Dummy .replacement.


Thus, all the recon_ckt_b states are replaced by their assigned base_ckt states, and all the complete arrays of recon_ckt_b are arranged corresponding to [a.sub.m_base] order. Hence, from (9), the combinations of input lines are selected with min{assignment_[cost.sub.1], ..., assignment_[cost.sub.E]} & min{totaLcost1, total_[cost.sub.2], ..., total_[cost.sub.g]}.

Now, binary state codes K([a.sub.m]) and K([a.sub.s]) are applied in base_ckt and recon_ckt_b. As it changes the weights of cost matrix, weight_assignment is again applied to construct a modified cost matrix. In this case, arrays are created by combining [selected_input_combination, K([a.sub.s]), K([a.sub.m])]. Dummy states are replaced in matched states of base_ckt and recon_ckt_b by using Propositions 1 and 2. Then, dummy transitions are replaced by using Proposition 1. The dummy replacement algorithm is shown in Algorithm 3.

Proposition 1. Dummy transitions in a matched state of base.ckt or recon_ckt_b should be replaced with one of the existing transitions in that particular state with a minimum cost.

Proof. For each matched state (or assigned state after matching) [member of] recon_ckt_b if ([t.sub.m_base] [greater than or equal to] [t.sub.m_recon]) then ([t.sub.m_base] - [t.sub.m_recon]) dummy transitions are present in recon_ckt_b state.

Hence, there are ([t.sub.m_base] - [t.sub.m_recon]) transitions, present in the corresponding state of base_ckt which are unassigned. These unassigned transitions in base_ckt will lead the system to failure while operating in recon_ckt_b mode. As a solution, these unassigned transitions of base_ckt are assigned to the existing transitions of recon_ckt_b with the least cost by looking at the particular column of the modified cost matrix.

Similarly, for each matched state (or assigned state after matching) [member of] recon_ckt_b, if ([t.sub.m_base] < [t.sub.m_recon]) then ([t.sub.m_recon] - [t.sub.m_base]) dummy transitions are present in base_ckt state. Hence, there are ([t.sub.m_recon] - [t.sub.m_base]) transitions, present in the corresponding state of recon_ckt_b which are unassigned. These unassigned transitions in recon_ckt_b will lead the system to failure while operating in base_ckt mode. As a solution, these unassigned transitions of recon_ckt_b are assigned to the existing transitions of base_ckt with the least cost by looking at the particular row of the modified cost matrix.

Let M.[C.sub.ixj] represent the modified cost matrix for a matched state, where rows ([U.sub.i]) and columns ([V.sub.j]) denote the base_ckt and recon_ckt_b transitions, respectively. Thus, the unassigned transitions in base_ckt state can be assigned by (10) as follows:

Unassigned_[U.sub.i] [right arrow] [V.sub.j]: min (M_[C.sub.1j], M_[C.sub.2j], M_[C.sub.3j], ..., M_[C.sub.ij]). (10)

Similarly, the unassigned transitions in recon_ckt_b state can be assigned by (11) as follows:

Unassigned_[V.sub.j] [right arrow] [U.sub.j]}: min (M_[C.sub.i1], M_[C.sub.i2], M_[C.sub.i3], ..., M_[C.sub.ij]). (11)

Proposition 2. If M_base < M_recon, then dummy states are replaced by splitting the matched state in base-ckt.

Proof. In FSM, splitting a state with high transitions results in low power consumption [8,19]. It also improves the operating speed [2,20]. IfM_base > M_recon, then there are (M_base - M_recon) states, present in base_ckt which are unassigned. These unassigned states in base_ckt will lead to failure in the system while operating in recon_ckt_b mode. As base_ckt is the largest FSM in the collection and its transitions per state are greater than recon_ckt_b, splitting recon_ckt_b states are insignificant for the system performance. So, these unassigned states of base_ckt are assigned using Proposition 1. ?

If M_base < M_recon, then (M_recon - M_base) dummy states are replaced by splitting the matched state in base_ckt. Let [PSI]([a.sub.m_base]) = Q([t.sub.m_base] - [a.sub.m_recon]), where Q is a positive integer. Only the states for which [absolute value of [PSI]([a.suub.m_base]) > 1] can be split [19]. Each state can be split into nonoverlapping subsets of ([t.sub.m_base] - [t.sub.m_recon]) transitions. Algorithm 7 is proposed to split abase_ckt state.

At this stage, the states and the input lines of both the FSMs are completely matched and fixed. Hence, the output matching is performed by performing a Bitwise-XOR operation and selecting the combination with the least count of 1's. If N-base [greater than or equal to] N_recon, then G = [sup.N_recon] [P.sub.N_base] combinations of output lines for base_ckt are generated to match with output lines of recon_ckt_b. Otherwise, G = [sup.N_recon] [P.sub.N_base] base combinations of output lines for recon_ckt_b are generated to match with output lines of base_ckt. Then, for each combination of output lines, Bitwise-XOR operation is performed between corresponding output lines of base_ckt and recon_ckt_b. Let XOR_[count.sub.g], where g [member of] {1, 2, ..., G} represents the total number of 1's in the Bitwise-XOR operation for a particular combination of output lines. Therefore, the combinations of output lines with min{XOR_[count.sub.1], ..., XOR_[count.sub.G]} are selected.

At the end of every iteration, the description of base_ckt is updated to operate on the next iteration. At the end of Bth iteration, for each recon_ckt_k [member of] {recon_ckt_1, recon_ckt_2, ..., recon_ckt_(B - 1)}, replacement of dummy transitions and states is performed and updated descriptions of recon_ckt_1, recon_ckt_2, ..., recon_ckt_(B - 1) are obtained. In this way, descriptions of all FSMs are optimally matched, having the same dimension. Therefore, a mutual (i.e., [sup.B][C.sub.2]) Bitwise-XOR operation between the updated descriptions of FSMs is conducted which provides the optimized multiplexer bank for mode based reconfiguration.

3. Experimental Evaluation

Experiments have been conducted to illustrate the advantages of the proposed architecture using the FSM benchmark circuits from MCNC/LGSynth [21] as shown in Table 1.

The proposed iterative greedy heuristic based Hungarian algorithm has been implemented in MATLAB (2016b) environment. MATLAB HDL Coder tool is used to generate the Verilog HDL code for multiplexer bank for mode based reconfiguration. The Reconfigurable FSMIM-S architecture is described in Verilog HDL and implemented on a Xilinx xc6vlx75t Speed Grade-3 device (Virtex-6) by using Xilinx ISE 14.6 [15]. All computations are performed using a computer with an Intel(R)Core(TM)i5,8 GB RAM, and 2.67 GHz CPU.

Let [x.sub.1], [x.sub.2], [x.sub.3], ... be the input lines, [y.sub.1], [y.sub.2], [y.sub.1], ... be the output lines, and S1,S2, S3, ... be the states of an FSM. In the proposed algorithm, at the first stage, input matching is performed along with the state matching; after that, dummy states and transitions are replaced. Then, output matching is performed (Algorithm 4).

As the number of inputs or outputs exceeds 8, it requires the generation of more than [sup.8][P.sub.8] = 40320 combinations for matching, which exhausts the simulation resources. Hence, the excess input lines are discarded from input matching, which contains the maximum number of don't cares out of the total number of transitions. Similarly, the excess output lines are discarded from output matching, which contains the minimum number of 1's out of the total number of transitions. Therefore, the complexities of input selector bank and group encoder are reduced because the information content of these lines is minimum.

The FSM "s1494" has been considered as base_ckt, because it consists of 48 states, 8 inputs, 19 outputs, and 250 transitions which are of higher values as compared with any of the FSMs in the collection. Hence, "s1494" is considered as an FSM included in the design at the 0th iteration. In this case, state splitting is never used for dummy state replacement, because base_ckt contains the highest number of states. All dummy states and transitions are replaced by using Proposition 1. For output match [y.sub.1], [y.sub.2], [y.sub.3], [y.sub.4], [y.sub.5], [y.sub.6], [y.sub.7], [y.sub.9], [y.sub.10], [y.sub.16] and [y.sub.18] are discarded because they contain 62,12,13, 8, 6,6, 38,4,16,46, and 78 instances of 1's, respectively, out of a total of 250 transitions.

In the 1st iteration, an FSM, "sand," is included in the design. For input matching, [x.sub.6], [x.sub.7], and [x.sub.9] are discarded because they contain 178,150, and 182 don't cares, respectively, out of a total of 184 transitions. All states are matched with the states of base_ckt in respective order. For output matching, y8 is discarded because it contains 3 instances of 1's, out of a total of 253 transitions.
Input. Combinations_of_output_lines
Output. XOR_[count.sub.g]
begin
       for all (combinations_of_output_lines) do
       perform Bitwise-XOR operation between corresponding
       output lines of base.ckt & recon_ckt_ b;
       XOR_[count.sub.g] [less than or equal to] the total
       number of 1's in the Bitwise-XOR operation;
              where, g [member of] {1, 2, ..., G};
end
end

Algorithm 4: Output_matching.


In the 2nd iteration, an FSM, "styr," is included in the design. For input matching, [x.sub.9] is discarded because it contains 160 don't cares, out of a total of 166 transitions. States S3, S4, S15, and S16 are matched with S4, S3, S16, and S15, respectively, of base_ckt. The rest of the states are matched with the states of base_ckt in respective order. For output matching, [y.sub.4] and [y.sub.9] are discarded because they contain 5 and 6 instances of 1's, respectively, out of a total of 254 transitions.

In the 3rd iteration, an FSM, "planet," is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.4], [y.sub.10], [y.sub.11], [y.sub.12], [y.sub.13], [y.sub.14], [y.sub.15], [y.sub.16], [y.sub.17], [y.sub.18] and [y.sub.19] are discarded because they contain 19, 5, 2, 26, 13, 3, 4, 2, 4, 4, and 23 instances of 1's, respectively, out of a total of 255 transitions.

In the 4th iteration, an FSM, "s832," is included in the design. For input matching, [x.sub.1], [x.sub.2], [x.sub.3], [x.sub.4], [x.sub.6], [x.sub.6], [x.sub.11], [x.sub.12], [x.sub.13], [x.sub.14], and [x.sub.15] are discarded because they contain 240, 228, 239, 234, 241, 224, 230, 235, 241, and 241 don't cares, respectively, out of a total of 245 transitions. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.2], [y.sub.4], [y.sub.6], [y.sub.9], [y.sub.10], [y.sub.12], [y.sub.13], [y.sub.14], [y.sub.16], [y.sub.17] and [y.sub.18] are discarded because they contain 3, 5, 2, 2, 2, 6, 2, 4, 2, 4, and 6 instances of 1's, respectively, out of a total of 259 transitions.

In the 5th iteration, an FSM, "cse," is included in the design. All states of "cse" are matched with the states of base_ckt in respective order. In the 6th iteration, an FSM, "s386," is included in the design. All states of "s386" are matched with the states of base_ckt in respective order. In the 7th iteration, an FSM, "ex6," is included in the design. All states of "ex6" are matched with the states of base_ckt in respective order. In the 8th iteration, an FSM, "mc," is included in the design. All states of "mc" are matched with the states of base_ckt in respective order.

In the 9th iteration, an FSM, "planet1," is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.4], [y.sub.10], [y.sub.11], [y.sub.12], [y.sub.13], [y.sub.14], [y.sub.15], [y.sub.16], [y.sub.17], [y.sub.18] and [y.sub.19] are discarded because they contain 19, 5, 2, 26, 13, 3, 4, 2, 4, 4, and 23 instances of 1's, respectively, out of a total of 279 transitions.

In the 10th iteration, an FSM, "s1488," is included in the design. All states are matched with the states of base_ckt in respective order. For output matching, [y.sub.1], [y.sub.2], [y.sub.3], [y.sub.4], [y.sub.5], [y.sub.6], [y.sub.8], [y.sub.10], [y.sub.11], [y.sub.14] and [y.sub.17] are discarded because they contain 6, 7, 4, 13, 6, 38, 10, 16, 42, 61, and 64 instances of 1's, respectively, out of a total of 281 transitions.

In the 11th iteration, an FSM, "s208," is included in the design. For input matching, [x.sub.3], [x.sub.4], and [x.sub.5] are discarded because they contain 153,153, and 153 don't cares, respectively, out of a total of 153 transitions. All states are matched with the states of base_ckt in respective order. The input matching and output matching among FSMs are shown in Tables 2 and 3, respectively, along with the minimum assignment_cost, totaLcost, and XOR_count (as defined in Algorithm 1).

At the end of the proposed algorithm, optimized multiplexer bank for mode based reconfiguration is formed by performing a mutual Bitwise-XOR operation between the updated descriptions of FSMs. Hence, to evaluate the individual hardware contribution of FSMs in the Reconfigurable FSMIM-S, architecture is evaluated as follows: (i) the Bitwise-XOR operation is performed iteratively between the updated description of base_ckt and the FSM included in that particular iteration. (ii) The Verilog HDL code for the required multiplexer bank is generated at every iteration to implement in Xilinx xc6vlx75t-3 device. (iii) Thus, the number of LUTs occupied by the particular FSM is measured by the difference between the numbers of LUTs in the current iteration and previous iteration of the system. The iterative implementation of the Reconfigurable FSMIM-S architecture on Virtex-6 is shown in Table 4. Therefore, at the last iteration, the total number of LUTs required and the average speed of operation are obtained for the proposed architecture.

The experimental results from MCNC FSM benchmarks illustrate the advantages of the proposed architecture as compared with VRMUX [11]. As a result, operating speed is enhanced at an average of 30.43%, and LUT consumption is reduced by an average of 5.16% in FPGA implementation. It also shows that the operating speed is improved at an average of 9.14% in comparison with CRMUX [11] during FPGA implementation. The limitation of the proposed technique is the requirement of higher LUTs, as it requires an average of 88.65% more LUTs in comparison with CRMUX [11] during FPGA implementation. The comparisons of hardware requirement and operating speed are presented in Figures 3 and 4, respectively.
Input. Combination_of_input Jines for base.ckt, [a.sub.m_base],
       [h.sub.base], combination_of_input Jines for recon.ckt b,
       [a.sub.m_recon], [h.sub.recon];
Output. Cost matrix C /* cost matrix formation to map recon.ckt Jj
states into base_ckt states */
begin

base_ckt_array [left arrow] create an array at each transition in
base_ckt by combining
              [input.combination [member of] {[x.sub.1_base],
               ..., [x.sub.L_base]}, [a.sub.m_base]];
recon_ckt_array [left arrow] create an array at each transition in
recon_ckt_fo by combining
              [input.combination [member of] {[x.sub.1_recon],
               ..., [x.sub.L_recon]}, [a.sub.m_recon]];
for each (state, [a.sub.m_recon] [member of] recon_ckt_array) do

       replace [a.sub.1_base] [left arrrow] [a.sub.1_recon];
       1 go to transition matching; /*calling the function/
       "transition matching"*/
       C[l, 1] [left arrow] match_count;

       replace [a.sub.2_base] [left arrrow] [a.sub.1_recon];
       go to transition matching; /*calling the function/"transition
       matching"*/
       C[l, 2] [left arrow] match count;

       replace [a.sub.M_base] [left arrrow] [a.sub.1_recon];
       go to transition matching; /* calling the function/"transition
       matching" * /
       C[l, M_base] [left arrow] match count;

       *** transition matching ***

       for each (transition in recon ckt array, [h.sub.recon] [member
       of] [1, 2, ..., [t.sub.m_recon]}) do
              if ([t.sub.m_base] [greater than or equal to]
              [t.sub.m_recon]) then
                            add ([t.sub.m_base] - [t.sub.m_recon])
                            dummy transitions in the recon_ckt_array;
                            [beta] [left arrow] [t.sub.m_base];
              else if ([t.sub.m_base] < [t.sub.m_recon]) then
                            add ([t.sub.m_recon] - [t.sub.m_base])
                            dummy transitions in the recon_ckt_array;
                            [beta] [left arrow] [t.sub.m_recon];
              end
              for each (transition in base ckt array, [h.sub.base]
              [member of] {1, 2, ..., [t.sub.m_base}) do
                     perform Bitwise-XOR operation between the arrays
                     for that particular transition;
                     C[[h.sub.recon], [h.sub.base]] [left arrow] the
                     total number of l's in the Bitwise-XOR operation;
              end
       end

       GBH hungarian algorithm(); /*calling the procedure/"GBH
       hungarian algorithm"*/

       arrange the reconcktarrays based on assignment obtained from

       GBHhungarianalgorithm;

       matchcount [left arrow] [[summation].sup.[beta].sub.i=1]
       [[summation].sup.[beta].sub.j=1] [C.sub.ij] [[lambda].sub.ij];

end
end

Algorithm 5: Weight-assignment.


The operating speed of the proposed system is maximum (i.e., 810.17 MHz) and its LUT requirement is minimum (i.e., 42 LUTs) in the 0th iteration. The operating speed is reduced, and the LUT requirement is increased successively by adding an FSM at each iteration as shown in Table 4. Therefore, the proposed architecture acts as an ideal candidate for such applications where the similarity between the sets of FSMs is high (i.e., fewer differences in their descriptions). Many FPGA families such as Altera stratix-IV, stratix-V, or MAXII do not contain RAM blocks, and hence CRMUX cannot be used. The proposed architecture is preferred in such cases.
Input. Cost matrix C
Output. Optimal assignment between V and U
begin
[v.sub.1] [left arrow] V, [u.sub.1] [left arrow] U, [f.sub.0] [left
arrow] [phi]([v.sub.1], [u.sub.1]) and k [left
arrow] 1 /* Initialization */
while [mathematical expression not reproducible];
       if [C.sup.V.sub.k] [less than or equal to] [C.sup.U.sub.k] then
       /* performing vertex elimination from set V */
              if [f.sup.V.sub.k] [less than or equal to] [f.sub.k/1]
              then /* identify whether vertex elimination from V is
              "profitable" */
                     [f.sub.k] [less than or equal to]
                     [f.sup.V.sub.k], [v.sub.k+1] [left arrow]
                     [u.sub.k+1] [left arrow] [u.sub.k];
              else if [f.sup.U.sub.k] [less than or equal to]
              [f.sub.k/1] then /* identify whether vertex elimination
              from U is "profitable" */
                     [f.sub.k] [left arrow] [f.sup.U.sub.k],
                     [v.sub.k+1], [u.sub.k+1] [left arrow] [u.sub.k];
              end
       else if [C.sup.V.sub.k] > [C.sup.U.sub.k] then /* performing
       vertex elimination from set U */
              if [f.sup.U.sub.k] [less than or equal to] [f.sub.k/1]
              then /* identify whether vertex elimination from U is
              "profitable" */
                     [f.sub.k] [left arrow] [f.sup.U.sub.k],
                     [v.sub.k+1] [left arrow] [v.sub.k], [u.sub.k+1]
                     [left arrow] [u.sub.k];
       else if [f.sup.V.sub.k] [less than or equal to] [f.sub.k/1]
       then /* identify whether vertex elimination from V is
       "profitable" */
                     [f.sub.k] [left arrow] [f.sup.V.sub.k],
                     [v.sub.k+1] [left arrow] [v.sub.k], [u.sub.k+1]
                     [left arrow] [u.sub.k]

              end
       end
k [left arrow] k + 1;
end
end

Algorithm 6: GBH_hungarian_algorithm.

Input. [a.sub.m_base], M-base, M_recon, h
Output. Resultant states to replace dummy states in base_ckt
begin
while ((matched state, [a.sub.m_base] 6 base.ckt) && (M_base [greater
than or equal to] M_recon)) do
       for each (transition in base_ckt, [h.sub.base] 6 {1, 2, ...,
       [t.sub.m_base]}) do
              if ([t.sub.m_base] -[t.sub.m_base]) [greater than or
              equal to] 1 then
                     split the state, |[PSI]([a.sub.m_base]) = Q > 1|;
              end
       end
       m_base [left arrow] m_base + 1;
end
if (M_base > M_recon) then
       replace dummy states in recon_ckt_fo by Proposition 1;
end
end

Algorithm 7: Base.ckt state splitting.


Moreover, in the proposed algorithm, the next state function is partially included in matching, and binary state encoding is used. The experimental results from [22-24] show that the evolutionary state encoding algorithms such as [23] or [24] outperform the binary or random state encoding techniques by an average of 59.72% and 64.06%, respectively.

Therefore, the LUT requirement for the proposed architecture can be further reduced by 20 to 30% by using the evolutionary state encoding techniques.

4. Concluding Remarks

This paper presents a high-speed reconfigurable FSM with input multiplexing and state-based input selection (Reconfigurable FSMIM-S) architecture. The creation of such architecture leads to a problem of defining the optimized multiplexer bank for mode based reconfiguration for the set of FSMs in a particular application. This situation transforms the problem into a weighted bipartite graph matching problem where the objective is to match the description of FSMs in the set with minimal cost. As a solution, an iterative greedy heuristic based Hungarian algorithm is proposed, which provides the required optimized multiplexer bank. By using the proposed architecture, operating speed is enhanced at an average of 30.43% and LUT consumption is reduced by an average of 5.16% in FPGA implementation in comparison with VRMUX [11]. It has also been shown that the operating speed is improved at an average of 9.14% as compared with CRMUX [11]. The only trade-off of the proposed technique is that it requires 88.65% more LUTs compared with CRMUX [11] during FPGA implementation.

Further, the improvement on this work is focused on reducing the LUT requirement to implement the proposed architecture. In this study, a binary state encoding is used, and next state function is partially included in matching. However, evolutionary state encoding algorithms such as [23] or [24] can be used to reduce the increased LUT requirement.

https://doi.org/10.1155/2018/6831901

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

[1] J. Glaser, M. Damm, J. Haase, and C. Grimm, "TR-FSM: Transition-based Reconfigurable finite state machine," ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 4, no. 3, article no. 23, 2011.

[2] A. Karatkevich, Design of Reconfigurable Logic Controllers, vol. 45, springer, Berlin, Germany, 2016.

[3] J. Wu, D. Yang, and Z. Chen, "Design and application of multi-stage reconfigurable signal processing flow on FPGA," Computers and Electrical Engineering, vol. 42, pp. 1-11, 2015.

[4] E. De Lucas, M. Sanchez-Elez, and I. Pardines, "DSPONE48: A methodology for automatically synthesize HDL focus on the reuse of DSP slices," Journal of Parallel and Distributed Computing, vol. 106, pp. 132-142, 2017.

[5] M. Nandini Priya and R. Brindha, "An enhanced architecture for high performance BIST TPG," in Proceedings of the 2nd IEEE International Conference on Innovations in Information, Embedded and Communication Systems, ICIIECS 2015, pp. 1-6, Coimbatore, India, March 2015.

[6] K. Mielcarek, A. Barkalov, and L. Titarenko, "Designing LUT-based mealy FSM with transformation of collections of output functions," in Proceedings of the 5th International Conference on Modern Circuits and Systems Technologies, MOCAST 2016, pp. 1-4, Thessaloniki, Greece, May 2016.

[7] R. Senhadji-Navarro, I. Garcia-Vargas, G. Jimenez-Moreno, and A. Civit-Ballcels, "ROM-based FSM implementation using input multiplexing in FPGA devices," IEEE Electronics Letters, vol. 40, no. 20, pp. 1249-1251, 2004.

[8] A. Klimowicz, V. Solov'Ev, and T. Grzes, "Minimization method of finite state machines for low power design," in Proceedings of the 18th Euromicro Conference on Digital System Design, DSD 2015, pp. 259-262, Funchal, Portugal, August 2015.

[9] S. N. Pradhan and P. Choudhury, "Low power and high testable Finite State Machine synthesis," in Proceedings of the International Conference and Workshop on Computing and Communication, IEMCON2015,pp. 1-5, Canada, 0ctober2015.

[10] I. Garcia-Vargas and R. Senhadji-Navarro, "Finite state machines with input multiplexing: A performance Study," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 5, pp. 867-871, 2015.

[11] R. Senhadji-Navaro and I. Garcia-Vargas, "high-speed and area-efficient reconfigurable multiplexer bank for RAM-based finite state machine implementations," Journal of Circuits, Systems and Computers, vol. 24, no. 7, Article ID 1550101, 2015.

[12] V. Sklyarov and I. Skliarova, "Synthesis of parallel hierarchical finite state machines," in Proceedings of the 2013 21st Iranian Conference on Electrical Engineering, ICEE2013, Iran, May2013.

[13] S. Gupta, V. Pareek, S. C. Jain, and D. Jain, "Realization of sequential reversible circuit from finite state machine," in Proceedings of the International Computer Science and Engineering Conference, ICSEC 2014, pp. 458-463, Khon Kaen, Thailand, August 2014.

[14] M. Barketau, E. Pesch, and Y. Shafransky, "Minimizing maximum weight of subsets of a maximum matching in a bipartite graph," Discrete Applied Mathematics, vol. 196, pp. 4-19, 2015.

[15] K. Date and R. Nagi, "GPU-accelerated Hungarian algorithms for the linear assignment problem," Parallel Computing, vol. 57, pp. 52-72, 2016.

[16] J. Dutta and S. C. Pal, "A note on Hungarian method for solving assignment problem," Journal of Information and Optimization Sciences, vol. 36, no. 5, pp. 451-459, 2015.

[17] V. Stozhkov, V. Boginski, O. . Prokopyev, and E. L. Pasiliao, "A simple greedy heuristic for linear assignment interdiction," Annals of Operations Research, vol. 249, no. 1-2, pp. 39-53, 2017.

[18] S. K. Ramadoss, A. P. Singh, and I. K. G. Mohiddin, "An evolutionary heuristic algorithm for the assignment problem," OPSEARCH, vol. 51, no. 4, pp. 589-602, 2014.

[19] T. N. Grzes and V. V. Solov'ev, "Minimization of power consumption of finite state machines by splitting their internal states," Journal of Computer and Systems Sciences International, vol. 54, no. 3, pp. 367-374, 2015.

[20] V. Salauyou, "Synthesis of high-speed finite state machines in FPGAs by state splitting," in Computer Information Systems and Industrial Management: 15th IFIP TC8 International Conference, CISIM 2016, K. Saeed and W. Homenda, Eds., vol. 9842 of Lecture Notes in Computer Science, pp. 741-751, Springer International Publishing, Vilnius, Lithuania, 2016.

[21] https://people.engr.ncsu.edu/brglez/CBL/benchmarks/ LGSynth89/fsmexamples/.

[22] T. Villa and A. Sangiovanni-Vincentelli, "NOVA: State assignment of finite state machines for optimal two-level logic implementations," in Proceedings of the 26th ACM/IEEE Design Automation Conference, pp. 327-332, June 1989.

[23] A. H. El-Maleh, "Majority-based evolution state assignment algorithm for area and power optimisation of sequential circuits," IET Computers & Digital Techniques, vol. 10, no. 1, pp. 30-36, 2016.

[24] A. H. El-Maleh, "A probabilistic pairwise swap search state assignment algorithm for sequential circuit optimization," Integration, the VLSI Journal, vol. 56, pp. 32-43, 2017.

NitishDas (iD) and P. Aruna Priya (iD)

Department of ECE, SRM University, Kattankulathur, Chennai 603203, India

Correspondence should be addressed to P. Aruna Priya; arunapriya.p@ktr.srmuniv.ac.in

Received 25 July 2017; Revised 27 October 2017; Accepted 16 November 2017; Published 10 January 2018

Academic Editor: Michael Hubner

Caption: Figure 1: Reconfigurable FSMIM-S architecture.

Caption: Figure 2: Flow chart for iterative greedy heuristic based Hungarian algorithm.

Caption: Figure 3: Comparison of hardware requirements during FPGA implementation.

Caption: Figure 4: Comparison of operating speeds during FPGA implementation.
Table 1: Benchmark circuits from MCNC/LGSynth [21].

Benchmark     Number of    Number of    Number of
circuits        states       inputs      outputs

s1494             48           8            19
sand              32           11           9
styr              30           9            10
planet            48           7            19
s832              25           18           19
cse               16           7            7
s386              13           7            7
ex6               8            5            8
mc                4            3            5
planet1           48           7            19
s1488             48           8            19
s208              18           11           2

Table 2: Input matching among FSMs.

                                         Circuits

sl494                sand         styr        planet        s832
base_ckt          recon_ckt_1  recon_ckt_2  recon_ckt_3  recon_ckt_4

[x.sub.1]          [x.sub.4]    [x.sub.7]    [x.sub.6]    [x.sub.9]
[x.sub.2]          [x.sub.2]    [x.sub.4]    [x.sub.3]   [x.sub.16]
[x.sub.3]          [x.sub.1]    [x.sub.5]    [x.sub.4]   [x.sub.18]
[x.sub.4]         [x.sub.10]    [x.sub.1]    [x.sub.2]   [x.sub.17]
[x.sub.5]          [x.sub.5]    [x.sub.8]    [x.sub.5]    [x.sub.8]
[x.sub.6]          [x.sub.3]    [x.sub.6]    [x.sub.7]    [x.sub.7]
[x.sub.7]          [x.sub.0]    [x.sub.1]    [x.sub.1]    [x.sub.5]
[x.sub.8]         [x.sub.11]    [x.sub.2]     --     [x.sub.10]
Minimum                0            0            0            0
assignment-cost
Minimum               11           50           14           22
total_cost

                                         Circuits

s1494                 cse         s386          ex6          mc
base_ckt          recon_ckt_5  recon_ckt_6  recon_ckt_7  recon_ckt_8

[x.sub.1]          [x.sub.5]    [x.sub.4]    [x.sub.3]       --
[x.sub.2]          [x.sub.2]     --      [x.sub.4]    [x.sub.3]
[x.sub.3]           --      [x.sub.2]     --      [x.sub.1]
[x.sub.4]          [x.sub.3]    [x.sub.7]    [x.sub.1]       --
[x.sub.5]          [x.sub.4]    [x.sub.1]    [x.sub.2]    [x.sub.2
[x.sub.6]          [x.sub.1]    [x.sub.5]     --         --
[x.sub.7]          [x.sub.6]    [x.sub.6]    [x.sub.5]       --
[x.sub.8]          [x.sub.7]    [x.sub.3]     --         --
Minimum                0            0            0            0
assignment-cost
Minimum               19           31            6            1
total_cost

                                 Circuits

s1494               planetl       sl488          s208
base_ckt          recon_ckt_9  recon_ckt_10  recon_ckt_11

[x.sub.1]          [x.sub.5]    [x.sub.1]     [x.sub.1]
[x.sub.2]          [x.sub.1]    [x.sub.5]     [x.sub.8]
[x.sub.3]          [x.sub.6]    [x.sub.3]     [x.sub.6]
[x.sub.4]          [x.sub.3]    [x.sub.4]     [x.sub.9]
[x.sub.5]          [x.sub.4]    [x.sub.8]     [x.sub.11]
[x.sub.6]          [x.sub.2]    [x.sub.7]     [x.sub.2]
[x.sub.7]          [x.sub.7]    [x.sub.2]     [x.sub.10]
[x.sub.8]           --      [x.sub.6]     [x.sub.7]
Minimum                0            2             0
assignment-cost
Minimum               17            42            33
total_cost

Table 3: Output matching among FSMs.

                                      Circuits

s1494            sand           styr          planet          s832
base.ckt     recon_ckt_1    recon_ckt_2    recon_ckt_3    recon_ckt_4

[y.sub.8]     [y.sub.4]      [y.sub.5]      [y.sub.2]      [y.sub.19]
[y.sub.11]    [y.sub.6]      [y.sub.6]      [y.sub.6]      [y.sub.15]
[y.sub.12]    [y.sub.2]      [y.sub.7]      [y.sub.8]      [y.sub.2]
[y.sub.13]    [y.sub.9]      [y.sub.8]      [y.sub.9]      [y.sub.8]
[y.sub.14]    [y.sub.7]      [y.sub.3]      [y.sub.5]      [y.sub.7]
[y.sub.15]    [y.sub.3]      [y.sub.2]      [y.sub.3]      [y.sub.1]
[y.sub.17]    [y.sub.5]      [y.sub.1]      [y.sub.1]      [y.sub.11]
[y.sub.19]    [y.sub.1]      [y.sub.10]     [y.sub.7]      [y.sub.5]

XOR_count         95            163            106             76

                                      Circuits

s1494            cse            s386           ex6             mc
base.ckt     recon_ckt_5    recon_ckt_6    recon_ckt_7    recon_ckt_8

[y.sub.8]       --       [y.sub.6]      [y.sub.5]      [y.sub.4]
[y.sub.11]    [y.sub.7]      [y.sub.1]      [y.sub.8]      [y.sub.2]
[y.sub.12]    [y.sub.3]      [y.sub.3]      [y.sub.4]      [y.sub.5]
[y.sub.13]    [y.sub.5]      [y.sub.4]      [y.sub.1]      [y.sub.1]
[y.sub.14]    [y.sub.1]        --       [y.sub.6]      [y.sub.3]
[y.sub.15]    [y.sub.4]      [y.sub.7]      [y.sub.7]          --
[y.sub.17]    [y.sub.6]      [y.sub.5]      [y.sub.2]          --
[y.sub.19]    [y.sub.2]      [y.sub.2]      [y.sub.3]          --

XOR_count        134            154             86             72

                              Circuits

s1494          planet1         s1488           s208
base.ckt     recon_ckt_9    recon_ckt_10   recon_ckt_11

[y.sub.8]     [y.sub.6]      [y.sub.18]         --
[y.sub.11]    [y.sub.7]      [y.sub.7]      [y.sub.1]
[y.sub.12]    [y.sub.1]      [y.sub.13]         --
[y.sub.13]    [y.sub.8]      [y.sub.19]         --
[y.sub.14]    [y.sub.2]      [y.sub.9]      [y.sub.2]
[y.sub.15]    [y.sub.5]      [y.sub.15]         --
[y.sub.17]    [y.sub.3]      [y.sub.12]         --
[y.sub.19]    [y.sub.9]      [y.sub.16]         --

XOR_count        142            103            136

Table 4: Iterative implementation of the Reconfigurable FSMIM-S
architecture on Virtex-6.

Iteration    FSM included in   #LUTs occupied        Maximum
             the particular        in the           operating
                iteration        particular         frequency
                                  iteration

0th               s1494              42            810.17 MHz
1st               sand               79            779.271 MHz
2nd               styr               105           775.164MHz
3rd              planet              137           728.704MHz
4th               s832               199           725.005MHz
5th                cse               230           722.335MHz
6th               s386               249           720.643 MHz
7th                ex6               255           715.231 MHz
8th                mc                269           706.889 MHz
9th              planet1             303           676.338 MHz
10th              s1488              330           671.760 MHz
11th              s208               349           665.181 MHz

Iteration     Maximum path      #LUTs occupied by
                  delay         the FSM (#LUTs in
                                   the current
                               iteration--#LUTs in
                                   the previous
                                    iteration)

0th             4.571 ns                42
1st             4.778 ns                37
2nd             4.455 ns                26
3rd             4.609 ns                32
4th              5.381ns                62
5th             5.140 ns                31
6th             5.662 ns                19
7th             4.967 ns                6
8th             4.526 ns                14
9th             5.098 ns                34
10th            4.486 ns                27
11th            3.014 ns                19

Note. #LUTs denotes the number of LUTs in ISE.
COPYRIGHT 2018 Hindawi Limited
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Article
Author:Das, Nitish; Priya, P. Aruna
Publication:International Journal of Reconfigurable Computing
Date:Jan 1, 2018
Words:9807
Previous Article:RP-Ring: A Heterogeneous Multi-FPGA Accelerator.
Next Article:Reconfigurable Network Stream Processing on Virtualized FPGA Resources.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters