State Machine Design
State Machine Design
Steve Golson Trilobyte Systems, 33 Sunset Road, Carlisle MA 01741 Phone: 508/369-9669 Email: [email protected]
Abstract: Designing a synchronous nite state machine (FSM) is a common task for a digital logic engineer. This paper will discuss a variety of issues regarding FSM design using Synopsys Design Compiler1. Verilog and VHDL coding styles will be presented. Different methodologies will be compared using real-world examples.
Another way of organizing a state machine uses only one logic block as shown in Figure 2.
1.0 Introduction
A nite state machine2 has the general structure shown in Figure 1.
(Mealy only) inputs NEXT STATE LOGIC STATE MEMORY OUTPUT LOGIC outputs
Figure 1: State machine structure The current state of the machine is stored in the state memory, a set of n ip-ops clocked by a single clock signal (hence synchronous state machine). The state vector (also current state, or just state) is the value currently stored by the state memory. The next state of the machine is a function of the state vector and the inputs. Mealy outputs [7] are a function of the state vector and the inputs while Moore outputs [8] are a function of the state vector only.
outputs inputs LOGIC STATE MEMORY
A highly-encoded state assignment will use fewer ops for the state vector, however additional logic
Page 1
Page 2
next_state = 8'b0 ; case (1'b1) // synopsys parallel_case full_case state[START]: if (in == 8'h3c) next_state[SA] = 1'b1 ; else next_state[START] = 1'b1 ; state[SB]: if (in == 8'haa) next_state[SE] = 1'b1 ; else begin next_state[SF] = 1'b1 ; state[SC]: next_state[SD] = 1'b1 ;
See Listing 1 and Listing 3 for more examples. Using parameter and the full_case directive in Verilog we can specify arbitrary state encodings and still have efcient logic. In VHDL the state encodings are declared as an enumerated type (see Listing 5). The actual numeric value of the enumerated elements is predened by the VHDL language: the rst element is 0, then 1, 2, etc. It is difcult to dene arbitrary encodings in the VHDL language.3 To remedy this problem Synopsys has provided the attribute enum_encoding which allows you to specify numeric code values for the enumerated types. Unfortunately not all VHDL simulators will implement this vendor-specic extension, which means your behavioral and gate simulations will use different encodings.
The case statement looks at each state bit in turn until it nds the one that is set. Then one bit of next_state is set corresponding to the appropriate state transition. The remaining bits of next_state are all set to zero by the default statement
next_state = 8'b0 ;
Note the use of parallel_case and full_case directives for maximum efciency. The default statement should not be used during synthesis. However default can be useful during behavioral simulation, so use compiler directives to prevent Design Compiler from seeing it:
// synopsys translate_off default: $display(Hes dead, Jim.) ; // synopsys translate_on
As before, all the bits of next_state are set to zero by the default assignment, and then one bit is set to 1 indicating the state transition.
3. This still isnt xed in VHDL 93 [2].
Page 3
For both the Verilog and VHDL one-hot machines, the behavioral simulation will exactly agree with the post-synthesis gate-level simulation.
Sometimes it is easier to specify an output value as a function of the next state rather than of the current state.
With no further assignment the value will hold, or we can set, clear, and toggle:
next_myout = 1b1 ; /* set */ next_myout = 1b0 ; /* clear */
5.0 Outputs
Outputs are coded in a manner similar to the next state value. A case statement (or the equivalent) is used, and the output is assigned the appropriate value depending on the particular state transition or state value. If the output is a dont care for some conditions then it should be driven unknown (x). Design Compiler will use this dont care information when optimizing the logic. Assigning the output to a default value prior to the
case statement will ensure that the output is
This JK type output is especially useful for pseudostate ag bits (see Section 3.3).
6.0 Inputs
6.1 Asynchronous inputs
Sometimes a state machine will have an input which may change asynchronously with respect to the clock. Such an input must be synchronized, and there must be one and only one synchronizer op. The easiest way to accomplish this is to have the sync op external to the state machine module, and place a large4 set_input_delay on that input to allow time for the sync op to settle. If the sync op is included in the same module as the FSM then you can place an input delay on the internal op output pin. Unfortunately this requires the op to be mapped prior to compiling the rest of the machine. Rather than hand-instantiating the op we can use register inference as usual and simply map that one op before compiling. The following script will map the op:
specied for all possible state and input combinations. This will avoid unexpected latch inference on the output. Also the code is simplied by specifying a default value which may be overridden only when necessary. The default value may be 1, 0 or x. It is best to have a default of 0 and occasionally set it to 1 rather than the reverse (even if this requires an external inverter). Consider an output that is 1 in a single state, and 0 otherwise. Design Compiler will make the output equal to the one-hot state bit for that state. Now consider an output that is 0 in only one state, and 1 otherwise. The output will be driven by an OR of all the other state bits! Using set_flatten -phase true will not help. For a one-hot machine you can use the state bits directly to create outputs which are active in those states:
myout = state[IDLE] || state[FOO] ;
4. Large means a large fraction of your clock period. Extra credit: ask your ASIC vendor about the metastability characteristics of their ops. Try not to laugh.
Page 4
/* get the name of the unmapped flop */ theflop = signal_to_be_synced + "_reg" /* group it into a design by itself */ group -cell flop -design temp \ find(cell,theflop) /* remember where you are */ top = current_design /* push into the new design */ current_design = temp /* set_register_type if necessary */ /* map the flop */ compile -map_effort low -no_design_rule /* pop back up */ current_design = top /* ungroup the flop */ ungroup -simple_names find(cell,flop) /* clean up */ remove_design temp remove_variable top /* now set the internal delay */ set_input_delay 10 -clock clk \ find(pin,theflop/Q*) /* now you can compile the fsm */
Rather than instantiating a specic gate from your vendor library, the gate can be selected from the Synopsys GTECH library. This keeps your HDL code vendor-independent. In Verilog this is done
GTECH_AND2 myand ( .Z(signal_qualified), .A(signal_in), .B(enable)) ;
which prevents logic-level optimization during the compile. Design Compiler will attempt to map the gate exactly in the target library. Sometimes this technique will create redundant logic in your module. This can cause a problem when generating test vectors because some nodes may not be testable. Verilog users may be tempted to use gate primitives:
and myand (signal_qualified, signal_in, enable) ;
The set_input_delay will put an implicit dont_touch on the sync op. If your ASIC vendor has a metastable resistant op then use set_register_type to specify it.
making the reasonable assumption that this will initially map to a GTECH_AND2 as the Verilog is read in. Then set_map_only could be used as above. Unfortunately this does not work; gate primitives do not always map to a GTECH cell. Perhaps a future Synopsys enhancement will allow this. In order to support behavioral simulation of your HDL, a behavioral description of the GTECH gates must be provided. Synopsys supplies such a library only for VHDL users. One hopes that a similar Verilog library will be provided in a future release.
5. An example would be where the signal is used as a mux select. If the data inputs to the mux are equal then Design Compiler assumes the mux output will have the same value regardless of the select value. Unfortunately, glitches on the select may nevertheless cause glitches on the mux output.
Page 5
synthetic comments in Verilog. (See Listing 1, Listing 3, and Listing 5 for examples.)
extract puts your design into a two-level PLA
you can no longer atten the design. This precludes any further use of extract. Compile scripts are more verbose and complicated.
format before doing any optimizations and transformations. Thus if your design cannot be attened then you cannot use extract. Synopsys provides the group -fsm command to isolate your state machine from any other logic in its module. Unfortunately the newly-created ports have Synopsys internal names like n1234. The resulting state table is difcult to understand. Therefore to efciently use extract you should avoid group -fsm. This means you can have no extraneous logic in your module. Your design must be mapped to gates before you can use extract. Synopsys suggests that you run compile on your design after reading in the HDL and before applying any constraints, i.e.:
compile -map_effort low -no_design_rule
This isnt really necessary since most of your design will already be implemented in generic logic after reading the HDL, and extract can handle that ne. What you really must do is
replace_synthetic
to get rid of any hierarchy. This will be considerably faster than using compile. After using extract always do check_design to get a report on any bad state transitions.
This allows a clock skew to be applied to the state ops without affecting the input and output timing (which may be relative to an off-chip clock, for example). If you have any Mealy outputs you generally need to specify them as a multicycle path using
set_multicycle_path -setup 2 \ -from all_inputs() \ -to all_outputs() set_multicycle_path -hold 1 \ -from all_inputs() \ -to all_outputs()
7.1 Advantages
You can get very fast results using extract with set_fsm_coding_style one_hot. FSM design errors can be uncovered by inspecting the extracted state table.
7.2 Disadvantages
The world isnt a PLA, but extract treats your design like one. Unless you are truly area constrained, the only interesting coding style that extract supports is one-hot. You might as well code for one-hot to begin with (cf. Section 4.2). You can be happily using extract, but one day modify your HDL source and then discover that
Sometimes it is useful to group paths into four categories: input to state op, state op to output, input to output (Mealy path), and state op to state op. With the paths in different groups they can be given different cost function weights during compile. If we have used separate clocks as suggested above then we might try
/* put all paths in default group */ group_path -default -to \ { find(clock) find(port) find(pin) } \ > /dev/null
Page 6
/* now arrange them */ group_path -name theins \ -from find(clock,inclk) \ -to find(clock,clk) group_path -name theouts \ -from find(clock,clk) \ -to find(clock,outclk) group_path -name thru \ -from find(clock,inclk) \ -to find(clock,outclk) group_path -name flop \ -from find(clock,clk) \ -to find(clock,clk)
Unfortunately this doesnt work! It seems that whenever you specify a clock as a startpoint or endpoint of a path, all paths with that clock are affected. You end up with the same path in more than one group.6 So instead of using clocks we can specify pins:
group_path -name theins \ -from all_inputs() \ -to all_registers(-data) group_path -name theouts \ -from all_registers(-clock_pins) \ -to all_outputs() group_path -name thru \ -from all_inputs() \ -to all_outputs() group_path -name flop \ -from all_registers(-clock_pins) -to all_registers(-data)
If this is a highly-encoded machine then it is very difcult to determine which state transition this path corresponds to. Worse, this may actually be a false path. In contrast, if this is a one-hot machine then we see this transition must start in state[0] because op state_reg[0] is set (pin state_reg[0]/QN falling), and must end in state[1] because op state_reg[1] is being set (state_reg[1]/D is rising). Now that the particular transition has been identied it may be recoded to speed up the path. When using extract the state ops for one-hot machines are given the names of the corresponding states. This makes path analysis particularly straightforward.
This works ne. You do get the paths where you want them.7 Regardless of the path groupings we can specify timing reports that give us the information we want:
report_timing \ -from all_inputs() \ -to all_registers(-data) report_timing \ -from all_registers(-clock_pins) \ -to all_outputs() report_timing \ -from all_inputs() \ -to all_outputs() report_timing \ -from all_registers(-clock_pins) -to all_registers(-data)
6. Hopefully this will be xed in a future version. 7. This works even if you specify these groups on unmapped logic. As the ops are mapped during the compile, Design Compiler automatically changes the op pin names used in the path groupings.
Page 7
Table 1 -- Compile results for example state machines compile for max speed slack (ns) area run time (minutes) compile for min area slack (ns) area run time (minutes)
prep3
8 states, 12 transitions, 8 inputs, 8 outputs coded for extract binary one_hot auto_3 auto_4 auto_5 no extract (binary) coded for one_hot structure flatten flatten & structure -3.86 -4.35 -4.38 221 341 239 <2 -12.23 -9.84 -11.93 194 258 193 <2 -5.41 -5.19 -5.95 -5.01 -5.43 -4.56 228 227 214 234 229 216 <2 -8.84 -11.59 -7.29 -8.55 -8.55 -12.22 166 196 164 159 159 169 <2
prep4
16 states, 40 transitions, 8 inputs, 8 outputs coded for extract binary one_hot auto_4 auto_5 auto_6 no extract (binary) coded for one_hot structure flatten flatten & structure -5.27 -5.90 -5.04 335 475 342 <5 -10.30 -13.79 -10.94 259 370 260 <5 -7.33 -4.34 -6.42 -6.58 -8.30 -8.87 298 348 283 285 279 299 <7 -15.50 -10.96 -13.16 -14.63 -12.81 -17.03 195 255 190 184 191 204 <7
Page 8
Table 1 -- Compile results for example state machines (Continued) compile for max speed slack (ns) area run time (minutes) compile for min area slack (ns) area run time (minutes)
sm40
40 states, 80 transitions, 63 inputs, 61 outputs coded for extract binary one_hot auto_6 auto_7 auto_8 no extract (binary) coded for one_hot structure flatten flatten & structure -4.78 -7.38 -4.41 882 3026 905 12 202 28 -16.59 -49.68 -16.86 761 2141 753 7 73 22 -5.19 -2.82 -5.39 -5.71 -5.63 -7.72 931 912 885 979 933 889 100 27 87 76 69 35 -21.04 -16.48 -18.60 -29.54 -18.37 -31.73 661 737 668 683 682 604 81 21 61 51 51 6
sm70
69 states, 116 transitions, 27 inputs, 16 outputs coded for extract binary one_hot auto_7 auto_8 auto_9 no extract (binary) coded for one_hot structure flatten flatten & structure -7.92 -6.51 -7.39 1339 1852 1326 20 36 29 -35.77 -26.20 -33.96 1096 1548 1104 12 23 18 -7.98 -3.12 -5.51 -5.69 -4.55 -20.70 1030 1200 996 975 1018 1249 17 10 15 13 19 49 -17.66 -8.28 -14.67 -11.33 -11.84 -60.43 857 1121 849 817 827 843 5 5 6 6 5 8
Page 9
datapaths, but it does work with a single bank of ops as in a state machine. The drawbacks are: The ops cannot have asynchronous resets Results may be affected by
compile_preserve_sync_resets = "true"
or
reg [2:0] /*synopsys enum code*/ state;
A set_false_path from the asynchronous reset port of an extracted FSM will prevent the state ops from being mapped during compile. Apparently an implicit dont_touch is placed on the ops. This is no doubt a bug. When using auto state encoding, only the unencoded states are given new values. If you want to replace all current encodings then do
set_fsm_encoding {} set_fsm_encoding_style auto
When using extract with auto encoding only the minimum number of state ops are used. If you have specied a larger number then usually, but not always, you will get a warning about truncating state vector. Do a report_fsm to be sure. The encoding picked by extract does not depend on the applied constraints. Coding the same machine in Verilog and VHDL and using extract gives identical state tables, but the compile results are slightly different. If your HDL source species an output as dont care, this will not be reected in the state table, because prior to the extract you have to map into gates rst and that collapses the dont care. Always do an ungroup before extract. set_fsm_encoding cant handle more than 31 bits if you are using the ^H format. Instead use ^B which works ne. Remove any unused inputs from your module before doing an extract. They will be included in the state table and it slows down the compile. Verilog users should infer ops using non-blocking assignments with non-zero intra-assignment delays:
always @ (posedge clk) myout <= #1 next_myout ;
This is not necessary for Synopsys but should make your ops simulate correctly in all Verilog simulators (e.g. Verilog-XL and VCS). Avoid using synchronous resets; it will probably add many additional transitions to your machine. For example the sm40 machine adds 26 transitions for a total of 106, and the sm70 machine adds 60 for a total of 176.
Page 10
If you must use synchronous resets then they should be implemented as part of the op inference and not in the state machine description itself. Here is a Verilog example modied from Listing 3:
// build the state flops always @ (posedge clk) begin if (!rst) state <= #1 S0 ; else state <= #1 next_state ; end
13.0 References
[1] Pranav Ashar, Srinivas Devadas, A. Richard Newton, Sequential Logic Synthesis, Kluwer Academic Publishers, 1992. [2] Jean-Michel Berg, Alain Fonkoua, Serge Maginot, Jacques Rouillard, VHDL 92, Kluwer Academic Publishers, 1993. [3] Giovanni De Micheli, Robert K. Brayton, Alberto Sangiovanni-Vincentelli, Optimal State Assignment for Finite State Machines, IEEE Trans. Computer-Aided Design, vol. CAD-4, no. 3, pp. 269-285, July 1985. [4] Steve Golson, One-hot state machine design for FPGAs, Proc. 3rd Annual PLD Design Conference & Exhibit, p. 1.1.3.B, March 1993. [5] D. A. Huffman, The Synthesis of Sequential Switching Circuits, J. Franklin Institute, vol. 257, no. 3, pp. 161-190, March 1954. [6] D. A. Huffman, The Synthesis of Sequential Switching Circuits, J. Franklin Institute, vol. 257, no. 4, pp. 275-303, April 1954. [7] George H. Mealy, A Method for Synthesizing Sequential Circuits, Bell System Technical J., vol. 34, no. 5, pp. 1045-1079, Sept. 1955. [8] Edward F. Moore, Gedanken-Experiments on Sequential Machines, Automata Studies, Annals of Mathematical Studies, no. 34, pp. 129-153, Princeton Univ. 1956. [9] Programmable Electronics Performance Corporation, Benchmark Suite #1, Version 1.2, March 28, 1993. [10] James R. Story, Harold J. Harrison, Erwin A. Reinhard, Optimum State Assignment for Synchronous Sequential Circuits, IEEE Trans. Computers, vol. C-21, no. 12, pp. 1365-1373, December 1972. [11] Synopsys, Finite State Machines Application Note, Version 3.0, February 1993. [12] Synopsys, Flattening and Structuring: A Look at Optimization Strategies Application Note, Version 3.0, February 1993. [13] Alan M. Turing, On Computable Numbers, with an Application to the Entscheidungsproblem, Proc. London Math. Soc., vol. 24, pp. 230-265, 1936. [14] John F. Wakerly, Digital Design: Principles and Practices, Prentice-Hall, 1990.
When using extract with auto encoding only the minimum number of state ops are used. Nevertheless, specifying more than the minimum will affect the state assignment and thus the compile results. Why is extract better at attening a design than compile using set_flatten?
12.0 Acknowledgments
Thanks to my clients for providing access to design tools and for allowing the use of examples sm40 and sm70. Thanks to John F. Wakerly for nding the Huffman references. An earlier version of this paper was presented at the Fourth Annual Synopsys Users Group Conference (SNUG 94) in March 1994. An updated version appeared in The Journal of High-Level Design (Synopsys, September 1994). It was very slightly revised into its present form.
Page 11
A.0 Appendix
The following example state machines are taken from the PREP benchmark suite [9].
Listing 1 is a Verilog implementation for use with Synopsys FSM extract. Listing 2 is a Verilog implementation that is one-hot coded.
A.1 prep3
prep3 is a Mealy machine with eight states and 12 transitions. It has eight inputs and eight registered outputs. Here is the state diagram:
Page 12
SB: if (in == 8'haa) begin next_state = SE ; next_out = 8'h11 ; end else begin next_state = SF ; next_out = 8'h30 ; end SC: begin next_state = SD ; next_out = 8'h08 ; end SD: begin next_state = SG ; next_out = 8'h80 ; end SE: begin next_state = START ; next_out = 8'h40 ; end SF: begin next_state = SG ; next_out = 8'h02 ; end SG: begin next_state = START ; next_out = 8'h01 ; end endcase end // build the state flops always @ (posedge clk or negedge rst) begin if (!rst) state <= #1 START ; else state <= #1 next_state ; end // build the output flops always @ (posedge clk or negedge rst) begin if (!rst) out <= #1 8'b0 ; else out <= #1 next_out ; end endmodule
module prep3 (clk, rst, in, out) ; input clk, rst ; input [7:0] in ; output [7:0] out ; parameter START SA SB SC SD SE SF SG [2:0] // synopsys enum code = 3'd0 , = 3'd1 , = 3'd2 , = 3'd3 , = 3'd4 , = 3'd5 , = 3'd6 , = 3'd7 ;
// synopsys state_vector state reg [2:0] // synopsys enum code state, next_state ; reg [7:0] out, next_out ; always @ (in or state) begin // default values next_state = START ; next_out = 8'bx ; // state machine case (state) // synopsys parallel_case full_case START: if (in == 8'h3c) begin next_state = SA ; next_out = 8'h82 ; end else begin next_state = START ; next_out = 8'h00 ; end SA: case (in) // synopsys parallel_case full_case 8'h2a: begin next_state = SC ; next_out = 8'h40 ; end 8'h1f: begin next_state = SB ; next_out = 8'h20 ; end default: begin next_state = SA ; next_out = 8'h04 ; end endcase
Page 13
Listing 2 -- prep3_onehot.v
/* ** ** ** ** ** ** ** */ prep3_onehot.v prep benchmark 3 -- small state machine benchmark suite #1 -- version 1.2 -- March 28, 1993 Programmable Electronics Performance Corporation one-hot state assignment
state[SB]: if (in == 8'haa) begin next_state[SE] = 1'b1 ; next_out = 8'h11 ; end else begin next_state[SF] = 1'b1 ; next_out = 8'h30 ; end state[SC]: begin next_state[SD] = 1'b1 ; next_out = 8'h08 ; end state[SD]: begin next_state[SG] = 1'b1 ; next_out = 8'h80 ; end state[SE]: begin next_state[START] = 1'b1 ; next_out = 8'h40 ; end state[SF]: begin next_state[SG] = 1'b1 ; next_out = 8'h02 ; end state[SG]: begin next_state[START] = 1'b1 ; next_out = 8'h01 ; end endcase end // build the state flops always @ (posedge clk or negedge rst) begin if (!rst) state <= #1 1'b1 << START ; else state <= #1 next_state ; end // build the output flops always @ (posedge clk or negedge rst) begin if (!rst) out <= #1 8'b0 ; else out <= #1 next_out ; end endmodule
module prep3 (clk, rst, in, out) ; input clk, rst ; input [7:0] in ; output [7:0] out ; parameter START SA SB SC SD SE SF SG [2:0] = 0 , = 1 , = 2 , = 3 , = 4 , = 5 , = 6 , = 7 ;
reg [7:0] state, next_state ; reg [7:0] out, next_out ; always @ (in or state) begin // default values next_state = 8'b0 ; next_out = 8'bx ; case (1'b1) // synopsys parallel_case full_case state[START]: if (in == 8'h3c) begin next_state[SA] = 1'b1 ; next_out = 8'h82 ; end else begin next_state[START] = 1'b1 ; next_out = 8'h00 ; end state[SA]: case (in) // synopsys parallel_case full_case 8'h2a: begin next_state[SC] = 1'b1 ; next_out = 8'h40 ; end 8'h1f: begin next_state[SB] = 1'b1 ; next_out = 8'h20 ; end default: begin next_state[SA] = 1'b1 ; next_out = 8'h04 ; end endcase
Page 14
A.2 prep4
prep4 is a Moore machine with sixteen states and 40 transitions. It has eight inputs and eight unregistered outputs. Here is the state diagram:
Listing 3 is a Verilog implementation for use with Synopsys FSM extract. Listing 4 is a Verilog implementation that is one-hot coded. Listing 5 is a VHDL implementation for use with Synopsys FSM extract. Listing 6 is a VHDL implementation that is one-hot coded.
I > 63
Page 15
Listing 3 -- prep4.v
/* ** ** ** ** ** ** ** */ prep4.v prep benchmark 4 -- large state machine benchmark suite #1 -- version 1.2 -- March 28, 1993 Programmable Electronics Performance Corporation binary state assignment -- highly encoded
S11:
next_state = S15 ; next_state = S8 ; next_state = S0 ; next_state = S12 ; next_state = S12 ; next_state = S14 ;
S12:
S13:
S14:
module prep4 (clk, rst, in, out) ; input clk, rst ; input [7:0] in ; output [7:0] out ; parameter [3:0] S0 = 4'd0 , S4 = 4'd4 , S8 = 4'd8 , S12 = 4'd12 , // synopsys enum code S1 = 4'd1 , S2 = 4'd2 , S5 = 4'd5 , S6 = 4'd6 , S9 = 4'd9 , S10 = 4'd10 , S13 = 4'd13 , S14 = 4'd14 ,
case(1'b1) // synopsys parallel_case full_case (in == 8'd0): next_state = S14 ; (8'd0 < in && in < 8'd64): next_state = S12 ; (in > 8'd63): next_state = S10 ; endcase
S3 S7 S11 S15
= = = =
// synopsys state_vector state reg [3:0] /* synopsys enum code */ state, next_state ; reg [7:0] out ; // state machine always @ (in or state) begin // default value next_state = S0 ; // always overridden case (state) // synopsys parallel_case full_case S0: case(1'b1) // synopsys parallel_case full_case (in == 8'd0): next_state = S0 ; (8'd0 < in && in < 8'd4): next_state = S1 ; (8'd3 < in && in < 8'd32): next_state = S2 ; (8'd31 < in && in < 8'd64): next_state = S3 ; (in > 8'd63): next_state = S4 ; endcase S1: if (in[0] && in[1]) else S2: next_state = S3 ; S3: next_state = S5 ; S4: if (in[0] || in[2] || in[4]) else S5: if (in[0] == 1'b0) else next_state = S5 ; next_state = S6 ; next_state = S0 ; next_state = S3 ;
if (in[7] == 1'b0) next_state = S15 ; else case (in[1:0]) // synopsys parallel_case full_case 2'b11: next_state = S0 ; 2'b01: next_state = S10 ; 2'b10: next_state = S13 ; 2'b00: next_state = S14 ; endcase endcase end // outputs always @ (state) begin // default value out = 8'bx ; case (state) // synopsys parallel_case full_case S0: out = 8'b00000000 ; S1: out = 8'b00000110 ; S2: out = 8'b00011000 ; S3: out = 8'b01100000 ; S4: begin out[7] = 1'b1 ; out[0] = 1'b0 ; end S5: begin out[6] = 1'b1 ; out[1] = 1'b0 ; end S6: out = 8'b00011111 ; S7: out = 8'b00111111 ; S8: out = 8'b01111111 ; S9: out = 8'b11111111 ; S10: begin out[6] = 1'b1 ; out[4] = 1'b1 ; out[2] = 1'b1 ; out[0] = 1'b1 ; end S11: begin out[7] = 1'b1 ; out[5] = 1'b1 ; out[3] = 1'b1 ; out[1] = 1'b1 ; end S12: out = 8'b11111101 ; S13: out = 8'b11110111 ; S14: out = 8'b11011111 ; S15: out = 8'b01111111 ; endcase end // build the state flops always @ (posedge clk or negedge rst) begin if (!rst) state <= #1 S0 ; else state <= #1 next_state ; end endmodule
S15:
S6: case(in[7:6]) // synopsys 2'b11: next_state = S1 2'b00: next_state = S6 2'b01: next_state = S8 2'b10: next_state = S9 endcase S7: case(in[7:6]) // synopsys 2'b00: next_state = S3 2'b11: next_state = S4 2'b10, 2'b01: next_state = S7 endcase S8: if(in[4] ^ in[5]) else if (in[7]) else S9: if (in[0] == 1'b0) else S10: next_state = S1 ;
parallel_case full_case ; ; ;
Page 16
Listing 4 -- prep4_onehot.v
/* ** ** ** ** ** ** ** */ prep4_onehot.v prep benchmark 4 -- large state machine benchmark suite #1 -- version 1.2 -- March 28, 1993 Programmable Electronics Performance Corporation one-hot state assignment
state[S12]: if (in == 8'd255) next_state[S0] = 1'b1 ; else next_state[S12] = 1'b1 ; state[S13]: if (in[1] ^ in[3] ^ in[5]) else
module prep4 (clk, rst, in, out) ; input clk, rst ; input [7:0] in ; output [7:0] out ; parameter [3:0] S0 = 4'd0 , S4 = 4'd4 , S8 = 4'd8 , S12 = 4'd12 ,
state[S14]: case(1'b1) // synopsys parallel_case full_case (in == 8'd0): next_state[S14] = 1'b1 ; (8'd0 < in && in < 8'd64): next_state[S12] = 1'b1 ; (in > 8'd63): next_state[S10] = 1'b1 ; endcase state[S15]: if (in[7] == 1'b0) next_state[S15] = 1'b1 ; else case (in[1:0]) // synopsys parallel_case full_case 2'b11: next_state[S0] = 1'b1 ; 2'b01: next_state[S10] = 1'b1 ; 2'b10: next_state[S13] = 1'b1 ; 2'b00: next_state[S14] = 1'b1 ; endcase endcase end // outputs
S1 S5 S9 S13
= = = =
S2 S6 S10 S14
= = = =
S3 S7 S11 S15
= = = =
reg [15:0] state, next_state ; reg [7:0] out ; // state machine always @ (in or state) begin // default value next_state = 16'b0 ;
always @ (state) begin // default value out = 8'bx ; case (1'b1) // synopsys parallel_case full_case state[S0]: out = 8'b00000000 ; state[S1]: out = 8'b00000110 ; state[S2]: out = 8'b00011000 ; state[S3]: out = 8'b01100000 ; state[S4]: begin out[7] = 1'b1 ; out[0] = 1'b0 ; end state[S5]: begin out[6] = 1'b1 ; out[1] = 1'b0 ; end state[S6]: out = 8'b00011111 ; state[S7]: out = 8'b00111111 ; state[S8]: out = 8'b01111111 ; state[S9]: out = 8'b11111111 ; state[S10]: begin out[6] = 1'b1 ; out[4] = 1'b1 ; out[2] = 1'b1 ; out[0] = 1'b1 ; end state[S11]: begin out[7] = 1'b1 ; out[5] = 1'b1 ; out[3] = 1'b1 ; out[1] = 1'b1 ; end state[S12]: out = 8'b11111101 ; state[S13]: out = 8'b11110111 ; state[S14]: out = 8'b11011111 ; state[S15]: out = 8'b01111111 ; endcase end // build the state flops always @ (posedge clk or negedge rst) begin if (!rst) state <= #1 1'b1 << S0 ; else state <= #1 next_state ; end endmodule
case (1'b1) // synopsys parallel_case full_case state[S0]: case(1'b1) // synopsys parallel_case full_case (in == 8'd0): next_state[S0] = (8'd0 < in && in < 8'd4): next_state[S1] = (8'd3 < in && in < 8'd32): next_state[S2] = (8'd31 < in && in < 8'd64): next_state[S3] = (in > 8'd63): next_state[S4] = endcase state[S1]: state[S2]: state[S3]:
; ; ; ; ;
if (in[0] && in[1]) next_state[S0] = 1'b1 ; else next_state[S3] = 1'b1 ; next_state[S3] = 1'b1 ; next_state[S5] = 1'b1 ;
state[S6]: case(in[7:6]) // synopsys parallel_case full_case 2'b11: next_state[S1] = 1'b1 ; 2'b00: next_state[S6] = 1'b1 ; 2'b01: next_state[S8] = 1'b1 ; 2'b10: next_state[S9] = 1'b1 ; endcase state[S7]: case(in[7:6]) // synopsys parallel_case full_case 2'b00: next_state[S3] = 1'b1 ; 2'b11: next_state[S4] = 1'b1 ; 2'b10, 2'b01: next_state[S7] = 1'b1 ; endcase state[S8]: if(in[4] ^ in[5]) else if (in[7]) else next_state[S11] = 1'b1 ; next_state[S1] = 1'b1 ; next_state[S8] = 1'b1 ;
state[S9]:
Page 17
Listing 5 -- prep4.vhd
-------prep4.vhd prep benchmark 4 -- large state machine benchmark suite #1 -- version 1.2 -- March 28, 1993 Programmable Electronics Performance Corporation binary state assignment, highly encoded
when S3 => next_state <= S5 ; when S4 => if (Iin(0) or Iin(2) or Iin(4)) = '1' then next_state <= S5 ; else next_state <= S6 ; end if ; when S5 => if (Iin(0) = '0') then next_state <= S5 ; else next_state <= S7 ; end if ; when S6 => case Iin(7 downto 6) is when b"11" => next_state when b"00" => next_state when b"01" => next_state when b"10" => next_state end case ; when S7 => case Iin(7 downto 6) is when b"00" => next_state when b"11" => next_state when b"01" => next_state when b"10" => next_state end case ;
library IEEE ; use IEEE.std_logic_1164.all ; use IEEE.std_logic_arith.all ; package typedef is subtype byte is std_logic_vector (7 downto 0) ; subtype bytein is bit_vector (7 downto 0) ; end typedef ; library IEEE ; use IEEE.std_logic_1164.all ; use IEEE.std_logic_arith.all ; use work.typedef.all ; entity prep4 is port ( clk,rst : in std_logic ; I : in byte ; O : out byte) ; end prep4 ; architecture behavior of prep4 is type state_type is (S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15) ; signal state, next_state : state_type ; attribute state_vector : string ; attribute state_vector of behavior : architecture is "state" ; signal Iin : bytein ; begin process (I) begin Iin <= to_bitvector(I); end process ; -- state machine process (Iin, state) begin -- default value next_state <= S0 ; case state is when S0 => if (Iin = X"00") then next_state <= S0; end if ; if (x"00" < Iin) and (Iin < x"04") then next_state <= S1; end if; if (x"03" < Iin) and (Iin < x"20") then next_state <= S2; end if; if (x"1f" < Iin) and (Iin < x"40") then next_state <= S3; end if; if (x"3f" < Iin) then next_state <= S4; end if; when S1 => if (Iin(1) and Iin(0)) = '1' then next_state <= S0; else next_state <= S3; end if ; when S2 => next_state <= S3 ;
S1 S6 S8 S9
; ; ; ;
S3 S4 S7 S7
; ; ; ;
when S8 => if (Iin(4) xor Iin(5)) = '1' next_state <= S11 ; elsif Iin(7) = '1' then next_state <= S1 ; else next_state <= S8 ; end if; when S9 => if (Iin(0) = '1') then next_state <= S11 ; else next_state <= S9 ; end if; when S10 => next_state <= S1 ; when S11 => if Iin = x"40" then next_state <= S15 ; else next_state <= S8 ; end if ; when S12 => if Iin = x"ff" then next_state <= S0 ; else next_state <= S12 ; end if ;
then
when S13 => if (Iin(1) xor Iin(3) xor Iin(5)) = '1' then next_state <= S12 ; else next_state <= S14 ; end if ;
Page 18
when S14 => if (Iin > x"3f") then next_state <= S10 ; elsif (Iin = x"00") then next_state <= S14 ; else next_state <= S12 ; end if ; when S15 => if Iin(7) = '0' then next_state <= S15 ; else case Iin(1 downto 0) is when b"11" => next_state when b"01" => next_state when b"10" => next_state when b"00" => next_state end case ; end if ; end case ; end process; -- outputs process (state) begin -- default value is don't care O <= byte'(others => 'X') ; case state is when S0 => O <= "00000000" when S1 => O <= "00000110" when S2 => O <= "00011000" when S3 => O <= "01100000" when S4 => O(7) <= '1' ; O(0) <= '0' ; when S5 => O(6) <= '1' ; O(1) <= '0' ; when S6 => O <= "00011111" when S7 => O <= "00111111" when S8 => O <= "01111111" when S9 => O <= "11111111" when S10 => O(6) <='1' ; O(4) <='1' ; O(2) <='1' ; O(0) <='1' ; when S11 => O(7) <='1' ; O(5) <='1' ; O(3) <='1' ; O(1) <='1' ; when S12 => O <= "11111101" when S13 => O <= "11110111" when S14 => O <= "11011111" when S15 => O <= "01111111" end case ; end process;
Listing 6 -- prep4_onehot.vhd
-------prep4_onehot.vhd prep benchmark 4 -- large state machine benchmark suite #1 -- version 1.2 -- March 28, 1993 Programmable Electronics Performance Corporation one-hot state assignment
library IEEE ; use IEEE.std_logic_1164.all ; use IEEE.std_logic_arith.all ; package typedef is subtype state_vec is std_logic_vector (0 to 15) ; subtype byte is std_logic_vector (7 downto 0) ; subtype bytein is bit_vector (7 downto 0) ; end typedef ; library IEEE ; use IEEE.std_logic_1164.all ; use IEEE.std_logic_arith.all ; use work.typedef.all ; entity prep4 is port ( clk,rst : in std_logic ; I : in byte ; O : out byte) ; end prep4 ; architecture behavior of prep4 is signal state, next_state : state_vec ; signal Iin : bytein ; begin process (I) begin Iin <= to_bitvector(I); end process ; -- state machine process (Iin, state) begin -- default value next_state <= state_vec'(others => '0') ; if state(0) = '1' then if (Iin = X"00") then next_state(0) <= '1'; end if ; if (x"00" < Iin) and (Iin < x"04") then next_state(1) <= '1'; end if; if (x"03" < Iin) and (Iin < x"20") then next_state(2) <= '1'; end if; if (x"1f" < Iin) and (Iin < x"40") then next_state(3) <= '1'; end if; if (x"3f" < Iin) then next_state(4) <= '1'; end if; end if; if state(1) = '1' then if (Iin(1) and Iin(0)) = '1' then next_state(0) <= '1'; else next_state(3) <= '1'; end if ; end if ; if state(2) = '1' then next_state(3) <= '1' ; end if;
; ; ; ;
; ; ; ;
; ; ; ;
-- build the state flops process (clk, rst) begin if rst='0' then state <= S0 ; elsif clk='1' and clk'event then state <= next_state ; end if ; end process ; end behavior ;
Page 19
if state(3) = '1' then next_state(5) <= '1' ; end if; if state(4) = '1' then if (Iin(0) or Iin(2) or Iin(4)) = '1' then next_state(5) <= '1' ; else next_state(6) <= '1' ; end if ; end if; if state(5) = '1' then if (Iin(0) = '0') then next_state(5) <= '1' ; else next_state(7) <= '1' ; end if; if state(6) = '1' then case Iin(7 downto 6) is when b"11" => next_state(1) when b"00" => next_state(6) when b"01" => next_state(8) when b"10" => next_state(9) end case ; end if; if state(7) = '1' then case Iin(7 downto 6) is when b"00" => next_state(3) when b"11" => next_state(4) when b"01" => next_state(7) when b"10" => next_state(7) end case ; end if; if state(8) = '1' then if (Iin(4) xor Iin(5)) = '1' next_state(11) <= '1' ; elsif Iin(7) = '1' then next_state(1) <= '1' ; else next_state(8) <= '1' ; end if; if state(9) = '1' then if (Iin(0) = '1') then next_state(11) <= '1' ; else next_state(9) <= '1' ; end if; if state(10) = '1' then next_state(1) <= '1' ; end if ; if state(11) = '1' then if Iin = x"40" then next_state(15) <= '1' ; else next_state(8) <= '1' ; end if ; if state(12) = '1' then if Iin = x"ff" then next_state(0) <= '1' ; else next_state(12) <= '1' ; end if ;
if state(14) = '1' then if (Iin > x"3f") then next_state(10) <= '1' ; elsif (Iin = x"00") then next_state(14) <= '1' ; else next_state(12) <= '1' ; end if ;
end if ;
end if ;
; ; ; ;
if state(15) = '1' then if Iin(7) = '0' then next_state(15) <= '1' ; else case Iin(1 downto 0) is when b"11" => next_state(0) <= '1' ; when b"01" => next_state(10) <= '1' ; when b"10" => next_state(13) <= '1' ; when b"00" => next_state(14) <= '1' ; end case ; end if ; end if ; end process; -- outputs process (state) begin -- default value is don't care O <= byte'(others => 'X') ; if if if if if state(0) = '1' then state(1) = '1' then state(2) = '1' then state(3) = '1' then state(4) = '1' then O(7) <= '1' ; O(0) <= '0' ; end if ; if state(5) = '1' then O(6) <= '1' ; O(1) <= '0' ; end if ; if state(6) = '1' then if state(7) = '1' then if state(8) = '1' then if state(9) = '1' then if state(10) = '1' then O(6) <='1' ; O(4) <='1' ; O(2) <='1' ; O(0) <='1' ; end if ; if state(11) = '1' then O(7) <='1' ; O(5) <='1' ; O(3) <='1' ; O(1) <='1' ; end if ; if state(12) = '1' then if state(13) = '1' then if state(14) = '1' then if state(15) = '1' then end process; -- build the state flops process (clk, rst) begin if rst='0' then state <= state_vec'(others => '0') ; state(0) <= '1' ; elsif clk='1' and clk'event then state <= next_state ; end if ; end process ; end behavior ; O O O O <= <= <= <= "00000000" "00000110" "00011000" "01100000" ; ; ; ; end end end end if if if if ; ; ; ;
; ; ; ;
then
end if;
O O O O
; ; ; ;
if if if if
; ; ; ;
end if;
end if ;
O O O O
; ; ; ;
if if if if
; ; ; ;
end if ;
if state(13) = '1' then if (Iin(1) xor Iin(3) xor Iin(5)) = '1' then next_state(12) <= '1' ; else next_state(14) <= '1' ; end if ; end if ;
Page 20