The TU Delft Sudoku Solver On FPGA
The TU Delft Sudoku Solver On FPGA
Kees van der Bok #1 , Mottaqiallah Taouil #2 , Panagiotis Afratis #3 , Ioannis Sourdis #4
#
Computer Engineering
Delft University of Technology
The Netherlands
1
[email protected],
2
[email protected],
3
[email protected],
4
[email protected]
State_Machine: next_empty_cell
Symbol Bitmap
Idle
check check
start
valid
Next_empty_cell Guess
next cell found Priority control
All cells filled
encoder logic
Guess Back_track
error Back_track (stack)
Row Column
restored last valid fill
Stack stack
Solve
D. Processing
Fig. 1. Top-level View of the Design
Figure 2 is a simplified representation of the processing unit.
Although the low-level details are kept out in this figure, it
as well as the related bits in the bitmaps. The write mode is clearly shows which steps are involved in the solving process
used to write symbols to the storage module and is used when as well as the flow of it. After being enabled, the processing
the initial puzzle is stored or when cells are filled-in. The unit will go to the next-empty-cell state. In this state, the
destructive read is applied when the solved puzzle is read out. next empty cell is determined; this is performed by checking
In addition, the destructive read clears the bitmaps related to the bitmaps representing the occupied cells. Each bit of the
each cell. That is because the bitmaps need to be cleared before bitmap represents a cell, only the bits representing occupied
the next puzzle is read in. Performing the bitmap clearing cells are set. Finding the first not-set bit in a bitmap is done
while reading out the solution saves valuable time. Whenever using a priority encoder. After having selected the cell, the
the global reset signal is asserted the communication interface state machine proceeds to the guess state in which a valid
will get to a state in which the bitmaps will be cleared as well. symbol for the selected cell is determined. Based on the puzzle
In this state every memory location is read once in destructive depicted in Table I a valid symbol is determined as follows.
mode. The first empty cell in this sudoku is (1,2) (i.e. first row, second
column). Table IV shows the bitmaps of the row, column and
TABLE III block of the corresponding cell. Performing a bitwise OR of
S TORAGE OF THEO RIGINAL S UDOKU these three bitmaps will give the candidate symbols that could
go in the cell (i.e. these symbols are represented by the ’0’ in
Mode Description
the result vector). From the result vector we conclude that 2, 4
Neutral Read Reads symbol and bitmaps
related to the address cell and 9 are valid symbols for cell (1,2). We select the first option
Destructive Read Same as neutral read but which is 2. Choosing the first candidate, instead of randomly
clears the bitmaps selecting one, saves logic and memory since we do not have
Write Write symbol and updates bitmaps to keep track of which symbols have been tried. Furthermore
Clear Clears symbol and updates bitmaps
for choosing the first option we only need a priority encoder.
When filling in a 2 in cell (1,2) we need to update the bitmaps;
this, however, is done by the storage module and is of no
The storage module contains five memories, one for the concern to the processing module. Whenever a cell is filled
sudoku and four for the bitmaps (rows, columns, blocks, the address of the cell is pushed on stack to memorize the
and occupied cells). The bitmaps are stored in four separate backtracking path. The process of finding an empty cell and
memories allowing them to be read in parallel. Updating filling it repeats until we solve the entire puzzle or until we
a bitmap requires three steps, namely reading the bitmap, reach an empty cell that can not be filled due to a conflict. In
modifying it and writing it back. the latter case, the partially filled sudoku is not valid forcing a
Two others modules worth mentioning are the checksum backtrack operation. In the backtrack state the last visited cell
calculator and the block calculator. The former module com- is popped from the stack, the symbol in that cell is read and
putes the checksum of the puzzle while the latter determines, cleared simultaneously. That read symbol is stored, in doing
based on the row, column and order of the puzzle, which block so the guess process will re-fill the cell only with symbols
is addressed. The block calculator is used to address the proper greater than the one causing the conflict. From the backtrack
block bitmap. state the processing unit returns to the guess state from which
TABLE V
it will backtrack one more cell (in case there is still a conflict) B ENCHMARK RESULTS
or start filling in empty cells again. Eventually the algorithm
Benchmark Puzzles (puzzle dimension - type of benchmark)
fills-in the last empty cell after which the puzzle is solved. run 3-a 3-b 4-a 6-a 7-a 8-a
0 0.021153 s 0.012237 s 0.221498 s 0.114990 s 0.211481 s 0.096214 s
TABLE IV 1 0.020691 s 0.012235 s 0.220870 s 0.115048 s 0.211223 s 0.096670 s
2 0.020642 s 0.012228 s 0.221676 s 0.115143 s 0.211264 s 0.096429 s
C ANDIDATE S ELECTION 3 0.020732 s 0.012932 s 0.221710 s 0.115157 s 0.211662 s 0.096686 s
4 0.020885 s 0.012348 s 0.221437 s 0.115206 s 0.211501 s 0.096408 s
5 0.020728 s 0.012930 s 0.221729 s 0.115650 s 0.210956 s 0.096666 s
Symbol 6 0.020788 s 0.012243 s 0.221397 s 0.115777 s 0.210929 s 0.096475 s
1 2 3 4 5 6 7 8 9 7 0.020715 s 0.012249 s 0.221544 s 0.115817 s 0.211370 s 0.096178 s
Row 0 0 1 0 0 1 0 0 0 8 0.020952 s 0.012255 s 0.220798 s 0.115617 s 0.211575 s 0.096357 s
9 0.020926 s 0.012943 s 0.221133 s 0.115062 s 0.211468 s 0.096156 s
Column 1 0 0 0 1 1 1 0 0 Avarage 0.020821s 0.012460 s 0.221379 s 0.115347 s 0.211343 s 0.096424 s
Block 0 0 0 0 1 1 0 1 0 std. dev. 0.000156 0.000330 0.000337 0.000328 0.000249 0.000203
Result 1 0 1 0 1 1 1 1 0
V. R ESULTS branches, that might contain the solution, in the search tree
could remain unexplored. An optimization we have actually
We synthesized and prototyped the design on a Xilinx implemented is to traverse the the rows based on the number
Virtex2P-30 FPGA. The design occupied 110 BlockRAMs of filled cells they contain. By visiting the rows in this order
(80% of the available ones), 2,436 Slices (17%), while the op- the probability of choosing the right path increases. Although,
erating frequency was 50 MHz limited by long wires required the technique showed promising results (i.e. speed-ups up to
to interconnect our logic with the distributed BlockRAMs. We 30 times) for order N = 3 puzzles it did not help us in solving
used the benchmarks provided in [3] to evaluate the efficiency the hard instances within the time limit. We have ran this
of our design. Our sudoku solver seems to work well for order technique in simulation only, we failed to have it working on
N = 3 sudokus. However, the solver requires significantly the FPGA before the Sudoku design competition deadline.
more effort solving hard puzzles of order N = 4 and higher.
The solver is able to solve order N = 3 puzzles which are VII. C ONCLUSION
classified as hard. For higher-order puzzles the solver can The Brute-force technique seems to be a feasible method
only solve easy instances. Harder instances take an excessive for solving sudoku puzzles. However, the technique is not
amount of time to be solved (not completed at least within applicable to hard instances or high-order sudokus. In order
an hour). Therefore we have not been able to measure the to improve the brute-force algorithm the search needs to be
execution time for most of the benchmark puzzles. We tried directed. A hybrid solver using both the brute-force and an
to solve benchmark puzzle 10a which the solver was not able elimination algorithm could lead to a significant decrease in
to solve within 24 hours. A ten-run benchmark of our solver the possibilities that need to be explored. However, we could
is depicted in Table V. This table shows the results for the not find an elimination method fitting the available resources.
puzzles our solver can solve within reasonable time only. The brute-force algorithm we have implemented can find the
next empty cell and determine a valid symbol in constant time.
VI. P ROPOSED I MPROVEMENTS
The former, is the prime improvement over a software version
We have thought of various optimizations to accelerate the of this algorithm. However, it does not solve the exhaustive
exhaustive search that our solver is performing. Initially, we nature of the algorithm.
planned to use a hybrid algorithm composed of an elimina-
tion algorithm and the brute-force algorithm. Starting with ACKNOWLEDGMENT
the elimination algorithm some blank cells might be found. We would like to acknowledge the hosts of the design
Whenever the elimination algorithm gets stuck, the brute- competition since we enjoyed this challenging exercise. Fur-
force algorithm could be used to advance. The elimination thermore, we would like to thank all those who inspired us
algorithm can be used after every guess by the brute-force and gave us advice.
algorithm, reducing the search space considerable. However,
R EFERENCES
strong elimination algorithms require a significant amount
of information to be kept available. The memory usage of [1] I. Lynce and J. Ouaknine, “Sudoku as a sat problem.”
[2] I. Skliarova and A. de Brito Ferrari, “Reconfigurable hardware sat solvers:
the elimination algorithms exceeds the memory offered by A survey of systems,” IEEE Trans. Comput., vol. 53, no. 11, pp. 1449–
the target FPGA by far. We have been experimenting with 1461, 2004.
elimination techniques that only required the bitmaps we have [3] Sudoku Benchmarks, “http://fpt09.cse.unsw.edu.au/comp/benchmarks.html.”
available. We concluded that such algorithm can only deduce
the value of a cell in very trivial situations and would therefore
be of no use. Another improvement we considered is having
multiple brute-force processing units to operate on the puzzle
in parallel. This would, however, have a negative impact on the
performance because the processing units will interfere with
each other which will in most cases not lead to a solution since