0% found this document useful (0 votes)
28 views4 pages

The TU Delft Sudoku Solver On FPGA

This document describes a hardware implementation of a Sudoku solver on an FPGA. It discusses various Sudoku solving algorithms and why most were not suitable for implementation on an FPGA due to high memory or logic requirements. The implemented algorithm uses a brute-force approach that tries all possible combinations of numbers in each cell until it finds a valid solution or reaches a conflict. The performance of this solver is analyzed for Sudoku puzzles of order 3 to 15.

Uploaded by

amjaduet10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views4 pages

The TU Delft Sudoku Solver On FPGA

This document describes a hardware implementation of a Sudoku solver on an FPGA. It discusses various Sudoku solving algorithms and why most were not suitable for implementation on an FPGA due to high memory or logic requirements. The implemented algorithm uses a brute-force approach that tries all possible combinations of numbers in each cell until it finds a valid solution or reaches a conflict. The performance of this solver is analyzed for Sudoku puzzles of order 3 to 15.

Uploaded by

amjaduet10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The TU Delft Sudoku Solver on FPGA

Kees van der Bok #1 , Mottaqiallah Taouil #2 , Panagiotis Afratis #3 , Ioannis Sourdis #4
#
Computer Engineering
Delft University of Technology
The Netherlands
1
[email protected],
2
[email protected],
3
[email protected],
4
[email protected]

Abstract—Solving Sudoku puzzles is a mind-bending activity II. S UDOKU S OLVING


that many people enjoy during their spare time. As such, for
those being acquainted with computers, it becomes an irresistible Automatic solving of sudoku puzzles can be done in various
challenge to build a computing engine for sudoku solving. Many ways. There are a few well-known problems, for which
sudoku solvers have been developed recently, using advanced
techniques and algorithms to speed-up the computation. In this algorithms exist, showing similarity to the sudoku problem.
paper, we describe a hardware design for an FPGA implementa- Solving a sudoku is, foremost, a constraint satisfaction prob-
tion of a sudoku solver. Furthermore, we show the performance lem, but could also be regarded as an exact-cover problem,
of the above design for solving puzzles of order N 3 to 15. graph-coloring problem or binary-satisfaction problem.
Besides converting the sudoku problem to a known problem,
I. I NTRODUCTION sudoku solvers that mimic the human solving scheme have
been developed. This solving method is usually referred to
Using an FPGA to solve a sudoku puzzle is an inter- as the elimination or solving-by-logic method. In essence the
esting challenge and valuable test case for general purpose method is very closely related to the graph-coloring problem.
algorithm execution on FPGA. Recently, executing general The elimination method uses logic reasoning based on the
purpose applications on FPGA has grown in popularity and constraints of the sudoku puzzle to exclude the symbols
has proven to be more efficient in many cases. Although that can not be placed in a certain cell. The crux of the
FPGAs are much slower then GPPs, regarding the operating method is that if one candidate remains, when all others have
frequency, better performance can be achieved exploiting high been proven infeasible, that symbol can be filled in. Usually
degree of parallelism and customization, as well as the ability these solvers implement rules derived from the pen-and-paper
to keep data local. Before designing our sudoku solver we methods. The algorithm requires the possible candidates to
considered the available algorithms and how they could be be kept in memory. This can be done by assigning to each
mapped to an FPGA. Most algorithms turned out to have cell of the puzzle a bitmap in which each bit represents
excessive resource requirements, either in memory size or a certain symbol (e.g. bit 0 represents symbol 1 etc.). A
in logic. Our design choice is a brute-force algorithm, the set bit identifies that the symbol represented by that bit is
only design possibility that fitted in the target FPGA device a candidate for the cell the bitmap relates to. Elimination
(Virtex2P-30). We expected the brute-force algorithm to be rules must be applied until one candidate remains (i.e. one
faster than a software version, because of the more efficient and only one bit remains set in the bitmap). The cell can
way in which the valid symbols for a cell can be determined. then be filled with the symbol represented by this bit. An
Although we have improved the basic step of algorithm, the advantage of this algorithm is that it is suitable for parallel
symbol selection, we have not solved the exhaustive nature of execution. Unfortunately the elimination algorithm requires
the brute-force solver (i.e. the solver may have to go through more memory than available on the FPGA. Storing the bitmaps
all the valid symbol assignments). The previously described requires N 4 ∗ 225 bits, for each cell a 225-bit bitmap. For
issue causes the solver to become intractable for hard or large N = 15 we need 154 ∗225 = 11390625 ≈ 12M b This exceeds
sudoku problems. Therefore, we conclude that the brute-force the available memory of the V2P30 FPGA, which offers only
solver needs to be enriched with techniques that prune the 1.4M b of Block RAM.
search space. We explored the benefits of filling empty cells There are other, less memory demanding, methods of storing
in a particular order. Although the former reduces the solving the sudoku information. However, these methods decrease
time, it would not make the hard and large problems tractable. the amount of information available and therefore weaken
The following sections describe our brute-force sudoku the strength of the elimination method considerably. Another
solver implemented on an FPGA. The algorithm and design issue we discovered when analyzing this method is that the
are explained and the performance is analyzed. algorithm is not complete. Therefore, the algorithm is not
TABLE I
capable of solving hard sudoku instances. A N E XAMPLE S UDOKU
Another algorithm we considered is the dancing-links algo-
rithm, which is an elegant algorithm solving the exact cover 6 0 0 0 0 0 0 0 3
problem. Despite the elegance of the algorithm, it turned out to 8 0 0 4 5 6 1 0 0
0 5 0 0 0 0 0 0 0
be unfeasible to implement this approach on the FPGA. The 0 1 5 9 0 0 3 0 0
latter because the algorithm’s memory requirements exceed 0 0 0 0 1 0 0 0 0
the memory available on the FPGA. 0 6 0 0 8 0 5 0 7
A method for converting the sudoku problem to a binary- 0 0 2 0 0 0 0 0 0
satisfiability problem has been proposed in [1]. Using such 9 0 0 0 0 1 7 4 0
4 7 0 0 9 0 0 0 6
a description the sudoku could be solved using an FPGA
based binary-satisfiability solver as for example described in TABLE II
ROW B ITMAPS
[2]. An FPGA based binary-satisfiability solver requires an
excessive amount of logic and is therefore only practical for row Symbol
small problems. The solver required to solve a sudoku puzzle 1 2 3 4 5 6 7 8 9
following the above approach requires more than the available 0 0 0 1 0 0 1 0 0 0
FPGA logic resources. 1 1 0 0 1 1 1 0 1 0
2 0 0 0 0 1 0 0 0 0
The most straightforward method of solving a sudoku is 3 1 0 1 0 1 0 0 0 1
by brute force. Brute-force sudoku solvers fill-in cells spec- 4 1 0 0 0 0 0 0 0 0
ulatively taking into account the constraints that apply to 5 0 0 0 0 1 1 1 1 0
a sudoku. Cells are filled in until a conflict is discovered. 6 0 1 0 0 0 0 0 0 0
On conflict the solver clears the filled-in cells until it has 7 1 0 0 1 0 0 1 0 1
8 0 0 0 1 0 1 1 0 1
returned to a cell that has untried candidates. The brute-force
method performs an exhaustive search which will always find
the solution, if one exists. However, finding a solution might A. Main Controller
require unacceptable time for large puzzles. Heuristics can be The overall orchestration of the sudoku solver is done
used to minimize the search space. The brute-force method by the main controller which is a simple state machine.
could be efficient when combined with an elimination method. This state machine resembles the overall state of the sudoku
III. T HE A LGORITHM solver. Based on this state access to the storage module is
The algorithm used in our design is similar to the brute- either granted to the communication interface module or the
force method described in the previous section. In this section, processing module.
we elaborate in more detail the algorithm and its FPGA B. Communication
implementations.
We use bitmaps to check valid symbols for a certain cell. Communication between the board and the host PC uses
These bitmaps represent symbols present in a unit. A unit the well known RS-232 protocol. RS-232 sends the data bit
is a row, column or block. Bitmaps are maintained per row, by bit. The nature of RS-232 is asynchronous. Each, serially
column and block. Table II shows the row bitmaps of the transmitted, byte will start with a start bit and end with a stop
sudoku depicted in Table I. Using the bitmaps we determine bit. The receiver uses the start and stop bits to synchronize.
a valid symbol in constant time. The algorithm chooses the Two simple state machines perform the communication, one
symbol with the lowest numerical value among the candidates. for transmitting and one for receiving. A third state machine
The puzzle is traversed in row-major order until a cell is is used to orchestrate the higher-level procedure of receiving
encountered that has now valid candidate symbols. In case, the puzzle and transmitting the solution after the puzzle has
during the solving process a cell is encountered that can been solved.
not be assigned a symbol (i.e. every possible symbol for C. Storage
that cell conflicts with the sudoku constraints), the solver
All the information regarding the sudoku is maintained
needs to backtrack to a cell that has at least one possible
within the storage module. Besides storing the sudoku and
alternative assignment. The initial bitmaps are constructed
the bitmaps, the storage module calculates the checksum and
when the sudoku puzzle is received from the RS-232 unit
updates the bitmaps. Writing or reading from or to the storage
and stored in memory. While solving the puzzle the bitmaps
module is kept simple. All necessary processing and calcu-
are continuously updated. This is achieved by using bitwise
lations are hidden from other modules. Two control signals
operations, which set the appropriate bits while writing, while
are used to select one of the four operating modes in which
they clear the bits while backtracking.
the storage module can operate. The operation modes are
IV. D ESIGN described in Table III. Neutral reads are used by the processing
The design is composed of four main parts (Figure 1): module to read the symbol and bitmaps that are related to the
control, communication, storage and processing. Each of these addressed cell. The clear mode is used for backtracking. In
modules are described in the following subsections. this mode the symbol in the addressed cell will be cleared
Sudoku_Processing

State_Machine: next_empty_cell
Symbol Bitmap
Idle
check check
start
valid
Next_empty_cell Guess
next cell found Priority control
All cells filled
encoder logic
Guess Back_track
error Back_track (stack)
Row Column
restored last valid fill
Stack stack
Solve

Fig. 2. The Sudoku Processing Unit

D. Processing
Fig. 1. Top-level View of the Design
Figure 2 is a simplified representation of the processing unit.
Although the low-level details are kept out in this figure, it
as well as the related bits in the bitmaps. The write mode is clearly shows which steps are involved in the solving process
used to write symbols to the storage module and is used when as well as the flow of it. After being enabled, the processing
the initial puzzle is stored or when cells are filled-in. The unit will go to the next-empty-cell state. In this state, the
destructive read is applied when the solved puzzle is read out. next empty cell is determined; this is performed by checking
In addition, the destructive read clears the bitmaps related to the bitmaps representing the occupied cells. Each bit of the
each cell. That is because the bitmaps need to be cleared before bitmap represents a cell, only the bits representing occupied
the next puzzle is read in. Performing the bitmap clearing cells are set. Finding the first not-set bit in a bitmap is done
while reading out the solution saves valuable time. Whenever using a priority encoder. After having selected the cell, the
the global reset signal is asserted the communication interface state machine proceeds to the guess state in which a valid
will get to a state in which the bitmaps will be cleared as well. symbol for the selected cell is determined. Based on the puzzle
In this state every memory location is read once in destructive depicted in Table I a valid symbol is determined as follows.
mode. The first empty cell in this sudoku is (1,2) (i.e. first row, second
column). Table IV shows the bitmaps of the row, column and
TABLE III block of the corresponding cell. Performing a bitwise OR of
S TORAGE OF THEO RIGINAL S UDOKU these three bitmaps will give the candidate symbols that could
go in the cell (i.e. these symbols are represented by the ’0’ in
Mode Description
the result vector). From the result vector we conclude that 2, 4
Neutral Read Reads symbol and bitmaps
related to the address cell and 9 are valid symbols for cell (1,2). We select the first option
Destructive Read Same as neutral read but which is 2. Choosing the first candidate, instead of randomly
clears the bitmaps selecting one, saves logic and memory since we do not have
Write Write symbol and updates bitmaps to keep track of which symbols have been tried. Furthermore
Clear Clears symbol and updates bitmaps
for choosing the first option we only need a priority encoder.
When filling in a 2 in cell (1,2) we need to update the bitmaps;
this, however, is done by the storage module and is of no
The storage module contains five memories, one for the concern to the processing module. Whenever a cell is filled
sudoku and four for the bitmaps (rows, columns, blocks, the address of the cell is pushed on stack to memorize the
and occupied cells). The bitmaps are stored in four separate backtracking path. The process of finding an empty cell and
memories allowing them to be read in parallel. Updating filling it repeats until we solve the entire puzzle or until we
a bitmap requires three steps, namely reading the bitmap, reach an empty cell that can not be filled due to a conflict. In
modifying it and writing it back. the latter case, the partially filled sudoku is not valid forcing a
Two others modules worth mentioning are the checksum backtrack operation. In the backtrack state the last visited cell
calculator and the block calculator. The former module com- is popped from the stack, the symbol in that cell is read and
putes the checksum of the puzzle while the latter determines, cleared simultaneously. That read symbol is stored, in doing
based on the row, column and order of the puzzle, which block so the guess process will re-fill the cell only with symbols
is addressed. The block calculator is used to address the proper greater than the one causing the conflict. From the backtrack
block bitmap. state the processing unit returns to the guess state from which
TABLE V
it will backtrack one more cell (in case there is still a conflict) B ENCHMARK RESULTS
or start filling in empty cells again. Eventually the algorithm
Benchmark Puzzles (puzzle dimension - type of benchmark)
fills-in the last empty cell after which the puzzle is solved. run 3-a 3-b 4-a 6-a 7-a 8-a
0 0.021153 s 0.012237 s 0.221498 s 0.114990 s 0.211481 s 0.096214 s
TABLE IV 1 0.020691 s 0.012235 s 0.220870 s 0.115048 s 0.211223 s 0.096670 s
2 0.020642 s 0.012228 s 0.221676 s 0.115143 s 0.211264 s 0.096429 s
C ANDIDATE S ELECTION 3 0.020732 s 0.012932 s 0.221710 s 0.115157 s 0.211662 s 0.096686 s
4 0.020885 s 0.012348 s 0.221437 s 0.115206 s 0.211501 s 0.096408 s
5 0.020728 s 0.012930 s 0.221729 s 0.115650 s 0.210956 s 0.096666 s
Symbol 6 0.020788 s 0.012243 s 0.221397 s 0.115777 s 0.210929 s 0.096475 s
1 2 3 4 5 6 7 8 9 7 0.020715 s 0.012249 s 0.221544 s 0.115817 s 0.211370 s 0.096178 s
Row 0 0 1 0 0 1 0 0 0 8 0.020952 s 0.012255 s 0.220798 s 0.115617 s 0.211575 s 0.096357 s
9 0.020926 s 0.012943 s 0.221133 s 0.115062 s 0.211468 s 0.096156 s
Column 1 0 0 0 1 1 1 0 0 Avarage 0.020821s 0.012460 s 0.221379 s 0.115347 s 0.211343 s 0.096424 s
Block 0 0 0 0 1 1 0 1 0 std. dev. 0.000156 0.000330 0.000337 0.000328 0.000249 0.000203
Result 1 0 1 0 1 1 1 1 0

V. R ESULTS branches, that might contain the solution, in the search tree
could remain unexplored. An optimization we have actually
We synthesized and prototyped the design on a Xilinx implemented is to traverse the the rows based on the number
Virtex2P-30 FPGA. The design occupied 110 BlockRAMs of filled cells they contain. By visiting the rows in this order
(80% of the available ones), 2,436 Slices (17%), while the op- the probability of choosing the right path increases. Although,
erating frequency was 50 MHz limited by long wires required the technique showed promising results (i.e. speed-ups up to
to interconnect our logic with the distributed BlockRAMs. We 30 times) for order N = 3 puzzles it did not help us in solving
used the benchmarks provided in [3] to evaluate the efficiency the hard instances within the time limit. We have ran this
of our design. Our sudoku solver seems to work well for order technique in simulation only, we failed to have it working on
N = 3 sudokus. However, the solver requires significantly the FPGA before the Sudoku design competition deadline.
more effort solving hard puzzles of order N = 4 and higher.
The solver is able to solve order N = 3 puzzles which are VII. C ONCLUSION
classified as hard. For higher-order puzzles the solver can The Brute-force technique seems to be a feasible method
only solve easy instances. Harder instances take an excessive for solving sudoku puzzles. However, the technique is not
amount of time to be solved (not completed at least within applicable to hard instances or high-order sudokus. In order
an hour). Therefore we have not been able to measure the to improve the brute-force algorithm the search needs to be
execution time for most of the benchmark puzzles. We tried directed. A hybrid solver using both the brute-force and an
to solve benchmark puzzle 10a which the solver was not able elimination algorithm could lead to a significant decrease in
to solve within 24 hours. A ten-run benchmark of our solver the possibilities that need to be explored. However, we could
is depicted in Table V. This table shows the results for the not find an elimination method fitting the available resources.
puzzles our solver can solve within reasonable time only. The brute-force algorithm we have implemented can find the
next empty cell and determine a valid symbol in constant time.
VI. P ROPOSED I MPROVEMENTS
The former, is the prime improvement over a software version
We have thought of various optimizations to accelerate the of this algorithm. However, it does not solve the exhaustive
exhaustive search that our solver is performing. Initially, we nature of the algorithm.
planned to use a hybrid algorithm composed of an elimina-
tion algorithm and the brute-force algorithm. Starting with ACKNOWLEDGMENT
the elimination algorithm some blank cells might be found. We would like to acknowledge the hosts of the design
Whenever the elimination algorithm gets stuck, the brute- competition since we enjoyed this challenging exercise. Fur-
force algorithm could be used to advance. The elimination thermore, we would like to thank all those who inspired us
algorithm can be used after every guess by the brute-force and gave us advice.
algorithm, reducing the search space considerable. However,
R EFERENCES
strong elimination algorithms require a significant amount
of information to be kept available. The memory usage of [1] I. Lynce and J. Ouaknine, “Sudoku as a sat problem.”
[2] I. Skliarova and A. de Brito Ferrari, “Reconfigurable hardware sat solvers:
the elimination algorithms exceeds the memory offered by A survey of systems,” IEEE Trans. Comput., vol. 53, no. 11, pp. 1449–
the target FPGA by far. We have been experimenting with 1461, 2004.
elimination techniques that only required the bitmaps we have [3] Sudoku Benchmarks, “http://fpt09.cse.unsw.edu.au/comp/benchmarks.html.”
available. We concluded that such algorithm can only deduce
the value of a cell in very trivial situations and would therefore
be of no use. Another improvement we considered is having
multiple brute-force processing units to operate on the puzzle
in parallel. This would, however, have a negative impact on the
performance because the processing units will interfere with
each other which will in most cases not lead to a solution since

You might also like