Solving Nonograms (Picross)

Hey, it's Friday afternoon, let's have a fun puzzle/algorithm problem to solve.

One of my favorite Nintendo DS games is Picross DS. The game is quite simple, it involves solving puzzles called Nonograms. You can try a simple online Picross clone here: TylerK's Picross.

Nonograms are a grid, with sequences of numbers defined for every row and column of the grid. The numbers define blocks of "filled in" squares for that row/column, but the unfilled areas on either sides of the blocks are not defined. For example, if you have a row that looks like this:

Possible solutions for that row would include:

etc.

The "4 5" simply tells you that, somewhere in the row, there are 4 sequential blocks filled in, followed by 5 sequential blocks filled in. Those will be the only blocks filled in, and the amount of space before/after them are not defined.

A puzzle is complete when all rows and columns meet their definitions, without any contradictions.

Extremely simple game in concept, but it can take quite a while to solve some of them manually. Picross DS's puzzles gradually increase in size up to 25x20 grids, which would generally take me about half an hour to solve.

However, it always occurs to me that it'd be a pretty easy game to write a program to solve. I've never gotten around to it, but perhaps some people here will enjoy the challenge. If a decent number of solutions are posted, I'll benchmark them on a large puzzle against each other, similar to what Paolo Bergantino did here with his Boggle question. There's quite a bit of information on the Nonogram Wikipedia page about methods of attacking a puzzle, if you want to reference that.

Here's an easy-to-parse definition of Puzzle #1 from TylerK's Picross, so you have something to feed a program. Line 1 is puzzle dimensions (probably unnecessary), line 2 is row definitions, line 3 is column definitions. This is just the first thing that came to mind, so if anyone can think of a better input format, feel free to comment or edit this post to include it.

15 15
15,4 5,2 4,1 3,2,2,2 4 3,2 6 2,2 1 6 2,2 1 1 4 2,1 1,1 3 2 1,2 2 1 2 1,3 3 2 1,9
4 4,3 1 2 3,2 1 2 2,2 1 1,1 4 2,1 3,1 8,1 3 1 1,1 4 2 1,1 4,2 4 3,3 3 3,4 1,10 3,10


ANSWERS:


Yes, the problem is NP-complete, but that only means that you are (pretty much) stuck with exponentially growing run-times as the size of the puzzle grows. But in real life puzzle sizes don't grow. Hardly anyone bothers to build puzzles that are bigger than 100x100 and the vast majority are smaller than 50x50. Building a solver that will solve 95% of the puzzles published in books and magazines in a fraction of second is actually not particularly challenging. A fairly straight-forward depth-first search system will do it.

What's less obvious is that there is a small fraction of puzzles that are quite nasty and will cause run-times to explode for most simple solvers. Most of these are badly designed puzzles that are also difficult or impossible for humans to solve, but some are particularly clever puzzles that experienced human solvers can readily solve, using deeper insights than most AI programs can manage.

I've written a solver of my own that will solve most puzzles very quickly, and I've done a survey of many other solvers with benchmark results comparing their performance. I also have a discussion of an interesting class of hard puzzles (the Domino puzzles) that demonstrates how some puzzles that are not hard for a clever human prove very hard for most programs. Links to my solver and the Domino Puzzle will be found in the survey.

I think there is still a lot of room for improvement and would encourage people with fresh ideas to take a shot at it. But it's worth noting that the obvious things have been done and there isn't much use in doing them again.


Determining whether a Nonogram solution exists/is unique is NP-hard. See


Rather than trying to place the "first" row, it will cut down on your search substantially if you try to get information from the "most constrained" row or column, which may have several forced values. A quick indication is to add up all the values in a row/column and add #_of_values - 1, then look for the row/column with the biggest such value (or smallest difference between this value and the # of rows or columns, if the rows and columns differ). Thus if you have a 25x25 puzzle, and one of the rows is "5 1 1 6 1 1 3", that row has a value of 24, which means it is very constrained - the relative position of all but one of the blank squares in that row is known, and the last blank square can be in any of 8 different relative positions. So for each group of filled squares, there are only two possibilities: extra blank square is to the left of this group, or extra blank square is to the right of this group. So, for example, five of the filled squares in that group of 6 are already known.

Once you have some forced values from one direction, that further constrains any of the groups from the other direction that intersect with the known information. So the best approach is likely to be working back and forth between columns and rows as more information becomes known, which is pretty much the way you need to solve it by hand.


This seems like a pretty obvious case for depth first search with backtracking, as with the "n queens" problem. The sort of cute part is that as well as placing rows/columns, you can shift the blocks.

Okay, here's an outline.

  1. Start with an empty board, place the first row

  2. Now, with that board, place a second row, check it against the column constraints. If it passes, recursively try the next row against that state; if it doesn't pass, then try the next possible placement of that row.

  3. If at any time you run out of possible placements of a row that satisfy the constraints, then the puzzle has no solution. Otherwise, when you run out of rowns, you've solved the problem.

It's convenient that these rows can be treated as binary numbers, so there is a natural ordering to the possibilities.


The real problem is whether anyone can come up with an algorithm that solves the said puzzle faster than a human? This is easy for relatively easy puzzles such as the reference one but if the puzzle grows most of the algorithms here will quickly slow down. Here is the puzzle I tried to solve. The problem is that the 4th row for example has 2220075 possible combinations I believe. If Charlie's algorithm temporarily accepts a wrong row for row 3, it will go through all these combinations for row 4. And this is not to mention the case where the algorithm contradicts itself on row 35 over a mistake it made on row 2.

My algorithm was similar to John's. It failed to run in x86 mode on my 64bit desktop. When I switched it to 64bit mode and let it run overnight, my computer was completely unusable in the morning. The process was reserving 8 Gigs of memory (8 Gigs physical on the desktop) and the computer wouldn't respond due to the frantic swapping. And it hadn't even solved all the possible rows. Not to mention it hadn't even touched the possible column combinations.

List<List<int>> rows =
                new List<List<int>>()
                {
                    new List<int> { 8,29,4 },
                    new List<int> { 6,4,25,4,3 },
                    new List<int> { 5,3,2,3,9,4,2,1,3 },
                    new List<int> { 4,2,2,2,2,1,2,2 },
                    new List<int> { 4,1,1,9,10,2,2,1 },
                    new List<int> { 3,2,6,5,5,1,1 },
                    new List<int> { 3,1,5,5,1,1 },
                    new List<int> { 3,1,4,4,1,1 },
                    new List<int> { 3,1,4,4,1,1 },
                    new List<int> { 3,1,3,3,1,1 },
                    new List<int> { 3,1,3,6,2 },
                    new List<int> { 3,1,2,3,2,4,2 },
                    new List<int> { 4,3,1,8,7,1,2,3 },
                    new List<int> { 4,2,1,12,11,1,2,4 },
                    new List<int> { 5,1,2,7,2,2,6,1,1,4 },
                    new List<int> { 4,1,1,1,6,2,2,6,1,2,1,3 },
                    new List<int> { 4,1,1,2,4,3,4,3,1,1,1,1,3 },
                    new List<int> { 4,1,1,2,1,4,1,2,3,2,1,2,2 },
                    new List<int> { 3,1,1,1,2,5,6,1,1,1,3,2 },
                    new List<int> { 3,2,1,1,2,1,5,4,4,2,1,2,1,2 },
                    new List<int> { 3,2,2,1,1,4,2,2,3,1,1,2,1,1,2 },
                    new List<int> { 3,1,3,2,1,1,4,1,5,3,2,1,3,1,2 },
                    new List<int> { 3,1,2,1,2,1,3,7,4,1,4,2,2 },
                    new List<int> { 2,1,4,1,1,1,2,6,2,2,2,3,2,1 },
                    new List<int> { 2,2,4,1,2,1,2,5,2,1,1,3,2,1 },
                    new List<int> { 2,2,1,4,1,1,3,3,2,1,4,4,1 },
                    new List<int> { 2,3,3,2,1,3,3,7,4,1 },
                    new List<int> { 2,3,2,4,5,8,1,2,1 },
                    new List<int> { 1,1,3,11,6,7,1,3,1 },
                    new List<int> { 1,1,2,2,13,10,2,3,2 },
                    new List<int> { 1,2,3,1,6,1,1,7,1,5,2 },
                    new List<int> { 1,1,3,2,6,1,1,1,1,4,1,4,2 },
                    new List<int> { 1,1,6,7,2,4,2,5,6,1 },
                    new List<int> { 1,1,2,3,1,4,2,2,11,2,1 },
                    new List<int> { 1,1,1,1,2,1,5,10,1,1,1 },
                    new List<int> { 1,1,1,1,4,7,4,10,1,1,1 },
                    new List<int> { 1,2,1,1,28,1,1,3 },
                    new List<int> { 1,2,1,2,27,2,1,3 },
                    new List<int> { 1,1,1,1,26,1,1,1,1 },
                    new List<int> { 2,3,1,28,2,1,2,1 }
                };
            List<List<int>> cols =
                new List<List<int>>()
                {
                    new List<int> { 40 },
                    new List<int> { 28,1 },
                    new List<int> { 23,8 },
                    new List<int> { 5,6,7,4 },
                    new List<int> { 3,6,1,9,3,1 },
                    new List<int> { 2,3,2,5,4,2,2 },
                    new List<int> { 1,2,4,1,2,5,2 },
                    new List<int> { 1,1,4,9,2,3,2 },
                    new List<int> { 2,4,2,6,1,4,3 },
                    new List<int> { 1,4,1,3,4,1,6 },
                    new List<int> { 1,4,3,2,3,5,5 },
                    new List<int> { 2,4,1,2,3,4,1,3 },
                    new List<int> { 1,2,3,4,2,2,4,4,1 },
                    new List<int> { 1,1,2,3,2,1,4,2,4 },
                    new List<int> { 2,3,5,3,3,5,4 },
                    new List<int> { 3,1,6,1,2,5,5 },
                    new List<int> { 3,2,6,2,15 },
                    new List<int> { 3,1,8,2,13 },
                    new List<int> { 2,2,4,5,15 },
                    new List<int> { 2,2,2,2,22 },
                    new List<int> { 2,1,1,1,12,6 },
                    new List<int> { 2,1,10,4,5 },
                    new List<int> { 3,1,3,1,2,4 },
                    new List<int> { 3,1,1,4,3,1,4 },
                    new List<int> { 3,2,2,3,2,2,5 },
                    new List<int> { 3,1,1,5,1,1,5 },
                    new List<int> { 3,1,1,5,1,1,5 },
                    new List<int> { 3,1,1,5,1,1,5 },
                    new List<int> { 3,2,5,2,1,1,4 },
                    new List<int> { 3,1,1,3,2,2,4 },
                    new List<int> { 3,1,6,4,5 },
                    new List<int> { 2,2,12,2,6 },
                    new List<int> { 2,2,1,1,22 },
                    new List<int> { 2,1,2,2,5,15 },
                    new List<int> { 3,1,4,3,2,14 },
                    new List<int> { 3,1,7,2,1,13 },
                    new List<int> { 3,2,6,1,1,6,8 },
                    new List<int> { 3,2,5,2,2,4,7 },
                    new List<int> { 2,1,2,4,1,1,1,4,1,4,2 },
                    new List<int> { 1,1,4,4,3,1,4,5,1 },
                    new List<int> { 1,1,5,1,1,2,1,2,2,3,2 },
                    new List<int> { 1,5,2,2,1,5,5,3 },
                    new List<int> { 1,6,2,1,4,2,6,1 },
                    new List<int> { 1,6,2,6,5,2 },
                    new List<int> { 1,5,3,1,9,2 },
                    new List<int> { 2,2,4,2,6,3 },
                    new List<int> { 1,2,2,2,9,2,1 },
                    new List<int> { 3,5,5,8,4 },
                    new List<int> { 4,13,9 },
                    new List<int> { 27,2 }
                };

Copyright goes to Tampere Guild of Information Technology / Kaisapais / A finnish brewery.


I don't have enough time right now to flesh out a solution, but this is how I would tackle it.

"function1" determine the possible combinations for a row or column given the constraints of the numbers on the top or side, and the squares that are already filled out and those that have been emptied.

"function2" take the output from function1 and logically and all the results together - the spots with ones could be filled in.

"function3" take the output from function1 and logically or all the results together - the spots with zeros could be emptied.

keep applying function2 and function3 to all the rows and collumns until no more boxes are emptied or filled in. If the puzzle is solved then, you are done.

If the puzzle is not solved, then take a row or column that has the fewest possibilities and apply the first possibility. Then apply function2 and function3 to the new board. If it leads to a contradiction (0 possibilities for a row or column) then un-apply the possibility and try a different one. Keep recursing like this until a solution is found.


Some months ago I wrote a solver for nonograms on C++. It just look for all permissible positions for shaded and blank cells. And on every solution step it looks if every position is ok. So here is the result for Chad Birch's nonogram with timing: .

And I tried my solver for Mikko Rantanen's nongram too: .


Steven Simpson has written a nonogram solver that's freely available in different versions, including a JavaScript script. He discusses details of the algorithms on that website (such as here- basically, he's using a series of linesolvers that trade off speed vs. completeness, coupled with divide and conquer by guessing when all the linesolvers reach a dead end. He also has links to other approaches, which cover more ground than we have here.


Let me point out 2 interesting twists on the classic nonogram puzzles :

  • When the puzzle does more than just list lengths of occupied cells. This public challenge constrained some cells in advance as being occupied :

  • When the nonogram contains more than just empty/occupied cells, but uses patches of different colors to occupy the cells. For example see ; from solving these by hand, I get the feeling that these are actually easier to solve than the regular nonograms.

A particular challenge for algorithm designers is that the colored nonograms benefit greatly from considering horizontal/vertical constraints together. The usual line-per-line solvers are at a clear disadvantage here.


Here is what I found:

All NP Complete problems have the same feel. It's always easy to solve the next 80% of remaining cases. For example, nanograms decompose into a single lines. One can write a routine, solve_one_line(old_state, line_definition, new_state) to update what is known about a single line, then keep iterating over rows and columns. Then it fails on a few cases, so you write something to solve 80% of those cases. Then you add random numbers and solve every case you have ever found, but it is possible to construct an optimally bad one.

Other games following this pattern are MineSweeper and Soduku.

Thinking in Parallel is hard. For example, you might figure out that the solve_one_line routine above will not affect another row if running on a row or another column if running on a column.

Now you get:

  all_until_completion(rows):
      solve_one_line(...)
  all_until_completion(cols):
      solve_one_line(...)

This gets you up to running 20 cores (on a 20x20) without data locking or anything. Then you think about running on a graphics card with each cell a processor. Then you notice how much time has passed.

One time I felt old was looking at modern computer science textbook where O(n) notation was replaced with O(n, p) notation because no one would care to evaluate an algorithm for a single processor. The 8 queens solution is a great, fast recursive algorithm with fast-fail, efficient memory use, and only runs on one processor.

Problems are good excuses to play. Grinding out more of the same gets boring. Look through a list of technologies you with which might want more experience: behavior driven testing; dependency injection; pure functional programming; neural nets; genetic algorithms; fast, sloppy and out of control; GPGPU; OCR; example driven learning; crowdsourcing; etc. Pick one and try to use it somehow to solve the problem. Sometimes the goal is not to solve the problem but to wander through unknown territory.

Contribute something.* This can be a simple as a write-up. Better might be a repository of hundreds of nanograms so others can play. [Let me know if a repository exists, otherwise I will make one]. Start contributing as soon as you have something you find kind of neat. Heed the Klingon words: Perhaps today is a good day to die; I say we ship it.

So write bizarre, parallel solutions to NP problems and share them. And have a great day!



 MORE:


 ? How can I find words in matrix of letters
 ? java string permutations and combinations lookup
 ? 4x4 Word Grid - Java
 ? Optimize boggle algorithm
 ? Algorithm to find word on Boggle board
 ? keep going out of bounds in my recursive method to navigate and find all combinations of letters in a 4X4 matrix
 ? where is mistake ? php letter matrix boggle
 ? How to create a Boggle Board from a list of Words? (reverse Boggle solver!)
 ? How to find all possible words that can be created from a list of letters
 ? What is the fastest way to match string prefixes for the building of a Radix search tree?