How can I find words in matrix of letters

This is another question I was asked in the telephonic interview:

Given a dictionary and a crossword(2d matrix of characters) find all the dictionary words which can be found in the crossword.

All I could think of was hash the dictionary, find every possible word in the crossword and search the hash. I could not optimize it all.

Must admit Microsoft interview questions are tough :(

Please give me the lines to think on.


ANSWERS:


How about:

  • build a search tree for the dictionary (one level per letter)
  • for each position in the grid, start searching through the index, one character at a time, and proceed in each allowed direction until there are no entries left in the index, or you hit a grid boundary

I don't think a hash would be a very useful optimization here.


The most appropriate solution depends a lot on the constraints you expect to be dealing with. How large is your dictionary? How large is your crossword.

I'd suggest taking a look at Suffix trees. You can insert all the dictionary words into one. Then search the suffix tree for the rows, columns and diagonals. For the rows, start a search from the root of the tree for the first letter in each row and iterate through the tree as you pass through the row. Do the same from right to left if necessary. Similar story for columns and diagonals.

Tree construction is O(N) and consumes O(N) space, where N is the size of your dictionary in characters. Searching will then take O(PQ) time, where your crossword is of size PxQ. Giving an overall runtime of O(N+PQ) and space of O(N).

The thing is, though, suffix trees are a pain to implement. They really are. So you might prefer settling for a simple Trie, which will give you a total runtime of O(N+PQ(max(P,Q)).


This question is exactly how one would play Boggle.

This past question in SO more than sufficiently ANSWERS THIS QUESTION.

Have fun...


What was wrong with your answer?

  • A dictionary is sorted so I think I'd arrange the dictionary words into a prefix trie. This would help since there are probably lots of words where the prefix is also a word. The sorting helps (minimally) with the build time.

  • Then walk the crossword looking at all possible words. As you extract the characters of a potential word, you are walking down the trie - so you'll find the first word starting with a certain set of characters, but also be in the right place to continue to find other words starting with the same characters


Suppose the dictionary contains n words of average length k, and the matrix contains m² characters.

  1. Preprocess the dictionary into a prefix tree (aka trie). — O(kn)
  2. For each position in the matrix, look up the strings across and down in the trie. — O(m³)

Total time: O(max(kn, m³))

In realistic word-searches, the average length of words found in the matrix is more like k than like m, so the time taken would be O(k max(n, m²)).


I would compile the dictionary into a DFA that recognizes words in the dictionary, then run it over the rows and columns and diagonals of the letter matrix. Should be O(m+n) where m is the length of the dictionary in characters and n is the area (w*h) of the matrix.



 MORE:


 ? java string permutations and combinations lookup
 ? 4x4 Word Grid - Java
 ? Optimize boggle algorithm
 ? Algorithm to find word on Boggle board
 ? keep going out of bounds in my recursive method to navigate and find all combinations of letters in a 4X4 matrix
 ? where is mistake ? php letter matrix boggle
 ? How to create a Boggle Board from a list of Words? (reverse Boggle solver!)
 ? How to find all possible words that can be created from a list of letters
 ? What is the fastest way to match string prefixes for the building of a Radix search tree?
 ? Depth first search or backtrack recursion for finding all possible combination of letters in a crossword puzzle/boggle board?