This is another question I was asked in the telephonic interview:
Given a dictionary and a crossword(2d matrix of characters) find all the dictionary words which can be found in the crossword.
All I could think of was hash the dictionary, find every possible word in the crossword and search the hash. I could not optimize it all.
Must admit Microsoft interview questions are tough :(
Please give me the lines to think on.
- build a search tree for the dictionary (one level per letter)
- for each position in the grid, start searching through the index, one character at a time, and proceed in each allowed direction until there are no entries left in the index, or you hit a grid boundary
I don't think a hash would be a very useful optimization here.
The most appropriate solution depends a lot on the constraints you expect to be dealing with. How large is your dictionary? How large is your crossword.
I'd suggest taking a look at Suffix trees. You can insert all the dictionary words into one. Then search the suffix tree for the rows, columns and diagonals. For the rows, start a search from the root of the tree for the first letter in each row and iterate through the tree as you pass through the row. Do the same from right to left if necessary. Similar story for columns and diagonals.
Tree construction is O(N) and consumes O(N) space, where N is the size of your dictionary in characters. Searching will then take O(PQ) time, where your crossword is of size PxQ. Giving an overall runtime of O(N+PQ) and space of O(N).
The thing is, though, suffix trees are a pain to implement. They really are. So you might prefer settling for a simple Trie, which will give you a total runtime of O(N+PQ(max(P,Q)).
This question is exactly how one would play Boggle.
This past question in SO more than sufficiently ANSWERS THIS QUESTION.
What was wrong with your answer?
A dictionary is sorted so I think I'd arrange the dictionary words into a prefix trie. This would help since there are probably lots of words where the prefix is also a word. The sorting helps (minimally) with the build time.
Then walk the crossword looking at all possible words. As you extract the characters of a potential word, you are walking down the trie - so you'll find the first word starting with a certain set of characters, but also be in the right place to continue to find other words starting with the same characters
Suppose the dictionary contains n words of average length k, and the matrix contains m² characters.
- Preprocess the dictionary into a prefix tree (aka trie). — O(kn)
- For each position in the matrix, look up the strings across and down in the trie. — O(m³)
Total time: O(max(kn, m³))
In realistic word-searches, the average length of words found in the matrix is more like k than like m, so the time taken would be O(k max(n, m²)).
I would compile the dictionary into a DFA that recognizes words in the dictionary, then run it over the rows and columns and diagonals of the letter matrix. Should be
m is the length of the dictionary in characters and
n is the area (w*h) of the matrix.