Search for multiple words by prefix (trie data structure)

How can I use a trie (or another data structure or algorithm) to efficiently search for multiple words by prefix?

For example: suppose this is my data set:

  • Alice Jones
  • Bob Smith
  • Bobby Walker
  • John Doe
  • (10000 names in total)

A trie data structure allows me to efficiently retrieve all names starting with "Bo" (thus without iterating over all the names). But I also want to search on the last name by prefix, thus searching for "Wa" should find "Bobby Walker". And to complicate things: when the user searches for "Bo Wa" this should also find the same name. How can I implement this? Should I use a separate trie structure for each part of the name? (And how to combine the results)?

Background: I'm writing the search functionality for a big address book (10000+ names). I want to have a really fast autocomplete function that shows results while people are typing the first few letters of the first & last name. I already have a solution that uses a regex, but it needs to iterate over all names which is to slow.


ANSWERS:


A very good data structure would be a BurstTrie. See:

There's a scala implementation here:


You could try a second trie with the reversed string and a wildcard search:


I think that a sorted array will also fit for your requirements, an array containing Person objects (they have a firstName and a lastName field). Let's say that you have a prefix and want to find all the values that fit your prefix. Simply run a binary search to find the first position (let's say is firstIndex) where your prefix appears on firstName and one more to find the last position (lastIndex). Now you can retrieve your values in O(lastIndex - firstIndex). The same goes when you want to find them by lastName. When you have a prefixFirstName and a prefixLastName you can search for the interval where values match for prefixFirstName and then, on this interval, you can check for the values matching the prefixLastName. For a conclusion, when you have one or two prefixes you run 4 binary searches (around 17 iterations per search for 100k names) which is fast enough and you can retrieve them in linear time. Even if it isn't the fastest solution, I suggested it since it's easy to understand and easy to code.



 MORE:


 ? How do I count words and lines that were input as a cstring array
 ? How do I count words and lines that were input as a cstring array
 ? How do I count words and lines that were input as a cstring array
 ? locating matched words in a string
 ? How to dynamically allocate a struct and it's TCHAR member array with calloc?
 ? How to make it so the user can't delete a dynamic array?
 ? how to use a dynamically created one-dimensional array by a two-dimension array reference only with standard library?
 ? How do I create an array with dynamic dimension sizes in C++?
 ? How to dynamically allocate array of objects using parameterized constructor?
 ? How to remove an element from a dynamically allocated array of objects with operator -= without using std::vectors?