NexusDB delivers a modular, flexible and in any imaginable direction extendable generic framework for indexing "tokens", where a token is defined as a sequence of Unicode characters.
In a first step, so called "token extractor's" are responsible for turning a field into a stream of tokens. So far there exists one token extractor class that will return the contents of string and memo fields as one single token. You will be able to write your own token extractors and register it with the server engine. One of our customers has for example written another token extractor which uses XPath expressions to extract token lists out of XML documents.
In the next step the tokens are send through a chain of token filters, e.g. separator at specific characters / Unicode character categories, upper/lower case, stop words, aliases, ... Each of these filters in the chain can transform an input token into any number of output tokens.
In the last step this token stream is feed into token indices. There are currently 2 implementations of a token index. One that feeds the tokens into a normal index, resulting in a number of keys per record in that index (one key per token contained) which allows to directly use FindKey/SetRange on that index to find records containing specific tokens. This engine is included in the current Developer Edition (version 2).
The 2nd token index engine will use bit-arrays per token to store the information about which record contains specific tokens. This makes it possible to very quickly and efficiently evaluate complex expression searches which return a list of records that match the expression search. If you want to display that information in a grid a result set will have to be built. This engine will be included in version 3.
The design is modular enough to easily implement any other token index you can imagine.
Every element in this generic indexing framework is extendable/replaceable by deriving your own classes and registering them. As the indexing takes place directly in the engine core it will always be updated in real-time, doesn't increase network traffic and will correctly participate in transaction / nested transaction processing.