RavenDB is a second generation document database. This means that you can to throw typeless documents into a data store, but the only way to query them is by indexes that are built with Lucene.Net. RavenDB is a wonderful product that's primary strength is it's simplicity and easy of use. In keeping with that theme, even when you need to customize RavenDB, it makes it relatively easy to do.
So, let's talk about customizing your Lucene.Net analyzer in RavenDB!
RavenDB comes equipped with all of the analyzers that are built into Lucene.Net. For the vast majority of use cases, these will do the job! Here are some examples:
"The fox jumped over the lazy dogs, Bob@hotmail.com 123432."
StandardAnalyzer, which is Lucene's default, will produce the following tokens:
[fox] [jumped] [over] [lazy] [dog] [firstname.lastname@example.org] 
SimpleAnalyzer will tokenize on all non-alpha characters, and will make all the tokens lowercase:
[the] [fox] [jumped] [over] [the] [lazy] [dogs] [bob] [hotmail] [com]
WhitespaceAnalyzer will just tokenize on white spaces:
[The] [fox] [jumped] [over] [the] [lazy] [dogs,] [Bob@hotmail.com] [123432.]
In order to resolve an issue with indexing file names (details below), I found myself in need of an Alphanumeric analyzer. This analyzer would be similar to the SimpleAnalyzer, but would still respect numeric values.
AlphanumericAnalyzer will tokenize on the .NET framework's Char.IsDigitOrLetter:
[fox] [jumped] [over] [lazy] [dogs] [bob] [hotmail] [com] 
Lucene.Net's base classes made this pretty easy to build...
How to Implement a Custom Analyzer
Grab all the code and more from GitHub:
A lucene analyzer is made of two basic parts, 1) a tokenizer, and 2) a series of filters. The tokenizer does the lions share of the work and splits the input apart, then the filters run in succession making additional tweaks to the tokenized output.
To create the Alphanumeric Analyzer we need only create two classes, an analyzer and a tokenizer. After that the analyzer can use reuse the existing LowerCaseFilter and StopFilter classes.
public sealed class AlphanumericAnalyzer : Analyzer
public AlphanumericAnalyzer(Version matchVersion, ISet<string> stopWords)
_enableStopPositionIncrements = StopFilter
_stopSet = stopWords;
public override TokenStream TokenStream(String fieldName, TextReader reader)
TokenStream tokenStream = new AlphanumericTokenizer(reader);
tokenStream = new LowerCaseFilter(tokenStream);
tokenStream = new StopFilter(
public class AlphanumericTokenizer : CharTokenizer
protected override bool IsTokenChar(char c)
How to Install Plugins in RavenDB
Installing a custom plugin to RavenDB is unbelievably easy. Just compile your assembly, and then drop it into the Plugins folder at the root of your RavenDB server. You may then reference the analyzers in your indexes by using their fully assembly qualified names.
Again, you can grab all of the code and more over on GitHub: