Archive for the ‘Lucene.Net’ Category

Minimal Lucene.Net example

Tuesday, April 15th, 2008

About a year ago we had to incorporate a Lucene.Net index in an application because we were unable to achieve acceptable speeds while performing regular SQL queries on a database. The database was a 4GB large – poorly normalised – full of NULLs – monster, which would eat our queries for breakfast and spit out time-outs in return. We had to make a Google-like search on the data spanning a lot of columns, including full-text search. Using Lucene.Net to index the searchable columns of this database enabled us to search the data within seconds and in some cases even work without the actual database. Because all the data which had to be shown as the result of the search could be extracted out of the index.

Lucene.Net has gained rapid exposure and several articles & tutorials can be found on how to implement it. Quite some time ago I made a minimal example on how to create an index and search it, I thought I’d share it, maybe it’ll help someone with his first steps in the Lucene.Net world.

The minimal Lucene.Net example creates an index of all the postal codes of Belgium (data is read from a .csv file), the postal codes can be searched and the results are shown in a list. Nothing more, nothing less.

Some extra information:

Analyzer analyzer = new StandardAnalyzer();

An analyzer is used when indexing raw text to transform it into searchable terms, removing frequently used words like “the”, “in”, “a”, “and”, “of”.

IndexWriter writer = new IndexWriter(indexFolder, analyzer, true);

An IndexWriter is used for creating and adding/removing items to/from the index, an analyzer should be specified which is used when adding data to the index.

Document document = new Document();
 
document.Add(new Field(POSTALCODECOLUMN, parts[0], Field.Store.YES, Field.Index.UN_TOKENIZED));
document.Add(new Field(CITYCOLUMN, parts[1], Field.Store.YES, Field.Index.UN_TOKENIZED));
 
writer.AddDocument(document);

A document is like a virtual record which contains the fields which are searchable. A field can be specified more than once, if a city has for some reason multiple postal codes, these can be added with the same field name and each time with a different postal code.

writer.Optimize();

After adding the documents the writer is optimized, which rewrites the entire index by merging all segment files into one file, greatly reducing the fysical size of the index and the searching speed.

The example is a VS2005 solution and uses Lucene.Net version 2.0.0.4 (included in the zip-file).

Minimal Lucene Example