Archive for April, 2008

Minimal Lucene.Net example

Tuesday, April 15th, 2008

About a year ago we had to incorporate a Lucene.Net index in an application because we were unable to achieve acceptable speeds while performing regular SQL queries on a database. The database was a 4GB large – poorly normalised – full of NULLs – monster, which would eat our queries for breakfast and spit out time-outs in return. We had to make a Google-like search on the data spanning a lot of columns, including full-text search. Using Lucene.Net to index the searchable columns of this database enabled us to search the data within seconds and in some cases even work without the actual database. Because all the data which had to be shown as the result of the search could be extracted out of the index.

Lucene.Net has gained rapid exposure and several articles & tutorials can be found on how to implement it. Quite some time ago I made a minimal example on how to create an index and search it, I thought I’d share it, maybe it’ll help someone with his first steps in the Lucene.Net world.

The minimal Lucene.Net example creates an index of all the postal codes of Belgium (data is read from a .csv file), the postal codes can be searched and the results are shown in a list. Nothing more, nothing less.

Some extra information:

Analyzer analyzer = new StandardAnalyzer();

An analyzer is used when indexing raw text to transform it into searchable terms, removing frequently used words like “the”, “in”, “a”, “and”, “of”.

IndexWriter writer = new IndexWriter(indexFolder, analyzer, true);

An IndexWriter is used for creating and adding/removing items to/from the index, an analyzer should be specified which is used when adding data to the index.

Document document = new Document();
 
document.Add(new Field(POSTALCODECOLUMN, parts[0], Field.Store.YES, Field.Index.UN_TOKENIZED));
document.Add(new Field(CITYCOLUMN, parts[1], Field.Store.YES, Field.Index.UN_TOKENIZED));
 
writer.AddDocument(document);

A document is like a virtual record which contains the fields which are searchable. A field can be specified more than once, if a city has for some reason multiple postal codes, these can be added with the same field name and each time with a different postal code.

writer.Optimize();

After adding the documents the writer is optimized, which rewrites the entire index by merging all segment files into one file, greatly reducing the fysical size of the index and the searching speed.

The example is a VS2005 solution and uses Lucene.Net version 2.0.0.4 (included in the zip-file).

Minimal Lucene Example

Mystery tab #13119

Monday, April 14th, 2008

Since a couple of weeks I had this #13119 tab in the toolbox of Visual Studio. The only way I found to fix this was to go into C:\Documents and Settings\[UserName]\Local Settings\Application Data\Microsoft\VisualStudio\9.0 folder, replace [UserName] with your username, and delete the four files which have Toolbox in their name (toolbox.tbd, toolboxIndex.tbd, toolbox_reset.tbd, toolboxIndex_reset.tbd). Visual Studio regenerates these when you start it up. Full thread on MSDN can be found here.

Buildserver upgrade

Monday, April 7th, 2008

We recently started the move from visual studio 2005 to 2008 and since then our buildserver has had problems with those upgraded projects.

The first error I ran into was: File format version is not recognized.  MSBuild can only read solution files between versions 7.0 and 9.0, inclusive. Our server runs CruiseControl v1.3.0.2958 which by default targets the 2.0 framework, so you need to direct your MSBuild task to the new 3.5 version. You do this by using the executable node illustrated below:

<msbuild>
<executable>C:WINDOWSMicrosoft.NETFrameworkv3.5MSBuild.exe</executable>
...
</msbuild>

Second error: The imported project “C:Program FilesMSBuildMicrosoftVisualStudiov9.0WebApplicationsMicrosoft.WebApplication.targets” was not found. To fix this one you need to look on your development machine for that file, it’s on the same location as stated by the error if you’re using a regular install. Then go to your buildserver and recreate the same folder structure there and copy the file.

Third error: C:WINDOWSMicrosoft.NETFrameworkv3.5Microsoft.Common.targets (1734,9):  error MSB3091: Task failed because “LC.exe” was not found, or the correct Microsoft Windows SDK is not installed. The error message is larger than I’ve put here since it gives 4 ways to solve it. The easiest and probably fastest way to solve this is to download the Windows SDK for Windows 2008 and install it on your buildserver, you can find it here. It is an iso file of 1.3 GB, so make sure you have a fast connection.

Now your server has been successfully upgraded ;) .

Cleanup Folder with SDC Tasks Library

Monday, April 7th, 2008

I had to automate the deletion of the contents of a folder for one of our projects in the buildserver, one google later I ran into the SDC Tasks Library (2.1.3009.0) which had just that. The pdf file included described how to make their library available in your build script, I followed the instructions but kept running into build errors. The Cleanup task was not found by MSBuild, turned out I needed to add a trailing backslash to the path name of the TasksPath property. The final configuration is:

<!– SDC MSBuild Tools –>
<PropertyGroup>
<TasksPath>$(MSBuildExtensionsPath)\Sdc\</TasksPath>
</PropertyGroup>
<Import Project=“$(TasksPath)\Microsoft.Sdc.Common.tasks“/>
<!– SDC MSBuild Tools –>

But even with this now corrected the Cleanup folder task was not found, again I opened up their tasks file with notepad and searched for the Cleanup task, I couldn’t find it. Hmm, maybe I’m going blind so let’s use ctrl+f. Again no results. So I edited the file and put this entry to the list of tasks:

<UsingTask AssemblyFile=“$(TasksPath)Microsoft.Sdc.Tasks.dll“ TaskName=“Microsoft.Sdc.Tasks.Folder.CleanFolder“/>

Ran MSBuild again, and now successfull. For anyone who doesn’t want to edit the file, I’ve attached it.

Microsoft.Sdc.Common.tasks (40,21 kb)