Adapting Lucene scoring for an n-gram index

October 24, 2012

At Ginger we use a large index of n-grams, which is basically a sequence of words and their frequency in our corpus. We wanted to make this index searchable, so naturally, we defaulted to using Lucene, which is the most popular open source IR library. This is how we started adding documents to the index: 1: Document document = new Document(); 2: document.add(new Field("ngram", ngram, Field.Store.YES, Field.Index.ANALYZED)); 3: NumericField frequencyField = new NumericField("frequency", Field.Store.YES, true); ...
