Analysis of Lucene Index Word Frequency

Paul Wheatley
May 16, 2012

The initial results were disappointing as Lucene indexed all of the words and the most frequently used words were ones that occurred commonly in plain English. \\
The General Service List [|] is a list of commonly occurring words deemed to be most useful to people learning English, and their frequency. Andrew Jackson used this list to determine how much more frequently words were used in the Lucene index, in comparison to "common English", as defined by the GSL. |
Solution champion: Andrew Jackson
Solution champion: Andrew Jackson
Git link: The analysis results (Spreadsheets, csv files, Lucene index, etc.) have been checked into GIT here: [|]
| *Evaluation* | |