Analysis of Lucene Index Word Frequency

compared with
Current by Paul Wheatley
on May 16, 2012 16:05.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (2)

View Page History
The initial results were disappointing as Lucene indexed all of the words and the most frequently used words were ones that occurred commonly in plain English. \\
The General Service List [http://jbauman.com/aboutgsl.html|http://jbauman.com/aboutgsl.html] is a list of commonly occurring words deemed to be most useful to people learning English, and their frequency. Andrew Jackson used this list to determine how much more frequently words were used in the Lucene index, in comparison to "common English", as defined by the GSL. |
| *Solution champion* | Andrew Jackson |
| *Solution champion* | [~anjackson]\\ |
| *Git link* | The analysis results (Spreadsheets, csv files, Lucene index, etc.) have been checked into GIT here : [https://github.com/openplanets/AQuA/tree/master/word-freq-compare|https://github.com/openplanets/AQuA/tree/master/word-freq-compare] |
| *Evaluation* | |