freqy - word clouds for directories
A recurring issue in SPRUCE mashups has been when presented with a load of unknown files how does anyone go about cataloging it? freqy is one way to help. It started life as a word cloud library (just using word frequencies in documents) at the Bodliean Library. This was developed further at the last London event - see:
and the problem arose again this time around. Discussing with the practitioner and also picking up on a general desire to simplify tools freqy was born. It is a simple tool that, given a directory uses Tika to extract text from any files it finds in the directory or any sub-directories (so supported formats are those that Tika understands) and then counts n-grams (either 1, 2 or 3-gram) and creates a report of the 30 most commonly occurring words/pairs/triplets.
Subsequent development has also added an easy to use GUI.
Tool Registry Link
Add an entry to the OPF Tool Registry, and provide a link to it here.
Any notes or links on how the solution performed.