Title |
Extraction of keywords (and images) from large collections of text based files |
Detailed description | To facilitate a rapid initial categorisation of large hetergoneous collections of primarily text-based digital files/documents, a tool which parsed the documents, and presented a summary of (eg) the top 5 keywords (by wordcount) from each document, along with thumbnails of any images embedded in the document. |
Issue champion | ![]() |
Other interested parties |
|
Possible Solution approaches | Solution from previous mashup event: Analysis of Lucene Index Word Frequency |
Context | |
Lessons Learned | |
Datasets | |
Solutions |
Labels: