Matthias Hahn <[email protected]>
Ingesting a huge amount of data into a repository could become a difficult task. The increasing amount of data that has to be ingested in a limited time, demands a repository that is able to scale in this respect.
We used the - still in development - Fedora 4 implementation based on an alpha release to measure the performance of the ingest of data provided by the ONB and random generated data.
- I need a repository with an ingest throughput of 5000 Google Book Scans per month.
We measured the ingest performance with Modeshape, as the underlying JCR repository implementation of Fedora 4, and measured the ingest performance with Fedora 4 without the SCAPE Connector API and with the SCAPE Connector API. All numbers have been distributed to Duraspace and have been discussed with the developers (including Modeshape developers). As a result the scalability and performance of a Fedora 4 cluster has been postponed to the Fedora 4.1 release and will not be part of Fedora 4.0 (the release date is still not known). All experiments are based on an alpha release of Fedora 4.0.
Scenarios, case studies, etc. that provide background to this story.