In order to implement a horizontically scalable repository the HighLevelStorage effort of the fedora-commons developers seems a viable approach to have fedora use HBase/HDFS as an underlying binary storage system.
A PoC was developed by Aaron Birkland which can be accessed at:
https://github.com/birkland/fcrepo (branch hlstore_hbase_poc)
Building on Aaron's PoC FIZ Ka develops the idea further and will conduct some feasability tests in the near future, in order to gain some numbers about performance and requirements for the hardware to be used.
The work done by FIZ can be accessed at github as well, at:
https://github.com/smeg4brains/fcrepo (branch hlstore_hbase_poc)
This is still under development and it might happen that you checkout commits with bugs in it. It has not yet been tested in a distributed environment, there are no semaphores or concurrency guarantuees whatsoever.
Build and Run:
to build the PoC one has to go through the follwing steps in order to have fedora work with the new components:
- Checkout the hlstore_hbase_poc branch from https://github.com/smeg4brains/fcrepo
- Build the project using maven: mvn -DSkipITs -Dmaven.test.skip=true install
- Run the installer found in fedora-install/target/
- Remove the follwing modules from $FEDORA_HOME/server/config/fedora.fcfg
- remove $FEDORA_HOME/server/config/spring/akubra-llstore.xml
- copy the folder src/main/resources/config/spring/highlevel_hbase/ to $FEDORA_HOME/server/config/spring/highlevel_hbase
- Start a local HBase server on you machine, see http://hbase.apache.org/quickstart.html
- Run the Application Server with the fedora-webapp (tested on tomcat 6)
- Goto e.g.: http://localhost:8080/fedora/admin ot get to the fedora admin console and test search and ingestion of objects into HBase
Current Status (M9)
Since the fedora community is currently discussing a high level storage feature for the upcoming releases, we paused the active development of the fedora HDFS integration in order to concentrate on supporting the community in planning and hopefully implementing this feature, which would allow every fedora based repository system to operate on an arbitrary application stack. (e.g. HDFS storage/HBase metadata storage)