4.2. File system BLOB store

This section describes the file system BLOB store. It acts as an overlay to the database in order to store BLOBs more efficiently than the underlying database.

Also see the wiki at https://github.com/ctron/package-drone/wiki/File-system-BLOB-store

One initial idea for Package Drone was to store the BLOBs in the database, together with the entity information. Most modern database can store BLOBs efficiently and the JDBC API does provide streaming access to BLOBs so that it is not required that all BLOBs have to fit into the memory. This would not only simplify the implementation, but also would simplify maintenance tasks like backup and restore.

So much for the theory [2] . The real world looks a bit different and both MySQL and PostgreSQL do not implement the JDBC streaming correctly, to that in fact the BLOBs do get stored in the main memory, at least temporarily.

While PostgreSQL can store a BLOB with streaming and only fails when reading the BLOB again, MySQL already fails when writing the BLOB and totally explodes when reading out the BLOB again. A BLOB gets encoded and parsed when the server sends it to the client without any flow control, so the BLOB data is help in encoded and decoded form at the same time, which make the 400MB grow to about 1.2GB.

In order not to make the file system storage mandatory and support migration to the BLOB store from older versions, the store will act as an overlay to the already existing database. If future JDBC driver versions will fix these issues, it will still be possible to use the database only mode. Also could it be possible to use other database like Oracle or DB/2 which are reported to work much better in this area.

In order to understand the implications of using the file system store, here are a few details.

Right now there are 3 BLOB operations: create, read and delete. There is no way to update a BLOB. This is made on purpose, since updating the artifact main data is an operation so close to delete & create, this the decision was made not to support updating the artifact data, but to require deletion and re-creation of the entity.

All BLOB operations inside Package Drone are routed through a BLOB manager. This instance decides which storage layer will be used for performing the operation. The default is to simply pass the request to the database.

When the BLOB store is activated, a new storage is created, a unique ID generated and stored in both the database and the file system. The location of the store is also stored in the file system. So on system startup it is possible to load the store location and check if it matches to the expected store (using the ID).

The create operation will then always go to the file system layer. BLOBs are stored in hierarchy (currently 3) of prefix named directories in order to reduce the number of files in a single directory. Their name is the artifact ID, assigned by the database storage layer.

If a BLOB is requested to be deleted the file will simply be deleted. Empty directories will be kept. If the file does not exists, then nothing will be done. Also no error will be reported, since it is possible the BLOB was stored in the database. In this case there is nothing to do, since deleting the artifact will delete the row in the database which holds the binary data then.

When a BLOB is read then the manager will first check the file system. If there is a match, if will be read from the file system. Otherwise the database will be used. This two step approach will work for all cases where artifacts got created when the file system layer was already activated and the second step is only required for legacy entries, which are still stored in the database.

Since BLOBs cannot be updated the issue of concurrency is not that difficult. Also multiple concurrent readers is not a problem. And since all modifying operations are safe guarded by the storage server an a higher level, there is no additional locking required by the BLOB manager other than its internal structures.

When it comes to database transactions the situation is a bit different. Package Drone [3] makes use of database transactions in order to allow atomic operations on channels and artifacts. So if an channel extractor fails to extract meta data, the whole transaction is rolled back, and the original state still present.

This conflicts with the file system storage in two scenarios. First, an artifact could have been created, but then the transaction is rolled back. An already stored artifact must then be deleted. This could be handled as a trivial situation. Since artifact IDs are based on random UUIDs, it would be possible to just leave the unreferenced artifact data and clean it up later. However the second case, deleting an artifact, is a bit more complicated. If the deletion of an artifact fails, then the artifact's BLOB must stay intact.

One idea would be to use a transaction manager and implement the file system operations using some sort JTA operation. However adding a full blown JTA transaction manager to the existing OSGi based setup seemed like more trouble than it would solve.

So instead a artifact delete queue was added as a database structure. When an artifact gets deleted, a deletion marker is stored in this table in the same database transaction as the artifact gets deleted. If the transaction goes through, then the artifact was successfully marked as deleted and directly after the commit, the queue will be processed and artifacts do get deleted from the file system.

If a transaction gets rolled back during the creation phase of an artifact then this artifact ID (which is not in the database now) will get added to the queue in another database transaction, and gets deleted afterwards the same way.

[2] And please do not say I told you so, my colleague already did that.

[3] At least the current, default, SQL based storage service