The issue with the DerbyDataStore has been patched.
However, the Jackrabbit Data store garbage collection Jackrabbit (which purpose is to free up disk space) used with the DerbyDataStore does not return unused space to the operating system. This is the expected standard Derby behaviour.
There are several ways to reclaim unused space, see : http://db.apache.org/derby/docs/10.5/adminguide/cadminspace21579.html
I made some tests using the two following procedure with the specified following arguments:
- SYSCS_UTIL.SYSCS_COMPRESS_TABLE(CURRENT SCHEMA, $table_name, 1)
- SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE(CURRENT SCHEMA, $table_name, 1, 1, 1)_
While #1 method guarantees to recover the maximum amount of free space, this procedure can be memory-intensive and use a lot of temporary disk space (an amount equal to approximately two times the used space plus the unused, allocated space).
On the contrary, #2 method procedure uses no temporary files and moves rows around within the same conglomerate but cannot guarantee it will recover all available space.
The tests were made with against three repository :
- A repository with 4Mo of binary data stored in the data store, where exactly 2Mo of those has been previously deleted
- A repository of 6Go (coming from a production environment)
- One of my local repository of 237Mo, where a lot of content have been purged
Line label |
#1 Repository |
#2 Repository |
#3 Repository |
Sum of all node lengths |
4Mo -> 2Mo |
591543ko -> 591278ko |
130Mo -> 102Mo |
Difference |
2Mo |
265ko |
27Mo |
Data store item count |
1000 -> 500 |
29394 -> 29383 |
806 -> 557 |
Deleted item count |
500 |
11 |
250 |
Data store size (method #1) |
6.35Mo -> 4.07Mo |
4.79Go -> 4.79Go |
181Mo -> 113Mo |
Gained space (method #1) |
2.28Mo |
irrelevant |
68Mo |
Data store size (method #2) |
6.35Mo -> 4.17Mo |
4.79Go ->4.79Go |
181Mo -> 177Mo |
Gained space (method #2) |
2.18Mo |
irrelevant |
4Mo |
For the #3 repository, the difference in freed up space is significative.
Working maintenance task patch. There is still one major issue because we are unable to reconnect the AmetysRepository once it has been disconnected to launch a task.
Missing binary ressources (images).