• Icon: New Feature New Feature
    • Resolution: Fixed
    • Icon: Major Major
    • 4.0M5
    • 2.LATER
    • None
    • None

      Implements JackRabbit maintenance tasks :

      1. Data store garbage collector
      2. Re-indexing
      3. Consistency check

          [REPOSITORY-204] JackRabbit maintenance tasks

          Working maintenance task patch. There is still one major issue because we are unable to reconnect the AmetysRepository once it has been disconnected to launch a task.
          Missing binary ressources (images).

          Thibaut Rizzi (Inactive) added a comment - Working maintenance task patch. There is still one major issue because we are unable to reconnect the AmetysRepository once it has been disconnected to launch a task. Missing binary ressources (images).

          The issue with the DerbyDataStore has been patched.

          However, the Jackrabbit Data store garbage collection Jackrabbit (which purpose is to free up disk space) used with the DerbyDataStore does not return unused space to the operating system. This is the expected standard Derby behaviour.
          There are several ways to reclaim unused space, see : http://db.apache.org/derby/docs/10.5/adminguide/cadminspace21579.html

          I made some tests using the two following procedure with the specified following arguments:

          1. SYSCS_UTIL.SYSCS_COMPRESS_TABLE(CURRENT SCHEMA, $table_name, 1)
          2. SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE(CURRENT SCHEMA, $table_name, 1, 1, 1)_

          While #1 method guarantees to recover the maximum amount of free space, this procedure can be memory-intensive and use a lot of temporary disk space (an amount equal to approximately two times the used space plus the unused, allocated space).
          On the contrary, #2 method procedure uses no temporary files and moves rows around within the same conglomerate but cannot guarantee it will recover all available space.

          The tests were made with against three repository :

          1. A repository with 4Mo of binary data stored in the data store, where exactly 2Mo of those has been previously deleted
          2. A repository of 6Go (coming from a production environment)
          3. One of my local repository of 237Mo, where a lot of content have been purged


          Test results (before -> after garbage collection)
          Line label #1 Repository #2 Repository #3 Repository
          Sum of all node lengths 4Mo -> 2Mo 591543ko -> 591278ko 130Mo -> 102Mo
          Difference 2Mo 265ko 27Mo
          Data store item count 1000 -> 500 29394 -> 29383 806 -> 557
          Deleted item count 500 11 250
          Data store size (method #1) 6.35Mo -> 4.07Mo 4.79Go -> 4.79Go 181Mo -> 113Mo
          Gained space (method #1) 2.28Mo irrelevant 68Mo
          Data store size (method #2) 6.35Mo -> 4.17Mo 4.79Go ->4.79Go 181Mo -> 177Mo
          Gained space (method #2) 2.18Mo irrelevant 4Mo

          For the #3 repository, the difference in freed up space is significative.

          Thibaut Rizzi (Inactive) added a comment - The issue with the DerbyDataStore has been patched. However, the Jackrabbit Data store garbage collection Jackrabbit (which purpose is to free up disk space) used with the DerbyDataStore does not return unused space to the operating system. This is the expected standard Derby behaviour. There are several ways to reclaim unused space, see : http://db.apache.org/derby/docs/10.5/adminguide/cadminspace21579.html I made some tests using the two following procedure with the specified following arguments: SYSCS_UTIL.SYSCS_COMPRESS_TABLE(CURRENT SCHEMA, $table_name , 1) SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE(CURRENT SCHEMA, $table_name , 1, 1, 1)_ While #1 method guarantees to recover the maximum amount of free space, this procedure can be memory-intensive and use a lot of temporary disk space (an amount equal to approximately two times the used space plus the unused, allocated space). On the contrary, #2 method procedure uses no temporary files and moves rows around within the same conglomerate but cannot guarantee it will recover all available space. The tests were made with against three repository : A repository with 4Mo of binary data stored in the data store, where exactly 2Mo of those has been previously deleted A repository of 6Go (coming from a production environment) One of my local repository of 237Mo, where a lot of content have been purged Test results (before -> after garbage collection) Line label #1 Repository #2 Repository #3 Repository Sum of all node lengths 4Mo -> 2Mo 591543ko -> 591278ko 130Mo -> 102Mo Difference 2Mo 265ko 27Mo Data store item count 1000 -> 500 29394 -> 29383 806 -> 557 Deleted item count 500 11 250 Data store size (method #1) 6.35Mo -> 4.07Mo 4.79Go -> 4.79Go 181Mo -> 113Mo Gained space (method #1) 2.28Mo irrelevant 68Mo Data store size (method #2) 6.35Mo -> 4.17Mo 4.79Go ->4.79Go 181Mo -> 177Mo Gained space (method #2) 2.18Mo irrelevant 4Mo For the #3 repository, the difference in freed up space is significative.

          Thibaut Rizzi (Inactive) added a comment - - edited

          Attached file jackrabbit-maintenance_DRAFT_20121219.zip is the new version this sketch project, that implements three tasks :

          • Data store garbage collection
          • Re-indexing
          • Consistency check

          Thibaut Rizzi (Inactive) added a comment - - edited Attached file jackrabbit-maintenance_DRAFT_20121219.zip is the new version this sketch project, that implements three tasks : Data store garbage collection Re-indexing Consistency check

          This ZIP Archive contains a litle JAVA project running a simple test scenario where the jackrabbit garbage collector is called.

          To run the project, you should change the REPOSITORY_HOME and REPOSITORY_CONF Strings in the Config class.
          REPOSITORY_HOME tells where to find (or create) the repository folder.
          REPOSITORY_CONF tells where to find the repository.xml file.

          This is not currently compatible with the DerbyDataStore.
          Tested with the FileDataStore and DbDataStore (mysql).

          You can find enclosed some repository.xml samples (repository.file.xml is recommended)

          Thibaut Rizzi (Inactive) added a comment - This ZIP Archive contains a litle JAVA project running a simple test scenario where the jackrabbit garbage collector is called. To run the project, you should change the REPOSITORY_HOME and REPOSITORY_CONF Strings in the Config class. REPOSITORY_HOME tells where to find (or create) the repository folder. REPOSITORY_CONF tells where to find the repository.xml file. This is not currently compatible with the DerbyDataStore. Tested with the FileDataStore and DbDataStore (mysql). You can find enclosed some repository.xml samples (repository.file.xml is recommended)

            trizzi Thibaut Rizzi (Inactive)
            trizzi Thibaut Rizzi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: