GridPane Duplicacy leaving 38GB of Fossils Behind

Sep 21, 2022 4:23 PM

Initial Investigation

Server storage was as follows

Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           395M  1.2M  394M   1% /run
/dev/vda1       121G  103G   12G  90% /

A specific folder was holding over 58GB of data.

--- /opt/gridpane/backups/duplications ---
   58.1 GiB [##########] /chunks                                                                                                                                                                             724.0 KiB [          ] /snapshots
    4.0 KiB [          ]  config

Looking deeper, running the following command. This didn’t compute.

duplicacy check -tabular | less
Storage set to /opt/gridpane/backups/duplications
Listing all chunks
21 snapshots and 159 revisions
Total chunk size is 19,519M in 7201 chunks

There is only 20GB of backup data in 7201 chunks.

Digging deeper revealed.

find /opt/gridpane/backups/duplications -type f ! -name "*.fsl" | wc -l
find /opt/gridpane/backups/duplications -type f ! -name "*.fsl" | wc -l

There are 7201 chunks, but 68844 .fsl files which are fossil files. Lets see how big they’re

❯ find /opt/gridpane/backups/duplications -type f ! -name "*.fsl" -print0 | du --files0-from=- -hc | tail -n1
20G     total
❯ find /opt/gridpane/backups/duplications -type f -name "*.fsl" -print0 | du --files0-from=- -hc | tail -n1
38G     total

One-liner to check your fossil file sizes

Run this command on your server from anywhere, it will detect and find .duplicacy files.

echo "\n** Checking duplicacy storage **"; \
echo -n "Total Chunks size of backup chunks: "; du --max-depth="0" -h "/opt/gridpane/backups/duplications/chunks"; \
echo "----"; \
echo -n "Total .fsl files: "; find /opt/gridpane/backups/duplications/chunks -name "*.fsl" | wc -l; \
echo -n "Total .fsl file size: "; find /opt/gridpane/backups/duplications/chunks -type f -name "*.fsl" -print0 | du --files0-from=- -hc | tail -n1; \
echo "----"; \
echo -n "Total normal chunk files: "; find /opt/gridpane/backups/duplications/chunks -type f ! -name "*.fsl" | wc -l; \
echo -n "Total normal chunk file size: "; find /opt/gridpane/backups/duplications/chunks -type f ! -name "*.fsl" -print0 | du --files0-from=- -hc | tail -n1; \
echo "----"; \
echo -n "Duplicacy reporting totals: "; \cd "$(dirname "$(find /var/www/ -name ".duplicacy" | tail -n 1)" )" >> /dev/null; duplicacy check -tabular | grep Total

What are fossil files?

The prune command implements the two-step fossil collection algorithm. It will first find fossil collection files from previous runs and check if contained fossils are eligible for permanent deletion (the fossil deletion step). Then it will search for snapshots to be deleted, mark unreferenced chunks as fossils (by renaming) and save them in a new fossil collection file stored locally (the fossil collection step).

For fossils collected in the fossil collection step to be eligible for safe deletion in the fossil deletion step, at least one new snapshot from each snapshot id must be created between two runs of the prune command. However, some repository may not be set up to back up with a regular schedule, and thus literally blocking other repositories from deleting any fossils. Duplicacy by default will ignore repositories that have no new backup in the past 7 days, and you can also use the -ignore option to skip certain repositories when deciding the deletion criteria.

Pruning Fossil Files

First let’s do a dry run cause we always do a dry run when doing something destructive.

duplicacy prune -exhaustive -exclusive -d

Once we confirm there’s a ton of fossil files and we want to prune them, remove the -d like so

duplicacy prune -exhaustive -exclusive