Copied over from my earlier blog post What do you do when deleting files will not free space on a full file system? jordan smith suggested that I copy the post here to foster some wider disucssion. I have not marked it as a question as it a rhetorical one, in part.
I was looking at a system the other day a few months ago that had high storage use and utilisation. Clearly there were difficulties writing because the storage was so full and delete operations were also taking a long time. I could see the deletes were queued though and thought that would at least mean that things should get better in a few hours. It was then I noticed deduplication was running and it got me thinking about the problems that might cause.
The filesystem had about 30% dedupe saving, which was considerable and probably helping a lot. 20TB less 30% is a nice saving on space... at least it is until you only have 500GB free. Now I could see the deletes but I thought to myself, just how much disk space will get freed up and how much will just be pointers to duplicated blocks?
In the simple case the 30% number suggested that if 1TB of data were deleted about 600GB would be freed. When you are trying to free space, deduplication is suddenly making you pay extra for freeing up space. Is that 30% saved a good indicator of what deleting is likely to do?
Next I thought about what might be being deleted. When I need space on a system I own, I look a removing old backups and duplicate copies of files that are no longer needed. Why would that habit change, aim for the low hanging fruit. Only now those fruit might not be taking up much or any space. They are hollow fruit that deduplication has already freed space from. Now there are deletes queued up to delete links that will save very little or purpose no space at all. That 1TB I find and delete could easily be 1TB of the 6TB that deduplication has freed, deleting it does noting for me.
Now what should I do? I need space but deleting duplicates and backups won't give me much. I need to know what deleting some files would do to the disk space. I need an rm --dry-run that could tell me that removing these 1TB of files will only free 100GB of used space.
Do we have the tools to do that?
In the blog post jordan smith asked if I was thinking about HNAS but in truth this is about all systems that run deduplication. How can you know what will free real space on any of them?