Hu Yoshida

Free Deduplication With No Compromise

Blog Post created by Hu Yoshida Employee on Nov 7, 2014

Feb 11, 2014


In my last post I talked about the explosion of copies that IDC says is now 65% of our external storage capacity, and I outlined ways to reduce the number of copies, reduce the capacity of copies and quantify the cost savings of reducing copies.


Copies are a necessary part of storage management, but we can do a lot to manage them better and make them more efficient, in terms of capacity utilization and operational efficiency. One of the ways to reduce capacity is deduplication.

Last month, Jon Toigo published an article in SearchStorage entitled “Deduplication and compression are short-term capacity fixes”.

In this article Jon points out that: “Deduplication merely processes the incremental backup in a different way (by comparing the full backup data to the previous full backup and “reducing” or eliminating the bits that are the same in the new full backup).”  He points out that vendors use deduplication to charge more for a rack of comparatively cheap disks, usually consumer grade SATA drives, and gives an example of an early VTL vendor that charged $410,000 for approximately $3000 worth of disks and shelf hardware! He also points out that the deduplication functionality is isolated to a single array, and once that array is filled up, you have to buy another array with another expensive deduplication engine.

That is not the case with the deduplication function that is in our HUS and HNAS products for primary storage. HUS is an array with a unified block and file engine. HNAS file serving engine can support our largest enterprise configurations, consisting of many arrays, as a common virtualized storage pool. Both HUS and HNAS file modules can scale up to 8 nodes with 32PB of capacity and 16 million file objects per directory.

With version 11 of the HNAS object-based file system software, any current file model and Hitachi Unified Storage  customer can upgrade to get this deduplication functionality.  HNAS uses its FPGA architecture to offload performance intensive file/storage services. HNAS has added automated, auto-throttling deduplication capabilities that do not interfere with file sharing workloads. This way there is little to no impact on write performance.  Hitachi’s patented deduplication technology leverages an advanced cryptographic hashing algorithm on a fixed block basis (4KB or 32KB blocks) to ensure data integrity.

As for read performance, you may recall that HNAS implements an object-based file  system (SiliconFS) as an integral part of its underlying storage architecture.  That means it already processes lists of pointers to blocks of data, so reconstructing a data block for reading (rehydration) has no impact on performance.

The major function of deduplication on primary storage is space reclamation, where redundant data is eliminated and wasted capacity is freed up.  With HNAS, space reclamation is done in the background and the system throttles this activity down if it’s running at more than 50% of its IOPS capabilities to avoid impacting file serving performance.

Deduplication pairs well with the growing use of flash drives, which is another trend that I identified for 2014. The economics of space reclamation of premium priced flash drives by factors of 9:1 are very compelling.  Deduplication also consolidates a number of writes, which helps to extend the durability of flash drives. Hitachi supports flash in all of our file serving systems. These systems also include Hitachi’s Intelligent File Tiering capability, which provides the best storage economics available. Files that no longer need the premium performance of flash can be tiered to lower cost disk drives based on policies that are triggered by time, alerts, or file descriptors such as mpeg.

The base HNAS Deduplication license, which enables one of four deduplication engines is provided at no cost, and HDS encourages all current customers to try it out.  If you need more deduplication engines for larger amounts of data, the Premium Dedupe package can be licensed at a small additional cost. Primary deduplication can be enabled on a per file system basis. An assessment tool is available to help users estimate the amount of dedupe savings prior to implementation.

Please take three minutes to view this video on Hitachi Data Systems “free” deduplication capability. or see my  previous blog on Deduplication Without Compromise.