Product evolution perspective:
In my last blog post, I wrote about how Hitachi introduced a new level of commonality across its storage products with the introduction of an expanded VSP family – all of which leverage similar functionality from its Storage Virtualization Operating System – as part of its recent “Software-defined infrastructure” launch and Hitachi Connect 2015 event.
If you haven't heard, the new VSP models are HDS' first storage products NOT to offer a custom ASIC to offload I/O operations. For those of that have worked exclusively with HDS storage this might raise some red flags, since HDS storage has always depended on ASICs in some way, but it shouldn't. From a technology perspective, the new VSP models are still engineered and purpose-built systems. This means deterministic behavior which is fundamental to HDS storage is maintained. In addition, advancements in processor technology (Intel-Multicore) now allows general purpose processors to provide 'enough' (I’ll touch on this in my next post) processing power and functionality to handle midrange size workloads, without needing to offload low level requirements into silicon (there is a caveat here that I'll cover in more detail later). I really don’t want to get to down and dirty in the detail (since it not publically available) or why Intel-Multicore and why it makes a lot of sense technically, you can go read other great blogs which cover this topic. I do however just want to reiterate from the previous blog post that for HDS this enables commonality with the storage layer which better enables social innovation.
The use of Intel processors is by no means new for storage and other vendors have proven that using newer Intel processors effectively can have a dramatic effect on system performance. In recent years we've seen EMC drastically improve performance of their industry leading midrange offering (the VNX) by moving to an MCx (Multi-core) architecture, Netapp continuously improves FAS performance by simply beefing up processor specs, and many start-ups that leverage a combination of flash for backend processing and Intel Multi-core in their frontend controller architectures.
At this point, I’d imagine there would be a lot of people out their adopting the ‘I told you so’ attitude, sighting that HDS have been very slow to adopt an Intel only design. This attitude however is somewhat misplaced for two reasons:
- Everyone knew at some point it (removing ASICs) was going to happen. HDS like any other storage vendor is simply adapting, making use of the technology around them. HDS were biding their time to ensure doing so wouldn’t require any compromises.
- HDS hasn’t completely moved to Intel, they’ve just moved some core functions which make sense. I’ll expand on this point here.
Running Intel for all internal processes doesn’t always make sense, there is always caveat, there is always an exception to every rule/workload, and in this instance it is when performing tasks in software and across commodity processors are detrimental to the overall behavior of the system. HDS has identified two such exceptions and factored in offloading capabilities into the system.
In recent years, a general trend in the storage industry has focused toward deriving value by means of tiering. The underlying technology concept to deliver this is known as sub-LUN tiering. This enables blocks of storage to be moved up and down between different storage tiers depending on their access requirements. HDS calls this Hitachi Dynamic Tiering (HDT), EMC calls this FAST (Fully Automated Storage Tiering) VP (Virtual Provisioning), HP calls it Adaptive Optimization, IBM calls it EasyTier and the list goes on. A key design objective of all sub-LUN technology is to keep the hottest pages/blocks/chunks/etc. on the fastest storage tier (normally flash) and the coldest pages/blocks/chunks/etc. to slowest tiers of storage (normally SAS/NLSAS). Most of these implementations also send write I/Os from cache to flash either and/or do backend flushing activities (i.e. destaging) to improve overall system performance.
These approaches combined create a performance utilisation profile that is 'top heavy' by sending the most activity to the highest performance media. When flash is used in the top tier this really changes things since it requires a large degree of housekeeping (in the form of overhead). This effect can dramatically amplify top end utilisation, because despite the lower response time of flash devices, the system spends more time managing storage resources housed on flash storage rather than traditional spinning disks. Modern workloads don’t help here if anything they only further amplifies utilisation. This largely has to do with the fact that we now have increased density, through virtualization and consolidation, which results in fairly large working sets that are highly randomized (thus reducing cache efficiency). While you might not see this immediately, the problem has been large enough to create issues with sub-LUN implementations that could only be solved by throttling backend processes or limiting the amount of flash relative to the processing power. Depending on who you talk to, this problem, when compared to the traditional issue of optimizing parity calculations due to the small cache sizes which drove issues in storage performance 20 years ago, can be orders of magnitude larger.
HDS identified this issue some time ago and solved the problem on the VSP (and then the HUSVM, VSP G1000 and HUS150) by offloading the flash housekeeping tasks to a custom flash controller located in their HAF modules (read more here). While such offloading capabilities have a profound impact on effectively utilising flash on their own, when combined with the architecture of the VSP family, it allows the system to offload housekeeping tasks when the array is at its busiest in the top tier, thus freeing up processor cycles. This is pretty evident in the recent SPC-1 benchmark on the VSP G1000 (which could not be achieved with regular SSD) which can be seen here
The second process that is still being offloaded on the new VSP arrays is encryption. From a processing perspective encryption is not cheap, particularly when you're trying to achieve high operation rates such as those found in a storage array. Many storage vendors have felt the pinch when enabling encryption on loaded systems (a particular vendor really felt this in their midrange offering, you know who you are) without first providing dedicated processing resources. The problem was large enough for Intel to introduce instruction sets for dealing with AES based encryption. The use of CPU instructions however, does still require processor cycles and a lot of CPU time, which on a system constantly managing encryption/decryption tasks, is an ineffective use of resources. The alternative approach of self-encrypting drives (SED) which alleviates this problem creates its own performance problems with spinning disk and as drive capacities increases. To address this challenge HDS is supporting encrypted backend directors for the VSP G400, G600 and G800 (sorry nothing concrete about the G200 yet, but keep watching this space) which are currently available on VSP G1000 to provide line speed encryption at scale without comprising processor cycles.
So while the new VSP family members at heart uses an Intel multi-core architecture for some of their core processing, they have been designed to support ASIC offload for specific workloads tasks where it matters most. This not only frees up processor cycles to manage enterprise software functionality rather than housekeeping tasks (which differentiates it from other architectures that just focus on using Intel processes alone with flash in the backend), but also allows the platform to be extensible and scalable which is required to meet derives requirements such as those used in Social innovation solutions!
On the topic of architecture, this is where I will pick up in my next post (again hopefully tomorrow).