While flash arrays are a better fit for the highly random access demands of consolidated infrastructure and high performance analytics than spinning disk, flash technology requires more management in order to realize consistent high performance and durability. There are clear distinctions between the vendors in how they leverage the advantages of flash (random access and low latency) and compensate for the weaknesses of flash (durability and write magnification). Flash arrays have two basic components, the flash array controller, and the flash media. Both the controller and the media must be optimized for the characteristics of flash in order to get the best performance and durability out of flash technology. All flash array vendors can differentiate their technology in the array controller, but most must depend on the general market for flash device technology.
Enterprise disk array controllers have been focused on increasing functionality for high availability, scalability, and performance through intelligence in the controller and caching, since there was very little that could be done with the disk media. While disks are mechanical devices, which have had a phenomenal run in decreasing cost/capacity, very little could be done to increase performance. In the almost 60 year history of the disk drive, these performance parameters have only increased 3 times, while capacities have increased over 8 orders of magnitude.
Flash media is a programmable device, which is ideal for random access since it has no rotational latency, nor mechanical seek times. However, flash media is much more complex since there are many more parameters to manage for performance and endurance than rotation and seek times. Unlike disks, data must be written to pages of cells that have been formatted with ones. The formatting is not done on an individual page basis but is done to blocks of pages. When a page in a block gets updated, it must be written to a new formatted page and the old page is marked invalid and cannot be used. Eventually the block gets fragmented with invalid pages, and garbage collection is required to recover capacity. The remaining valid pages are read and rewritten to a new block before the old block is reformatted to ones. Unlike disk, flash media degrades with each write, each read, and over time. Flash technology requires electrons to be captured in a floating gate, which is insulated by thin oxide layers. While Flash is considered to be non-volatile, electrons are leaking all the time, and reading data applies a voltage, which drives out more electrons. This requires the contents of a block to be refreshed periodically onto another preformatted page. This rewriting of flash pages in addition to the rewriting of pages for garbage collection is called housekeeping and causes “write amplification”, which degrades the life of the flash media and impacts performance. Other house keeping functions that need to be processed in the flash device include page to block mapping, wear leveling, compression, extended error correction codes, the management of spare pages, and monitoring of performance and durability. With single core processors that are found in most SSDs, house keeping has the most impact on performance since it blocks I/O access while it moves blocks around in order to reformat pages. IOPs may be degraded by 75% when house keeping is in process. When executing a performance test on flash arrays, it is important to prime the array first by writing to all the flash cells several times, to simulate how the array would operate under normal conditions with housekeeping.
Therefore the performance of a flash array not only depends on the control path and caching in the array controller, but more of the performance is dependent on the processing in the flash media. When flash was initially introduced for enterprise storage, many vendors used the same SSD’s that were designed for commodity PCs and servers and attached them to storage controllers that were optimized over the years for spinning disk. This did not leverage the full capabilities of flash technology either in the controller or the media.
In the past few years, many new startups have brought out all flash arrays (AFA) where the controllers are optimized for SSD’s. Some have also designed the controllers to scale out over infiniband or Ethernet, in order to increase capacity and performance. Currently the scale out bandwidth is in the order of 40 Gb/s, over which global access is provided for meta data and data. Some try to compensate for the impact of house keeping by buffering in the controllers and writing full blocks of pages to the device, and essentially do the housekeeping in the controller. Doing housekeeping in the controller has the disadvantage of not having awareness of changes that are initiated in the device for refresh or error correction code purposes. New AFA developers also do not have the years of experience in developing enterprise controllers; as a result many functions that are expected in enterprise arrays are lacking or require additional appliances for features like replication.
Hitachi Data Systems supports SSDs with code in our enterprise storage controllers that is optimized for significantly improved I/O processing, higher internal thread count, and faster internal data movement, without compromising the many enterprise storage services and features that were already developed. Our G1000 storage array scales out internally over a high performance internal switch, which supports 786 GB/s for user data bandwidth and 128 GB/s for the control or meta data bandwidth (note that this is GB/s per byte and not Gb/s per bit).
In addition to these enhancements to the controller Hitachi also built its own flash device to optimize the performance and durability of flash technology.
Recognizing the increased processing demands of flash media, Hitachi developed a Flash Media Device (FMD) with a quad core processor; with embedded 8 lanes of PCIe v2.0 that interfaces to a SAS target mode controller, and 32 parallel paths to the flash DIMMs. This approach enables the parallel processing of multiple tasks. Of most importance is the removal of housekeeping tasks from the I/O path (garbage collection, wear leveling, ECC, and so forth), eliminating performance degradation due to host I/O blockage.
The FMD also eliminates many of the writes, which increases endurance and performance. The flash controller chip supports a feature, called “block write avoidance”. Any data stream of all "0s" or "1s" is recognized by the algorithm, in real time, and remaps the data with a pointer. This is especially helpful in the case of parity group formatting where most of the writes are eliminated and the space savings is used to increase the over provisioned area for other writes.
By extending the ECC (Error Correction Code) to 42 bits of ECC for every 1KB of data written, the FMD ECC is able to correct up to 42 bits per 1.4KB, which exceeds the standard MLC spec of 24. Although the extended ECC requires more processing cycles it extends the readability of pages and delays the need to re-write the page, reducing write amplification. The data diagnostic and read retry functions of the FMD are partnered with the periodic data diagnostic and recovery functions resident within Hitachi storage controllers. If the bit errors exceed the ECC correction capability of the FMD, then the data is read out by the read retry function. This function adjusts the parameters of the flash memory and reads the data. The area is then refreshed, meaning that the data is read and copied to a different area before the data becomes unreadable.
Hitachi was also able to support the first version of the FMDs with 1.6 TB of usable capacity while SSDs at that time were limited to 400 and 600 GB due to lack of processing power in the media device. Currently Hitachi’s FMD are shipping in 1.6 and 3.2 TB capacities. With the combination of Hitachi Accelerated Flash code in our storage controllers and FMD devices we were able to achieve the highest SPC-1 performance benchmark results in February of this year.
The flat performance curve shown above is the result of improvements in the both the controller software and improvements in the Flash Module Device. No other vendor can claim 2,004,941.89 SPC IOPS with a response time of 0.96 ms under 100% load from an SPC-1 benchmark. This was accomplished with only 64 FMDs and the list price for the full enterprise class configuration came in at USD $1.00/SPC-1 IOP. Other flash array vendors are limited to improvements in the controller and dependence on off-the-self flash media, which limits their ability to provide the optimum performance for flash technology. While this may be all right for application specific purposes, this does not support the requirements for enterprise arrays.
Using custom flash modules Hitachi is able to provide greater integration and improved performance and density than off the self SSDs. AFA vendors will argue that using custom modules is effectively betting against the entire SSD industry where companies with decades of experience building enterprise logic and media chips as well as startups attract some of the best engineers in their fields. Hitachi has been in the business of building enterprise logic and media chips as long as anyone else and continues to attract many of the best engineers. In order to leverage their costs, SSD vendors have targeted the commercial markets with the higher volumes, and have been slow to invest in the processing capabilities required for enterprise performance, endurance, and capacity. Hitachi will continue to invest in developing leading edge non-volatile memory devices.
In addition to FMDs, Hitachi also supports the smaller capacity SSDs as well as SAS disks to enable customers to optimize their storage for different business requirements. Hitachi recognizes that there is an opportunity for lower cost, application specific, AFAs that can deliver value using commercial grade SSDs. Although AFAs are limited in scalability and enterprise functions like tiering and replication, they can be effective for tactical deployments of a single or limited number of applications of moderate size.