Hu Yoshida

What’s Required For 142 Year Data Retention?

Blog Post created by Hu Yoshida Employee on May 21, 2015

I just became a grandfather for the second time just as this issue of Time Magazine was published. I now have two grandsons and hope to see them grow up to carry on the family name.



When I thought about this article, I was staggered by the records retention requirements for their lifetimes. Not only has my youngest grandson had more pictures taken of him in his first few months than I probably had in my whole life and there must be hundreds of copies of those pictures and videos in the cloud. If he is one of the lucky ones to live to 142 years, all his medical history will be retained for his entire life plus 20 years or whatever the retentions requirements will be at that time. Only the cloud could be able to retain that much data over such a long period of time


In 2006 SNIA set up a task force to address the challenges of a 100-year archive. We have already blown through the 100-year time line. In 2006, long-term generally meant greater than 10 to 15 years, a period beyond which multiple physical media and logical format migrations must take place. Overall, those surveyed felt that current practices were too manual, prone to error, costly and lacked adequate coordination across the organization. In addition, information classification and collaboration between those who own information and administrating groups were both recognized as very important practices that needed to be addressed.


Today there is a solution for this massive data retention challenge. The solution comes in the form of the next generation Cloud and object storage.


Hitachi Data Systems commissioned The Economist to assess the state of cloud today; research the lessons learned; and recommend best practices to prepare for the next generation cloud. They found that “Cloud computing has matured considerably since the early days when it was a largely unproven form of IT service delivery, provided by companies that were relatively new to IT.” Now the cloud offers a wider range of services, with improved security practices, and improved customer services. The next generation of cloud will be a core component of our corporate and personal landscape. Even banks are moving to public cloud to enable more investment in innovation than in Infrastructure. The commercial incentives for companies to move to the cloud are being driven by improved business performance and the ability to migrate data across generations of infrastructure.


So that answers the aging infrastructure challenge, but how do you handle the information classification and organization of data so that you can find that one picture or medical report in an ocean of data a 100 or more years from now?


Greg Knierieman blogs about his view on cloud and comes to the conclusion that, “..control of data is one of the most significant features of a cloud solution.”

How do you know the data hasn’t changed; that it has been secure and you can show a chain of custody during all those years of retention? That is where an object store like Hitachi’s Content Platform (HCP) comes in. HCP virtualizes the data from the application so that it can be used by any application using open Ethernet interfaces. HCP indexes the data during ingestion, appends meta data that describes the content and policies that govern the data. The data is hashed on ingestion to prove immutability when the hash is compared on access. Additional meta data can be appended when needed, and data is encrypted, compressed, single instanced, and scrubbed when retention is no longer required 160 year from now. All this activity is logged so you always know what has happened to the data and meta data.




What about the longevity of HCP itself? Software must be able to evolve over time without disrupting their users access to legacy data. HCP has been able to do this since it was announced 8 years ago as the Hitachi Content Archive Platform. The name change from HCAP to HCP expanded its direction from an active archive to an object storage system. The core of an object store is a search engine. When HCAP was developed, this search engine was based on FAST (Fast Search And Transfer), a product of a Norwegian company of the same name. Microsoft acquired FAST in 2008 and shortly thereafter the Index platform was transitioned to Solr, which is an open source enterprise search platform. The conversion of the core search engine in HCP was done without disruption to HCAP/HCP customers. Another example of how HCP is evolving was the introduction of the HCP S10, storage node. Instead of FC SAN attached storage, the S10 is connected to HCP nodes through S3 (Simple Storage Service) protocol. The S10 also uses erasure coding, which enables the use of low cost commodity, high capacity, disks, with the assurance of high availability.


HCP can also move data from cloud to cloud in the event that your current cloud vendor is not in business a100 years from now. Hitachi has been in business for 105 years and we expect that Hitachi will continue to be in business for the next few hundreds years and more. While our time here is short, our data will live on.