Hu Yoshida

Digital Debris and Information Governance

Blog Post created by Hu Yoshida Employee on Jan 26, 2017


Digital data is exploding all around us with exponential growth ahead as more IoT applications come on line. Many companies are storing petabytes of data and some are looking at exabytes in the near future. In fact, we will not be able to store all the digital data that is being created. I blogged about this in April of last year where I noted that today we can only store about a third of the digital data that is created. In three short years, by 2020, when there will be about 44 Zetta Bytes of digital data, we will only be able to store 6.6 % of this data due to limitations in our ability to deliver storage hardware and software, and the rest falls off the edge. (44 Zetta bytes is estimated to be more than all the stars in the universe) So the question will be what can we store and what falls off the edge.


We also have to look at how much of the data we store is actually necessary for us to retain. A survey done by the Compliance, Governance and Oversight Council (CGOC) in 2012 showed that most companies retain significantly more information than they need for business or legal reasons. According to this survey;

  1. Only 1% of information being retained by the companies surveyed was subject to legal hold requirements (i.e., required to be preserved because it related to the subject matter of actual or reasonably anticipated litigation or regulatory proceeding).
  2. Only 5% was subject to regulatory retention requirements.
  3. Only 25% had temporary business value.

The remaining 69% of information being retained by the companies was, in effect, “data debris,” information having no current business or legal value. 


I have not been able to find a more recent survey but I suspect that with the explosion of data since 2012 and the increasing use of mobile, social, cloud and analytics that the proportion of data debris is much higher today.


EDRM (Electronic Discovery Reference Model) is a coalition of consumers and providers working together since 2005 to create practical resources to improve e-discovery & information governance. EDRM published a white paper on Disposing of Digital Debris in 2014 which is still applicable today. In this white paper they provide recommendations on defining and identifying digital debris and propose a coherent Information Governance strategy for cleaning up digital debris. “A digital disposal program must be defensible and requires:


  • People: Leadership and commitment to guide transformational change
  • Policies and Processes: Rules, regulations and procedures that link information duties and value to data assets; and information demand to infrastructure supply.
  • Technology: Tools that enable IT to implement and execute information governance policies and procedures.


With this three step approach, organizations can begin to reduce the risk and overhead costs associated with the risky retention of digital debris.”


This white paper emphasizes that while most organizations have records management policies in place and published on a website, it does no good if they do not put technology in place to support this initiative. Technology tools to implement and execute is often missing. The paper recommends that organizations leverage technology that can “automate legal holds, records retention, de-duplication, storage tiering, and deletion of data with no business, legal or regulatory value. To simplify overall implementation, it is desirable to use technologies that support a number of these capabilities within a single platform.

Ideally, the chosen technology platform must also provide a central catalog itemizing the classes of and sources of data of end-users. Policy makers in legal, records, business and compliance must be able to view, understand and share this catalog.”

This description is exactly what can be done with the Hitachi Content Platform and Hitachi Content Intelligence. This solution helps you gain actionable business insights with intelligent exploration of all your data so you can:

  • Locate and identify the most relevant data regardless of its type or location
  • Identify data value with automated cataloging, transformation and augmentation
  • Access relevant data with richer context available where and when you need it

The combined solution’s WORM functionality together with the object's unique identifier (or digital fingerprint), guarantee immutability and the protection of records from inadvertent and deliberate overwriting. Once it is in the repository, this fixed-content data cannot be modified. Deletions or unintended changes before the retention period expires are prevented by object versioning protection. To modify an object, HCP allows a new, different object to be created from the original.


Since HCP can store multiple versions of an object, this provides a history of how the data has changed over time. Each version is an object in its own right, with system metadata and, optionally, custom metadata. If a copy of a record already in storage is sent for storage by the controlling application, the system will identify that it is a copy, because its fingerprint will be the same as the original, and will “block” the copy’s entry into storage. This means that the storage system cannot accidentally store duplicates of records.


The HCP shredding function ensures no trace of a record is recoverable from disk after deletion. To ensure files are truly unrecoverable, HCP uses a digital shredding feature that overwrites deleted files with a random pattern, a technique that complies with the internationally recognized United States Department of Defense (DOD) specification 5520.22-M. Data shredding actions can be performed on individual objects or configured to adhere to deletion governance policies in place.


Some localities require that certain data be destroyed in response to changing circumstances. For example, companies may be required to destroy particular information about employees who leave. Privileged delete is an HCP feature that enables authorized users to delete objects even if they are under retention. With each privileged delete operation, the user is required to specify a reason. HCP logs all these operations, including the specified reasons, thereby creating an audit trail.


To support legal discovery, users and applications can place a hold on selected objects. While an object is on hold, it cannot be deleted through any mechanism, regardless of its retention setting. HCP facilitates complete and comprehensive monitoring and auditing of all events during the information life cycle. Object tracking and event logging for are available for audit support. All delete actions are logged within HCP and logs can be extracted using the system's auditing mechanisms.


I recommend that your read the EDRM white paper Disposing of Digital Debris and contact your Hitachi Data System representative to see how we can deliver on the technology to execute information governance, policies, and procedures and clean up your digital debris based on your records management policies. The following is a quote from a customer who is using HCP with HCI.


Precision Discovery recognizes the value in HCP with Hitachi Content Intelligence in solving the parts of the litigation lifecycle that are the most difficult to manage. This platform allows us to build custom solutions that greatly reduce customer costs, increase visibility and manage risk. None of this is possible without the intelligence, extensibility and raw power of Hitachi Content Intelligence.

– Howard Holton, CIO, Precision Discovery