Hu Yoshida

Creating A Centralized Data Hub

Blog Post created by Hu Yoshida Employee on Apr 10, 2017

In my trends for 2017, I called out the movement to a centralized data hub for better management, protection and governance of an organization’s data.



“2017 The year of the Rooster teaches the lessons of order, scrutiny and strategic planning.”


Data is exploding and coming in from different sources as we integrate IT and OT, and data is becoming more valuable as we find ways to correlate data from different sources to gain more insight, or we repurpose old data for new revenue opportunities. Data can also be a liability if it is flawed, accessed by the wrong people, is exposed, or is lost, especially if we are holding that data in trust for our customers or partners. Data is our crown jewels, but how can we be good stewards of our data if we don’t know where it is: on some one’s mobile device, an application silo, an orphan copy, or somewhere in the cloud? How can we provide governance for that data without a way to prove immutability, and show the auditors who accessed it when, and how can we show that the data was destroyed?


For these reasons, we see more organizations creating a centralized data hub for better management, protection and governance of their data. This centralized data hub will need to be an object store that can scale beyond the limitations of file systems, ingest data from different sources, cleanse that data, provide secure multi-tenancy, with extensible meta data that can provide search and governance across public and private clouds and mobile devices. Scalability, security, data protection and long term retention will be major considerations. Backups will be impractical and will need to be eliminated through replication and versioning of updates. An additional layer of Content Intelligence, can connect and aggregate data, transforming and enriching data as it’s processed, and centralize the results for authorized users to access. Hitachi’s content platform, (HCP) with Hitachi Content Intelligence (HCI) can provide a centralised, object data hub with seamlessly integrated cloud-file gateway, enterprise file synchronization and sharing, and big data exploration and analytics.


Creating a centralized data hub starts with the ingestion of data which includes the elimination of digital debris and the cleansing of flawed data. Studies have shown that 69% of information being retained by companies was, in effect, “data debris,” information having no current business or legal value. Other studies have shown that 76% of flaws in organizational data are due to poor data entry by employees. It is much better to move data quality upstream and embed it into the business process, rather than trying to catch flawed data downstream and then attempting to resolve the flaw in all the different applications that are used by other people. The Hitachi Content Intelligence software can help you cleanse and correct data that you are ingesting and apply it to the aggregate index (leaving the source data in its original state), or apply the cleansing permanently, when the intent of ingest and processing is to centralize the data on an HCP via write operations.


When data is written to the Hitachi Content platform; it is encrypted, single instance stored with safe multitenancy, with system and custom metadata, and replicated for availability. The data is now centralized for ease of management and governance. RESTful interfaces enable connection to private and public clouds. HCP Anywhere and Hitachi Data Ingestor provide control of mobility and portability to mobile and edge devices. Hitachi Content Intelligence can explore, detect, and respond to data queries.


HCO suite.png


Scott Baker our Senior Director for Emerging Business Portfolio recently did a webcast about the use of this HCP suite of products in support of GDPR (General Data Protection Regulation) which is due to be implemented by May 25, 2018 and will have a major impact on organizations that do business with EU countries. The Transparency and privacy requirements of GDPR cannot be managed when data is spread across silos of technology and workflows. (You can see this webcast at this link on BrightTalk)


In this webcast he gave a use case of how Rabo Bank used this approach to consolidate multiple data sources to monitor communications for regulatory compliance.

Rabo architecture.png

Rabo Bank is subject to a wide range of strict government regulations and penalties for non-compliance over various jurisdictions with too many independently managed data silos, including emails, voice and instant messaging and some data stored on tape. The compliance team was reliant on IT for investigations which limited their ability to respond and make iterative queries. Regulatory costs were soaring due to the resources required to carry out data investigations across silos. The results of implementing the HCP suite of products are shown in the slide below.

Rabo Results.png

For more information on this use case for a centralized data hub you can link to this PDF