Hu Yoshida

2015 Trends: Big Data, Internet of Things, Data Lakes, and Hybrid Cloud

Blog Post created by Hu Yoshida Employee on Dec 8, 2014



This post is the next to last post in my blog series on top trends for 2015. I will follow up with my final post after the webcast with George Crump of Storage Switzerland on December 10. These three trends cover Big Data and IoT, Data Lakes, and the growing use of Hybrid Cloud trends that support Business Defined IT.



8. Big Data And Internet of Things

IDC has predicted that big data will grow at a 27% CAGR to $32.4 B through 2017 about 6 times the growth rate of the overall information and communication technology market. Other analyst like Wikibon, are even more bullish, predicting revenues of  $53.4B by 2017, as more businesses begin to realize real benefits from Big Data analytics. 2015 will continue to see solid growth in Big Data analytics tools like SAP HANA and Hadoop, which can deliver results in a matter of minutes or hours as opposed to days. Preconfigured converged and hyper-converged platforms will speed implementation of Big Data applications.




While Big Data of today is more about business data with the addition of social sentiment, the Big Data of tomorrow will be more about the Internet of Things (IoT) with machine to machine communications, which will have a bigger impact on our lives. The Internet of Things will help us solve problems in carbon footprint, transportation, energy, smart cities, public safety, life sciences, based on information technology. The new world of IoT will create an explosion of new information, which can be used to create a better world. Batch analytics will give way to data streaming analytics for real time analysis of sensor data, and more intelligence will be incorporated in edge ingestors. Applications, built around the Internet of things, will be introduced by companies that have expertise in sensor analysis and in verticals like surveillance and healthcare. In 2015, IT companies will be partnering with social infrastructure companies to realize the potential of an IoT world.


Hitachi Data Systems has already started down this path by partnering with other divisions in Hitachi. For instance HDS is partnering with Clarion, a member of the Hitachi Group and an In-Vehicle Information Solution Provider. We have announced a research and development partnership for deployment of new data-driven solutions in the next generation of Clarion in-vehicle connectivity products. This collaboration will give drivers, insurance companies, and manufacturers usable insights that will lead to improved auto performance and safety, increasing value across the growing market for connected cars.



9. Data Lake for Big Data Analytics

While there will continue to be a high demand for scale up enterprise storage and compute systems, the growth of unstructured data and the value it has for Big Data analysis will require new types of distributed, scale out storage and compute systems. Pentaho CTO James Dixon is credited with coining the term “data lake”. “If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” These “data lake” systems will hold massive amounts of data and be accessible through file and web interfaces. Data protection for data lakes will consist of replicas and will not require backup since the data is not updated. Erasure coding will be used to protect large data sets and enable fast recovery. Open source will be used to reduce licensing costs and compute systems will be optimized for map reduce analytics. Automated tiering will be employed for performance and long-term retention requirements. Cold storage, storage that will not require power for long-term retention, will be introduced in the form of tape or optical media.



10. Hybrid Cloud Gains Traction

The adoption of hybrid clouds, the combination of private and public cloud is gaining momentum. Analysts like Tech Pro Research are suggesting that 70% of organizations are either using or evaluating the use of hybrid clouds. With the growing competition among public cloud providers, the lower cost of WAN bandwidth, and the ability to control the movement to the public cloud with meta data that is retained within the firewalls of a private cloud, hybrid cloud become a cost effective platform for running enterprise workloads. Much of the data that is created for Big Data and IoT will not be frequently accessed and would be suitable for low cost storage in a public cloud. The data could be sent to a public cloud using RESTful protocols while the active meta data remains in a private cloud, behind the firewall. Object storage systems like the Hitachi Content Platform (HCP) enables the automatic tiering of data into a public cloud while maintaining the encryption and meta data control within a private cloud. It is very inexpensive to store data in a public cloud as long as you do not access it. But while you store the data in the public cloud you want to have control.  Here are some considerations for using a hybrid cloud. You want to retain the meta data under your control so that you can search the data content and only retrieve the data objects from the public cloud when a match is found. The meta data should be extensible so that changes in the use or linkages to the data can be appended to the meta data. You also want to encrypt the data before your send it to the pubic cloud since you don’t know where it physically resides at anytime and if it is moved in the cloud you want to ensure that you data is not left in the clear on the prior device. Data that is moved to the cloud should also be hashed so that when you retrieve the data, you can compare the hash to ensure nothing has changed and you can prove immutability. You should also have the flexibility of migrating data in the background, between public clouds to meet your business needs.


This concludes my series on top trends for 2015. Here is a list of the trends with links to my previous posts. I invite you to join me along with George Crump, analyst with Storage Switzerland, on December 10 in a web discussion of these trends and their impact. Registration and details are available here.


1.  Business Defined IT

Convergence, Automation, and Integration

2. New Capabilities Accelerate Adoption of Converged and Hyper-converged Platforms

3. Management Automation

Continuous Infrastructure

4. Software Defined

5. Global Virtualization adds a new dimension to storage virtualization

6. A Greater Focus on Data Recovery and Management of Data Protection Copies

7. Increasing Intelligence in Enterprise Flash Modules.

2015 Trends: Big Data, Internet of Things, Data Lakes, and Hybrid Cloud

8. Big Data And Internet of Things

9. Data Lake for Big Data Analytics

10. Hybrid Cloud Gains Traction