In my last blog post I commented on Hitachi Vantara’s selection as one of the “Coolest Business Analytics vendors”by CRN, Computer Reseller News, and expanded on Hitachi Vantara’s business analytics capabilities. CRN’s report positions business analysis tools at the top of the big data tools pyramid to derive insight and value from the ever-growing volume of data. In this post I will be expanding on how we address the rest of the big data pyramid.
Other analysts and trade publications like Network World also refer to this as a Big Data Fabric which is gaining more attention as analytics become a driving force in driving business outcomes. Whether you think of this as a big data pyramid or a big data fabric, the concept is a converged platform that supports the storage, processing, analysis, governance and management of the data that is currently locked up in different application silos and is the biggest hurdle to overcome for developing meaningful and accurate business analytics.
A comprehensive big data pyramid or big data fabric must provide features and functionality, such as data access, data discovery, data cleansing, data transformation, data integration, data preparation, data enrichment, data security, data governance, and orchestration of data sources, including support for various big data fabric workloads and use cases. The solution must be able to ingest, process, and curate large amounts of structured, semi-structured, and unstructured data stored in big data platforms such as Apache Hadoop, MPP, EDW, NoSQL, Apache Spark, in-memory technologies, and other related commercial and open source projects, and do it simply, efficiently, and cost effectively.
The strength of the information fabric you weave is directly affected by the quality of data you stitch together. For any organization that is actively investing in anything remotely close to a data lake, the focus must be on data quality before data use. A key point that many organization seems to miss in determining the worth of their data is that just because data is being collected, that does not mean that organizations are collecting the right data. They may be either collecting very little of something very important or not collecting the right data at all. Data quality impacts business effectiveness
Effective data quality is only reliable when it occurs as close to the point of data creation, and long before it is given asylum in the data center or blended downstream for some other business purpose. This is important because it is here that Hitachi Vantara really shines. Both Hitachi Content Intelligence and Pentaho can be used as, “Data Quality Gateways” designed and implemented in the data stream to affect data veracity and bolster the source of truth expected out of the information fabric. Regardless of whether we are talking about discovery, orchestration, management, governance, control, preparation, etc. focusing on the quality and correctness of data is what makes the information fabric reliable and trustworthy. More importantly, when you perform these veracity activities is up to you – that is the power offered by Hitachi Vantara’s solutions. Certainly, we would continually suggest that it be well before data is stored in your data center, but we do not force that best practice on our customers.
Additionally, our two products allow you to sub-segment your dataset based on the business outcome that is desired. For example, if you are working with data sets in a manner that allows you to answer very specific questions based on known datasets in a time-sensitive manner, Pentaho provides the right solution for you with additional capabilities to blend and visualize that data. If the business outcome is based more on exploratory activities across multiple datasets and allows for time to conduct the exploration, then Hitachi Content Intelligence provides the ideal solution. Both can process the data instream and at contextual levels. Both offer the ability to leave the data where it resides or augment and migrate the data to our Data Services Platform (Hitachi Content Platform) where it can be stored for its lifespan in a compliant, self-protected, and secure manner.
Data is a reusable commodity, where value may be gained from different data points and can continue to provide additional and valuable insights that may not have been imagined in the first analysis. Hitachi Content Platform is the ideal repository for long term retention of Big Data with its geo-no backup-data protection, secure multitenancy, governance and security features, extensible metadata, self-healing reliability and availability, low cost erasure coding storage, cloud gateway, and the speed and scalability that leverages the latest advances in infrastructure technology.
This is a very high-level view of what we provide for Big Data Fabrics. In subsequent posts, I will expand on some of these concepts which differentiate us in the big data space. Hitachi Vantara has the most comprehensive set of big data and big data analytics tools built around our integrated Hitachi Content Platform with Hitachi Content Intelligence, and Pentaho solution sets.