Project managers use reference architectures to outline technical discussions and define best practices. It’s a template we fit our ideas and debates into. Reference architectures run from high level to detailed abstractions. It starts with a management point of view of the concepts. [All hail Jerry Woolf, inventor of the dry erase whiteboard marker.] Next, IT architects craft a more technical view. This becomes “the plan”. Next comes, the gut wrenching discussion “You want how much money?” Later, detailed workflows of data movement are defined. These are pages full of complex details that only a programmer could love. Finally, programmers turn workflows and components into code.
Recently, Hitachi Data Systems and Teradata formed a partnership focused on the Internet of Things. A management level reference architecture for the alliance is found here: Three Links in a Chain. It describes edge computing, the IoT Platform, and the analytic ecosystem. Let’s drill down one level into the Hitachi-Teradata reference architecture.
Figure 1: Reference Architecture for Hitachi Lumada and Teradata Analytics
Intro to Hitachi Lumada Architecture Components
Lumada is Hitachi’s IoT platform. It supports business solutions such as smart industry or smart energy applications. It begins with an edge SDK that is callable by Python or Java programs. MQTT and other protocols provide continuous streams of real-time sensor data. Hitachi’s data integration provides vital data preparation but you can add to this easily. Data preparation is done in batch and real time processing. It also blends in external data.
Identity services provide the three A’s of security: Authentication, Authorization, and Audit. Studio enables creation of high-quality custom applications from pre-built components. Asset and user management are subsystems to administrate edge devices and people. Orchestration includes all the policies and controls placed on things, people, and processes. Not shown is a module library of reusable components. This includes dashboards, workflows, reports, metadata models, and analytics.
I am enamored with the sensor data storage, cluster services, and asset registry. Sensor data is stored in Apache Cassandra. Cassandra provides a scalable, high-performance cluster database for time series data. Underneath Lumada are scale-out cluster services. Lumada architects wisely built upon a scalable foundation for fast response time. Thus, Lumada can cope with small or huge amounts of sensor data. Why the excitement? Because for many IoT platforms, scalability is a problem all by itself. Unforseen scalability complexity has defeated many implementations. Fortunately, Lumada rests on a scalable clustered foundation whether it’s on premises or in a cloud.
The part I am most excited by are the ‘Avatars’ inside the asset registry. An avatar is a digital model of a physical machine or device. It contains the device specifications, properties, and metadata. It provides KPIs, current and historical state, and events for 'things'. Lumada also lets you design avatars, make inquiries, and invoke visualizations. Tools such as Qlik, Tableau, and MicroStrategy can provide reports, dashboards, and visualizations. Avatars can include rules and algorithms. When the avatar model is robust, simulations can predict device behavior. Poke an avatar with too much digital heat and watch how it behaves without catching fire. Industrial designers do this to rethink end-to-end design, manufacturing processes, and device usage. Machine owners can better predict maintenance cycles, yield quality, and costs reliably. Sweet!
Intro to Teradata Architecture Components
Teradata broke away from traditional data warehouses back in 2010. The result is the Teradata Unified Data Architecture (UDA). The UDA combines data lakes, data warehouses, data integration, analytics, and discovery zones into an analytic ecosystem. Where did this lead us?
The data warehouse is still an indispensable point of data gravity. It contains the corporate data model in thousands of tables. Feeding into the data warehouse is ERP, CRM, supply chain, risk, and other data. Pentaho and other components participate in the scalable ingest framework. Teradata Listener is a self-service ingest tool for real-time data streams. Listener works well with sensor data arriving in MQTT format. Kylo is open source for clients that generate thousands of flat files daily all over the world. Manufacturers and transportation companies tend to need Kylo. Thus, the Teradata ingest framework connects to Lumada’s ingest brokers.
Originally, all data was well behaved rows and columns stored in relational tables. For example, all the financial data a CFO needs is tabular like Excel spreadsheets. How can a CFO exist without spreadsheets? But then came data structures that didn’t fit into rows and columns. The first was XML, then JSON, then Avro. Next came IoT sensor data which is a continuous time-series. Sensor data often looks like a saw tooth pattern. Thus, Teradata added data types and functions to accommodate unstructured data. This enables parallel processing of any type of data. It’s not that hard, really. I mean creating new data types. Parallel processing is incredibly difficult to do right.
Back in 2007, Teradata entered a partnership with SAS®. We started embedding SAS algorithms "in-database." This means SAS procs now run in the parallel database engine. In 2006, data scientists had to extract 50GB of data to a server and run SAS procs for hours. In 2007, the SAS proc could run in parallel. Elapsed times of days became hours, hours became minutes. In-database parallel performance also applies to Fuzzy Logix, R, MicroStrategy, and Python scripts. Analytic tools went from automobile speed to rocket speed.
Geospatial data is also not a simple data structure. You can’t just add up longitude and latitude. (Hint: geospatial data comes from complex sensors.) Thus, Teradata implemented 2.5D geospatial storage and processing. This is vital for ‘things’ that move around. Do you know where your $150,000 tool is today? When and where was the airplane when it malfunctioned? Geospatial ANSI SQL makes it really easy for users to do location-aware analysis. Similarly, time-series data is easier to analyze with ANSI SQL temporal functions. Temporal functions can provide 80X data compression and faster analytics for sensor data. Add to this special time-series tables that deal with data gaps and sliding windows of time. This is perfect for analyzing sensor data at scale.
Teradata App Center is a framework for encapsulating processes. Anything the developer or business user codes can be encapsulated into a module. That module then appears in App Center somewhat like your mobile phone apps. Imagine a sensor data scrubber and conditioner built from Python. Turn it into a reusable app. Apps can run scripts, software modules, submit SQL, or fire up a visualization. Both the programmer and business users use the apps. Bonus: shadow IT code is captured in one place.
Data Labs are storage pools and tools for ad hoc experimentation. Teradata ended up with multiple solutions in this category. The first was Teradata Data Lab where business users upload and process data in the data warehouse. That’s pure ad hoc experiments joined to production data. Then came Aster Database which is an amazing data scientist parallel workbench. Aster contains hundreds of built-in algorithms including machine learning. The UDA also includes Hadoop data lakes. Our Think Big experts often build data discovery zones there.
Teradata QueryGrid is the glue that pulls together the entire UDA. It is a federated SQL query service connecting two parallel systems together. That’s a huge differentiator for Teradata. QueryGrid can run a parallel query in Teradata Database doing a parallel join to Hadoop data. Or we can run a parallel query using Presto to access data in Cassandra in parallel. This provides superior performance plus data placement options.
Rivers of Data
Figure 2: IoT Data Work Flows
Let’s follow the sensor data through the Lumada-UDA infrastructure.
- Sensors at the edge emit time-series data in a wide variety of formats and frequencies.
- Lumada core uses message brokers to capture sensor data into Cassandra. Lumada administrates the IoT devices and people worldwide. Lumada also supplies the data to business applications.
- Pentaho Data Integration and Teradata Listener distribute the sensor data to subscribers.
- Teradata QueryGrid peers access the Asset Registry and Sensor Data Storage in parallel.
- Sensor data is prepared and stored in the data warehouse or data lake as needed. Data transformations are done here and throughout the ingest architecture.
- Corporate data is continuously added to the UDA. This is ERP, CRM, supply chain, financial, and external data.
- Sensor data can be joined together with corporate data at parallel speed.
- Teradata App Center and Business Analytics Applications access the UDA data. This provides deep analytics, machine learning, and visualizations.
Final Observations on Hitachi Lumada/Teradata UDA
Dr. Michael Porter is a Harvard professor and author of 18 books on competitive strategy. He sums up the purpose of the Hitachi Lumada/Teradata Unified Architecture better than I can. In “How Smart, Connected Products Are Transforming Companies” he explains to everyone the value of sensor data. He explains
“This new product data is valuable by itself, yet its value increases exponentially when it is integrated
with other data, such as service histories, inventory locations, commodity prices, and traffic patterns...”.
- This new product data is valuable by itself. Lumada is needed to collect and extract value from sensor data.
- yet its value increases exponentially when it is integrated with other data. Joining sensor data to corporate data is the other half of the ROI.
So a control tower for sensors, devices, people, and avatars is vital. Don’t try this at home. Get a solid IoT platform like Hitachi Lumada. Then add the corporate data and deep machine learning.
With over 30 years in IT, Dan Graham has been a DBA, Product Manager for the IBM RS/6000 SP2, Strategy Director in IBM’s Global BI Solutions division, and General Manager of Teradata’s high end 6700 servers. He is currently Technical Marketing Director for the Internet of Things at Teradata. Dan lives in Silly Con Valley, except when he doesn't.