Boosting Streaming Analytics with Machine Learning for Deeper Insights

  /  Artificial Intelligence & ML   /  Machine Learning   /  Boosting Streaming Analytics with Machine Learning for Deeper Insights
Machine learning

Boosting Streaming Analytics with Machine Learning for Deeper Insights

We live in a world of increasingly smart and interconnected devices. From power grids to industrial control systems and vehicle fleets, IIoT devices are proliferating. Recent estimates project that there will be well over 24 billion IoT devices within the next four years, and this abundance of devices will generate a tsunami of data. How can our streaming analytics systems possibly track and analyze the billions of telemetry messages that these devices will generate? How can they analyze incoming messages with the speed and accuracy needed to gain key insights and take effective action in the moment? The combination of in-memory computing technology, digital twin software, and machine learning techniques can help address this challenge.

Consider, for example, a streaming analytics system that monitors a geographically distributed power grid looking for service interruptions or issues that could lead to outages and safety threats, such as wildfires. This system needs to be able to quickly sift through large volumes of data from sensors and devices distributed across the power network, analyze this data to spot new issues, and generate alerts that enable timely action. Big data tools that analyze data stored offline and query-based systems that examine log files cannot provide the fast and thorough introspection needed to identify emerging issues and react before small problems become large ones.

Because of its ability to seamlessly scale computing power to track millions of data sources while delivering results in milliseconds, in-memory computing technology can break the logjam in streaming analytics. Running on a cluster of servers hosted in the cloud or on-premises, an in-memory computing platform can host software components called “real-time digital twins.” These software twins ingest telemetry from specific devices, like nodes in a power grid, and analyze it in real time. By maintaining a continuously evolving model of each device’s dynamic state, the digital twin’s analytics algorithm can immediately spot issues and send out alerts when needed. Thousands (or even millions) of real-time digital twins simultaneously track all data sources and enable both highly granular and aggregate analysis of incoming telemetry. They tackle the dual challenges of handling large volumes of incoming data and providing actionable, real-time results.

Streaming analytics code is typically implemented using popular, object-oriented programming languages, such as Java and C#. By refactoring this code so that algorithms can focus on telemetry from a single data source and maintain state information about that source, real-time digital twins simplify development. At the same time, they give the in-memory computing platform a basis for scaling overall processing throughput while delivering fast results. However, this software architecture and in-memory orchestration framework do not alone enable developers to tackle the often daunting challenge of developing analytics algorithms that find patterns within incoming data indicating the need for alerting and action.

Because the underlying processes which lead to anomalies and device failures are often not well understood, the streaming analytics algorithms needed to detect them may be unknown or difficult to develop. Machine learning (ML) techniques can help solve this problem. ML algorithms can be trained with sample data sets to recognize abnormal patterns based upon previously captured telemetry messages that have been classified as normal or abnormal. After training and testing, ML algorithms can then be put to work monitoring incoming telemetry in real-time and signaling alerts when they observe suspected abnormal behavior. Rather than spending large amounts of time and resources crafting code to uncover insights in telemetry, developers can instead employ ML algorithms in many applications.

Combining ML with digital twins adds more capabilities. Running ML algorithms within real-time digital twins enables thousands of data streams to be automatically and independently analyzed in real-time with fast, scalable performance. This combination gives the streaming analytics system highly granular results by separately identifying anomalies for each device. The ML algorithm tackles the development challenge, and the in-memory computing platform provides the scalable computing power.

With IIoT devices generating more data than ever before, the incorporation of in-memory computing technology, digital twin in data analytics and software, and ML techniques offer a powerful new way to enhance situational awareness and enable fast and informed decision-making. This combination of technologies gives operational managers and data professionals rapid insights with deeper introspection than previously possible. Now they have the ability to effectively act on the torrents of telemetry that their systems generate every day.


About The Author

Dr. William Bain is CEO and founder of ScaleOut Software, which has been developing software products since 2003 designed to enhance operational intelligence within live systems ...