Hybrid Architecture for Time Series Data Pipelines
The ability to efficiently manage and analyze time series data is crucial in many sectors and industries. This article explores the development of a hybrid architecture designed to optimize the handling of such data, particularly from IIoT sources. By integrating local and centralized data management strategies, this architecture approach aims to enhance real-time data analysis capabilities, which are essential for industries facing dynamic and unpredictable conditions.
The scenario
Imagine extreme weather events. It’s not hard to do. Now, imagine these events knock out power in a region of your country. Authorities implement a stop-gap response to this crisis by deploying emergency generators to remote areas. However, it’s difficult to know when these machines will run out of fuel. This creates an unpredictable environment, and folks working in industry and manufacturing know that predictability is critical.
Now, imagine you can build a data pipeline that replicates data from local machines into a central hub and provides administrators with a real-time view of fuel consumption, energy production, and more. Using the TIG stack (Telegraf, InfluxDB, Grafana), you can do just that.
Making it work
The data pipeline
The data pipeline for this use case is pretty straightforward. The generators generate data as MQTT and send that data to an MQTT broker. Telegraf, an open source data collection agent, collects the data from the MQTT broker, putting it in an InfluxDB time series database instance at the edge.
The local instance downsamples the local data and replicates it to a cloud instance of InfluxDB that functions as the central hub. You can then use Grafana to query the cloud instance, analyze data, and create visualizations.
The power of SQL and time series
When it comes to managing and analyzing time series data, you want to be able to extract as much information out of your queries. To help with this, InfluxDB natively supports SQL and offers InfluxQL, an SQL-based language that allows for advanced analytical queries. You can query data to return all kinds of critical information, like the minimum, maximum, and average fuel level for a specific generator. You can set fuel level thresholds and alerts to go along with them.
For slightly more advanced queries, you can use the date bin function to perform averages over date bins of time or windows of time. Because time values are consistent across data, you can perform joins on time and calculate correlation analysis between two sets of data. For example, you can track fuel level and flow rate and then filter data so it only selects the rows where the time is within the last ten minutes relative to the current time.
Visualization and alerting with Grafana
InfluxDB natively supports Grafana, an open source platform for data visualization and alerting. Using InfluxDB and Grafana, you can create anything from basic to advanced visualizations.
For instance, Grafana’s adaptability allows you to create a map panel to visualize generator data. You can also partition data by generator IDs, set thresholds for fuel values, and create alerting rules. Overall, Grafana is an extremely dynamic and flexible option for managing and monitoring data, no matter how specific your needs are.
The importance of edge-to-cloud data replication
There are many different reasons why industrial operators would want to use edge data replication. Depending on the available resources at the edge, you can perform data processing and analysis at the edge for mission-critical processes. Or, you can simply downsample data at the edge and send the downsampled data to a central hub. If you have the networking resources, you could even replicate all your raw data to a central hub.
In our example, local operators may use local data to track fuel consumption, enabling them to refill the generator promptly when alerted. Simultaneously, administrators can use the same data to oversee the entire operation in real-time.
InfluxDB’s edge data replication (EDR) feature uses a durable queue to send data between an edge and a cloud instance of InfluxDB. This durable queue is critical because any data you put in the replication bucket gets sent to the cloud. Even if the network connection goes down, the queue holds that data until the connection is back up. At that point, it flushes the queue to the cloud so that there are no gaps in your data.
Conclusion
The combination of edge and cloud environments in the same data pipeline is what makes this a hybrid architecture setup. This approach improves durability, provides local access to data, enables real-time visualizations and inspections, and ensures global access to the data.
To learn more about InfluxDB and how to build a hybrid architecture, visit www.influxdata.com.
 
         
                 
                