Answers to your questions for the Data Lakes session at All Things IIoT Day

  /  Industrial IoT   /  Connected Industry   /  Answers to your questions for the Data Lakes session at All Things IIoT Day
Q&A Data lakes

Answers to your questions for the Data Lakes session at All Things IIoT Day

Data lakes are becoming increasingly popular as companies accumulate vast amounts of data, both structured and unstructured. These are the questions and their corresponding answers after Hitachi Vantara’s session on Data Lakes during the All Things IIoT Day.

What are some examples of Unstructured time series data?

Video is one type.  With CV techniques, anomalies can be detected, and the video frame can be timestamped for reference.  Our Pentaho software can help add the appropriate metadata in these cases.

Are IoT devices susceptible to cyber-attacks, and if so, is firmware updated to reduce the risk of attacks?

IoT devices are just as susceptible to cyber-attacks as any other device on the internet. TPMs are often used in hardware to ensure that only trusted software is run on a device, which minimizes the risk of malware infection.

Which Database is used for Data Lake?

Data lakes typically comprise storage and commonly used file formats that reside on top of HDFS, Object Stores, S3, etc.

How do you synchronize various data in various DBs used in your data lake with respect to timestamp?

Syncing with NTP servers and using GMT everywhere are common methodologies for synchronizing data with timestamps.

Is it possible to add context and relations to the raw data in Data Lake technology?

Yes, it is possible. Pentaho software is particularly useful in enhancing metadata for context and relation building. One way to do this would be to convert the data about a device’s ID and location into a geotag and augment it with that device ID’s OT data when it’s stored in the Data Lake.

How do you link the same timeline for different kinds of DBs being used in your Data Lake?

Similar to the above, one method is to ensure the raw data is timestamped accordingly, no matter where it may reside, in a time-series DB or otherwise.

Are you seeing the need for real-time alerts + automation by combining IT+OT data –  that can drive greater business agility from customers and how does Hitachi support these efforts currently?   What is role of Hitachi Lumada vs Pentaho vs other Storage offerings in this regard?

There’s undoubtedly a need for real-time alerts to minimize whatever impact that alert may be triggered. However, enhancing that OT alert data with relevant IT data to add more context can help accelerate rectifying the issue.  For example, where conditional choices may exist, maintenance records of an asset (e.g. asset health score) may be used to give feedback to automation systems; this more relevant information may enable that system to take the optimal route to rectify an issue.

Is it possible to extract a small-time section from a video and synchronize it with trend chat?

Yes, it is possible. Timestamp metadata can be extracted or calculated based on creation date and sample rates to sync with a trend chart.

What is an example of a small footprint OT deployment?

For our latest version of Pentaho 9.4, containers as small as 128 MB (with minimal functionality and growing with complexity) can be implemented to work with OT data.