Fixing the Oil & Gas Data Contango With Edge Machine Learning
It’s hard to imagine with today’s gas prices that just 2.5 years ago we were dealing with crashing spot prices relative to futures — so called contango — due to storage and transportation costs. Oil & Gas (O&G) enterprises are in contango again — but this time with data.
Data: A Cost Today For (Hopefully) Value Tomorrow
McKinsey & Company estimates that 99% of IoT data generated in O&G is never used for decision making. But at the same time this data is incurring real costs to ingest, process, and store. For a sense of scale, just at the upstream level, exploration and production generates about a terabyte of data per day per rig from a variety of sensors measuring everything from temperature, pressure, fluid viscosity, to seismic activity.
The promise of AI is that it can examine these vast troves of data to generate actionable insights. Already data scientists have developed a variety of machine learning models to optimize consumption and costs, predict equipment failures and maintenance requirements, optimize remote field operations, and improve safety. And yet we continue to see O&G enterprises up and down the supply chain struggle to operationalize these models into real world conditions. Why?
Why Does AI Value Remain Elusive in Oil & Gas?
In our experience working in IoT not just with energy companies but also manufacturers, distributors, and telco firms, the blocker isn’t developing models in the cloud, but rather deploying them in the field, close to the data source and, ultimately, the decisioning.
That’s due to three main blockers preventing AI from generating value from this data at the edge:
- The distance to consistent internet connectivity
- The inability to monitor the ongoing performance of models in real world conditions
- And the compute-constrained nature of the edge environment.
1. Distance:
The sites where hydrocarbon exploration, production, transportation, and refinement happens are usually remote. Which means:
- The location might not have an internet connection in order to deploy a model trained in the cloud or to relay sensor data back to the cloud.
- Connectivity that is available may not be reliable enough or have sufficient bandwidth for the application.
- Even if broadband connectivity is available, the latency (delay) from relaying data from the source to the cloud, running a model, and then returning the results back to those on the ground may be too high, particularly for control loops where latencies are measured in milliseconds.
Satellite internet services offer the option of connectivity in remote locations and the next-generation LEO (Low-Earth Orbit) constellations such Starlink and OneWeb offer improved bandwidth and latency. However, these services are still impacted by harsh weather which can reduce uptime below that required for critical operations. The solution is local model deployments, either on-device or to a local server in the field, which provide consistent availability and latency and transmits the monitoring and observability data when connectivity permits.
2. Model Monitoring:
Data science teams can focus so much on just deploying models and running models at the edge, that they forget to think about the day after a model is deployed. The environment is continually changing so the conditions on which a model is trained might no longer hold.
Consider a model for predicting when a certain piece of equipment might fail based on sensor data: as the ambient temperature changes so might the significance of certain signals coming from the sensors, so a model for summer might need to be updated for colder winter months. More broadly, do your edge ML operations have the ability to monitor performance, push updated models to your fleet (or a portion of your fleet), and return the observability data — inference inputs and outputs — for continuous analysis? Observability data allows automated tools to perform continuous statistical analysis, comparing current runs to prior behavior to detect data or model drift and find problems before that turn into failures.
3. Edge environment compute constraints:
Edge devices are frequently resource constrained in terms of CPU power, memory, and/or network bandwidth. Having them offload the inference load to a remote data center is one solution, of course. But what happens if that introduces too much latency or requires more bandwidth than is available? ML teams need the flexibility to deploy model pipelines anywhere, and everywhere, from on-device to the cloud. However, whether running on-device, on a local server, in a near-by micro-datacenter near-prem, or in a traditional data center, what’s needed is a specialized ML inference engine designed to run efficiently and perform consistently across this wide range of environments and provide the monitoring capabilities data scientists depend on.
***
In today’s dynamic world the oil and gas industry is one of the most important industries for any economy. Each day the industry faces challenges such as equipment failures, leakages, safety issues, and economic penalties. But by operationalizing machine learning at the edge using a dependable deployment platform that can work in the cloud and at the edge, the industry can start generating value from more of their data to provide early detection of faults, proactive maintenance reminders, dynamic flow control, and leakage detections.