![]()
Abstract With the ubiquity of machine learning technologies, their adoption in the energy industry is bound to become mainstream. However, a major setback of developing innovative modern machine learning technologies is their significant reliance on large labeled datasets during model training and development. Acquiring and labeling such large datasets can be prohibitively expensive as you need to manually capture and label massive amounts of data points depending on the application. Additionally, when dealing with images in the energy industry, it can be hard to capture images due to elevated safety and security standards as well as sensitive restricted areas. Therefore, an emerging technique that is being utilized is by generating synthetic datasets as an alternative to actual field images for training machine learning models. These modern methods of using synthetic datasets are not exclusive to visual data as we can also simulate acoustic and time-series data, among others. However, in this study we will limit our scope to focus on computer vision related applications and generating synthetic images.
We present a discussion of the 3D rendering software that could be used to generate synthetic visual data, such as Blender and Unity, by outlining their respective strengths and weaknesses. Furthermore, a detailed account of common image augmentations is presented to showcase their effects on model accuracy. In order to ensure that models trained on synthetic data perform well when deployed, we validate the model with a relatively small subset of real labeled images.
The difference in accuracy between real and synthetic data will be largely determined by how well the synthetic images represent real images. As such, we show that the simulated images look realistic enough and we expect better transference when predicting real images. To validate our methodology, Convolutional Neural Networks (CNNs) are used to train multiple models for visual inspection using synthetic data generated by the aforementioned 3D rendering software for the following applications:
Flange integrity
Classifying lever positions (open/closed)
As visual inspections via robotic technologies such as drones increase in facilities, the need for labeled data for machine learning model training increases so as to enable automated asset integrity. We propose a procedure to streamline the model training process by utilizing synthetic images. This will present cost and time reduction as well as the ability to train models for features with sparse real data.