This project was done as a part of a ConocoPhilips challenge for 2019 Texas A&M University Datathon competition. The task is to predict which equipment will fail on the surface (coded 0) or underground (coded 1) based on the sensor data.
One of the biggest challenges in this project was the imbalance of classes (1:60 ratio of class 1 to class 0). We resolved it by bootstrapping the underrepresented class.
Another challenge was that the difference in data types: some sensor data were presented over time and some as singular measurements. We averaged the over-time measurements to create a singular measurement.
The predictions made by our model achieved 98% accuracy in predicting underground failures on ConocoPhilips’ data.