Sometimes, the evidence confronts you with a reality that is completely different from what you want to see.
We recently wrote an article about getting to systemic (root) causes in your failure investigations. In this case study, we’ll look at an investigation where the evidence pointed to a conclusion that was very different from what the team was hoping to find, and shows how identifying root causes can be very confronting for the people involved.
A few years ago, we were asked to investigate the structural failure of a conveyor. The conveyor was down for a couple of weeks, impacting production the whole time, so it was obviously a major event that the client wanted to understand.
When we arrived at site, several of the maintenance team members informed us that the conveyor failed from fatigue due to overloading. However, there had been no failure analysis done on the structure – it was simply repaired, and the conveyor put back to work. We had no evidence other than photographs taken at the time, which appeared to show a significant amount of corrosion.
Without knowing the failure mode for certain, we had to run the investigation a little differently. After collecting all the information we could, including discussions with maintenance team members, maintenance history, inspection sheets and operating data, two facts emerged:
These facts suggested that, although we didn’t know the failure mode for certain (corrosion fatigue was a good bet but not provable), it was possible for the team to have detected the cracks before failure. The next question to understand was why they hadn’t detected the cracks.
The investigation found the following indirect causes:
All of these causes were fixable, and would have prevented this failure happening again, but what about all the other structures? This is an example of where digging deeper to find systemic causes and turbo-charging your learning is essential.
The investigation found the following systemic causes for the failure:
Behind these causes was an over-arching organizational factor: the site’s structure for managing and maintaining structures was not aligned. The engineering team were charged with carrying out the inspections, whilst the maintenance team were charged with completing the repairs. The engineering team weren’t included in the work management process, so their inspections weren’t being planned, executed properly, or completed with subsequent defects raised. If the maintenance team did find defects themselves, they wouldn’t engage with the engineering team to get advice and input on designing and planning the repairs.
Each team’s goals were also misaligned – the engineering team didn’t have availability or reliability in their KPI’s, so they had no incentive to ensure repairs were complete. Conversely, the maintenance’s team primary metric was schedule completion and backlog, so they were happy not to have the structural work orders in the CMMS.
When we presented these findings to the site team, they were obviously taken aback, since the findings had nothing to do with overloading the conveyor. Our findings, particularly the root causes, went into areas they weren’t prepared for, and some team members were unwilling to accept them, or otherwise reacted badly. In this instance, we’d compiled a fairly extensive amount of evidence, so we were able to take them through each piece until they were willing (although not exactly happy) to accept the recommendations for change.
Had we accepted the initial information at face value, we would have completely missed the true cause of the failure. And if we’d stopped at the indirect causes, we would not have helped the site to bring about the organizational and process changes that they needed to improve all their structures, not just the one in question.
By Matthew Grant