Don’t stop your RCA half-way through. Turbo-charge your learning.
In our previous article, we went through the process of identifying indirect causes. For many people, their failure investigation stops there. How many times have you heard things like:
“The root cause was that the maintainer didn’t properly tighten the bolt.”
“The root cause was that the maintainer didn’t do the inspection properly.”
“The root cause was that there was no subsequent notice raised to manage the defect found in the condition monitoring report.”
There’s a major problem with these sorts of statements: none of them are root causes. At best, they’re indirect causes – factors that are specific to the failure being investigated. If you fix these causes, you might stop this specific failure from recurring. But what about other, similar failures: where’s the turbo-charged learning? We talked about this in an earlier case study: if we’ve missed one condition monitoring task, what else have we missed? And, is there a deeper reason for why we’re missing these condition monitoring tasks?
Asking these questions in an RCA is about looking for systemic (or root) causes. Root causes are the fundamental factors in an organisation that enable the indirect causes to occur. They are often deep-seated, sometimes difficult to measure or quantify, and can take a lot of work to correct. They’re also confronting when you find them (or alternatively, they’re used as opportunities to pass the blame to someone else and avoid responsibility yourself). But when you fix them, the benefits can flow through to many other parts of your business.
The process of identifying root causes is the same as for the other parts of our model (below). We’ve classified them into four systemic causes that you can work through: we’ll briefly explain each below.
Inadequate Individual Capabilities
Capabilities refers to a combination of knowledge, skills and attitude that lead to an action being performed incorrectly (or completely omitted). Key questions to ask are:
Knowledge - Did the people involved know what to do/what was expected of them? (For example, did they know they had to inspect the failed component, and did they know what the limits for “good” and “bad” were?)
Skills - Did the people involved have the technical skills to do what was expected of them? (For example, welding skills, aligning a drive train etc).
Attitude - Did the people involved care whether they did a good job or not? (This one’s a lot rarer than people think).
To be clear, this is not about blaming individuals for the failure. Deficiencies in individual capabilities can’t exist without a deficiency in the organisation itself. It’s important to look at aspects of training, leadership etc that have contributed to the individual’s actions.
Inadequate Organisational Capabilities
This is the confronting part – looking at how the organisation itself works. There are certain fundamental factors that explain why individuals behave they way they do, and whether they have the capabilities, resources and information to do their job. We’ve learned to look for three key factors:
Culture – Is the team committed to a culture of quality and mutual accountability? Are they committed to the principles of proactive maintenance? Or are they reactive, building a culture of tolerating defects and using excuses like not having perfect service sheets?
Leadership – Are leaders (at all levels of the organisation) contributing to an effective culture by setting clear expectations and holding their people to account? Are they setting their teams up for success with clear communication, including effective pre-shift discussions? Or are they contributing to a reactive culture that tolerates defects and short cuts?
Alignment – Does the organisation’s structure and performance targets promote co-operation and sharing of information? We’ve found there is frequently misalignment between operations and maintenance, maintenance and engineering/reliability teams, maintenance and supply/procurement. Different goals and priorities, processes that are often misunderstood between teams, all make it difficult to align on a common approach.
Inadequate Systems of Work
Systems of work simply means the processes, information and business systems that describe how an organisation works. Key factors to look for include:
Processes and Task Instructions – We’ve found organisations go in one of two directions: they’re either insufficiently detailed to explain what to do, or they go overboard and produce documents that are too detailed (frequently a problem with PM checklists).
Work Management and the CMMS – It’s critical to have effective planning and scheduling to enable maintenance execution, which in turn requires effective management of the CMMS. Master data needs to be kept up to date, and defects and subsequent work orders must be raised consistently, correctly, and with sufficient detail.
Documentation Management – Closely related to keeping your CMMS up to date is ensuring that your maintainers have accurate information on the equipment. This includes your manuals, engineering drawings, and especially any temporary changes like bridging out sensors.
Operational Readiness – Does the business ensure their assets are supported from Day 1? We’ve found this is a common problem across the industry – assets are supplied without information, maintenance strategies, spares etc.
Management of Change – Does the business plan, communicate, execute and record their changes and improvement tasks effectively? We’ve found two main problems here, either sites don’t track their improvement tasks at all, or they do them quickly without properly communicating to the team or documenting the changes so they’re sustainable.
In our experience, looking at the available resources should be the last part of the failure investigation process, because it’s all too easy to shift blame onto your tools (just like the old proverb!) However, there are a couple of factors to look at:
Facilities, Tools and Equipment – In some cases, having the correct tools is essential. For example, the difference in aligning a drive train with a laser aligner as opposed to feeler gauges is significant. Likewise, a decent lubrication storage and distribution system makes life much easier by preventing contamination.
Technology – We’re seeing the growth of sensors, connected mobile devices such as tablets, and other technology, that simplifies and improves the way we inspect and maintain assets.
Again, a skilled workforce with an effective culture can often achieve great reliability without these resources. However, giving your maintainers the tools they need to do their job increases the likelihood of consistent quality by minimising the impact of variation in their skills and reducing the prospect of human error.
Using the Model
Getting to the root cause is never easy. However, we’ve learned to apply the following principles:
In many cases, root causes are about the absence of something that should be in place. It’s essential to stick to the evidence, but in some cases, you’ll find nothing. That’s normally a good indication you’ve found a problem, but ensure you’re thorough in talking to people to make sure that someone, somewhere, doesn’t have the piece of evidence you’re looking for.
Be respectful, but firm, in your conclusions. It is highly likely that someone will be upset when you identify your root causes. That’s why we tend to stay away from looking for them. However, making big improvements require confronting these fundamental issues. If you lay out your conclusions objectively, sticking to the evidence and avoiding criticising individuals, most people will accept what you say, even if they don’t like it.
An RCA is only of value when it feeds your defect elimination process; there needs to be improvements that address each of the indirect and root causes. Indirect causes are generally easy to fix – just change a tactic or service sheet, or update a drawing or master data. Process changes are also fairly straightforward. But it’s the organisational factors, especially culture, that are hard to change.
Changing a maintenance culture is never easy, it requires constant effort, especially from leaders. It’s also hard to measure, and hard to put in place actions that fit the “SMART” model. We’ve found that working agreements, tied to effective pre-starts that focus on quality as well as safety, can be successful in sustaining reliability improvements by building a culture of mutual accountability.
To illustrate the value of getting to the root cause, we have one more case study. Hopefully the lessons we’ve learned over the years in conducting RCA’s are valuable, and we continue to learn every time we work with a client.