PRO-0025-BM-Reliability Playbook
PRO-0025-BM-Reliability Playbook
TABLE OF CONTENTS
OBJECTIVE .......................................................................................................................................................... 2
SCOPE ................................................................................................................................................................. 2
APPLICATION ...................................................................................................................................................... 2
REFERENCES ....................................................................................................................................................... 2
DEFINITION ......................................................................................................................................................... 2
1.0 ROOT CAUSE ANALYSIS OVERVIEW ............................................................................................................. 3
1.1 Raise RCA request..................................................................................................................................... 4
1.2 Approve/Reject RCA request.................................................................................................................... 4
1.3 Decide RCA team leader and members ................................................................................................... 4
1.4 Organize and prioritize evidence collection ............................................................................................. 5
1.5 Collect data & take interviews ................................................................................................................. 6
1.6 Develop a timeline.................................................................................................................................... 7
1.7 Perform the root cause analysis using a logic tree .................................................................................. 8
1.8 Determine recommendations .................................................................................................................. 9
1.9 Review RCA report & recommendation(s) ............................................................................................. 11
1.10 Implement recommendation(s) in ERP ................................................................................................ 11
2.0 GOVERNANCE............................................................................................................................................. 12
2.1 RACI Matrix ............................................................................................................................................. 12
2.1.1 RACI Matrix to conduct RCA ............................................................................................................ 12
2.1.2 RACI Matrix to review RCA process ................................................................................................. 12
2.2 ESCALATION MATRIX .............................................................................................................................. 13
3.0 TOOLS & TEMPLATES ................................................................................................................................. 13
4.0 KPIS ............................................................................................................................................................. 14
1
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
Director:
Department: Base Metals, Planning and Engineering, Asset Performance Management
OBJECTIVE
• Define the root cause analysis process and procedures
• Define the responsibility of the team members involved in root cause analysis
• Guide the site to collect incident information and plan the analysis with all resources and necessary
compliance
• Continuously monitor and evaluate the recommendations of root cause analyses
SCOPE
This document defines the common processes and practices to be adhered by Base Metals to achieve a minimum
expected performance level for requirements set out in principle 8.10. “ A process for VPS technical identifying
gaps, analyzing and preventing failures is applied based on failure modes and difened triggers of losses and
impacts”.
APPLICATION
Vale Base Metals, Maintenance
REFERENCES
ISO 14224 - Collection and exchange of reliability and maintenance data for equipment
DEFINITION
Enterprise Resource Program (ERP): a business process management software system that allows an organization to
use a system of integrated applications to manage the business and automate functions related to technology,
services, production, finance, and human resources etc. Examples: Ellipse, SAP, Oracle etc.
Urgent Work: work on equipment that has failed or is expected to fail within the next 7 days that will negatively
impact health, safety, environment or a production/service activity. Due to its urgency, this work that does not have
the time go through the formal planning and scheduling process.
VPS - Vale Production System: VPS is our management model that sustains Vale’s organizational culture through
engaging people in the pursuit of operational excellence.
Root Cause Analysis (RCA): a systematic process for identifying the root cause of a problem or an event and an
approach for responding to it.
2
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
3
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
Base metal has specified guidance for conducting root cause analysis for breakdown or corrective maintenance
based on local conditions of frequency & duration of breakdowns/corrective maintenance as well as criticality of
equipment. The following criteria should be used for creating a RCA:
• Plant stoppage and production loss
• Equipment breakdown resulting in reduced production
Select the Equipment or Functional Location record on which the RCA Analysis is based. Categorization of RCA based
on severity of event is a good practice to identify RCA duration.
After an RCA has been requested by plant personnel, the RCA request will be assessed by the reliability team
members. The reliability team members will assess the criterion on which the RCA has been raised and confirm the
validity of incident. Further, reliability team members will check if any duplicate RCA request has been approved.
After investigating, reliability team members will accept the RCA for further team formation and RCA investigation.
There are a few key roles for any Root Cause Analysis. One of the most important roles is the reliability team
members, who coordinates all the RCA activities. The reliability team decides to initiate the RCA upon a production,
maintenance incident and picks a team leader to convene and coordinate the RCA. The team leader & reliability
team member usually determine who should be on the team to perform RCA.
The RCA Team meets, assesses the circumstances and causes, and makes recommendations for how to best address
the issue.
The RCA team must have a member that should:
• Have knowledge of RCA methodology
• Remain neutral about the event, showing no bias or pre-determined opinions about the event
4
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
Role Responsibilities
• Identify RCA oppurtunities
• Define the scope and breadth of the RCA effort
• Assign RCA roles and responsibilities
• Monitor the status of open RCAs
Reliability team • Sponsor a human change management plan where needed to ensure support from affected
members positions and departments
• Determine duration of the RCA to be conducted
• Monitor and ensure timely implementation of approved RCA solutions
• Monitor the effectiveness of implemented solutions
The RCA team leader and team members will meet to plan the root cause analysis investigation. The team will
conduct a kick-off meeting to initiate the investigation and identify the evidence to be gathered. The team should
prioritize data collection before any evidence is lost, and specific team members should be assigned that
responsibility.
5
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
Team members will start collecting data to capture initial observations based on the prioritization of evidence. They
should initiate the kick-off meeting to investigate and identify the evidence that needs to be gathered. Team to
prioritize data collection sequence before evidences might be lost.
Capture asset information:
• Asset ID
• Asset description
• Asset type
Conduct interviews:
• Identify stakeholders, who to interview to get a detailed sequence of events
• Perform interviews of stakeholders to get accurate and complete information to understand the incident
• Record all the events along with their timing
6
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
In order to perform RCA of the failure in dump truck, following information might be collected:
a) Dumper number: asset number, ERP tag
b) Model and specifications: specification sheet of the truck model
c) Failure History: date and time of breakdowns, breakdown period etc.
d) RCA History: history of the past RCAs conducted, their recommendations and implementation status
e) Maintenance work order history: history of work performed on the truck
f) Component replacement history: history of components replaced in the truck
g) Operator interview: operator version of events
h) Maintenance supervisor interview: maintenance supervisor version of events
Develop a sequence of events, a timeline, based on the data collected. Using the sequence of events, one can
analyze the data, develop hypotheses, and verify hypotheses for the RCA.
A timeline provides a way for the RCA team to view and organize a chain of events prior to the failure event and
identify possible work process issues.
For example(continued):
The event diagram below depicts events leading to a breakdown of a dump truck because of problem in a radiator.
Fig 3: Event diagram leading to breakdown of dump truck due to problem in radiator
7
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
After the sequence of events are known, the RCA team will start analyzing the failure using a logic tree analysis. A
logic tree depicts the start of the failure event, possible failure causes, and hypotheses related to the event itself. In
other words, a logic tree provides a way for the RCA team to organize and record discussion points on the possible
causes of failure events.
After all the failure modes are identified, the investigators will begin the root cause identification. The logic tree gives
structure to the reasoning process by helping the investigators answer questions about why particular factors existed
or occurred. The identification of root causes helps the investigator to determine the reasons the event occurred so
the problems surrounding the occurrence can be addressed.
For example (Continued):
Below logic tree depicts possible causes of overheating in a Dump Truck
8
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
RCA recommendation records can be linked to any equipment record and any functional location record. They may
or may not be the same equipment or functional location subjected to the RCA Analysis.
A recommendation is a suggested solution for preventing or lessening the impact of future failures for the piece of
equipment or location that the RCA team is analyzing. Such as making changes to maintenance schedules or
providing additional training to operators. The RCA analysis team should develop these solutions based on evidence
that was collected and conclusion that were drawn
After a hypothesis is proven true and a root cause is identified, the RCA team is to discuss the possible solutions with
the purpose to evaluate and consolidate all the recommendations in order to avert the failure from recurring.
After the recommendations are recorded, they are consolidated and sent for review and approval.
For example(continued):
An RCA team might identify the root cause of a frequently overheating of dump truck as given in the figure below.
9
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
Fig 6: Sample Root Cause Findings for frequent overheating of dump truck
The RCA team might recommend following actions to eliminate the root causes.
10
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
Once the RCA is carried out, a RCA report is submitted for approval to the maintenance manager or maintenance
superintedent to review the recommendations and evaluate the overall effectiveness of the RCA. The following
factors can be investigated for reviewing the RCA recommendations:
• Is the root cause clearly defined?
• Does the root cause make sense?
• Do actions address the root cause?
• Are the actions sustainable?
• Are recommendations feasible?
• When should the actions be done?
• Who will be responsible person?
Then, the reviewer may approve the RCA recommendations for further implementation in ERP. Or, the reviewer may
send the report back for review in the case of any inadequacy or missing information.
The implementation of the RCA recommendation through the ERP is considered to be the best practice for
effectively tracking and controlling the recurrence of failure. Once the recommendations are approved, they are to
be entered in the ERP as notification for respective equipment or functional locations along with responsible persons
and target dates. The RCA is officially closed after the RCA recommendations are entered in ERP for implementation.
Further, RCA recommendations can be used for developing equipment maintenance startegies.
11
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
2.0 GOVERNANCE
A responsibility assignment matrix, also known as RACI matrix, describes the participation by various roles in
completing tasks or deliverables for a project or business process.
Responsible (R): The person who does the work to achieve the task. They have responsibility for getting the work
done or decision made.
Accountable(A): The person who is accountable for the correct and thorough completion of the task. This is the role
that responsible is accountable to and approves their work.
Consulted (C): The people who provide information for the project and with whom there is two-way communication.
This is usually several people, often subject matter experts.
Informed (I): The people kept informed of progress and with whom there is one-way communication. These are
people that are affected by the outcome of the tasks, so need to be kept up-to-date.
Operation &
RCA Cross
Reliability Reliability team Maintenance
S. No. Process Functional Team
Manager members Superintedent/
Members
Manager
a. Supervise RCA process A R C I
Train the team to
b. A R - C, I
perform RCA
Document RCA
c. I A R I
outcomes
Approve RCA
d. I A C R
recommendations
Raise RCA
e. recommendations in I A R I
SAP
Operation &
RCA Cross
Reliability Reliability team Maintenance
S. No. Process Functional Team
Manager members Superintedent/
Members
Manager
Monitor change in
a. corporate guidelines for A R - C, I
RCA
b. Monitor RCA effectiveness A R - C, I
12
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
A standard RCA report template is to be used for conducting root cause analysis
RCA report: ZBM-Reliability-RCA Report
A standard root cause analysis A3 template is to be used for reporting RCA outcomes in one A3 page.
RCA A3 Template: ZBM-Reliability-RCA-A3 Template
A standard RCA recommendation tracking template is to be used to track RCA status and RCA recommendation
status
RCA & Recommendation Tracking template : ZBM-Reliability-RCA & Recommendation tracking
13
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
4.0 KPIS
A Key Performance Indicator (KPI) determines how well a company meets the operational and strategic goals. The
KPIs are performance measurement tools playing a key role in assessing system efficiency from the point of cost or
quality.
Following KPIs should be monitored for measuring RCA coverage & effectiveness
Type KPI UOM Description Parameters Formulae Data Source
Leading Overdue % Effective asset No. of RCA in A/B • For A –
RCA status management is one of the progress that have • Reliability function’s
major and important tasks crossed due date = A RCA
of maintenance and Total no. of RCA in completion/status
reliability department. RCA progress = B records
helps in identifying major
• For B –
cause of failure and helps
to identify the mitigation • Reliability function’s
procedure. Hence, RCA
measurement of RCA completion/status
coverage becomes very records
important.
Leading Overdue % RCA is a team-based task No. of RCA A/B • For A –
RCA and requires support from recommendations • Reliability function’s
recommen both reliability and that have crossed RCA
dation execution team. RCA helps due date = A completion/status
status to identify the major cause Total no. of open records
of failure and its mitigation RCA
• For B –
steps. Hence, execution of recommendations =
RCA recommendations on B • SAP (T-code IW28)
time is very important
Leading RCA % A measure to completion No. of RCA A/B • For A –
completio status of RCA recommendations • Reliability function’s
n status recommendation to avert that have been RCA
re-occurrence of failure cloased = A completion/status
therby ensuring risk is Total no. of RCA records
minimized. recommendations =
• For B –
B
• SAP (T-code IW28)
Lagging Equipment % A measure of equipment Total Operating A/B • For A –
Availability availability based on Hours - Unscheduled • 1.SAP (T-code
proportion of Downtime Hours - IW28)
planned/scheduled and Scheduled
• 2. Reliability
unplanned/emergency Downtime Hours) =
function’s RCA
downtime as compared to A
completion/status
total operating hours Total operating
records
available (for production, hours = B
manufacturing, service etc) • For B –
(To be calculated for
- typically a 36-month equipment on which • Reliability function’s
rolling average RCA
14
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019
15