100% found this document useful (1 vote)
149 views15 pages

PRO-0025-BM-Reliability Playbook

This document defines the root cause analysis process for a base metals company. It outlines raising an RCA request when key triggers are met, such as production losses or equipment failures. An RCA team is then formed, led by a team leader, to collect data, analyze timelines and potential causes, develop hypotheses, and determine recommendations. The goals are to identify root causes of failures in order to prevent reoccurrences and implement recommended solutions through monitoring and assessment of their effectiveness. The process aims to comply with industry standards for reliability and maintenance data collection and analysis.

Uploaded by

milton
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
149 views15 pages

PRO-0025-BM-Reliability Playbook

This document defines the root cause analysis process for a base metals company. It outlines raising an RCA request when key triggers are met, such as production losses or equipment failures. An RCA team is then formed, led by a team leader, to collect data, analyze timelines and potential causes, develop hypotheses, and determine recommendations. The goals are to identify root causes of failures in order to prevent reoccurrences and implement recommended solutions through monitoring and assessment of their effectiveness. The process aims to comply with industry standards for reliability and maintenance data collection and analysis.

Uploaded by

milton
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Root Cause Analysis

PGS-XXXXX, Rev. 00 13/09/2019

TABLE OF CONTENTS
OBJECTIVE .......................................................................................................................................................... 2
SCOPE ................................................................................................................................................................. 2
APPLICATION ...................................................................................................................................................... 2
REFERENCES ....................................................................................................................................................... 2
DEFINITION ......................................................................................................................................................... 2
1.0 ROOT CAUSE ANALYSIS OVERVIEW ............................................................................................................. 3
1.1 Raise RCA request..................................................................................................................................... 4
1.2 Approve/Reject RCA request.................................................................................................................... 4
1.3 Decide RCA team leader and members ................................................................................................... 4
1.4 Organize and prioritize evidence collection ............................................................................................. 5
1.5 Collect data & take interviews ................................................................................................................. 6
1.6 Develop a timeline.................................................................................................................................... 7
1.7 Perform the root cause analysis using a logic tree .................................................................................. 8
1.8 Determine recommendations .................................................................................................................. 9
1.9 Review RCA report & recommendation(s) ............................................................................................. 11
1.10 Implement recommendation(s) in ERP ................................................................................................ 11
2.0 GOVERNANCE............................................................................................................................................. 12
2.1 RACI Matrix ............................................................................................................................................. 12
2.1.1 RACI Matrix to conduct RCA ............................................................................................................ 12
2.1.2 RACI Matrix to review RCA process ................................................................................................. 12
2.2 ESCALATION MATRIX .............................................................................................................................. 13
3.0 TOOLS & TEMPLATES ................................................................................................................................. 13
4.0 KPIS ............................................................................................................................................................. 14

1
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

Director:
Department: Base Metals, Planning and Engineering, Asset Performance Management

OBJECTIVE
• Define the root cause analysis process and procedures
• Define the responsibility of the team members involved in root cause analysis
• Guide the site to collect incident information and plan the analysis with all resources and necessary
compliance
• Continuously monitor and evaluate the recommendations of root cause analyses

SCOPE
This document defines the common processes and practices to be adhered by Base Metals to achieve a minimum
expected performance level for requirements set out in principle 8.10. “ A process for VPS technical identifying
gaps, analyzing and preventing failures is applied based on failure modes and difened triggers of losses and
impacts”.

APPLICATION
Vale Base Metals, Maintenance

REFERENCES
ISO 14224 - Collection and exchange of reliability and maintenance data for equipment

DEFINITION
Enterprise Resource Program (ERP): a business process management software system that allows an organization to
use a system of integrated applications to manage the business and automate functions related to technology,
services, production, finance, and human resources etc. Examples: Ellipse, SAP, Oracle etc.
Urgent Work: work on equipment that has failed or is expected to fail within the next 7 days that will negatively
impact health, safety, environment or a production/service activity. Due to its urgency, this work that does not have
the time go through the formal planning and scheduling process.
VPS - Vale Production System: VPS is our management model that sustains Vale’s organizational culture through
engaging people in the pursuit of operational excellence.
Root Cause Analysis (RCA): a systematic process for identifying the root cause of a problem or an event and an
approach for responding to it.

2
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

1.0 ROOT CAUSE ANALYSIS OVERVIEW


A Root Cause Analysis is the process of analyzing the circumstances associated with the design, manufacture,
installation, use and maintenance that have led to a failure. This can include, but is not limited to, deficiencies in
management systems, failures in equipment, or inappropriate human intervention.
When a piece of equipment fails, there could be any number of causes that led up to the failure. The investigation
might find that each failure stems from another higher-level cause. If the root causes remain unidentified and are not
addressed, the failure will likely reoccur.
A root cause analysis may lead to a different conclusion such as component needing to be replaced, equipment
replacement, design changes, or process changes.
Detailed process flow to perform root cause analysis
• Collect data relevant to the failure
• Analyze the data
• Develop hypotheses about the root cause
• Validate or reject the hypotheses until root causes are identified
• Develop recommendations for preventing the failure from reoccurring
• Implement recommended solutions
• Track implemented solutions and assess their effectiveness in eliminating the failure

Fig 1: RCA process flow

3
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

1.1 Raise RCA request

Base metal has specified guidance for conducting root cause analysis for breakdown or corrective maintenance
based on local conditions of frequency & duration of breakdowns/corrective maintenance as well as criticality of
equipment. The following criteria should be used for creating a RCA:
• Plant stoppage and production loss
• Equipment breakdown resulting in reduced production
Select the Equipment or Functional Location record on which the RCA Analysis is based. Categorization of RCA based
on severity of event is a good practice to identify RCA duration.

Base metal RCA trigger points


Asset type Chronic losses Acute losses

Mobile fleet • More than 5 failures on a common • A single unplanned failure of an


component / equipment in 1 year equipment resulting in a production
• Two or more unplanned failures on a impact of $100k
single equipment within 500 hours of • A failure of any major component
operation under consideration for warranty
Fixed plant and Utilities • Two or more of the same unplanned • A single unplanned failure of an
failures on a single equipment within equipment resulting in a production
a year impact of $250k

Fig 2: Base metal RCA trigger points

1.2 Approve/Reject RCA request

After an RCA has been requested by plant personnel, the RCA request will be assessed by the reliability team
members. The reliability team members will assess the criterion on which the RCA has been raised and confirm the
validity of incident. Further, reliability team members will check if any duplicate RCA request has been approved.
After investigating, reliability team members will accept the RCA for further team formation and RCA investigation.

1.3 Decide RCA team leader and members

There are a few key roles for any Root Cause Analysis. One of the most important roles is the reliability team
members, who coordinates all the RCA activities. The reliability team decides to initiate the RCA upon a production,
maintenance incident and picks a team leader to convene and coordinate the RCA. The team leader & reliability
team member usually determine who should be on the team to perform RCA.
The RCA Team meets, assesses the circumstances and causes, and makes recommendations for how to best address
the issue.
The RCA team must have a member that should:
• Have knowledge of RCA methodology
• Remain neutral about the event, showing no bias or pre-determined opinions about the event

4
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

• Have no preconceived ideas of causal factors of the event


• Know the organizational history of the processes
• Have quality improvement knowledge and skills
• Not be directly involved in the incident
• Have enough members from other departments to allow for a broad perspective
It is important to have a balanced team because too few members will not provide a comprehensive perspective.
Therefore, a team of four to eight members works for most instances.

Role Responsibilities
• Identify RCA oppurtunities
• Define the scope and breadth of the RCA effort
• Assign RCA roles and responsibilities
• Monitor the status of open RCAs
Reliability team • Sponsor a human change management plan where needed to ensure support from affected
members positions and departments
• Determine duration of the RCA to be conducted
• Monitor and ensure timely implementation of approved RCA solutions
• Monitor the effectiveness of implemented solutions

• Organize the evidence


• Prioritize the data collection
• Interview stakeholders and other witnesses to the event
RCA team leader • Perform root cause analysis
• Review and identify recommendations
• Plan meetings
• Track the RCA progress status
• Maintain a fundamental understanding of the RCA methodology employed
• Participate as an RCA team member when called upon
• Apply their own specific knowledge, skills, and expertise in identifying incident causes
RCA cross-functional • Assist in identifying and gathering incident evidence as directed
• Participate in the solution identification process
team members
• Collect data to initiate analysis
(Reliability, Operation &
• Investigate the incident
Maintenance)
• Take interview of stakeholders and other witnesses to the event
• Perform root cause analysis
• Identify recommendations
• Implement recommendations in ERP
Operation and • Approve (or not) recommended RCA solutions for implementation including prioritization and
Maintenance resource allocation
Superintedent/manager • Review and approve RCA

1.4 Organize and prioritize evidence collection

The RCA team leader and team members will meet to plan the root cause analysis investigation. The team will
conduct a kick-off meeting to initiate the investigation and identify the evidence to be gathered. The team should
prioritize data collection before any evidence is lost, and specific team members should be assigned that
responsibility.

5
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

1.5 Collect data & take interviews

Team members will start collecting data to capture initial observations based on the prioritization of evidence. They
should initiate the kick-off meeting to investigate and identify the evidence that needs to be gathered. Team to
prioritize data collection sequence before evidences might be lost.
Capture asset information:
• Asset ID
• Asset description
• Asset type

Capture event information:


• Event start date
• Event end date
• Event narrative
• Event safety impact, environmental impact, and/or production loss
• Maintenance and Operational cost

Conduct site visit:


• Collect initial observations
• Take photos
• Record failure observations

Conduct interviews:
• Identify stakeholders, who to interview to get a detailed sequence of events
• Perform interviews of stakeholders to get accurate and complete information to understand the incident
• Record all the events along with their timing

Collect historical data & documents:


• Collect historical equipment and location data from ERP
• Collect past root cause analyses conducted on the equipment
• Review any pending recommendations on equipment or functional location
• Gather current equipment performance records, data sheets, relevant PFDs & PIDs
Focus must be on collecting data to understand and record “what happened” before analyzing why it happened
For example(continued):
A dump truck breakdown qualified for RCA because of frequent overheating problem.

RCA to be performed because of frequent


overheating problems on Dump Truck 123 causing a
loss of 1 kt of ore production in September 2019.

6
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

In order to perform RCA of the failure in dump truck, following information might be collected:
a) Dumper number: asset number, ERP tag
b) Model and specifications: specification sheet of the truck model
c) Failure History: date and time of breakdowns, breakdown period etc.
d) RCA History: history of the past RCAs conducted, their recommendations and implementation status
e) Maintenance work order history: history of work performed on the truck
f) Component replacement history: history of components replaced in the truck
g) Operator interview: operator version of events
h) Maintenance supervisor interview: maintenance supervisor version of events

1.6 Develop a timeline

Develop a sequence of events, a timeline, based on the data collected. Using the sequence of events, one can
analyze the data, develop hypotheses, and verify hypotheses for the RCA.
A timeline provides a way for the RCA team to view and organize a chain of events prior to the failure event and
identify possible work process issues.
For example(continued):
The event diagram below depicts events leading to a breakdown of a dump truck because of problem in a radiator.

Fig 3: Event diagram leading to breakdown of dump truck due to problem in radiator

7
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

1.7 Perform the root cause analysis using a logic tree

After the sequence of events are known, the RCA team will start analyzing the failure using a logic tree analysis. A
logic tree depicts the start of the failure event, possible failure causes, and hypotheses related to the event itself. In
other words, a logic tree provides a way for the RCA team to organize and record discussion points on the possible
causes of failure events.

DESCRIBE THE EVENT


The failure event is placed on top (the
safety, environmental, production impact),
followed by all failure modes and possible
causes of breakdowns

DESCRIBE FAILURE MODES


Each of the causes are hypothesis that
needs to be verified so that the causes
leading to the problem can be identified

DETERMINE ROOT CAUSE


The next step consists of determining and
verifying the physical roots, human roots
and system roots behind the failure

Fig 4: Sample logic tree

After all the failure modes are identified, the investigators will begin the root cause identification. The logic tree gives
structure to the reasoning process by helping the investigators answer questions about why particular factors existed
or occurred. The identification of root causes helps the investigator to determine the reasons the event occurred so
the problems surrounding the occurrence can be addressed.
For example (Continued):
Below logic tree depicts possible causes of overheating in a Dump Truck

8
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

Fig 5: Sample logic tree for overheating problem in Dump Truck

1.8 Determine recommendations

RCA recommendation records can be linked to any equipment record and any functional location record. They may
or may not be the same equipment or functional location subjected to the RCA Analysis.
A recommendation is a suggested solution for preventing or lessening the impact of future failures for the piece of
equipment or location that the RCA team is analyzing. Such as making changes to maintenance schedules or
providing additional training to operators. The RCA analysis team should develop these solutions based on evidence
that was collected and conclusion that were drawn
After a hypothesis is proven true and a root cause is identified, the RCA team is to discuss the possible solutions with
the purpose to evaluate and consolidate all the recommendations in order to avert the failure from recurring.
After the recommendations are recorded, they are consolidated and sent for review and approval.
For example(continued):
An RCA team might identify the root cause of a frequently overheating of dump truck as given in the figure below.

9
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

Fig 6: Sample Root Cause Findings for frequent overheating of dump truck

The RCA team might recommend following actions to eliminate the root causes.

Fig 7: Sample RCA recommendations for frequent overheating of dump truck

10
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

1.9 Review RCA report & recommendation(s)

Once the RCA is carried out, a RCA report is submitted for approval to the maintenance manager or maintenance
superintedent to review the recommendations and evaluate the overall effectiveness of the RCA. The following
factors can be investigated for reviewing the RCA recommendations:
• Is the root cause clearly defined?
• Does the root cause make sense?
• Do actions address the root cause?
• Are the actions sustainable?
• Are recommendations feasible?
• When should the actions be done?
• Who will be responsible person?
Then, the reviewer may approve the RCA recommendations for further implementation in ERP. Or, the reviewer may
send the report back for review in the case of any inadequacy or missing information.

1.10 Implement recommendation(s) in ERP

The implementation of the RCA recommendation through the ERP is considered to be the best practice for
effectively tracking and controlling the recurrence of failure. Once the recommendations are approved, they are to
be entered in the ERP as notification for respective equipment or functional locations along with responsible persons
and target dates. The RCA is officially closed after the RCA recommendations are entered in ERP for implementation.
Further, RCA recommendations can be used for developing equipment maintenance startegies.

11
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

2.0 GOVERNANCE

2.1 RACI Matrix

A responsibility assignment matrix, also known as RACI matrix, describes the participation by various roles in
completing tasks or deliverables for a project or business process.
Responsible (R): The person who does the work to achieve the task. They have responsibility for getting the work
done or decision made.
Accountable(A): The person who is accountable for the correct and thorough completion of the task. This is the role
that responsible is accountable to and approves their work.
Consulted (C): The people who provide information for the project and with whom there is two-way communication.
This is usually several people, often subject matter experts.
Informed (I): The people kept informed of progress and with whom there is one-way communication. These are
people that are affected by the outcome of the tasks, so need to be kept up-to-date.

2.1.1 RACI Matrix to conduct RCA

Operation &
RCA Cross
Reliability Reliability team Maintenance
S. No. Process Functional Team
Manager members Superintedent/
Members
Manager
a. Supervise RCA process A R C I
Train the team to
b. A R - C, I
perform RCA
Document RCA
c. I A R I
outcomes
Approve RCA
d. I A C R
recommendations
Raise RCA
e. recommendations in I A R I
SAP

2.1.2 RACI Matrix to review RCA process

Operation &
RCA Cross
Reliability Reliability team Maintenance
S. No. Process Functional Team
Manager members Superintedent/
Members
Manager
Monitor change in
a. corporate guidelines for A R - C, I
RCA
b. Monitor RCA effectiveness A R - C, I

12
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

2.2 ESCALATION MATRIX

Escalation Matrix for Root Cause Analysis


Process Responsibility Level 1 Level 2
Delay in collection of data RCA Team Operation & Maintenance
Reliability Manager
related to RCA Leader Superintedent/Manager
RCA Team Operation & Maintenance
Reliability Manager
Delay in completion of RCA Leader Superintedent/Manager
Unavailability of member(s) in RCA Team Operation & Maintenance
Reliability Manager
RCA meeting Leader Superintedent/Manager
Reliability Team Operation & Maintenance
Reliability Manager
Requirement of special training Members Superintedent/Manager
Operation and
Operation & Maintenance
Execution of RCA Maintenance Reliability Manager
Superintedent/Manager
recommendation members

3.0 TOOLS & TEMPLATES


Root cause analysis is an exhaustive process to evaluate the underlying key factors of equipment failure and to
identify mitigation steps in order to increase the equipment availability. During root cause analysis, multiple
stakeholders are involved, failure causes are identified, and recommendations are suggested and recorded.
To ensure a consistent approach for conducting RCA across the facilities as per the guidelines defined in this
document, it is important to have a standard RCA report.

A standard RCA report template is to be used for conducting root cause analysis
RCA report: ZBM-Reliability-RCA Report

A standard root cause analysis A3 template is to be used for reporting RCA outcomes in one A3 page.
RCA A3 Template: ZBM-Reliability-RCA-A3 Template

A standard RCA recommendation tracking template is to be used to track RCA status and RCA recommendation
status
RCA & Recommendation Tracking template : ZBM-Reliability-RCA & Recommendation tracking

13
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

4.0 KPIS
A Key Performance Indicator (KPI) determines how well a company meets the operational and strategic goals. The
KPIs are performance measurement tools playing a key role in assessing system efficiency from the point of cost or
quality.
Following KPIs should be monitored for measuring RCA coverage & effectiveness
Type KPI UOM Description Parameters Formulae Data Source
Leading Overdue % Effective asset No. of RCA in A/B • For A –
RCA status management is one of the progress that have • Reliability function’s
major and important tasks crossed due date = A RCA
of maintenance and Total no. of RCA in completion/status
reliability department. RCA progress = B records
helps in identifying major
• For B –
cause of failure and helps
to identify the mitigation • Reliability function’s
procedure. Hence, RCA
measurement of RCA completion/status
coverage becomes very records
important.
Leading Overdue % RCA is a team-based task No. of RCA A/B • For A –
RCA and requires support from recommendations • Reliability function’s
recommen both reliability and that have crossed RCA
dation execution team. RCA helps due date = A completion/status
status to identify the major cause Total no. of open records
of failure and its mitigation RCA
• For B –
steps. Hence, execution of recommendations =
RCA recommendations on B • SAP (T-code IW28)
time is very important
Leading RCA % A measure to completion No. of RCA A/B • For A –
completio status of RCA recommendations • Reliability function’s
n status recommendation to avert that have been RCA
re-occurrence of failure cloased = A completion/status
therby ensuring risk is Total no. of RCA records
minimized. recommendations =
• For B –
B
• SAP (T-code IW28)
Lagging Equipment % A measure of equipment Total Operating A/B • For A –
Availability availability based on Hours - Unscheduled • 1.SAP (T-code
proportion of Downtime Hours - IW28)
planned/scheduled and Scheduled
• 2. Reliability
unplanned/emergency Downtime Hours) =
function’s RCA
downtime as compared to A
completion/status
total operating hours Total operating
records
available (for production, hours = B
manufacturing, service etc) • For B –
(To be calculated for
- typically a 36-month equipment on which • Reliability function’s
rolling average RCA

14
Root Cause Analysis
PGS-XXXXX, Rev. 00 13/09/2019

Type KPI UOM Description Parameters Formulae Data Source


RCA has been completion/status
conducted) records

15

You might also like