WHY TREE AND ROOT CAUSE ANALYSIS
INCIDENT INVESTIGATION
WHY INVESTIGATE INCIDENTS
Determine Facts, Avoid Blame
Find Root Causes
Prevent Recurrence
Share Lessons
Meet Legal / Company requirements
Which Incidents?
An Incident could be : Injury Fatality Fire
Major Loss Business interruption
Release
Permit violations Equipment breakdown Legal liability Media attention
Near miss
How to conduct investigations
Incident Occurs
How to conduct investigations
Collect facts
How to conduct investigations
Form investigation team
How to conduct investigations
Develop time line
How to conduct investigations
Identify protective systems
How to conduct investigations
Determine root causes
How to conduct investigations
Develop recommendations
How to conduct investigations
Document investigation
How to conduct investigations
Share lessons learned
When to start investigation
As soon as possible within 48 hours
when it is safe to do so
Gather data as soon as possible
Interviews and written statements
weather , job and process status
written data: permits, JSAs, procedures,
control data, log sheets physical data:parts, equipment, photos, videotapes, sketches
interviews
Conduct interviews as soon as possible comfortable setting - try walking through the plant put interviewee at ease before questioning, explain that your purpose is fact finding, NOT fault finding
interviews
Ask open ended, non-leading questions and listen avoid speculation or implying blame in your questions use timeline and keep notes analyze what is said and obtain agreement close interview and thank interviewee
Incident investigation team
Facilitator Specialists
supervisor
employee/contractor involved
contractors
process operatives
Develop timeline
Use to organise facts - put all known facts on the timeline
- start timeline as far back as needed to identify all potential causes - include all responses to the incident
helps to prevent jumping to conclusions non-intimidating technique, helps focus on facts
Example timeline
date/time
1/4 9:34am 1/4 9:35am 1/4 9:37am 1/4 9:37am
activities
received low flow alarm notified operator operator observed major leak operator activated ESD
Identify protective systems
Definition: any management system or hardware system which reduces the potential for having the incident or the consequences of the incident
Identify protective systems
Shutdown/alarms
inert gas purging fire suppression hazard detection
Procedures training preventative maintenance
permit to work management of change PPE
emergency block valves
Root Causes
Root Cause : the most basic cause(s) that can reasonably be identified and that we have control to fix
Type of Causes
PHYSICAL - equipment or device changes or fails
HUMAN - human action or lack of action
SYSTEM - processes failed to support desirable human action
The Why Tree
Has many different forms namely:
block diagrams logic diagrams spread sheet
Why Tree Construction
Use timeline to determine primary event at the top of the tree
Identify the actions or conditions (and failed protections) which caused the primary event
Identify protective systems (continued)
Brainstorm all of the PHYSICAL causes (or causal factors) which reasonably could have caused the initial actions or conditions
Systematically rule out possible physical causes
Identify protective systems (continued)
Identify the HUMAN causes for each possible physical cause identify the system causes for each human cause: -why did the inappropriate action occur?
-what protective system failed to work to allow the action to occur?
Verify Each Causal factor
Visual Test / data Expert theory Conventional wisdom
When to Stop Asking Why
When you reach a normal condition or
when a system cause which we can fix has been identified
recommendations
Agree on how to eliminate the immediate hazard
Develop actionable recommendations for each root cause Prioritize recommendations based on the potential for eliminating the incident in the future
Recommendations (continued)
Assign responsibility for each recommendation Implement and track status to completion
Protective System Review
Confirm that why tree includes protective systems identified
document
Team Membership Investigation Results
-summary or incident -timeline -root causes (including how they were determined)
follow up recommendations Communicate lessons learned
Recommended Categories for Why Tree
Management commitment
Hazard analysis & risk based decision making procedures and safe work practices communications
Designs & reviews
pre-startup safety review inspection / quality control training
Recommended Categories for Why Tree (continued)
Preventative maintenance & repeat failure
human factors emergency response
Incident and near miss investigations contractor safety
management of change
audits
How to build an Events & Causal Factors Chart
1
Incident
Decide what is being investigated
Establish a sequence of "events"
What are causal factors & conditions for each event?
Very simple Events & Causal Factors chart example
1 - 4 causal factors/conditions reasons for an event or amplifying information
EVENTS progress from left to right, actions that describe what happened during incident
Reason for investigation
PERSON WALKS HOME
STEPS IN A HOLE
SPRAINS ANKLE
LEAVES LATE AFTER DARK 1
NO BARRICADE OR MARKINGS FOR HOLE 3
DECIDES TO TAKE A NEW ROUTE (SHORTCUT) 2
STREETLIGHT BROKEN 4
Exercise 1 Lube oil spill
A spill of 18 gallons of lube oil resulted from the equipment failure, specifically the failure of a 3/4 pipe nipple. The failure occurred due to excessive weight and size of the gauge installed on the nipple and vibration when fluid flows through the line. Excessive stress was placed on the nipple, resulting in metal fatigue. The gauge was not part of the original design detail and Management of Change was not used when the gauge was installed.
Lube oil spill
Lube oil spill occurs 3/4" nipple failed Metal fatigue
AND
Excessive weight and inadequate nipple Piping detail was not designed Or installed properly Design standards Not understood Vibration
Normal Condition
Installed as a field change Without proper review
MOC was not followed
MOC process not In place
Exercise 2 lube oil fire
On 19/5, a new Waukeshaw engine was installed for the main generator on a high priority. The job was inspected and turned over to operations on the 20/5. At 2:20PM, operations started the new engine. At 2:25PM, operations saw lube oil spraying out of a dresser coupling on the oil [Link] oil contacted the bare exhaust piping and ignited.
Exercise 2 lube oil fire (continued)
The fire was immediately extinguished with a hand held extinguisher in the area. The investigation indicated that the dresser coupling had been on the original equipment. The new engine had a piping arrangement that was a little shorter than the original engine. Consequently, a proper seal was not obtained when the piping was connected. This was not noticed by the mechanics who were being pushed to complete the job.
Lube oil fire
Lube oil fire
AND
Lube oil leaked from Dresser coupling
Oil contacted hot Exhaust piping
AND
Old coupling did Not fit new engine and proper Seal not obtained New engine slightly Different-not a "change In kind"
NORMAL CONDITION
Mechanics did Not notice
No testing done prior to Turnover to operations
Management of change process did not discover problem, or no MOC process OR
Production given greater emphasis than proper installation and testing
Mechanics not required to complete testing prior To turnover