Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
100%
(1)
100% found this document useful (1 vote)
532 views
Practical Root Cause Failure Analysis
Uploaded by
narendra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Practical Root Cause Failure Analysis For Later
Download
Save
Save Practical Root Cause Failure Analysis For Later
100%
100% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
100%
(1)
100% found this document useful (1 vote)
532 views
Practical Root Cause Failure Analysis
Uploaded by
narendra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Practical Root Cause Failure Analysis For Later
Carousel Previous
Carousel Next
Save
Save Practical Root Cause Failure Analysis For Later
100%
100% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 25
Search
Fullscreen
CRC FOCUS SERIES Reliability, Maintenance, and Safety Engineering A Practical Field View on Getting Work Done Effectively PRACTICAL ROOT CAUSE FAILURE ANALYSIS Key Elements, Case Studies, and Common Equipment Failures Randy Riddell @ CRC Press eeePractical Root Cause Failure Analysis Root Cause Failure Analysis (RCFA) is a method used by maintenance and reliability industry professionals as one of the key tools to drive improvement. This book offers a quick guide to the applications involved in performing a successful RCFA by providing a foundational view of maintenance and reli- ability strategies. It also highlights the practical applications of RCFA and identifies how to achieve a successful RCFA, as well as discussing common equipment failures and how to solve them. Case studies on topics including pump system failure analysis and vibration analysis are included. This book also: + Suggests examples on how to solve common failures on many types of equipment, including fatigue, pumps, bearings and mechanical power transmission + Highlights practical applications of RCFA. + Identifies key elements for how to achieve a successful RCFA + Presents case studies on topics including pump system failure analy- sis and vibration analysis This book is a must-read for any reliability engineer, particularly mechanical reliability professionals.Reliability, Maintenance, and Safety Engineering: A Practical Field View on Getting Work Done Effectively Series Editor: Robert J. Latino, Reliability Center, Inc., VA This series will focus on the “been there, done that” concept in order to pro- vide readers with experiences and related trade-off decisions that those in the field have to make daily, between production processes and costs no matter what the policy or procedure states. The books in this new series will offer tips and tricks from the field to help others navigate their work in the areas of Reliability, Maintenance and Safety. The concept of ‘Work as Imagined’ and ‘Work as Done’, as coined by Dr. Erik Hollnagel (an author of ours), this series will bridge the gap between the two perspectives to focus on books written by authors who work on the frontlines and provide trade-off deci- sions that those in the field have to make daily, between production pres- sures and costs...no matter what the policy or procedure states. The topics covered will include Root Cause Analysis, Reliability, Maintenance, Safety, Digital Transformation, Asset Management, Asset Performance Management, Predictive Analytics, Artificial Intelligence, the Industrial Internet of Things, and Machine Learning. Lubrication Degradation Getting into the Root Causes Sanya Mathura and Robert J. Latino Practical Root Cause Failure Analysis Key Elements, Case Studies, and Common Equipment Failures Randy Riddell For more information on this series, please visit: https:/www.routledge. com/Reliability-Maintenance-and-Safety-Engineering-A-Practical-Field- View-on-Getting-Work-Done-Effectively/book-series/CRCRMSEGWDEPractical Root Cause Failure Analysis Key Elements, Case Studies, and Common Equipment Failures Randy RiddellCRC Press. Boca Raton and London First edition published 2022 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OXI4 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2022 Randy Riddell Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the conse. quences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint, Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, repro- duced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www. copyright.com or contact the Copyright Clearance Center, Inc, (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact
[email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-16465-6 (hbk) ISBN: 978-1-032-16466-3 (pbk) ISBN: 978-1-003-24867-5 (ebk) Do 10.1201/9781003248675, Typeset in Times LT Std by codeMantraContents Foreword Preface Author Biography 1 Introduction to Successful Root Cause Failure Analysis (RCFA) 1.1. Troubleshooting vs Failure Analysis 1.2. Defining Failure - 4 Levels of Failure 1.3. Reliability & RCFA 1.4. Predicting Failure 1.5. Maintenance & RCFA, 25 Rights of Successful RCFA 2.1. The Right Systems 2.1.1. RCFA Process Infrastructure 2.1.2. Type of RCFA Process 2.1.2.1. Kepner Tregoe 2.1.2.2. The 5 Whys 2.1.2.3. Failure Mode Based 2.2. Right Resources 2.2.1. People 2.2.2. Budget 2.2.3. Tools 2.3 Right Evidence 2.3.1. Event Investigation 2.3.2. Failure Scene 2.3.3 Failure Type 2.4 Right Analysis 2.4.1 Failure Mode 2.4.2 Possible Causes 2.4.3 Physics of Failure 2.4.4 Failure Patterns 2.4.5. Ending Analysis vil xi Swooans 16 7 18 19 20 21 22 22 23 23 24 24 25 26 29 29 30 31 32 33vi_ Contents 25 Right Corrective Actions 2.5.1. Type of Action Items 2.5.2 Execution of Action Items 2.5.3 Closing the Loop 3 Key Factors in Fatigue Failure 3.1 3.2 33 Fundamentals of Fatigue Failure Case Study 1 — Chain Fatigue Failure Case Study 2 - Conveyor Shaft Failure 4 Key Factors in Bearing Failure 4d 42 43 44 45 46 47 Fundamentals of Bearing Failure Case Study 1 — Combustion Fan Bearing Failure Case Study 2 ~ Suction Roll Bearing Fracture Case Study 3 - Pulper Motor Bearing Failure Case Study 4 — Vacuum Pump Bearing Failure Case Study 5 — Suction Press Roll Internal Bearing Failure Case Study 6 ~ Exhaust Fan Bearing Failure 5 Key Factors in Pump System Failure 5.1 5.2 53 5.4 5.5 5.6 Fundamentals of Pump Failure Case Study 1 — White Water Pump Failure Case Study 2 ~ Stock Dilution Pump Failure Case Study 3 - Loop 1 Dilution Pump Failure Case Study 4 — Rejects Pump System Failure Case Study 5 ~ Screen Feed Pump Failure 6 Key Factors in Mechanical Power Transmission Failure 6.1 62 63 64 65 6.6 Fundamentals of Mechanical Power Transmission Failure Case Study 1 ~ Agitator Belt Drive Failure Case Study 2 — Screen Belt Drive Sheave Failure Case Study 3 ~ Paper Roll Gearbox Failure Case Study 4 — Coating Roll Gearbox Failure Case Study 5 ~ Press Roll Gearbox Failure 7 Key Factors in Bolted Joint Failure 7 72 Index Fundamentals of Bolt Failure Case Study ~ Dryer Journal Bolt Failures 35 36 38 39 41 4) 46 52 57 57 61 65 69 72 75 78 87 87 90 98 102 106 109 17 117 119 123 128 130 135 141 141 146 153Foreword ‘There are many books written about Root Cause Failure Analysis (RCFA), so what makes this one different? As its title conveys (as well as the book series), the difference is PRACTICALITY! This is not a book written by a researcher or an academic, it is written by a seasoned practitioner, Randy Riddell. This means it is based on field experience, where an operational plant in a real-life working environment forms Randy’s “lab.” For those of us fellow RCFA practitioners, it is refreshing to read a per- spective that doesn’t sugarcoat these realities but faces them head on and tells us how they have been overcome, We all are acutely aware that we do not work in perfect settings and not everything we simulate in a lab can be reproduced in real life. For instance, a bearing may run forever in a lab setting, where operat- ing conditions are always ideal for the design of that bearing, and the perfect amount of the correct lubricant is maintained. However, we know the reality is those bearings are going to be exposed to imperfect operating conditions, as well as to imperfect people and systems that will purchase, store, commission, operate and maintain them. This exposure to true reality will dramatically impact the bearing life. These are all variables which didn't exist in the lab, How long that bearing will really last is up to us! Practical RCFA is about using evidence-based, analytical approaches to uncover these realities and implement effective corrective actions that will strengthen the supporting organizational systems to prevent recurrence. But the goal of RCFA itself should not just be to prevent recurrence, but to organi- cally create and share knowledge across an organization. When this does NOT happen, think about the amount of RCFA re-work (costs) that occur simply because we did not know that a problem we have has already been solved, To this end, Randy starts off with laying the firm groundwork for what a holistic RCFA approach looks like. This focuses on the broader systems view, like: 1. The four levels of failure and the appropriate depth of analyses 2. Proactive versus reactive analyses 3. Exploration of physical, human, and latent root causes 4, Where RCFA fits in an effective Reliability system viiviii Foreword 5. The 5 ‘Rights’ of a successful RCFA system 6. Using RCFA to create a knowledge management system (what I call “institutionalizing corporate memory”) It’s one thing to understand principles, but it’s another to successfully apply them, From this point on, Randy practices what he preaches, and demonstrates the application of these principles (the Physics of Failure) across many types of failure, such as key factors in: 1. Fatigue Failure 2. Bearing Failure 3. Pump System Failure 4, Mechanical Powertrain System Failure and 5. Bolted Joint Failure It becomes evident the author is not only an effective RCEA facilitator in the field, but also quite an adept Subject Matter Expert (SME) in the technical pect of the failures he uses for his extensive case studies. This is a unique and valuable combination, Those who are practitioners will find this to be an invaluable reference as they develop their RCFA facilitation skills. Those who support RCFA efforts will truly understand the effort it takes to conduct a proper RCFA, thus giving them a new appreciation for how best to support the analysts. Systems integra- tion professionals will gain a unique perspective of how they can integrate the ful logic from an RCFA into their knowledge management systems, making it easier to share lessons learned across the organization (and grow RCFA knowledge databases). There is much to be gained from properly digesting this information, con- veying it to knowledge and eventually converting it to wisdom. But in the end, it is up to us as to how far along the spectrum we will go. I think this saying is appropriate here, as I view this as a paradigm that all of us in the RCFA business are united in our efforts to defeat: suces We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again! ‘Thanks for this significant contribution to the RCFA field, Randy! Robert J. Latino ~ Principal, Prelical Solutions, LLCPreface The reliability field is one of the most exciting fields to work and can bring great satisfaction in solving some of the most difficult and chronic failures. If you work in maintenance, reliability or engineering for almost any indus- try, root cause failure analysis is a key activity to improve your operation. ‘Modern day industry has molded our manufacturing organizations with roles and responsibilities which embrace and drive RCFA activities. As many have stated before, reliability and especially RCFA are the most powerful tools for uncovering the hidden plant within our current operations, Over the decades, organizations and companies have committed lots of resources to install RCFA systems and to transform their culture into a root cause finding culture. We have purchased training programs around RCFA, built massive databases, collected failure data, enhanced our computerized maintenance management system for failure analysis, added positions in our organizations to lead and manage RCFA systems and processes, and the list goes on and on. All of these are great things and necessary things to move our efforts to high levels of performance; however, it takes even more than all that to produce excellence in RCFA in our plants, Despite all our efforts and the popularity of RFA, the results have not been a universal success. For any mature reliability organization, RCFA is a core function, Additionally, anyone who invests in RCFA wants a successful RCFA, so why doesn’t everyone achieve it? What does a good RCFA look like? In simple terms, a good RCFA finds the root causes and makes the necessary corrections to prevent or insulate future failures from occurring. More importantly, what are the key elements to achieve a successful RCFA? Guessing or being lucky will work sometimes but there is no need to depend on luck or chance for being successful. There are key elements that will greatly increase the chance for successful RCFA, The more key elements that are employed then the higher the chance that the desired outcome will result. This book will look at what successful root cause failure analysis looks like RCFA is a process ~ but mastering a process will not guarantee a suc- cessful RCFA. In three decades of using many different methods for RCFA, I've not seen a silver bullet in any of them. However, there are some good ele~ ments in each of them that can be key in helping to solve failures. This book does not attempt to teach a particular method or process but will utilize somex Preface elements of several methods to combine an overall RCFA effort to achieve RCEA success. While the advancement of the IloT and other condition based monitoring technology has simplified some of the human interaction with our industrial equipment, the need for human-driven failure analysis is still in high demand. Technology can determine the equipment conditions, give some fault indice tions, and provide good RCFA information but it can't determine root cause. Executing successful RCFA is still a human function so we must be equipped to execute it successfully.Author Biography 4 i A native of Kossuth, MS, Mr. Randy Riddell attended Mississippi State University where he received a Bachelor of Science in Mechanical Engineering, He has over 32 years of industrial experience with a career focused on main- tenance and reliability in the paper industry. His formal certifications include Certified Maintenance and Reliability Professional from the Society of Maintenance and Reliability Professionals, Machinery Lubrication Technician from the International Council of Machinery Lubrication, Certified Lubrication Specialist from the Society of Tribology and Lubrication Engineers, and Level 1 Vibration and Pump System Assessment Professional (PSAP) from the Hydraulic Institute, He has published many articles around industrial equip- ment reliability to university engineering settings. cs. He has also been a guest speaker at several industry and xiTaylor & Francis Taylor & Francis Group http://taylorandfrancis.comIntroduction to Successful Root Cause Failure Analysis (RCFA) Root Cause Failure Analysis (RCFA) is a reliability process used to determine the causes of failure of a piece of equipment, component, system or even pro- cess. The goal of RCFA is to find these root causes so that some action can be taken to eliminate reoccurrence or more feasible to reduce the probability of reoccurrence. This goal is the focus of this book to evaluate what are the key elements of successful RCFA not only by looking at general concepts but specific examples on real-world industrial failures. I suspect that learning from failure has been something that humanity has studied since the fall in the Garden, The methodologies have certainly changed over time but the search for making things better has always been out in front of humanity. Tam reminded of a story about a young man who was being mentored by a seasoned businessman. The young man asked what was the key to his success? The businessman said, “Two words, good decisions.” The young man contin- ued to probe and asked, “well, how do I do that?” He answered, “One word, experience.” The young man thought, I can’t stop here so he asked, “Well how do I get that?” The businessman answered again, “Two words — bad decisions.” RCFA turns into success when we learn from not only our bad decisions but the bad decisions of everyone else as well. It probably needs to be said at this point that RCFA has several obstacles that keep us from wanting to learn from failure. For many the focus is on trou- bleshooting and not RCFA (more on that later). For some, firefighting is fun, and firefighting is rewarded by many organizations more than any proactive tasks. Probably the biggest hurdle in learning from failure is to refrain (avoid) from blaming others. RCFA should not involve blaming individuals or groups. Solutions will involve these two areas. While error is part of the human factor, DOI: 10.1201/9781003248675-1 12 Practical Root Cause Failure Analysis it is common among all of us. It has been said, “to err is human and to blame someone else, well that shows management potential.” T’'m not certain as to when RCFA first started being used as a focused activity but | would guess that it became more mainstream following on the heels of the industrial revolution, like so many other engineering and technological advances. One thing is certain, it began out of necessity and after failure. One such early failure occurred with the steam-powered riverboat, the Sultana, On April 27, 1865 on the Mississippi River near Memphis, TN three of four boilers exploded. The Sultana burned to the water line in 15 minutes. Over 1500 people died in the accident. The Civil War had just ended, and the nation was at the start of reconstruction and healing, so any form of RCFA was not completed and the root cause was never determined. In 1867-1868, there were 441 recorded boiler explosions, In 1880, there were 159 boiler explosions. Out of the need to prevent these repeating failures, a group of engineers met in 1880 and founded the American Society of Mechanical Engineers (ASME). ASME codes and standards began being developed for pressure vessels. An obvious lesson here is that unsolved failures will become chronic. Once fail- ures become chronic then they often become more impactful to the business or society and a higher priority to solve. One of the early lessons of the damaging effects of resonance came after a very well-known catastrophic structural failure of the Tacoma Narrows Bridge on November 7, 1940. The structural stiffness of the bridge was such that as the wind blew across it sent the bridge into severe resonance until the bridge collapsed. Much was learned about resonance from that failure. Resonance is not a source of vibration but a magnifier of it. Resonance is also very destruc- tive and must be avoided in our machines and structures to avoid failure. 1.1 TROUBLESHOOTING VS FAILURE ANALYSIS Is there a difference between troubleshooting and RCFA? We see trouble- shooting tables or charts all the time from equipment suppliers. We also see Troubleshoot, Cause, and Correct charts (TCC) but is that really RCFA? The American Heritage Dictionary says a troubleshooter is a worker whose job is to locate and eliminate sources of trouble. Well, that doesn’t help much. While it has been debated, and Im not sure there is an exact correct answer, here are some thoughts to consider. I believe there are two differences between trouble- shooting and RCFA.1 Introduction to Successful Root Cause Failure Analysis (RCFA) 3 1. The end goal of the activity: Troubleshooting has an end goal of correcting the system or process so the line can resume produc- tion, The end goal of RCFA is to find the root causes of component failure and then to complete some action items to either elimi- nate or prevent future failure potential. Troubleshooting focuses on the cause of unscheduled machine downtime while RCFA focuses on improving the probability of future failure events of the same failure mode. Once the machine is back up and running the troubleshooting process is over but the RCFA process on the failed component begins. With the fact that the machine is down, troubleshooting becomes a necessary activity in which the organi- zation does not debate or put off; it must get it done immediately. However, RCFA does not have the urgency by default as does trou- bleshooting, so only mature organizations will be successful in RCEA activities. 2. Analysis taken to the component level: RCFA eventually focuses in on failure at a component level while troubleshooting will only go deep enough to correct the machine malfunction, Troubleshooting is more of a reactive process, while RCFA is a proactive process for reliability improvement. Here is an example of the two reliability tasks applied to a plant situation. The press section of a paper machine has a roll that will not load. The machine is down due to the issue so there is an urgency to commit resources to correct and get the machine back up. The goal is to get the machine back on production. ‘Troubleshooting the problem is the focus of the team. Troubleshooting leads them to several directional valves in the hydraulic system with no solenoid power. A blown fuse is found and replaced. After changing the fuse and pow- ering up, it blows again. Now the directional valve is the focus of the trouble- shooting and is found to be locked up, which is causing the overload. The directional valve is changed out and the machine is started back up and in full production again. The troubleshooting task is now completed, However, the RCFA task has not begun yet. In a purely reactive orga- nization, it will not begin. It is off to the next troubleshooting activity on an urgent failure. The RCFA begins with having some component analy- sis of the locked up hydraulic valve. Inspection of the valve shows abrasive wear. Looking at condition monitoring programs, the last oil analysis for the hydraulic system was over 6 months ago (having skipped three samples) and the trend was showing that the ISO particle count was well above target on the hydraulic system. Further investigation revealed that the oil filter had a substitute part about 3 years earlier to save money and the filter, instead of being a 101, was a 25y. Action items were put in place to systemize the4 Practical Root Cause Failure Analysis oil analysis program and to change the filler element back to the correct filter. Additional training was also given for the technicians who operated and maintained the hydraulic system. Additional approvals were added to the process for changing store stock items to prevent technical mistakes with critical spare parts in the future. Another example would be a suction roll vibration problem that sud- denly occurred on a machine during one holiday weekend. The call came that the entire press section was vibrating severely. The vibration was so intense that it was vibrating loose many of the large (1.5”) machine screws holding key components in the machine. The first part of the troubleshoot- ing process involved determining which part of the machine was causing the vibration, Our vibration analyst was called in to diagnose the vibra- tion on the machine press section, It was determined that the source of the vibration was the suction roll, After slow turning the roll and everyone looking the roll over, we could not find an external defect. However, the vibration showed a high 1X vibration and we suspected there could be a problem with the shell. The decision was made to change the roll. After the lengthy roll-change process, the machine was started back up and wa running smooth again. This troubleshooting process was involved and very detailed. The RCFA on the roll would commence over the next several weeks and the shell was found to be cracked in the middle about halfway around. The shell failed due to fatigue, While in some failures, troubleshooting may be the first part of the process, failure analysis is a separate activity. These examples show the differences in troubleshooting and RCFA. ‘Therefore, it is important to have clarity on the goal of the problem-solving efforts. Both are solving problems but have different goals for the outcome. Both are necessary to any manufacturing organization, Troubleshooting is the floor and successful RCFA is the ceiling when it comes to uncovering the hid- den plant in every 1.2 DEFINING FAILURE — 4 LEVELS OF FAILURE Failure could be thought of as a pyramid of categories where there are four levels of failure with Level 4 being the top and Level | the bottom as shown in Figure 1.11 © Introduction to Successful Root Cause Failure Analysis (RCFA) 5 (‘Following Reliability & Safety ») pyramids as fallure level decrease more events at that level (for every level 4 failure, there maybe 1000 pre-failure Level 4 Functional Failure Level 3 Component Failure Level 2 Hidden Failure (Defects) Level. Pre-Failure Conditions FIGURE 1.1 — Failure level pyramid Level 4 - Functional Failure ‘What is failure or functional failure? One definition is when the asset is unable to fulfill a function to a standard of performance which is acceptable to the user? ‘This is not an emotional approval of the user to decide failure, but itis a matter of the asset design meeting design performance to a clear established standard, If a pump is to deliver 1000gpm (3785lpm) of fluid to the process as designed, then a functional failure would be when it can't deliver 1000gpm to the process. If an operations management decision resulted in wanting to get 1400gpm (5300Ipm) from the pump system, then this would require a system redesign. It is a failure when changing the pump out would restore the function of 1000gpm. If the pump was only able to deliver 700gpm, then it would be a functional failure. The pump may have wear, improper clearances, or some sort of line blockage. A failure in the pump that caused the functional failure would be a component failure. A component failure would be the lowest level of failure, which caused the asset to fail. For this example, it may be the impeller or suction wear plate. If the bearings or mechanical seal failed, these would be a component failure. Successful RCFA must focus the analysis on the compo- nent failure mode,6 Practical Root Cause Failure Analysis In addition to the functional failure example above, the pump could have a locked up bearing or sheared coupling where pump doesn’t turn at all. This may be a breakdown failure where the asset has a broken part. All of these would be a functional failure, but the nature of the failures is more severe when a pump locks up versus wear where the pump has slowly lost performance. Level 3 - Minor Failures Minor failures would be failures where the loss of function of that component has occurred, but the function of the equipment or parent asset is still meeting performance standards. In the case of the pump, a minor failure may be that the seal is now leaking product, Here is a functional failure of the seal (com- ponent functional failure) but the pump is still producing 1000 gpm to meet its function at the asset level. Level 2 - Hidden Failures A hidden failure might be one that is a defect in the part, but it is still function- ing, at least for the moment, The failure is hidden as far as the function of the asset is concerned but is a real defect on the part. An example here might be a bearing defect. The function of a bearing is to support the load while allowing the rotation of the shaft. That still happens with a bearing defect but eventually it will turn into a functional failure if allowed to go on for long enough. A lubrica- tion breakdown that is not known is another hidden failure. A part with a fatigue crack growing is a hidden failure. These things may or may not be known or could have the ability to be known by some technology. This would be the area of the failure curve where failure is occurring, but it has not been detected yet. Level 1 - Pre-failure Conditions A pre-failure condition may be any number of issues that will contribute to or cause failure if left unchecked. An example of these could be misalignment, imbalance, cavitation, leaks, mechanical looseness, poor lubrication, corro- sion, a bent shaft, resonance, high vibration, high temperature operation, and any other condition that is not ideal for the best operation of the asset. Some other items might be operating outside the operating design parameters for the equipment, such as too much pressure, too much flow, too much load, too much speed or too little speed. All these types of things can be preconditions1 Introduction to Successful Root Cause Failure Analysis (RCFA) 7 or symptoms which can be contributing causes or root causes of failure. A pre- failure condition is not an actual failure. There are always changing operating conditions of equipment and systems. The equipment must run through these conditions, most of which can be corrected before a functional failure occurs or even a Level 2 or 3 failure, All these failure levels lead to functional failure if not addressed. RCFA can also be triggered at any level of failure. If you can solve why the pump has high vibration (Level 1 pre-failure condition), then you will likely solve some of the root causes of the pump failure. RCFA is typically thought to be a reactive-type reliability activity and for much of the time it is. However, RCFA completed on Level 1 pre-failure conditions can transform RCEA into a proac- tive activity. Chronic failure RCFA could also be a proactive activity as chronic failures reoccur unless corrective measures are taken on the root causes. The level of failure that is identified on the equipment can help in under- standing the urgency of a response to that failure, as well as the things to consider when a failure analysis is executed for the eventual failure. The more proactive an organization may be, the lower down the failure level will their reaction to maintenance be. Consequently, the more reactive an organization is, the higher the level of failure is typically reacted to, which would be Level 4 orrun (o failure. Show me a plant with more Level 4 functional failures and I'll show you a reactive maintenance organization much of the time. The type of maintenance strategy may correspond as listed below. + Level 4 Failure ~ Run to Failure Maintenance + Level 3 Failure — Preventive Maintenance + Level 2 Failure — Preventive & Predictive Maintenance * Level | Pre-Failure Condition ~ Predictive Maintenance & Proactive Maintenance Predictive maintenance is both reactive and proactive depending on the issue. Predictive maintenance that finds a bearing defect is reactive. The bearing defect has already occurred, Failure is not prevented, it is only managed when found early by vibration analysis. Vibration analysis did not prevent failure in this case, it only provided more time to plan for maintenance. Vibration analysis that identifies a Level 1 pre-failure condition, such as misalignment, would be proactive in the sense that if the pre-failure condition is corrected, a higher-level failure may be avoided. Reactive predictive maintenance conditions would include all types of bearing defects, a bent shaft, and oil analysis wear metals, Proactive predictive maintenance conditions would include misalignment, imbalance, cavitation, looseness, resonance, and key fluid properties from oil analysis such as viscos- ity, additives, contamination, oxidation, etc.8 Practical Root Cause Failure Analysis 1.3 RELIABILITY & RCFA What is reliability? One definition is the probability that a product, system or service will perform its intended function adequately for a specified period or will operate in a defined environment without failure (~functional failure). For most rotating equipment, and even some static equipment, there is no such thing as 100% reliable, Industrial machines for the most part have a finite life. You could say they are designed to fail at some acceptable point in their service life, Managers at NASA in the 1980s estimated that the space shuttle would have a reliability of 99.99% which is a failure rate of 1 every 100,000 flights Engineers were a little more realistic and predicted a reliability of 99% (1 fail- ure every 100 flights). That doesn’t sound so good if you are an astronaut. The first space shuttle accident with the Challenger occurred at the 25th flight which was 96% reliability. In 2003, the Columbia incident occurred at the 113th flight. At the end of the shuttle missions, overall reliability was 98.51%. The engineers were not too far off. Despite the many safeguards that NASA had taken the final product still had failures. Why is that? The human element is the powerful common denominator. Every failure has a human ele- ment to consider at every step along the way of the life of the asset. From man- agers’ decisions to human resource management to purchasing to engineering to maintenance to operations. All human groups and their decisions affect the reliability and thus the failure of equipment. To understand solving failures, itis helpful to understand a little about the life cycle of an asset. In the life cycle of an asset, it goes from design, installa- tion and operation until its performance or condition deteriorates and mainte- nance is executed. This life cycle is often shown in the P-F curve or modified DIPF curve as shown in Figure 1.2. Point P is the point where a failure begins to occur or the point where its condition begins to deteriorate. F is functional failure or Level 4 failure. Other stages of failure may happen further back up the P-F curve. Every different type of failure mode has a different slope of the P-F curve region, Some may be steep, and some may be flatter. For example, a bearing that suddenly runs out of oil will have a steep and short P-F curve. A bearing outer race defect due to static corrosion may progress from a Level 2 toa Level 4 failure over many months so it would have a flat curve compared toa lack of lube failure mode. On the front end, the asset has been designed and installed to do a certain function and reliability is also designed in. Design and installation determine an asset's foundational ability to perform at a certain reliability level. The length of the flat part of the DIPF curve will depend on how well the design and installa- tion portions have been executed. Level 1 pre-failure conditions shorten the I-P1 © Introduction to Successful Root Cause Failure Analysis (RCFA) 9 5 eval 2 a P point where detection of % [Ltasattation condition deterioration begins 5 © | proactive Minenance S| Mamtssseimsnanse —peedisive Maintenance B | ciiwlnaicne Won Ana dts) Latication Excellence, Oil Analysis (wear metals) G | preventive Maimenance Ultrasound Bacco & | vitrvonarabsinpre: 18 Taermal Scans Fee a, \ gles SE] iat conan) cae Oil Anais (be Sie D- Design | onan F Asset, Dain Fx Functional any EEmIrIn eH Failure a Time FIGURE 1.2. — DIPF curve region and begin the P-F failure curve. Other factors after startup also affect how long the asset will run reliably. Once the equipment has entered a particular ure mode (P), the focus shifts toward managing the failure as it progresses. The main areas to prevent failure must happen prior to P in the design, installation, operation and maintenance portions of the equipment’s life. The result of that RCFA will address the different issues from the root cause of failure which may be something in design, installation, operation, maintenance, etc, The new or rebuilt asset is installed and the cycle repeats. 1.4 PREDICTING FAILURE Predictive maintenance utilizes many tools to help maintenance identify and manage early-stage failures before Level 4 functional failure occurs. Whether the condition is a vibration, oil condition, ultrasound or temperature, there are some levels that are proven to lead to functional failure if left unaddressed. However, the area manager once he hears about a failure condition or precon- dition on his equipment, wants to know how much remaining useful life (RUL) he has so that corrective plans can be put in place. How we respond will depend on several factors, such as the level failure (1, 2, 3, or 4), equipment criticality, the P-F interval for a particular failure mode, and the cost of unreliability (functional failure cost). Point P in the P-F curve is the point at which a failure condition can be identified, By the shape of the curve the condition continues to deteriorate at some slope until functional10 Practical Root Cause Failure Analysis failure, At this point, assume that the condition monitoring process is 100% accurate. The next question is how long until functional failure? To find the answer, we must dig a little deeper. Every failure mode has a different shaped P-F curve. For example, an inner race defect from corrosion will have a different P-F curve than that of fatigue spalling. Also, any other variable that is different in the same failure mode will also change the P-F curve shape. For example, the same inner race defect from corrosion, with a bearing that is lightly loaded compared to one heavily loaded, will have a different P-F interval. Consider the variables that can be different for each type of failure mode for a bearing and there could be hundreds of P-F intervals that would affect the RUL, So, to put some good estimates on the RUL in predicting failure of mul- tiple data streams that report on the condition requires a very good under- standing of not only the equipment but the analysis of multiple data streams reporting on the condition ~ vibration, oil analysis, ultrasound, temperature, etc, Equipment items will be discussed in other chapters so a few thoughts here on condition monitoring tools. Vibration has been called a science and an art. It is a science in the fact, that all the programs, software and algorithms are based on the science of motion, These frequencies, generated by the rotating equipment, are based on the physics of the angular speed and geometry of the equipment design. When any of these thought-to-be-known parameters change due to wear or other physical changes, the science doesn’t change but we don’t have the new parameters to input into the analysis which results in flaws. For example, a piece of equipment has a bearing changed and the new bearing, while it fits, is a different brand with different geometry. Well, the defect frequencies are not going to be the same. The other challenge with vibration analysis can be in reporting what the vibration shows versus what is causing the vibration, For example, a high 1X vibration can many times indicate imbalance. However, a bent shaft can also show high 1X as it gives the same view as imbalance mass in rotation, The mass is not centered around the axis of rotation but if the vibration report says you need to balance the rotor or clean the rotor, the root problem will not be addressed. Vibration is a mixed bag of reactive and proactive. Reactive results from vibration are bearing defects. The failure has already begun, and nothing can prevent failure at that point. However, vibration analysis can also identify pre- failure conditions (imbalance, misalignment, vane pass) that when corrected may prevent premature failure Temperature monitoring can be tricky. Small changes in temperature may not mean very much, whereas large temperature changes obviously mean something more serious is occurring. Usually there is little reaction time when temperature changes reach certain levels. Understanding the source of heat is1_ Introduction to Successful Root Cause Failure Analysis (RCFA) 11 critical on each application. Is there steam used or only mechanical friction? Is there lubrication to control friction and heat? Is it oil or grease? Where is the temperature measured? Oil analysis can provide several layers of condition monitoring. First it provides Level I failure conditions when wear elements begin to show up in analysis. If the wear has already started to occur, then the failure has begun, which is reactive. It doesn’t mean if the condition is not quickly addressed that wear can’t soon return to normal levels. A more proactive mode for oil analysi would be to closely monitor the lube condition, such as viscosity, contamina tion levels, additive levels, oxidation, Total Acid Number which can reveal degradation in the lubricant, If lube issues are not addressed it will lead to machine wear and ultimately functional failure. To say that predicting failure is difficult if not impossible is an understate- ment; however, with an experienced professional some educated guestimates are possible. If enough study has been done on particular failure modes on the same equipment, then some statistical data may be obtained on P-F curve intervals and RUL when failure conditions are discovered. 1.5 MAINTENANCE & RCFA An asset has several stages in its life cycle from design, procurement, installa- tion, operations, maintenance, and eventual failure. The failure and changeout should have an RCFA. If the RCFA finds a design issue or has an action item for a redesign the cycle starts over again, Every asset and process in a manu- facturing plant begins with design. Engineers, managers, and all the decision makers decide what will be installed, how it will be designed, and how much reliability will be built into every operating plant. There are so many parts of the asset life that are baked in up front with design choices. I have seen esti- mates as high as 50-75 percent of reliability issues have some connection back to design. Design choices will affect all levels of failure from Level 1 through to Level 4, Is the baseplate or foundation sufficient to minimize pre-failure conditions? Is the equipment selection such that components reach full life? Procurement really should be following the engineering specifications and industry standards. When going out for bids it is important to have these specifications closely reviewed and adhered to so that these are all satisfied, There is always pressure to find a lower bidder. A good rule is never to include a supplier on the bid list that you do not feel can meet the specifications of the project. Don’t use substitute materials outside of the specification. Be careful going to knockoff suppliers who offer a lower price but have cut corners. Price and value are two different things. Value is paid for typically over the long12__ Practical Root Cause Failure Analysis haul while price is garnered one-time upfront. Spare parts are another critical element on the front end of design and procurement. Installation has many pieces which may include choosing the right con- tractors on initial install or even during maintenance later as the equipment is replaced by a contractor. Proper checkout and commissioning are critical in passing the baton to the operations group for successful asset function and reliability, Have the maintenance staff been involved in signing off from criti- cal installation parameters that would lead to pre-failure conditions such as misalignment, pipe strain, imbalance? Training for operators and maintenance is an important part of the installation and commissioning of the equipment. Operations will only be as successful if the design, procurement and instal- lation phases are successful. Proper commissioning for the first time as well as having established standard operating procedures (SOPs) for future startup is important, Centerlines for operators so each shift operates to the same stan- dards. Training for operators so new employees is brought up with the same knowledge standards as the initial startup crews. Do operators understand the design limits of their equipment and operate below them? Do they routinely swap spares on schedule? Do they make regular equipment basic care rounds looking for pre-failure conditions? The operators are typically the most avail- able resources to the operating equipment and must take ownership of the equipment if full life is to be obtained. Maintenance can have a similar impact on the asset as the original con- tractors who installed the equipment. Maintenance should be performed on the asset. Does maintenance have the correct procedures and specifications neces- sary to keep the equipment in this condition? Is the planning and scheduling function such that this information is available for them at the time of main- tenance? If not, then this is an area where failure can be introduced into the equipment. For example, having the wrong gap, wrong spacing, wrong align- ment, wrong assembly, wrong bolt torque, or substituting the wrong materials can all lead to equipment failure. Is lubrication being executed as and when it is required? Proactive maintenance will do more Level L-type failure corrective work. When the equipment is misaligned, maintenance will do an alignment. When the seal starts to leak, maintenance will fix the seal before the entire asset reaches functional failure due to a seal leak. Anytime maintenance changes parts, an RCFA may be initiated. Most of the time it will not be feasible, but the opportunity exists to ask the questions ~ did we get full life from the asset or component? If not, then why not? What caused it to fail before full life? When the RCFA is executed, it may lead to a redesign of the system or the area where the root cause of failure was determined, whether it be design, procurement, installation, operations or maintenance. The total reliability of the asset is the sum of all the people who had some hand in the asset as defined
You might also like
Iso 15663 LCC
PDF
No ratings yet
Iso 15663 LCC
110 pages
Practical Root Cause Failure Analysis
PDF
100% (10)
Practical Root Cause Failure Analysis
169 pages
Bearing Failure Brochure
PDF
No ratings yet
Bearing Failure Brochure
16 pages
White Paper Asset Reliability and Integrity
PDF
100% (1)
White Paper Asset Reliability and Integrity
20 pages
Reliability and Maintenance Data Improvement Based On ISO 14224
PDF
100% (1)
Reliability and Maintenance Data Improvement Based On ISO 14224
6 pages
Paper RCFA - Root Cause Failure Analysis
PDF
No ratings yet
Paper RCFA - Root Cause Failure Analysis
7 pages
Root Cause Failure Analysis Rev 2
PDF
100% (1)
Root Cause Failure Analysis Rev 2
69 pages
Part 7 RCM Pump Reliability 33
PDF
100% (1)
Part 7 RCM Pump Reliability 33
33 pages
RCFA K-23252 Blower Trip Due Motor Overload - Final
PDF
No ratings yet
RCFA K-23252 Blower Trip Due Motor Overload - Final
19 pages
Failure Analysis of Rotating Equipment Using Root Cause Analysis Methods
PDF
No ratings yet
Failure Analysis of Rotating Equipment Using Root Cause Analysis Methods
7 pages
Basic Concepts of FMEA and FMECA
PDF
No ratings yet
Basic Concepts of FMEA and FMECA
2 pages
Oil and Gas Compressor RAM
PDF
No ratings yet
Oil and Gas Compressor RAM
7 pages
Reliability Improvement Fake News: Terrence O
PDF
100% (1)
Reliability Improvement Fake News: Terrence O
46 pages
Root-Cause Analysis - General Electric
PDF
100% (2)
Root-Cause Analysis - General Electric
220 pages
SAMI Booklet
PDF
100% (1)
SAMI Booklet
16 pages
FMEA - Chris Satria Pramana - 02111540000131
PDF
100% (1)
FMEA - Chris Satria Pramana - 02111540000131
3 pages
RCM Report Sample
PDF
100% (1)
RCM Report Sample
15 pages
Apm Book PDF
PDF
No ratings yet
Apm Book PDF
314 pages
The Difference Between Failure Modes and Failure Mechanisms
PDF
No ratings yet
The Difference Between Failure Modes and Failure Mechanisms
4 pages
Failure Pattern RCM
PDF
100% (1)
Failure Pattern RCM
14 pages
RCM-based Motor Management
PDF
No ratings yet
RCM-based Motor Management
4 pages
Questions of Reliability Centered Maintenance
PDF
100% (1)
Questions of Reliability Centered Maintenance
15 pages
Reliability Failure Analysis
PDF
100% (1)
Reliability Failure Analysis
12 pages
DNV RAM FPSO White Paper
PDF
100% (2)
DNV RAM FPSO White Paper
7 pages
A2. Equipment Criticality Rating
PDF
No ratings yet
A2. Equipment Criticality Rating
3 pages
Belt Failure Posters Both
PDF
No ratings yet
Belt Failure Posters Both
2 pages
Rcm3 May 2017 Course Brochure v01
PDF
No ratings yet
Rcm3 May 2017 Course Brochure v01
3 pages
Applying Reliability Centered Maintenance RCM To Sampling Subsystem in Continuous Emission Monitoring System
PDF
No ratings yet
Applying Reliability Centered Maintenance RCM To Sampling Subsystem in Continuous Emission Monitoring System
9 pages
Reliability Centered Maintenance (RCM2) : Well-Defined, Most Used RCM Process in Industry Compliant To RCM Sae Standard
PDF
No ratings yet
Reliability Centered Maintenance (RCM2) : Well-Defined, Most Used RCM Process in Industry Compliant To RCM Sae Standard
4 pages
Revised Proact RCI English
PDF
No ratings yet
Revised Proact RCI English
31 pages
2-1 RCM
PDF
No ratings yet
2-1 RCM
25 pages
TM 5-698-6 - Reliability - Data - Collection - 2006 PDF
PDF
No ratings yet
TM 5-698-6 - Reliability - Data - Collection - 2006 PDF
118 pages
Pump FMEA
PDF
100% (6)
Pump FMEA
241 pages
Mechanical Engineering Failures The Role of Reliability
PDF
No ratings yet
Mechanical Engineering Failures The Role of Reliability
15 pages
RCMO
PDF
100% (2)
RCMO
115 pages
Fmeca Procedure
PDF
100% (1)
Fmeca Procedure
11 pages
Application of Reliability Centered Maintenance On A Drilling System
PDF
No ratings yet
Application of Reliability Centered Maintenance On A Drilling System
67 pages
Failure Mode and Effects Analysis1
PDF
No ratings yet
Failure Mode and Effects Analysis1
6 pages
Maintenance Roi
PDF
No ratings yet
Maintenance Roi
8 pages
PCM PCP Failure Mode - A - Oct2019
PDF
No ratings yet
PCM PCP Failure Mode - A - Oct2019
8 pages
What Is Reliability - Centered Maintenance
PDF
No ratings yet
What Is Reliability - Centered Maintenance
6 pages
Fmea Circulating Water Pump
PDF
No ratings yet
Fmea Circulating Water Pump
10 pages
Uptime August September PDF
PDF
100% (1)
Uptime August September PDF
68 pages
FMEA and FMECA PDF
PDF
No ratings yet
FMEA and FMECA PDF
254 pages
P-1809B Mechanical Seal Leak RCFA
PDF
100% (2)
P-1809B Mechanical Seal Leak RCFA
18 pages
Photography of Gear Failures: Robert Erricheuo
PDF
100% (1)
Photography of Gear Failures: Robert Erricheuo
4 pages
Optimum PM and Reliability Centred Spares
PDF
100% (2)
Optimum PM and Reliability Centred Spares
79 pages
AVT Reliability
PDF
No ratings yet
AVT Reliability
8 pages
Why Do Machines and Equipment Continue To Fail in Companies PDF
PDF
No ratings yet
Why Do Machines and Equipment Continue To Fail in Companies PDF
126 pages
RCM For Marine Applications
PDF
No ratings yet
RCM For Marine Applications
26 pages
10 Reasons Why Mntnce Fails
PDF
No ratings yet
10 Reasons Why Mntnce Fails
7 pages
LCC Book Chapter 6 Parra Asset Management
PDF
No ratings yet
LCC Book Chapter 6 Parra Asset Management
19 pages
Proact Rca Template
PDF
100% (1)
Proact Rca Template
1 page
RCA Root Cause Failure Analysis, Slides Eng. Alaa Omar 2020
PDF
No ratings yet
RCA Root Cause Failure Analysis, Slides Eng. Alaa Omar 2020
117 pages
Solutions - April - 2010 - Simplify Root Cause Analysis
PDF
No ratings yet
Solutions - April - 2010 - Simplify Root Cause Analysis
4 pages
Root Cause Failure Analysis Methodology: Rcfa of
PDF
No ratings yet
Root Cause Failure Analysis Methodology: Rcfa of
44 pages
RCFA
PDF
No ratings yet
RCFA
34 pages
Refresh Your Maintenance Program
PDF
No ratings yet
Refresh Your Maintenance Program
16 pages
Motor Defect RCA (1)
PDF
No ratings yet
Motor Defect RCA (1)
12 pages
Root Cause Analysis Article Rca RCM PDF
PDF
100% (1)
Root Cause Analysis Article Rca RCM PDF
4 pages