It is Saturday night, 2 a.m. Your mobile phone rings, and caller ID reveals that it is work calling.
Something has gone wrong – or maybe not, but since you are the reliability engineer, you are the one expected to drop everything, go in, and figure it out. The guys that called, operations, only know that they heard or felt something unfamiliar. And that makes it your problem.
It has been like this, more or less, since the early days of the industrial revolution. The earliest approach to equipment maintenance was reactive maintenance: run the machine to failure and then fix it or replace it. But those failures, just like this bump in the night, too often came at inopportune moments or brought with them catastrophic collateral damage.
Eventually, reactive maintenance was replaced by strategies intended to avoid ill-timed and dramatic failures. Scheduled or preventive maintenance may be thought of as “fixing it before it breaks” and ideally on your schedule. The most famous advocate of this philosophy was American W.E. Deming, dispatched to help speed Japan’s recovery after WW2. Unfortunately, statistical models predict; they do not give guarantees. Some machines fail earlier than predicted, usually at night, on weekends or during public holidays, while other machines might run much longer than expected.
Next came predictive maintenance: keep running, but watch machines closely using technology so that you can intervene before failure, while not sacrificing much operating life. That sounds great! And it is great! For the past 30–40 years, the best-in-class machinery reliability practitioners have been using vibration analysis, tribology (oil analysis), infrared thermography, ultrasonic testing, and other techniques to monitor equipment, most often on a monthly basis.
Corrective and proactive maintenance evolved along the way as well. It was a logical progression that if the predictive tools told you that a machine was out of balance, you would correct the balance immediately – not after more damage was done. If proactive testing revealed that a machine was being run at resonance and shaking itself to death as a result, then it simply makes sense to perform the analysis necessary to correct the situation and return the system to a sustainable operating condition.
There are at least two main challenges to the successful application of these strategies:
- Collecting data at the moment something is happening
- Understanding what that data means
The machinery condition monitoring world can be divided, more or less, into two groups which are periodic measurements and continuous monitoring. A very small subset of all machines is monitored continuously. Only those that are absolutely mission-critical, the machines that shut down the whole operation if they stop running, require continuous monitoring. For the rest of the equipment, often referred to as the “balance of plant,” it simply does not make sense to monitor them continuously. Why? Cost. Keep in mind that the cost per channel of monitoring (typically using accelerometers) continues to drop as technology matures, and wireless capabilities are starting to have a real impact on defraying installation costs (running conduit and wire are really expensive undertakings). Despite these advancements, it is still true that as of today (early 2020) a relatively small minority of machines are continuously monitored. For those that are monitored continuously, in many cases the alarms produced do not provide adequate information to understand what is happening (only that something is happening) and so it still must be investigated and studied further.
So what? Well, some faults develop in predictable ways. For example, a subsurface stress riser in a bearing gives way to a small crack that progresses into a larger spall, with each stage lasting months or longer, giving an analyst a fighting chance of finding, understanding, and trending the developing failure mode. It does work out that way – sometimes.
In other cases, a fault comes and goes, its intensity often determined by process conditions or environmental variations. Just like an intermittent electrical fault in a car, it never happens when you are with the mechanic at the garage. So, since the majority of machines are tested only once a month but measurable events can happen at any time, the likelihood of catching one is pretty small. This is what creates the call at 2 a.m. on Saturday night. You did not plan it this way.
This is the first way in which Iris CM will change the online monitoring market: not missing the data in the first place. It is not necessary to permanently install any part of the Iris CM system, and it is not necessary for anyone to physically start data collection (of course, permanent installation is also an option, whether to monitor a critical asset on an ongoing basis, to instrument a test cell, or for use as a quality control application).
This means that you can set up and configure the system on any asset, using either an external trigger (like an accelerometer), a non-contact video-based trigger or even a periodic, time-based recording schedule to acquire the data without the user needing to supervise the monitoring. The user can review the data either at the system or remotely (with a LAN or wireless connection) and when the analysis is done, the system may be easily redeployed to another troubleshooting task.
Now imagine getting that same 2 am call, but instead of dressing and driving into the plant, you boot up your laptop and access the system remotely. You watch the alarming machine in real time, seeing the motion-amplified video many times from each of several cameras strategically positioned to show not only the detail of specific, key components, but also the overall context of what is happening. Data buffering lets you look back in time up to 90 minutes to see what was happening leading up to the alarm. All while still in your robe and slippers.
This still leaves us with the “understanding” problem: understanding what the data means. Some aspects of traditional vibration analysis that seem, on the surface, to be difficult are actually more straightforward than sorting out many of the “simpler faults.” For example, if a gear has 27 teeth, and the spectrum has a peak at the 27th harmonic of shaft turning speed, you can be pretty confident that you are looking at gear mesh frequency. There’s not much wiggle room there. If you have calculated the frequency of a fault on the inner race of a bearing and your spectrum shows a harmonic set of peaks matching that frequency and there are sidebands separated by shaft turning speed, then it is hard to imagine that you do not have damage on that bearing inner race. These are pretty clear “smoking guns”.
On the other hand, consider the first few harmonics of turning speed in a frequency spectrum. Those could represent one (or more) of literally dozens of failure modes. Imbalance? Misalignment? Loose fastener? Inadequate base? Cracked foundation? Is it directional? Could there be resonance? Traditional vibration analysis may tell you that there is a problem, but it normally does not spell out clearly what that problem is.
A skilled analyst may employ advanced analysis techniques like phase analysis or operational deflection shape analysis to fine tune their analysis. But those take time and skill, and you need to be lucky enough to be there at the exact time that it is happening (2 a.m. on Saturday). Even then, if the frequency of interest is quite low or, if the process is not running in a steady-state condition, being there may not be enough – the technique may require steady-state operation or the frequencies of interest may produce such minute forces that piezoelectric accelerometers may struggle to be excited enough to provide a usable signal.
This is the second way in which Iris CM will change the online monitoring market: letting users understand what the data means. Motion Amplification® is a proprietary video processing technique that detects subtle displacement and then converts that movement to a level visible with the naked eye, enabling a dramatic visualization of the movement. When you can see how the machine is moving, the source of most issues becomes clear. That loose bolt is visible. The cracked foundation is obvious. You can see if the motor and pump are bouncing up and down together (in phase) or as if they were on opposite ends of a teeter-totter (out of phase).
The Iris CM has three cameras. Context is everything in many cases and, with multiple cameras deployed to see both the overall view in addition to details, the user can understand the situation more completely. That could save you a trip to the asset.
Art Crawford used to tell a story about a fan that failed regularly – always in the middle of the night. He decided to baby-sit it and, sure enough, after a few nights of surveillance he caught someone dumping mop water down a drain directly over the fan. The cold water hitting the hot machine created enough shock to dislodge material from the blades and put it out of balance. I think that Art would have enjoyed being able to let technology watch the machine while he slept.
Imagine having a tireless colleague that can watch your problem machines 24/7. A colleague with an iron-vise memory and vision that can slow down motion, amplify it, and share that with your whole team. Now you never have to miss getting the right data and you don’t have to guess at what it means. And you don’t have to go to the plant at 2 a.m. on Saturday night.