Alarm Management Gone Wrong, and other Tales From the HUG...

June 14, 2006

More from the Honeywell User Group...

Having successfully escaped "death by chocolate" last night at the Companions and Children's event, themed on Charlie and the Chocolate Factory, I am bringing this blog up to date. It is an open secret that Mara Weber, the User Group Americas Director at Honeywell,is a confirmed chocoholic and Tanya Tralka, the vice-chair of User Group Americas, actually works for Hershey (noted in this space on Monday). And I have to confess that the chocol...

More from the Honeywell User Group...

Having successfully escaped "death by chocolate" last night at the Companions and Children's event, themed on Charlie and the Chocolate Factory, I am bringing this blog up to date. It is an open secret that Mara Weber, the User Group Americas Director at Honeywell,is a confirmed chocoholic and Tanya Tralka, the vice-chair of User Group Americas, actually works for Hershey (noted in this space on Monday). And I have to confess that the chocolate fondue strawberries were to die for, especially for a diabetic like me. There were some fascinating talks yesterday afternoon, including a fascinating discussion of using PHD (and Visual Basic) to perform rigorous flow compensation, by Alan Davis, of Chevron Phillips, Jason Baum of Vision Stone, and Pradeep Adarkar of CRA Engineering Group. Another talk, which I failed to get to, but am eagerly awaiting the materials from, was John Halajko (FMC Corp.) who presented a DCS comparison checklist in spreadsheet form. One of the Honeywellers came up to him afterward, John said, and told him that he was going to use the spreadsheet on the Experion R300 and see how it came out, "and show it to the bosses." Another excellent talk yesterday came from Dave Chappell of P and G, with Mark Burbine of Mettler-Toledo and Jamie Bohan of Honeywell. Their discussion of "Repeatable, accurate feeds for difficult or non-uniform material transfers" was fascinating. They provided a low cost standard method for conducting feeds of these difficult material transfers, which will be a real salvation to all those people with wierd stuff that they need to get from one bucket to another. Of course, there was more...much more, as you can see from the enewsletter that is posted on the ControlGlobal home page...tune in tomorrow for today's newsletter on the home page. While I was unsuccessfully escaping going to chocolateland, I ran into Eddie Habibi and Roland Heersink of PAS. Eddie, of course, is an ex-Honeyweller, and PAS does a sizeable amount of business with Honeywell private labeling PAS products. PAS is here as part of the PKS Advantage program (Honeywell's knock-off of the highly successful Rockwell Encompass Partners program). Eddie and Roland wanted me to take a look at their work, which Honeywell is selling as Automation Data Manager. I am going to do that this afternoon after my "Lunch and Learn" speech. Yes, your humble editor is speaking at this year's HUG. Not to mention becoming notorious for being part of both Jack Bolick's and Donnalee Scaggs' addresses on Monday via video clip. My topic is "Who's going to replace you when you retire?" Tickets for the speech are sold out! I am flattered. You can hear the podcast of my talk at Who's going to replace you when you retire? This morning's first offering was by Lee Swindler, of Lyondell, who is a major Honeywell supporter and user group volunteer. Before I tell you the story, I want to note that it is a remarkably confident company that can allow even a known really good friend to stand up and say things that are less than good about the company. Erik Rasmussen, Honeywell's chief of PR, told me, "We pride ourselves in listening to our end users, and we don't always do everything right." The fact is, Swindler's talk wasn't about bashing any company, but rather talking about what the Lyondell team should have done differently to effect a different outcome. It was surprisingly titled, "Alarm Management Gone Wrong." Swindler gave a confessional report of the problems they incurred doing a TDC2000-to-Experion-PKS upgrade at the Bayport 2 PO/TBA unit. Part of the upgrade was an expansion to their Triconex SIS system, with over 1600 hardwired I/O and 3000 serial I/O. "If I was with ExxonMobil," Swindler said, "this here would be my last slide and I'd have to just tell you the rest...but I'm not." They had huge problems after the cutover, including large numbers of active and cycling alarms, over 30 system alarms, numerous BAD PV alarms, and major problems with their Ronan annunciator, including freezing and bad alarms. All of this, coupled with alarms being ignored and the volume on the DCS turned off contributed to one Level 2 environmental event, and a significant union grievance. "Sometimes it is fun to be a controls engineer," Swindler noted ruefully, "but that day all I wanted to do was hide in a corner." The site's operator training and culture was based on the TDC2000, with a hardwired annunciator for alarms. The operators also had a Modcomp system running the plant APC, with settable alarms that were connected to the annunciator as well. The operators set the PV Alarm level "tight" and were accustomed to "operating to alarm. There were very poor ergonomics in the control room, and limited alarm screens on the TDC2000. "The operators believed that the only real alarms came from the annunciator," Swindler said. And then there was the alarm rationalization project... "We didn't change the site culture," Swindler said, "and the engineering contractor was very weak on alarm management and rationalization skills. We didn't hire an alarm management professional to consult." He went on, "There was also heavy operations bias on the rationalization team, with the unit superintendant and six operators on the team." "They recommended setpoints set too tight, as they had been running for years, and actually increased the dependence on the annunciator and hardwired alarms," Swindler noted. Another real problem was the Triconex-to-Experion link. There is no native way for Triconex to talk directly to Experion, so the decision was made to use a Matrikon OPC server in a redundant (1oo2) configuration. "This was the solution recommended by both Honeywell and Triconex," Swindler said, "but if we had to do it over again, we'd have gone with Modbus. OPC is a complex and fragile technology at best." The Matrikon system gave them random distortion of data, both for digital and analog values. "We wound up loading the Matrikon software on the Experion servers themselves, as Matrikon recommended. This made us very nervous having third party software resident on the heart of our control system," Swindler said. "What would happen when we upgraded the R210 Experion system? We had no way of knowing." Swindler went on, "We went through seven or eight software revisions from Matrikon, all of which broke on installation. We appear to have been the development system for Matrikon," he said, "and they don't seem to have test beds where they can test their software. Being the beta test site for OPC software while trying to run an operating plant is not any fun." Eventually, and quite recently, he said, they implemented an alternate solution to get rid of the Matrikon software that was resident on the Experion servers. They deployed Honeywell Redirection Manager, and the problems went away. They still have the OPC servers and they are now working well. Swindler pointed out Kerry Sartain, from Georgia Pacific, sitting in the audience as he noted, "And Kerry Sartain reported yesterday that he tried Redirection Manager with his OPC problem, and it didn't work so he went back to Matrikon. All this proves is that OPC technology is very fragile." The next problem was with Ronan's serial annunciator. Communications loss caused the annunciator to freeze, and then, when it came back up, there would be sporadic re-flash of existing alarms. Since the operators were overdependent on the annunciator, this caused much distress. It became apparent after diagnosis that the problems were in the "A" link, which, after a firmware upgrade in the annunciator, went away. And now the whole sorry story comes to the Experion DCS issues. "Working with Experion needs a different and better skillset than working with other DCSes," Swindler noted. "Our operators didn't have the background to get the best out of the training they received. They were uncomfortable with working on the system, and they still are." "Honeywell really dropped the ball on this," he said, describing the fact that there is no way to disable HART alarms that appear after instrument hookup in R210 (it wasn't clear if this has been amended in R300). The outputs can't be switched from HART to regular without deleting the tag and rewriting one. "Needless to say, the operators don't think too highly of doing that while they are trying to operate the plant." Then there were the project management issues. The project was behind schedule. They didn't finish the SAT, the contractor performed poorly. They decided not to do a FAT, didn't do as thorough an SAT as they should have, and therefore didn't see the problems until they started to cutover. They had real issues with BAD PV thresholds, because the issue is not well understood. Analog devices work differently from smart transmitters in the way they report BAD PV, and you must set up your threshold settings based on the type of device you are getting a signal from, he told us. He showed us a slide based on NAMUR standard NE-43. "We incorrectly diagnose root cause fairly often because we don't understand this issue," he said. The problems they had applied to both the DCS and SIS, he said. They also didn't do a field survey, and they didn't have a functioning MOC system to deal with changes. "If we had done all our cutover hot, we would have discovered all these problems earlier and had an easier time fixing them," Swindler said, "but we only did about 50% of our loops as hot cutover. If we had done them all hot, we would have discovered the problems loop by loop, instead of all at once." So what lessons can be learned from this project? "First, change the culture and provide effective training," Swindler said. "Second, do your alarm rationalization right. We didn't hire an alarm management consultant and we should have. Eventually we brought in PAS to do it right. Third, exercise caution with new technology. You may hear wonderful things from lots of other people about OPC or other new technologies, but until you've actually used it...beware. Fourth, a Site Acceptance Test is critical, and shouldn't be shortened or done without, regardless of the impact on the schedule," he said ruefully. "Fifth, properly engineered PV settings and alarm bandwidth is important and should not be minimized," he continued,"and sixth, I recommend you maximize hot cutover regardless of what the experts say."

Sponsored Recommendations

2024 Industry Trends | Oil & Gas

We sit down with our Industry Marketing Manager, Mark Thomas to find out what is trending in Oil & Gas in 2024. Not only that, but we discuss how Endress+Hau...

Level Measurement in Water and Waste Water Lift Stations

Condensation, build up, obstructions and silt can cause difficulties in making reliable level measurements in lift station wet wells. New trends in low cost radar units solve ...

Temperature Transmitters | The Perfect Fit for Your Measuring Point

Our video introduces you to the three most important selection criteria to help you choose the right temperature transmitter for your application. We also ta...

2024 Industry Trends | Gas & LNG

We sit down with our Industry Marketing Manager, Cesar Martinez, to find out what is trending in Gas & LNG in 2024. Not only that, but we discuss how Endress...