CG1104_Murphy3

How Do We Minimize Murphy Consequences in Automation?

April 5, 2011
Murphy is Alive and Well! Minimizing Risks Is Essential to Create Solutions That Minimize Downtime

The most popular rendition of Murphy's Law is, "What can go wrong will, and at the worst possible time…"  In today's automation world, we are building ever more complicated automation and management systems, designed to eek the last bit of quality and production performance our of our processes.  We are creating some fertile ground for the production of future Murphy's Law crops.  Minimizing these risks, from all perspectives – Automation Vendor, System Integrator, and End User, is essential to create solutions that will degrade gracefully to minimize downtime.

Why am I writing about this now?  Two Words – Hard Drive.  As Baz Luhrmann once said in a commencement speech turned into "Sun Tan Song,"  "The real troubles in your life are apt to be things that never crossed your worried mind; the kind that blindside you at 4pm on some idle Tuesday."  OK, in my case it was 3:15 on a Monday and I had to do a hard boot.  That was it – "Drive not recognized."   New Drive in hand, some software upgrades at the same time, backups that were out of date and a day later of loading, copying, recovering and I was 90% whole again, (not what the plant manager would want to hear, right?).  A few shortcuts to get back on line quickly – Antivirus software can wait, documentation of the new system setup can wait, I won't forget to update that temporary license I got from my software vendor, to get back up and running… By now you're likely shaking your head saying Yup, I've been there…  Oh, but it gets better.  The next morning I wake up to a message saying there is a problem with the Operating System, and the system can't boot.  Recover with a Boot Disk, scan the drive and there are bad clusters.  The Bathtub Curve still exists… 

The bright side – Some aspects of my technical life continued to work.  Webmail gave me access to email.  I could also go to another machine that had parallel email access thanks to IMAP rather than POP technology.  I happen to use a combination of Web Based "Hosted or Cloud Based" solutions.  They keep working.  This reminded me of the term "Graceful Degradation," when systems can encounter failures yet will continue to deliver value.

How do you design for Graceful Degradation?  And, how do you design for a quick recovery?

Let's explore these issues from the three perspectives outlined earlier – Automation Product Vendors, System Integrators and the End User perspective.

Product Vendors should design products with Graceful Degradation in mind.  Products should be modular, to enable flexibility to meet the task, but if a Module fails, the entire product shouldn't stop delivering value.  For example, if security authentication is not available as a service, can you still authenticate locally?  What's the backup?  Licensing of technology.  Will licensing survive Disk Mirroring?  If not, are your procedures for update documented and readily available?  Does it require a product expert?  Are your application files neatly stored in one place?  Do you rely on Registry settings or is the entire configuration stored within a set of application files?  Are the resources of your product well documented in terms of Ports or Operating System Services that are needed?  Focus on delivering solutions and not toolsets.  In today's world, we need to deliver fish, not deliver tools and teach people to fish.  There is no time for learning and becoming product experts today, and the variability in the resulting applications extends development and troubleshooting times, and complicates documentation.

Consider following an "Appliance Paradigm."  There is a range of new "Purpose Built," commercial of the shelf (COTS) products that are applied rather than engineered for an application.  Examples include Enterprise Transaction Modules, Panel HMIs, and the Universal Protocol Gateway.  Appliances are quickly replaced with an off the shelf spare and their program is updated through Flash Memory cards or some simple configuration settings.

System Integrators need to resist their tendency to engineer everything.  In the past, when vendors provided toolkits, there was a heavy reliance on system integrators that have the ability to amortize a development tool learning curve over many projects.  Integrators would then develop end user applications efficiently.  That was a time when end users had engineers on staff that would be the process expert, guiding the developments of their integration partner.  Today, with rightsizing in all industries, the process expertise lays in the heads of both integration partners and a fewer number of process engineers.  The System Integrator should really be changing their focus from engineering custom solutions, to configuring purpose built products to solve problems.  By engineering, I mean software development and scripting – using development tools rather than purpose built solutions.  Development Tools require high levels of QA and Documentation, items that can quickly eat into the profits of a job.  The IEEE Computer Society statistics show that development costs make up 40% of overall software costs.  60% is taken up with QA, bug fixing and enhancement.  QA really needs to be separate from development in order objectively test an application and it takes as long to QA a development as it does to develop in the first place.  World class companies have ratios of 1:1 or higher, QA to Developers.  Going forward, System Integrators will be delivering more value to customers by delivering process expertise, delivering value in product selection, chosing Purpose Built products to solve a customer problem.  It is the Process Expertise that is the new value of an Integrator, not the expertise in a specific vendor tool.  Vendors creating purpose built "Appliance" solutions, which benefit from high deployment volumes, and unmatched levels of QA, situational testing, and both user and application documentation.

Who was Murphy anyway?

We attribute the sayings like "If it can fail, it will" and "At the worst time possible" to Murphy.  But who was Murphy? 

Sayings like this stretch way back to 1877 when Alfred Holt wrote, "It is found that anything that can go wrong at sea generally does go wrong sooner or later."

British stage magician Nevil Maskelyne wrote in 1908, "It is an experience common to all men to find that, on any special occasion, such as the production of a magical effect for the first time in public, everything that can go wrong will go wrong."

In 1952, an unknown Physicist was quoted in the Yale Book of Quotations as saying "Murphy's Law says if anything can go wrong, it will."

Well, in 1949, a Wright Field Aircraft Lab development engineer named Captain Ed Murphy, made the remark "If there is any way to do it wrong, he will," referring to a technician that was incorrectly wiring strain gages.  This comment and the attribution was picked up in a letter sent to Arthur Bloch, and used in his 1977 book, "Murphy's Law and Other Reasons Why Things Go Wrong."

End Customers must learn that there are new solutions on the market, and need to work with System Integrators to design for Graceful Degradation.  That means modular systems and Parallel Systems.  Don't layer systems such that one failure at the foundation will bring down the entire solution.  In the past, "One Version of the Truth" drove designers toward tiered solutions, layering higher level analytics on a singular set of input data.  While this provides accountability, and the ability to know the contribution of each layered system, and proliferates the "Garbage In, Garbage Out" scenario, setting up the "House of Cards."

Instead, consider parallel paths that can offer you validation of your primary solution.  Brought to basics, that means Panel HMIs that communicate with controllers in conjunction with SCADA systems that communicate with Controllers.  Allow those communications to run in parallel, if bandwidth permits.  If one system fails, the other can act as a backup.  Think of this as poor-man's redundancy.  It is simply a logical architecture that enables you to have more than one way to operate your systems.

Avoid solutions that require a great deal of "Tribal Knowledge."  This can happen when an integrator uses "Tools" to develop your solutions rather than selecting Purpose Built products to solve a problem.  The drawbacks of engineered solutions are higher long term costs of ownership due to the higher costs to tweak solutions, and both QA and document the final outcome.  Heaven forbid you need to transition the maintenance of your solution from one developer to another, that has not had familiarity with the solution.

Rely on open and widely adopted standards such as OPC, for interoperability between products, rather than custom developed "Glue Code" to create interoperability between products and systems.  OPC is supported by virtually every vendor, and there are an abundance of toolkits enabling rapid development, when needed.
None of this is rocket science.  It is simply looking at the problem with a new perspective.  Perhaps I should be thankful for my failed Hard Drive.

As for Murphy, don't kid yourself.  He is alive and well, and when you are at your most vulnerable, he'll be knocking at your door. 

About the Author

Roy Kok is a 30+ year veteran of the automation industry.  Today, he owns and operates AutomationSMX, a Marketing and Sales Consultancy.  He is also an OPC Foundation Evangelist and is President of a new startup, Aware Technology.  You can reach Roy at [email protected]