Safety-Critical Systems

Safety-Critical Systems

by  William Yeager, Marty Leibham, Jardyn Bartman, Jose Delatorreleal, Ty Bivens

Introduction

    Safety-critical systems, also called life-critical systems, are computer systems that can result in injury or loss of life if it fails or malfunctions.  These systems can also cause harm to other equipment or the environment in the event of failure. People use safety-critical systems every day; for example: in phones, in cars, in computers, even traffic lights.  There are many safety-critical systems found in the world today.  Safety Engineering is what makes sure that these systems operate the way they are needed to should they fail.  Because the failure of these systems can be dangerous, these systems are designed to be as flawless as possible. Safety-Critical systems are strongly related to engineering.  They are a part of systems engineering and industrial engineering, however these systems are becoming increasingly computer based. [4, 6] 

    A main topic in System Safety is the avoidance of hazards or any condition that threatens the safety of any users. The rate of occurrence and the severity of these hazards factors into how much risk can be tolerated.  A hazard can be anything that can lead to an accident, develop into an accident, or anything likely to become dangerous when interacted with.  If there is significant risk due to severity or frequency of a particular hazard than risk reduction measures must be implemented in order for a risk to become tolerable. [5] 

    Safety Engineering emerged in the 1950's and 1960's to help control hazards that emerged from potentially dangerous missile and rocketry projects, and has only grown as more technologies rely on computers.  It is important that in these systems, safety is designed into the product rather than it being an afterthought.  Simplifying these types of systems is bad as it increases the opportunity for a single component’s malfunction to cause a system wide failure.  Small errors in a system can rapidly develop into a system wide failure that creates hazards.  In many ways it can be difficult for people to decide when these systems become truly safe enough for widespread use. [2, 3]  


Reliability:  5 types of Safety Systems

  1. Fail-operational systems - These types of systems will continue to operate even if their control systems fail. An example would be an automatic landing system if, in the event of a failure, the approach, flare and landing can be completed by the remaining part of the automatic system.
  2. Fail-safe systems - These types of systems become safe if they fail.  When faults are detected, these systems switch to a safe mode and usually inform an operator. An example would be a motorized gate that can be pushed open by hand with no crank or key required when a power outage occurs.
  3. Fail-secure systems - These types of systems become secure when they fail, usually by locking up to minimize harm. An example would be air brakes on trucks. The brakes are held in the 'off' position by air pressure created in the brake system. Should a brake line split the air pressure will be lost and the brakes applied. It is impossible to drive a truck with a serious leak in the air brake system.
  4. Fail-passive systems - These systems continue to operate in the event of a system failure by becoming passive and handing controls over to an operator. An example would be an automatic landing system if, in the event of a failure, there is no significant out-of-trim condition or deviation of flight path or attitude - but the landing is not completed automatically.
  5. Fault-tolerant systems - These systems continue to operate in the event of failure usually by detecting at risk components and getting replacements for them before they can result in any risk.  An example in a real life environment would be the Transmission Control Protocol.  It is designed to allow reliable two-way communication to a packet-switching network, even in the presence of communications links which are imperfect or overloaded. [4]


Developing Safety-Critical Systems

    Safety-critical systems are more complicated and more difficult to design when compared to other systems or software.  The idea of a safety-critical system is to create systems that are intrinsically safe, minimize hazards, control hazards, and reduce the impact of hazards.  Creating these systems can take a long time and cost vast amounts of money.  Developing these systems is more or less very similar to the development process of other systems.  However, for these systems, it is crucial that the final product performs exactly as intended and is capable of demonstrating its dependability.  Each phase of development is more carefully structured and documented so that any problems are quickly resolved and the system can perform in an appropriate manner.  

    The first step in development is approaching the system requirements, usually those specified by the target consumers of the system.  A functional requirements document must be written up that specifies exactly what this system attempts to accomplish.  Afterwards the requirements of the system are analyzed to identify risks and potential hazards related to the system.  This also outlines what the system must do or not do for the sake of safety.  At this time designers try to anticipate every situation the system may encounter. These documents must be concise about how the system will completely fulfill the requirements so that the programmers can clearly understand what is needed.  This can be a difficult process as specifications can often be misinterpreted. Ideally specifications must be:  correct, complete, consistent, and unambiguous.  Faults in these documents are one of the greatest problems during development.  The documents might not be adequate or they might not effectively address the customer’s desired requirements. [1]

    When the documents are completed and approved, designing begins.  Before coding begins the project is outlined and program subdivisions are distinguished.  Each program subdivision is designed to dictate a particular program behavior.  These program subdivisions must each be coded.  Programmers write lists of coding that they believe will achieve the outlined behavior for each subdivision. When completed the coding is compiled and put together.  When all of the completed subdivisions are linked together into a working program, the true behavior of the system emerges. This process can take years.  While most programs have thousands of lines of code, these systems are often comprised of hundreds of thousands or millions of lines of coding.  A different set of skills is necessary for programmers of these systems. Communication and organization are needed in dividing up such a large task.  This ensures that there are reasonable workloads and that the subdivisions are consistent. [3]

    While creating the system, everything must be done carefully.  The coding, inspecting, documenting testing, verifying, and analyzing must all be done with utmost care.  Safety-Critical Systems need the best quality software because lives depend on them working correctly.  Testing is done with these systems extensively to ensure that are no errors.  People can make mistakes.  Mistakes in these systems are potentially life threatening, so there are many different people involved in the development and testing of these systems.  In many cases, a system is produced that seems to work, but then it unexpectedly fails.  It is easy to build a computer that works 90% of the time, however it is extremely difficult to make one that works 100% of the time.  Typically a regular computer program will have five errors or less per thousand lines of code, but applying these typical programming practices to safety-critical systems can result in the loss of life.  Safety-critical systems need to be near perfect.  There is no room to put these systems on the market and potentially having customers experiencing the consequences of any errors before they have been corrected.  Errors should be eliminated from these systems before being commercially produced.  Systems with failures are often completely recalled, and these negligent designs can often result in criminal penalties or lawsuits. [3, 4]   


Notable Applications 

Medicine
    • Heart-lung machines
    • Insulin pumps
    • Infusion pumps
    • Radiation therapy
    • Robotic Surgery
    • Defibrillator machines and artificial cardiac pacemakers
Transportation
    • Aviation
      1. Air traffic control
      2. Engine control systems
      3. Flight planning
      4. Life support
    • Automotive
      1. Airbags
      2. Braking
      3. Steering
    • Railway
    1. Railway signaling 
    2. Traffic control
    • Spaceflight
    1. Launch safety 
    2. Vehicle safety
Power
    • Emergency shutdowns in plants and factories
    • Nuclear reactor control systems
Weapons
    • Weapons and defense systems
    • Warning systems
    • Arming and detonating explosives
Infrastructure
    • Fire alarms
    • Automatic doors
    • Emergency dispatch services
Recreation
    • Amusement rides
    • Parachutes
    • SCUBA gear  
[3, 4]


Conclusion

    Fortunately,  regulations, better development techniques, and cautious usage of computers have prevented many accidents from occurring in recent years.  There are still difficulties in this field.  Consumers desire systems that are safe and easy to use; and developers want systems that are easier to design, create, and repair.  It can be difficult to find a happy medium, and the produced systems often are poorly matched to the user’s needs.  In many situations the systems being used are complicated and difficult to use for anyone who does not understand how the technology works.  The people who depend on these systems to work correctly have many other things to worry about, and a complex machine can add more strife to these situations.  Today, many work to create more effective and efficient safety-critical systems.  Not only are lives depending on doctors and pilots, they are depending on engineers and programmers as well. [3]


Works Cited
  1. “Developing Safety-Critical Systems.”  Slideserve.  Slideserve.  21 Sept. 2012.  Web.  3 Dec. 2012.  <http://www.slideserve.com/haley/developing-safety-critical-systems>.
  2. Kalinsky, Dave.  “Architecture of Safety-Critical Systems.” Embedded.  UMB Tech. 23 Aug. 2005. Web.  3 Dec. 2012.  <http://www.embedded.com/design/prototyping-and-development/4006464/Architecture-of-safety-critical-systems>.
  3. Jacky, Jonathan.  “Safety-Critical Computing:  Hazards, Practices, Standards, and Regulation.”  University of Washington.  1994.  Web.  3 Dec. 2012.  <http://staff.washington.edu/jon/pubs/safety-critical.html>.
  4. “Life-Critical Systems.” Wikipedia.  Wikimedia Foundation, Inc.  25 Oct. 2012.  Web.  3 Dec. 2012.  <http://en.wikipedia.org/wiki/Life-critical_system>.
  5. “Safety-Critical Computer Systems - Open Questions and Approaches.”  Slideserve.   Slideserve.  17 July 2012.  Web.  3 Dec. 2012.  <http://www.slideserve.com/brygid/safety-critical-computer-systems-open-questions-and-approaches>.
  6. “Safety Engineering.” Wikipedia. Wikimedia Foundation, Inc.  23 Oct. 2012.  Web.  3 Dec. 2012.  <http://en.wikipedia.org/wiki/Safety_engineering>.

Comments