chore: add more chapters and replacements

2025-03-15 19:07:36 -06:00 · 2025-03-15 19:07:36 -06:00 · ff069b52c4
commit ff069b52c4
parent c1804744bf
7 changed files with 2770 additions and 26 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,5 @@
 piper/
 *.wav
 *.ogg
+*.mp3
 *.onnx*
--- a/8
+++ b/8
@ -2,20 +2,20 @@
 PATH:=./piper:$(PATH)

 WAV_FILES := $(patsubst %.txt,%.wav,$(wildcard *.txt))
-OGG_FILES := $(patsubst %.txt,%.ogg,$(wildcard *.txt))
+MP3_FILES := $(patsubst %.txt,%.mp3,$(wildcard *.txt))

 MODEL=en_GB-alan-medium.onnx
 CONFIG=en_GB-alan-medium.onnx.json

-complete: $(OGG_FILES)
+complete: $(MP3_FILES)
 	echo $@ $^

 $(WAV_FILES): %.wav: %.txt
 	cat $^ | piper -m $(MODEL) -c $(CONFIG) -f $@


-$(OGG_FILES): %.ogg: %.wav
-	ffmpeg -i $^ $@
+$(MP3_FILES): %.mp3: %.wav
+	ffmpeg -y -i $^ $@


 install:
--- a/chapter02.txt
+++ b/chapter02.txt
@ -311,14 +311,14 @@ started to slow down as the most obvious hazards were eliminated. The emphasis
 then shifted to unsafe acts. Accidents began to be regarded as someone’s fault rather
 than as an event that could have been prevented by some change in the plant
 or product.
-Heinrich’s Domino Model, published in 1931, was one of the first published
+Heinrich’s Domino Model, published in 19 31, was one of the first published
 general accident models and was very influential in shifting the emphasis in safety
 to human error. Heinrich compared the general sequence of accidents to five domi
 noes standing on end in a line (figure 2 3). When the first domino falls, it automati
 cally knocks down its neighbor and so on until the injury occurs. In any accident
 sequence, according to this model, ancestry or social environment leads to a fault
 of a person, which is the proximate reason for an unsafe act or condition (mechani
-cal or physical), which results in an accident, which leads to an injury. In 1976, Bird
+cal or physical), which results in an accident, which leads to an injury. In 19 76, Bird
 and Loftus extended the basic Domino Model to include management decisions as
 a factor in accidents.
 1. Lack of control by management, permitting.
@ -439,7 +439,7 @@ able as the identified cause. Other events or explanations may be excluded or no
 examined in depth because they raise issues that are embarrassing to the organiza
 tion or its contractors or are politically unacceptable.
 The accident report on a friendly fire shootdown of a U.S. Army helicopter over
-the Iraqi nofly zone in 1994, for example, describes the chain of events leading to
+the Iraqi nofly zone in 19 94, for example, describes the chain of events leading to
 the shootdown. Included in these events is the fact that the helicopter pilots did not
 change to the radio frequency required in the nofly zone when they entered it (they
 stayed on the enroute frequency). Stopping at this event in the chain (which the
@ -459,14 +459,14 @@ more basis for this distinction than the selection of a root cause.
 Making such distinctions between causes or limiting the factors considered
 can be a hindrance in learning from and preventing future accidents. Consider the
 following aircraft examples.
-In the crash of an American Airlines D C 10 at Chicago’s O’Hare Airport in 1979,
+In the crash of an American Airlines D C 10 at Chicago’s O’Hare Airport in 19 79,
 the U.S. National Transportation Safety Board (N T S B) blamed only a “mainte
 nanceinduced crack,” and not also a design error that allowed the slats to retract
 if the wing was punctured. Because of this omission, McDonnell Douglas was not
 required to change the design, leading to future accidents related to the same design
 flaw.
 Similar omissions of causal factors in aircraft accidents have occurred more
-recently. One example is the crash of a China Airlines A300 on April 26, 1994, while
+recently. One example is the crash of a China Airlines A300 on April 26, 19 94, while
 approaching the Nagoya, Japan, airport. One of the factors involved in the accident
 was the design of the flight control computer software. Previous incidents with the
 same type of aircraft had led to a Service Bulletin being issued for a modification
@ -480,7 +480,7 @@ that delay, 264 passengers and crew died.
 In another D C 10 saga, explosive decompression played a critical role in a near
 miss over Windsor, Ontario. An American Airlines D C 10 lost part of its passenger
 floor, and thus all of the control cables that ran through it, when a cargo door opened
-in flight in June 1972. Thanks to the extraordinary skill and poise of the pilot, Bryce
+in flight in June 19 72. Thanks to the extraordinary skill and poise of the pilot, Bryce
 McCormick, the plane landed safely. In a remarkable coincidence, McCormick had
 trained himself to fly the plane using only the engines because he had been con
 cerned about a decompressioncaused collapse of the floor. After this close call,
@ -499,14 +499,14 @@ more basis for this distinction than the selection of a root cause.
 Making such distinctions between causes or limiting the factors considered
 can be a hindrance in learning from and preventing future accidents. Consider the
 following aircraft examples.
-In the crash of an American Airlines D C 10 at Chicago’s O’Hare Airport in 1979,
+In the crash of an American Airlines D C 10 at Chicago’s O’Hare Airport in 19 79,
 the U.S. National Transportation Safety Board (N T S B) blamed only a “mainte
 nanceinduced crack,” and not also a design error that allowed the slats to retract
 if the wing was punctured. Because of this omission, McDonnell Douglas was not
 required to change the design, leading to future accidents related to the same design
 flaw .
 Similar omissions of causal factors in aircraft accidents have occurred more
-recently. One example is the crash of a China Airlines A300 on April 26, 1994, while
+recently. One example is the crash of a China Airlines A300 on April 26, 19 94, while
 approaching the Nagoya, Japan, airport. One of the factors involved in the accident
 was the design of the flight control computer software. Previous incidents with the
 same type of aircraft had led to a Service Bulletin being issued for a modification
@ -520,7 +520,7 @@ that delay, 264 passengers and crew died.
 In another D C 10 saga, explosive decompression played a critical role in a near
 miss over Windsor, Ontario. An American Airlines D C 10 lost part of its passenger
 floor, and thus all of the control cables that ran through it, when a cargo door opened
-in flight in June 1972. Thanks to the extraordinary skill and poise of the pilot, Bryce
+in flight in June 19 72. Thanks to the extraordinary skill and poise of the pilot, Bryce
 McCorMICk, the plane landed safely. In a remarkable coincidence, McCorMICk had
 trained himself to fly the plane using only the engines because he had been con
 cerned about a decompressioncaused collapse of the floor. After this close call,
@ -545,14 +545,14 @@ exceptional case when every life was saved through a combination of crew skill a
 sheer luck that the plane was so lightly loaded. If there had been more passengers and
 thus more weight, damage to the control cables would undoubtedly have been more
 severe, and it is highly questionable if any amount of skill could have saved the plane .
-Almost two years later, in March 1974, a fully loaded Turkish Airlines D C 10 crashed
+Almost two years later, in March 19 74, a fully loaded Turkish Airlines D C 10 crashed
 near Paris, resulting in 346 deaths.one of the worst accidents in aviation history.
 Once again, the cargo door had opened in flight, causing the cabin floor to collapse,
 severing the flight control cables. Immediately after the accident, Sanford McDon
 nell stated the official McDonnellDouglas position that once again placed the
 blame on the baggage handler and the ground crew. This time, however, the FAA
 finally ordered modifications to all D C 10s that eliminated the hazard. In addition,
-an FAA regulation issued in July 1975 required all widebodied jets to be able to
+an FAA regulation issued in July 19 75 required all widebodied jets to be able to
 tolerate a hole in the fuselage of twenty square feet. By labeling the root cause in
 the event chain as baggage handler error and attempting only to eliminate that event
 or link in the chain rather than the basic engineering design flaws, fixes that could
@ -575,7 +575,7 @@ different types of links according to the mental representations the analyst has
 the production of this event. When several types of rules are possible, the analyst
 will apply those that agree with his or her mental model of the situation .
 Consider, for example, the loss of an American Airlines B757 near Cali,
-Colombia, in 1995 . Two significant events in this loss were
+Colombia, in 19 95 . Two significant events in this loss were
 (1.) Pilot asks for clearance to take the R O Z O. approach
 followed later by
 (2.) Pilot types R into the F M S. 5.
@ -630,7 +630,7 @@ often laid years before. One event simply triggers the loss, but if that event h
 happened, another one would have led to a loss. The Bhopal disaster provides a
 good example.
 The release of methyl isocyanate. (M I C.) from the Union Carbide chemical plant
-in Bhopal, India, in December 1984 has been called the worst industrial accident
+in Bhopal, India, in December 19 84 has been called the worst industrial accident
 in history. Conservative estimates point to 2,000 fatalities, 10,000 permanent dis
 abilities (including blindness), and 200,000 injuries . The Indian government
 blamed the accident on human error.the improper cleaning of a pipe at the plant.
@ -733,7 +733,7 @@ their face and closing their eyes. If the community had been alerted and provide
 with this simple information, many (if not most) lives would have been saved and
 injuries prevented .
 Some of the reasons why the poor conditions in the plant were allowed to persist
-are financial. Demand for M I C had dropped sharply after 1981, leading to reduc
+are financial. Demand for M I C had dropped sharply after 19 81, leading to reduc
 tions in production and pressure on the company to cut costs. The plant was operat
 ing at less than half capacity when the accident occurred. Union Carbide put pressure
 on the Indian management to reduce losses, but gave no specific details on how
@ -776,7 +776,7 @@ time and without any particular single decision to do so but simply as a series
 decisions that moved the plant slowly toward a situation where any slight error
 would lead to a major accident. Given the overall state of the Bhopal Union Carbide
 plant and its operation, if the action of inserting the slip disk had not been left out
-of the pipe washing operation that December day in 1984, something else would
+of the pipe washing operation that December day in 19 84, something else would
 have triggered an accident. In fact, a similar leak had occurred the year before, but
 did not have the same catastrophic consequences and the true root causes of that
 incident were neither identified nor fixed.
@ -822,7 +822,7 @@ Without understanding the purpose, goals, and decision criteria used to construc
 and operate systems, it is not possible to completely understand and most effectively
 prevent accidents.
 Awareness of the importance of social and organizational aspects of safety goes
-back to the early days of System Safety.7 In 1968, Jerome Lederer, then the director
+back to the early days of System Safety.7 In 19 68, Jerome Lederer, then the director
 of the NASA Manned Flight Safety Program for Apollo, wrote.
 System safety covers the total spectrum of risk management. It goes beyond the hardware
 and associated procedures of system safety engineering. It involves. attitudes and motiva
@ -876,7 +876,7 @@ be evaluated? Was a maintenance plan provided before startup? Was all relevant
 information provided to planners and managers? Was it used? Was concern for
 safety displayed by vigorous, visible personal action by top executives? And so forth.
 Johnson originally provided hundreds of such questions, and additions have been
-made to his checklist since Johnson created it in the 1970s so it is now even larger.
+made to his checklist since Johnson created it in the 19 70s so it is now even larger.
 The use of the MORT checklist is feasible because the items are so general, but that
 same generality also limits its usefulness. Something more effective than checklists
 is needed.
@ -1090,9 +1090,9 @@ rate has dropped by 35 per cent.
 sectio 2 4 1. Do Operators Cause Most Accidents?
 The tendency to blame the operator is not simply a nineteenth century problem,
 but persists today. During and after World War 2, the Air Force had serious prob
-lems with aircraft accidents. From 1952 to 1966, for example, 7,715 aircraft were lost
+lems with aircraft accidents. From 19 52 to 19 66, for example, 7,715 aircraft were lost
 and 8,547 people killed .. Most of these accidents were blamed on pilots. Some
-aerospace engineers in the 1950s did not believe the cause was so simple and
+aerospace engineers in the 19 50s did not believe the cause was so simple and
 argued that safety must be designed and built into aircraft just as are performance,
 stability, and structural integrity. Although a few seminars were conducted and
 papers written about this approach, the Air Force did not take it seriously until
--- a/chapter03.txt
+++ b/chapter03.txt
@ -0,0 +1,387 @@
+chapter 3.
+
+Systems Theory and Its Relationship to Safety.
+To achieve the goals set at the end of the last chapter, a new theoretical underpinning is needed for system safety. Systems theory provides that foundation. This
+chapter introduces some basic concepts in systems theory, how this theory is reflected
+in system engineering, and how all of this relates to system safety.
+section 3 1.
+An Introduction to Systems Theory.
+Systems theory dates from the 19 30s and 19 40s and was a response to limitations of
+the classic analysis techniques in coping with the increasingly complex systems starting to be built at that time . Norbert Wiener applied the approach to control
+and communications engineering , while Ludwig von Bertalanffy developed
+similar ideas for biology . Bertalanffy suggested that the emerging ideas in
+various fields could be combined into a general theory of systems.
+In the traditional scientific method, sometimes referred to as divide and conquer,
+systems are broken into distinct parts so that the parts can be examined separately.
+Physical aspects of systems are decomposed into separate physical components,
+while behavior is decomposed into discrete events over time.
+Physical aspects → Separate physical components
+Behavior → Discrete events over time
+This decomposition .(formally called analytic reduction).assumes that the separation
+is feasible. that is, each component or subsystem operates independently, and analysis results are not distorted when these components are considered separately. This
+assumption in turn implies that the components or events are not subject to feedback loops and other nonlinear interactions and that the behavior of the components is the same when examined singly as when they are playing their part in the
+whole. A third fundamental assumption is that the principles governing the assembling of the components into the whole are straightforward, that is, the interactions
+
+among the subsystems are simple enough that they can be considered separate from
+the behavior of the subsystems themselves.
+These are reasonable assumptions, it turns out, for many of the physical
+regularities of the universe. System theorists have described these systems as
+displaying organized simplicity .(figure 3 1.).. Such systems can be separated
+into non-interacting subsystems for analysis purposes. the precise nature of the
+component interactions is known and interactions can be examined pairwise. Analytic reduction has been highly effective in physics and is embodied in structural
+mechanics.
+Other types of systems display what systems theorists have labeled unorganized
+complexity.that is, they lack the underlying structure that allows reductionism to
+be effective. They can, however, often be treated as aggregates. They are complex,
+but regular and random enough in their behavior that they can be studied statistically. This study is simplified by treating them as a structureless mass with interchangeable parts and then describing them in terms of averages. The basis of this
+approach is the law of large numbers. The larger the population, the more likely that
+observed values are close to the predicted average values. In physics, this approach
+is embodied in statistical mechanics.
+
+These systems are too complex for complete analysis and too organized for statistics;
+the averages are deranged by the underlying structure . Many of the complex
+engineered systems of the post–World War 2 era, as well as biological systems and
+social systems, fit into this category. Organized complexity also represents particularly well the problems that are faced by those attempting to build complex software,
+and it explains the difficulty computer scientists have had in attempting to apply
+analysis and statistics to software.
+Systems theory was developed for this third type of system. The systems approach
+focuses on systems taken as a whole, not on the parts taken separately. It assumes
+that some properties of systems can be treated adequately only in their entirety,
+taking into account all facets relating the social to the technical aspects . These
+system properties derive from the relationships between the parts of systems. how
+the parts interact and fit together . Concentrating on the analysis and design of
+the whole as distinct from the components or parts provides a means for studying
+systems exhibiting organized complexity.
+The foundation of systems theory rests on two pairs of ideas. .(1).emergence and
+hierarchy and .(2).communication and control .
+
+section 3 2. Emergence and Hierarchy.
+A general model of complex systems can be expressed in terms of a hierarchy of
+levels of organization, each more complex than the one below, where a level is characterized by having emergent properties. Emergent properties do not exist at lower
+levels; they are meaningless in the language appropriate to those levels. The shape of
+an apple, although eventually explainable in terms of the cells of the apple, has no
+meaning at that lower level of description. The operation of the processes at the
+lower levels of the hierarchy result in a higher level of complexity.that of the whole
+apple itself.that has emergent properties, one of them being the apple’s shape .
+The concept of emergence is the idea that at a given level of complexity, some properties characteristic of that level .(emergent at that level).are irreducible.
+Hierarchy theory deals with the fundamental differences between one level of
+complexity and another. Its ultimate aim is to explain the relationships between
+different levels. what generates the levels, what separates them, and what links
+them. Emergent properties associated with a set of components at one level in a
+hierarchy are related to constraints upon the degree of freedom of those components.
+Describing the emergent properties resulting from the imposition of constraints
+requires a language at a higher level .(a metalevel).different than that describing the
+components themselves. Thus, different languages of description are appropriate at
+different levels.
+
+Reliability is a component property.1 Conclusions can be reached about the
+reliability of a valve in isolation, where reliability is defined as the probability that
+the behavior of the valve will satisfy its specification over time and under given
+conditions.
+Safety, on the other hand, is clearly an emergent property of systems. Safety can
+be determined only in the context of the whole. Determining whether a plant is
+acceptably safe is not possible, for example, by examining a single valve in the plant.
+In fact, statements about the “safety of the valve” without information about the
+context in which that valve is used are meaningless. Safety is determined by the
+relationship between the valve and the other plant components. As another example,
+pilot procedures to execute a landing might be safe in one aircraft or in one set of
+circumstances but unsafe in another.
+Although they are often confused, reliability and safety are different properties.
+The pilots may reliably execute the landing procedures on a plane or at an airport
+in which those procedures are unsafe. A gun when discharged out on a desert with
+no other humans or animals for hundreds of miles may be both safe and reliable.
+When discharged in a crowded mall, the reliability will not have changed, but the
+safety most assuredly has.
+Because safety is an emergent property, it is not possible to take a single system
+component, like a software module or a single human action, in isolation and assess
+its safety. A component that is perfectly safe in one system or in one environment
+may not be when used in another.
+The new model of accidents introduced in part 2 of this book incorporates the
+basic systems theory idea of hierarchical levels, where constraints or lack of constraints at the higher levels control or allow lower-level behavior. Safety is treated
+as an emergent property at each of these levels. Safety depends on the enforcement
+of constraints on the behavior of the components in the system, including constraints
+on their potential interactions. Safety in the batch chemical reactor in the previous
+chapter, for example, depends on the enforcement of a constraint on the relationship
+between the state of the catalyst valve and the water valve.
+
+footnote. 1. This statement is somewhat of an oversimplification, because the reliability of a system component
+can, under some conditions .(e.g., magnetic interference or excessive heat).be impacted by its environment. The basic reliability of the component, however, can be defined and measured in isolation, whereas
+the safety of an individual component is undefined except in a specific environment.
+
+
+section 3 3.
+Communication and Control.
+The second major pair of ideas in systems theory is communication and control. An
+example of regulatory or control action is the imposition of constraints upon the
+activity at one level of a hierarchy, which define the “laws of behavior” at that level.
+Those laws of behavior yield activity meaningful at a higher level. Hierarchies are
+characterized by control processes operating at the interfaces between levels .
+The link between control mechanisms studied in natural systems and those engineered in man-made systems was provided by a part of systems theory known as
+cybernetics. Checkland writes.
+Control is always associated with the imposition of constraints, and an account of a control
+process necessarily requires our taking into account at least two hierarchical levels. At a
+given level, it is often possible to describe the level by writing dynamical equations, on the
+assumption that one particle is representative of the collection and that the forces at other
+levels do not interfere. But any description of a control process entails an upper level
+imposing constraints upon the lower. The upper level is a source of an alternative .(simpler)
+description of the lower level in terms of specific functions that are emergent as a result
+of the imposition of constraints .
+Note Checkland’s statement about control always being associated with the
+imposition of constraints. Imposing safety constraints plays a fundamental role in
+the approach to safety presented in this book. The limited focus on avoiding failures,
+which is common in safety engineering today, is replaced by the larger concept of
+imposing constraints on system behavior to avoid unsafe events or conditions, that
+is, hazards.
+Control in open systems .(those that have inputs and outputs from their environment).implies the need for communication. Bertalanffy distinguished between
+closed systems, in which unchanging components settle into a state of equilibrium,
+and open systems, which can be thrown out of equilibrium by exchanges with their
+environment.
+In control theory, open systems are viewed as interrelated components that are
+kept in a state of dynamic equilibrium by feedback loops of information and control.
+The plant’s overall performance has to be controlled in order to produce the desired
+product while satisfying cost, safety, and general quality constraints.
+In order to control a process, four conditions are required .
+•Goal Condition. The controller must have a goal or goals .(for example, to
+maintain the setpoint).
+•Action Condition. The controller must be able to affect the state of the system.
+In engineering, control actions are implemented by actuators.
+•Model Condition. The controller must be .(or contain).a model of the system
+(see section 4.3).
+•Observability Condition. The controller must be able to ascertain the state of
+the system. In engineering terminology, observation of the state of the system
+is provided by sensors.
+
+
+Figure 3 2. shows a typical control loop. The plant controller obtains information
+about .(observes).the process state from measured variables .(feedback).and uses this
+information to initiate action by manipulating controlled variables to keep the
+process operating within predefined limits or set points .(the goal).despite disturbances to the process. In general, the maintenance of any open-system hierarchy
+(either biological or man-made).will require a set of processes in which there is
+communication of information for regulation or control .
+Control actions will generally lag in their effects on the process because of delays
+in signal propagation around the control loop. an actuator may not respond immediately to an external command signal .(called dead time); the process may have
+delays in responding to manipulated variables .(time constants); and the sensors
+may obtain values only at certain sampling intervals .(feedback delays). Time lags
+restrict the speed and extent with which the effects of disturbances, both within the
+process itself and externally derived, can be reduced. They also impose extra requirements on the controller, for example, the need to infer delays that are not directly
+observable.
+The model condition plays an important role in accidents and safety. In order to
+create effective control actions, the controller must know the current state of the
+controlled process and be able to estimate the effect of various control actions on
+that state. As discussed further in section 4.3, many accidents have been caused by
+the controller incorrectly assuming the controlled system was in a particular state
+and imposing a control action .(or not providing one).that led to a loss. the Mars
+Polar Lander descent engine controller, for example, assumed that the spacecraft
+
+
+was on the surface of the planet and shut down the descent engines. The captain
+of the Herald of Free Enterprise thought the car deck doors were shut and left
+the mooring.
+
+
+section 3 4.
+Using Systems Theory to Understand Accidents.
+Safety approaches based on systems theory consider accidents as arising from the
+interactions among system components and usually do not specify single causal
+variables or factors . Whereas industrial .(occupational).safety models and
+event chain models focus on unsafe acts or conditions, classic system safety models
+instead look at what went wrong with the system’s operation or organization to
+allow the accident to take place.
+This systems approach treats safety as an emergent property that arises when
+the system components interact within an environment. Emergent properties like
+safety are controlled or enforced by a set of constraints .(control laws).related to
+the behavior of the system components. For example, the spacecraft descent engines
+must remain on until the spacecraft reaches the surface of the planet and the car
+deck doors on the ferry must be closed before leaving port. Accidents result from
+interactions among components that violate these constraints.in other words,
+from a lack of appropriate constraints on the interactions. Component interaction
+accidents, as well as component failure accidents, can be explained using these
+concepts.
+Safety then can be viewed as a control problem. Accidents occur when component failures, external disturbances, and/or dysfunctional interactions among system
+components are not adequately controlled. In the space shuttle Challenger loss, the
+O-rings did not adequately control propellant gas release by sealing a tiny gap in
+the field joint. In the Mars Polar Lander loss, the software did not adequately control
+the descent speed of the spacecraft.it misinterpreted noise from a Hall effect
+sensor .(feedback of a measured variable).as an indication the spacecraft had reached
+the surface of the planet. Accidents such as these, involving engineering design
+errors, may in turn stem from inadequate control over the development process. A
+Milstar satellite was lost when a typo in the software load tape was not detected
+during the development and testing. Control is also imposed by the management
+functions in an organization.the Challenger and Columbia losses, for example,
+involved inadequate controls in the launch-decision process.
+While events reflect the effects of dysfunctional interactions and inadequate
+enforcement of safety constraints, the inadequate control itself is only indirectly
+reflected by the events.the events are the result of the inadequate control. The
+control structure itself must be examined to determine why it was inadequate to
+maintain the constraints on safe behavior and why the events occurred.
+
+As an example, the unsafe behavior .(hazard).in the Challenger loss was the
+release of hot propellant gases from the field joint. The miscreant O-ring was used
+to control the hazard.that is, its role was to seal a tiny gap in the field joint created
+by pressure at ignition. The loss occurred because the system design, including the
+O-ring, did not effectively impose the required constraint on the propellant gas
+release. Starting from here, there are then several questions that need to be answered
+to understand why the accident occurred and to obtain the information necessary
+to prevent future accidents. Why was this particular design unsuccessful in imposing
+the constraint, why was it chosen .(what was the decision process), why was the
+flaw not found during development, and was there a different design that might
+have been more successful? These questions and others consider the original
+design process.
+Understanding the accident also requires examining the contribution of the
+operations process. Why were management decisions made to launch despite warnings that it might not be safe to do so? One constraint that was violated during
+operations was the requirement to correctly handle feedback about any potential
+violation of the safety design constraints, in this case, feedback during operations
+that the control by the O-rings of the release of hot propellant gases from the field
+joints was not being adequately enforced by the design. There were several instances
+of feedback that was not adequately handled, such as data about O-ring blowby and
+erosion during previous shuttle launches and feedback by engineers who were concerned about the behavior of the O-rings in cold weather. Although the lack of
+redundancy provided by the second O-ring was known long before the loss of Challenger, that information was never incorporated into the NASA Marshall Space
+Flight Center database and was unknown by those making the launch decision.
+In addition, there was missing feedback about changes in the design and testing
+procedures during operations, such as the use of a new type of putty and the introduction of new O-ring leak checks without adequate verification that they satisfied
+system safety constraints on the field joints. As a final example, the control processes
+that ensured unresolved safety concerns were fully considered before each flight,
+that is, the flight readiness reviews and other feedback channels to project management making flight decisions, were flawed.
+Systems theory provides a much better foundation for safety engineering than
+the classic analytic reduction approach underlying event-based models of accidents.
+It provides a way forward to much more powerful and effective safety and risk
+analysis and management procedures that handle the inadequacies and needed
+extensions to current practice described in chapter 2.
+Combining a systems-theoretic approach to safety with system engineering
+processes will allow designing safety into the system as it is being developed or
+reengineered. System engineering provides an appropriate vehicle for this process
+
+because it rests on the same systems theory foundation and involves engineering
+the system as a whole.
+section 3 5.
+Systems Engineering and Safety.
+The emerging theory of systems, along with many of the historical forces noted in
+chapter 1, gave rise after World War 2 to a new emphasis in engineering, eventually
+called systems engineering. During and after the war, technology expanded rapidly
+and engineers were faced with designing and building more complex systems than
+had been attempted previously. Much of the impetus for the creation of this new
+discipline came from military programs in the 19 50s and 19 60s, particularly intercontinental ballistic missile .(ICBM).systems. Apollo was the first nonmilitary government program in which systems engineering was recognized from the beginning
+as an essential function .
+System Safety, as defined in MIL-STD-882, is a subdiscipline of system engineering. It was created at the same time and for the same reasons. The defense community tried using the standard safety engineering techniques on their complex
+new systems, but the limitations became clear when interface and component interaction problems went unnoticed until it was too late, resulting in many losses and
+near misses. When these early aerospace accidents were investigated, the causes of
+a large percentage of them were traced to deficiencies in design, operations, and
+management. Clearly, big changes were needed. System engineering along with its
+subdiscipline, System Safety, were developed to tackle these problems.
+Systems theory provides the theoretical foundation for systems engineering,
+which views each system as an integrated whole even though it is composed of
+diverse, specialized components. The objective is to integrate the subsystems into
+the most effective system possible to achieve the overall objectives, given a prioritized set of design criteria. Optimizing the system design often requires making
+tradeoffs between these design criteria .(goals).
+The development of systems engineering as a discipline enabled the solution of
+enormously more complex and difficult technological problems than previously
+. Many of the elements of systems engineering can be viewed merely as good
+engineering. It represents more a shift in emphasis than a change in content. In
+addition, while much of engineering is based on technology and science, systems
+engineering is equally concerned with overall management of the engineering
+process.
+A systems engineering approach to safety starts with the basic assumption that
+some properties of systems, in this case safety, can only be treated adequately in the
+context of the social and technical system as a whole. A basic assumption of systems
+engineering is that optimization of individual components or subsystems will not in
+
+
+general lead to a system optimum; in fact, improvement of a particular subsystem
+may actually worsen the overall system performance because of complex, nonlinear
+interactions among the components. When each aircraft tries to optimize its path
+from its departure point to its destination, for example, the overall air transportation
+system throughput may not be optimized when they all arrive at a popular hub at
+the same time. One goal of the air traffic control system is to optimize the overall
+air transportation system throughput while, at the same time, trying to allow as much
+flexibility for the individual aircraft and airlines to achieve their goals. In the end,
+if system engineering is successful, everyone gains. Similarly, each pharmaceutical
+company acting to optimize its profits, which is a legitimate and reasonable company
+goal, will not necessarily optimize the larger societal system goal of producing safe
+and effective pharmaceutical and biological products to enhance public health.
+These system engineering principles are applicable even to systems beyond those
+traditionally thought of as in the engineering realm. The financial system and its
+meltdown starting in 2007 is an example of a social system that could benefit from
+system engineering concepts.
+Another assumption of system engineering is that individual component behavior .(including events or actions).cannot be understood without considering the
+components’ role and interaction within the system as a whole. This basis for systems
+engineering has been stated as the principle that a system is more than the sum of
+its parts. Attempts to improve long-term safety in complex systems by analyzing and
+changing individual components have often proven to be unsuccessful over the long
+term. For example, Rasmussen notes that over many years of working in the field
+of nuclear power plant safety, he found that attempts to improve safety from models
+of local features were compensated for by people adapting to the change in an
+unpredicted way .
+Approaches used to enhance safety in complex systems must take these basic
+systems engineering principles into account. Otherwise, our safety engineering
+approaches will be limited in the types of accidents and systems they can handle.
+At the same time, approaches that include them, such as those described in this
+book, have the potential to greatly improve our ability to engineer safer and more
+complex systems.
+section 3 6.
+Building Safety into the System Design.
+System Safety, as practiced by the U.S. defense and aerospace communities as well
+as the new approach outlined in this book, fit naturally within the general systems
+engineering process and the problem-solving approach that a system view provides.
+This problem-solving process entails several steps. First, a need or problem is specified in terms of objectives that the system must satisfy along with criteria that can
+
+be used to rank alternative designs. For a system that has potential hazards, the
+objectives will include safety objectives and criteria along with high-level requirements and safety design constraints. The hazards for an automated train system, for
+example, might include the train doors closing while a passenger is in the doorway.
+The safety-related design constraint might be that obstructions in the path of a
+closing door must be detected and the door closing motion reversed.
+After the high-level requirements and constraints on the system design are identified, a process of system synthesis takes place that results in a set of alternative
+designs. Each of these alternatives is analyzed and evaluated in terms of the stated
+objectives and design criteria, and one alternative is selected to be implemented. In
+practice, the process is highly iterative. The results from later stages are fed back to
+early stages to modify objectives, criteria, design alternatives, and so on. Of course,
+the process described here is highly simplified and idealized.
+The following are some examples of basic systems engineering activities and the
+role of safety within them.
+•Needs analysis. The starting point of any system design project is a perceived
+need. This need must first be established with enough confidence to justify the
+commitment of resources to satisfy it and understood well enough to allow
+appropriate solutions to be generated. Criteria must be established to provide
+a means to evaluate both the evolving and final system. If there are hazards
+associated with the operation of the system, safety should be included in the
+needs analysis.
+•Feasibility studies. The goal of this step in the design process is to generate a
+set of realistic designs. This goal is accomplished by identifying the principal
+constraints and design criteria.including safety constraints and safety design
+criteria.for the specific problem being addressed and then generating plausible solutions to the problem that satisfy the requirements and constraints and
+are physically and economically feasible.
+•Trade studies. In trade studies, the alternative feasible designs are evaluated
+with respect to the identified design criteria. A hazard might be controlled by
+any one of several safeguards. A trade study would determine the relative
+desirability of each safeguard with respect to effectiveness, cost, weight, size,
+safety, and any other relevant criteria. For example, substitution of one material
+for another may reduce the risk of fire or explosion, but may also reduce reliability or efficiency. Each alternative design may have its own set of safety
+constraints .(derived from the system hazards).as well as other performance
+goals and constraints that need to be assessed. Although decisions ideally should
+be based upon mathematical analysis, quantification of many of the key factors
+is often difficult, if not impossible, and subjective judgment often has to be used.
+
+
+•System architecture development and analysis. In this step, the system engineers break down the system into a set of subsystems, together with the functions and constraints, including safety constraints, imposed upon the individual
+subsystem designs, the major system interfaces, and the subsystem interface
+topology. These aspects are analyzed with respect to desired system performance characteristics and constraints .(again including safety constraints).and
+the process is iterated until an acceptable system design results. The preliminary
+design at the end of this process must be described in sufficient detail that
+subsystem implementation can proceed independently.
+•Interface analysis. The interfaces define the functional boundaries of the
+system components. From a management standpoint, interfaces must .(1).optimize visibility and control and .(2).isolate components that can be implemented
+independently and for which authority and responsibility can be delegated
+. From an engineering standpoint, interfaces must be designed to separate
+independent functions and to facilitate the integration, testing, and operation
+of the overall system. One important factor in designing the interfaces is safety,
+and safety analysis should be a part of the system interface analysis. Because
+interfaces tend to be particularly susceptible to design error and are implicated
+in the majority of accidents, a paramount goal of interface design is simplicity.
+Simplicity aids in ensuring that the interface can be adequately designed, analyzed, and tested prior to integration and that interface responsibilities can be
+clearly understood.
+Any specific realization of this general systems engineering process depends on
+the engineering models used for the system components and the desired system
+qualities. For safety, the models commonly used to understand why and how accidents occur have been based on events, particularly failure events, and the use of
+reliability engineering techniques to prevent them. Part 2 of this book further
+details the alternative systems approach to safety introduced in this chapter, while
+part 3 provides techniques to perform many of these safety and system engineering
+activities.
--- a/chapter04.txt
+++ b/chapter04.txt
@ -0,0 +1,890 @@
+PART 2.
+
+
+STAMP. AN ACCIDENT MODEL BASED ON
+SYSTEMS THEORY.
+Part 2 introduces an expanded accident causality model based on the new assumptions in chapter 2 and satisfying the goals stemming from them. The theoretical
+foundation for the new model is systems theory, as introduced in chapter 3. Using
+this new causality model, called STAMP .(Systems-Theoretic Accident Model and
+Processes), changes the emphasis in system safety from preventing failures to enforcing behavioral safety constraints. Component failure accidents are still included, but
+our conception of causality is extended to include component interaction accidents.
+Safety is reformulated as a control problem rather than a reliability problem. This
+change leads to much more powerful and effective ways to engineer safer systems,
+including the complex sociotechnical systems of most concern today.
+The three main concepts in this model.safety constraints, hierarchical control
+structures, and process models.are introduced first in chapter 4. Then the STAMP
+causality model is described, along with a classification of accident causes implied
+by the new model.
+To provide additional understanding of STAMP, it is used to describe the causes
+of several very different types of losses.a friendly fire shootdown of a U.S. Army
+helicopter by a U.S. Air Force fighter jet over northern Iraq, the contamination of
+a public water system with E. coli bacteria in a small town in Canada, and the loss
+of a Milstar satellite. Chapter 5 presents the friendly fire accident analysis. The other
+accident analyses are contained in appendixes B and C.
+
+
+chapter 4.
+A Systems-Theoretic View of Causality.
+
+In the traditional causality models, accidents are considered to be caused by chains
+of failure events, each failure directly causing the next one in the chain. Part I
+explained why these simple models are no longer adequate for the more complex
+sociotechnical systems we are attempting to build today. The definition of accident
+causation needs to be expanded beyond failure events so that it includes component
+interaction accidents and indirect or systemic causal mechanisms.
+The first step is to generalize the definition of an accident.1 An accident is an
+unplanned and undesired loss event. That loss may involve human death and injury,
+but it may also involve other major losses, including mission, equipment, financial,
+and information losses.
+Losses result from component failures, disturbances external to the system, interactions among system components, and behavior of individual system components
+that lead to hazardous system states. Examples of hazards include the release of
+toxic chemicals from an oil refinery, a patient receiving a lethal dose of medicine,
+two aircraft violating minimum separation requirements, and commuter train doors
+opening between stations.
+In systems theory, emergent properties, such as safety, arise from the interactions
+among the system components. The emergent properties are controlled by imposing
+constraints on the behavior of and interactions among the components. Safety then
+becomes a control problem where the goal of the control is to enforce the safety
+constraints. Accidents result from inadequate control or enforcement of safetyrelated constraints on the development, design, and operation of the system.
+At Bhopal, the safety constraint that was violated was that the MIC must not
+come in contact with water. In the Mars Polar Lander, the safety constraint was that
+the spacecraft must not impact the planet surface with more than a maximum force.
+
+
+In the batch chemical reactor accident described in chapter 2, one safety constraint
+is a limitation on the temperature of the contents of the reactor.
+The problem then becomes one of control where the goal is to control the behavior of the system by enforcing the safety constraints in its design and operation.
+Controls must be established to accomplish this goal. These controls need not necessarily involve a human or automated controller. Component behavior .(including
+failures). and unsafe interactions may be controlled through physical design, through
+process .(such as manufacturing processes and procedures, maintenance processes,
+and operations), or through social controls. Social controls include organizational
+(management), governmental, and regulatory structures, but they may also be cultural, policy, or individual .(such as self-interest). As an example of the latter, one
+explanation that has been given for the 2 thousand 9 financial crisis is that when investment
+banks went public, individual controls to reduce personal risk and long-term profits
+were eliminated and risk shifted to shareholders and others who had few and weak
+controls over those taking the risks.
+In this framework, understanding why an accident occurred requires determining
+why the control was ineffective. Preventing future accidents requires shifting from
+a focus on preventing failures to the broader goal of designing and implementing
+controls that will enforce the necessary constraints.
+The STAMP .(System-Theoretic Accident Model and Processes). accident model
+is based on these principles. Three basic constructs underlie STAMP. safety constraints, hierarchical safety control structures, and process models.
+
+
+section 4 1.
+Safety Constraints.
+The most basic concept in STAMP is not an event, but a constraint. Events leading
+to losses occur only because safety constraints were not successfully enforced.
+The difficulty in identifying and enforcing safety constraints in design and operations has increased from the past. In many of our older and less automated systems,
+physical and operational constraints were often imposed by the limitations of technology and of the operational environments. Physical laws and the limits of our
+materials imposed natural constraints on the complexity of physical designs and
+allowed the use of passive controls.
+In engineering, passive controls are those that maintain safety by their presence.
+basically, the system fails into a safe state or simple interlocks are used to limit
+the interactions among system components to safe ones. Some examples of passive
+controls that maintain safety by their presence are shields or barriers such as
+containment vessels, safety harnesses, hardhats, passive restraint systems in vehicles,
+and fences. Passive controls may also rely on physical principles, such as gravity,
+to fail into a safe state. An example is an old railway semaphore that used weights
+
+
+to ensure that if the cable .(controlling the semaphore). broke, the arm would automatically drop into the stop position. Other examples include mechanical relays
+designed to fail with their contacts open, and retractable landing gear for aircraft in
+which the wheels drop and lock in the landing position if the pressure system that
+raises and lowers them fails. For the batch chemical reactor example in chapter 2,
+where the order valves are opened is crucial, designers might have used a physical
+interlock that did not allow the catalyst valve to be opened while the water valve
+was closed.
+In contrast, active controls require some action(s). to provide protection. .(1). detection of a hazardous event or condition .(monitoring), .(2). measurement of some
+variable(s), .(3). interpretation of the measurement .(diagnosis), and .(4). response
+(recovery or fail-safe procedures), all of which must be completed before a loss
+occurs. These actions are usually implemented by a control system, which now commonly includes a computer.
+Consider the simple passive safety control where the circuit for a high-power
+outlet is run through a door that shields the power outlet. When the door is opened,
+the circuit is broken and the power disabled. When the door is closed and the power
+enabled, humans cannot touch the high power outlet. Such a design is simple and
+foolproof. An active safety control design for the same high power source, requires
+some type of sensor to detect when the access door to the power outlet is opened
+and an active controller to issue a control command to cut the power. The failure
+modes for the active control system are greatly increased over the passive design,
+as is the complexity of the system component interactions. In the railway semaphore
+example, there must be a way to detect that the cable has broken .(probably now a
+digital system is used instead of a cable so the failure of the digital signaling system
+must be detected). and some type of active controls used to warn operators to stop
+the train. The design of the batch chemical reactor described in chapter 2 used a
+computer to control the valve opening and closing order instead of a simple mechanical interlock.
+While simple examples are used here for practical reasons, the complexity of our
+designs is reaching and exceeding the limits of our intellectual manageability with
+a resulting increase in component interaction accidents and lack of enforcement of
+the system safety constraints. Even the relatively simple computer-based batch
+chemical reactor valve control design resulted in a component interaction accident.
+There are often very good reasons to use active controls instead of passive ones,
+including increased functionality, more flexibility in design, ability to operate over
+large distances, weight reduction, and so on. But the difficulty of the engineering
+problem is increased and more potential for design error is introduced.
+A similar argument can be made for the interactions between operators and
+the processes they control. Cook  suggests that when controls were primarily
+
+
+mechanical and were operated by people located close to the operating process,
+proximity allowed sensory perception of the status of the process via direct physical
+feedback such as vibration, sound, and temperature .(figure 4.1). Displays were
+directly linked to the process and were essentially a physical extension of it. For
+example, the flicker of a gauge needle in the cab of a train indicated that .(1). the
+engine valves were opening and closing in response to slight pressure fluctuations,
+(2). the gauge was connected to the engine, .(3). the pointing indicator was free, and
+so on. In this way, the displays provided a rich source of information about the
+controlled process and the state of the displays themselves.
+The introduction of electromechanical controls allowed operators to control
+processes from a greater distance .(both physical and conceptual). than possible with
+pure mechanically linked controls .(figure 4.2). That distance, however, meant that
+operators lost a lot of direct information about the process.they could no longer
+sense the process state directly and the control and display surfaces no longer provided as rich a source of information about the process or the state of the controls
+themselves. The system designers had to synthesize and provide an image of the
+process state to the operators. An important new source of design errors was introduced by the need for the designers to determine beforehand what information the
+operator would need under all conditions to safely control the process. If the designers had not anticipated a particular situation could occur and provided for it in the
+original system design, they might also not anticipate the need of the operators for
+information about it during operations.
+
+
+Designers also had to provide feedback on the actions of the operators and on
+any failures that might have occurred. The controls could now be operated without
+the desired effect on the process, and the operators might not know about it. Accidents started to occur due to incorrect feedback. For example, major accidents
+(including Three Mile Island). have involved the operators commanding a valve to
+open and receiving feedback that the valve had opened, when in reality it had not.
+In this case and others, the valves were wired to provide feedback indicating that
+power had been applied to the valve, but not that the valve had actually opened.
+Not only could the design of the feedback about success and failures of control
+actions be misleading in these systems, but the return links were also subject
+to failure.
+Electromechanical controls relaxed constraints on the system design allowing
+greater functionality .(figure 4.3). At the same time, they created new possibilities
+for designer and operator error that had not existed or were much less likely in
+mechanically controlled systems. The later introduction of computer and digital
+controls afforded additional advantages and removed even more constraints on the
+control system design.and introduced more possibility for error. Proximity in our
+old mechanical systems provided rich sources of feedback that involved almost all
+of the senses, enabling early detection of potential problems. We are finding it hard
+to capture and provide these same qualities in new systems that use automated
+controls and displays.
+It is the freedom from constraints that makes the design of such systems so difficult. Physical constraints enforced discipline and limited complexity in system
+design, construction, and modification. The physical constraints also shaped system
+design in ways that efficiently transmitted valuable physical component and process
+information to operators and supported their cognitive processes.
+The same argument applies to the increasing complexity in organizational and
+social controls and in the interactions among the components of sociotechnical
+systems. Some engineering projects today employ thousands of engineers. The Joint
+
+
+Strike Fighter, for example, has eight thousand engineers spread over most of the
+United States. Corporate operations have become global, with greatly increased
+interdependencies and producing a large variety of products. A new holistic approach
+to safety, based on control and enforcing safety constraints in the entire sociotechnical system, is needed to ensure safety.
+To accomplish this goal, system-level constraints must be identified, and responsibility for enforcing them must be divided up and allocated to appropriate groups.
+For example, the members of one group might be responsible for performing hazard
+analyses. The manager of this group might be assigned responsibility for ensuring
+that the group has the resources, skills, and authority to perform such analyses and
+for ensuring that high-quality analyses result. Higher levels of management might
+have responsibility for budgets, for establishing corporate safety policies, and for
+providing oversight to ensure that safety policies and activities are being carried out
+successfully and that the information provided by the hazard analyses is used in
+design and operations.
+During system and product design and development, the safety constraints will
+be broken down and sub-requirements or constraints allocated to the components
+of the design as it evolves. In the batch chemical reactor, for example, the system
+safety requirement is that the temperature in the reactor must always remain below
+a particular level. A design decision may be made to control this temperature using
+a reflux condenser. This decision leads to a new constraint. “Water must be flowing
+into the reflux condenser whenever catalyst is added to the reactor.” After a decision
+is made about what component(s). will be responsible for operating the catalyst and
+water valves, additional requirements will be generated. If, for example, a decision
+is made to use software rather than .(or in addition to). a physical interlock, the
+software must be assigned the responsibility for enforcing the constraint. “The
+water valve must always be open when the catalyst valve is open.”
+In order to provide the level of safety demanded by society today, we first need
+to identify the safety constraints to enforce and then to design effective controls to
+enforce them. This process is much more difficult for today’s complex and often
+high-tech systems than in the past and new techniques, such as those described in
+part THREE, are going to be required to solve it, for example, methods to assist in generating the component safety constraints from the system safety constraints.
+The alternative.building only the simple electromechanical systems of the past or
+living with higher levels of risk.is for the most part not going to be considered an
+acceptable solution.
+section 4 2.
+The Hierarchical Safety Control Structure.
+In systems theory .(see section 3 3.), systems are viewed as hierarchical structures,
+where each level imposes constraints on the activity of the level beneath it.that is,
+
+
+constraints or lack of constraints at a higher level allow or control lower-level
+behavior.
+Control processes operate between levels to control the processes at lower levels
+in the hierarchy. These control processes enforce the safety constraints for which
+the control process is responsible. Accidents occur when these processes provide
+inadequate control and the safety constraints are violated in the behavior of the
+lower-level components.
+By describing accidents in terms of a hierarchy of control based on adaptive
+feedback mechanisms, adaptation plays a central role in the understanding and
+prevention of accidents.
+At each level of the hierarchical structure, inadequate control may result from
+missing constraints .(unassigned responsibility for safety), inadequate safety control
+commands, commands that were not executed correctly at a lower level, or inadequately communicated or processed feedback about constraint enforcement. For
+example, an operations manager may provide unsafe work instructions or procedures to the operators, or the manager may provide instructions that enforce the
+safety constraints, but the operators may ignore them. The operations manager may
+not have the feedback channels established to determine that unsafe instructions
+were provided or that his or her safety-related instructions are not being followed.
+Figure 4.4 shows a typical sociotechnical hierarchical safety control structure
+common in a regulated, safety-critical industry in the United States, such as air
+transportation. Each system, of course, must be modeled to include its specific
+features. Figure 4.4 has two basic hierarchical control structures.one for system
+development .(on the left). and one for system operation .(on the right).with interactions between them. An aircraft manufacturer, for example, might have only
+system development under its immediate control, but safety involves both development and operational use of the aircraft, and neither can be accomplished successfully in isolation. Safety during operation depends partly on the original design and
+development and partly on effective control over operations. Communication channels may be needed between the two structures.3 For example, aircraft manufacturers must communicate to their customers the assumptions about the operational
+environment upon which the safety analysis was based, as well as information about
+safe operating procedures. The operational environment .(e.g., the commercial airline
+industry), in turn, provides feedback to the manufacturer about the performance of
+the system over its lifetime.
+Between the hierarchical levels of each safety control structure, effective communication channels are needed, both a downward reference channel providing the
+
+
+information necessary to impose safety constraints on the level below and an upward
+measuring channel to provide feedback about how effectively the constraints are
+being satisfied .(figure 4.5). Feedback is critical in any open system in order to
+provide adaptive control. The controller uses the feedback to adapt future control
+commands to more readily achieve its goals.
+Government, general industry groups, and the court system are the top two
+levels of each of the generic control structures shown in figure 4.4. The government
+control structure in place to control development may differ from that controlling
+operations.responsibility for certifying the aircraft developed by aircraft manufacturers is assigned to one group at the FAA, while responsibility for supervising
+airline operations is assigned to a different group. The appropriate constraints in
+each control structure and at each level will vary but in general may include technical design and process constraints, management constraints, manufacturing constraints, and operational constraints.
+At the highest level in both the system development and system operation hierarchies are Congress and state legislatures.4 Congress controls safety by passing laws
+and by establishing and funding government regulatory structures. Feedback as to
+the success of these controls or the need for additional ones comes in the form of
+government reports, congressional hearings and testimony, lobbying by various
+interest groups, and, of course, accidents.
+The next level contains government regulatory agencies, industry associations,
+user associations, insurance companies, and the court system. Unions have always
+played an important role in ensuring safe operations, such as the air traffic controllers union in the air transportation system, or in ensuring worker safety in
+
+manufacturing. The legal system tends to be used when there is no regulatory
+authority and the public has no other means to encourage a desired level of concern
+for safety in company management. The constraints generated at this level and
+imposed on companies are usually in the form of policy, regulations, certification,
+standards .(by trade or user associations), or threat of litigation. Where there is a
+union, safety-related constraints on operations or manufacturing may result from
+union demands and collective bargaining.
+Company management takes the standards, regulations, and other general controls on its behavior and translates them into specific policy and standards for the
+company. Many companies have a general safety policy .(it is required by law in
+Great Britain). as well as more detailed standards documents. Feedback may come
+in the form of status reports, risk assessments, and incident reports.
+In the development control structure .(shown on the left of figure 4.4), company
+policies and standards are usually tailored and perhaps augmented by each engineering project to fit the needs of the particular project. The higher-level control
+process may provide only general goals and constraints and the lower levels may
+then add many details to operationalize the general goals and constraints given the
+immediate conditions and local goals. For example, while government or company
+standards may require a hazard analysis be performed, the system designers and
+documenters .(including those designing the operational procedures and writing user
+manuals). may have control over the actual hazard analysis process used to identify
+specific safety constraints on the design and operation of the system. These detailed
+procedures may need to be approved by the level above.
+The design constraints identified as necessary to control system hazards are
+passed to the implementers and assurers of the individual system components
+along with standards and other requirements. Success is determined through feedback provided by test reports, reviews, and various additional hazard analyses. At
+the end of the development process, the results of the hazard analyses as well
+as documentation of the safety-related design features and design rationale should
+be passed on to the maintenance group to be used in the system evolution and
+sustainment process.
+A similar process involving layers of control is found in the system operation
+control structure. In addition, there will be .(or at least should be). interactions
+between the two structures. For example, the safety design constraints used during
+development should form the basis for operating procedures and for performance
+and process auditing.
+As in any control loop, time lags may affect the flow of control actions and feedback and may impact the effectiveness of the control loop in enforcing the safety
+constraints. For example, standards can take years to develop or change.a time
+scale that may keep them behind current technology and practice. At the physical
+
+
+level, new technology may be introduced in different parts of the system at different
+rates, which may result in asynchronous evolution of the control structure. In the
+accidental shootdown of two U.S. Army Black Hawk helicopters by two U.S. Air
+Force F-15s in the no-fly zone over northern Iraq in 1994, for example, the fighter
+jet aircraft and the helicopters were inhibited in communicating by radio because
+the F-15 pilots used newer jam-resistant radios that could not communicate with
+the older-technology Army helicopter radios. Hazard analysis needs to include the
+influence of these time lags and potential changes over time.
+A common way to deal with time lags leading to delays is to delegate responsibility to lower levels that are not subject to as great a delay in obtaining information
+or feedback from the measuring channels. In periods of quickly changing technology,
+time lags may make it necessary for the lower levels to augment the control processes passed down from above or to modify them to fit the current situation. Time
+lags at the lowest levels, as in the Black Hawk shootdown example, may require the
+use of feedforward control to overcome lack of feedback or may require temporary
+controls on behavior. Communication between the F-15s and the Black Hawks
+would have been possible if the F-15 pilots had been told to use an older radio
+technology available to them, as they were commanded to do for other types of
+friendly aircraft.
+More generally, control structures always change over time, particularly those
+that include humans and organizational components. Physical devices also change
+with time, but usually much slower and in more predictable ways. If we are to handle
+social and human aspects of safety, then our accident causality models must include
+the concept of change. In addition, controls and assurance that the safety control
+structure remains effective in enforcing the constraints over time are required.
+Control does not necessarily imply rigidity and authoritarian management
+styles. Rasmussen notes that control at each level may be enforced in a very prescriptive command and control structure or it may be loosely implemented as performance objectives with many degrees of freedom in how the objectives are met
+. Recent trends from management by oversight to management by insight
+reflect differing levels of feedback control that are exerted over the lower levels and
+a change from prescriptive management control to management by objectives,
+where the objectives are interpreted and satisfied according to the local context.
+Management insight, however, does not mean abdication of safety-related responsibility. In a Milstar satellite loss  and
+Mars Polar Lander  losses, the accident reports all note that a poor transition from oversight to insight was a factor in the losses. Attempts to delegate decisions and to manage by objectives require an explicit formulation of the value
+criteria to be used and an effective means for communicating the values down
+through society and organizations. In addition, the impact of specific decisions at
+
+
+
+each level on the objectives and values passed down need to be adequately and
+formally evaluated. Feedback is required to measure how successfully the functions
+are being performed.
+Although regulatory agencies are included in the figure 4.4 example, there is no
+implication that government regulation is required for safety. The only requirement
+is that responsibility for safety is distributed in an appropriate way throughout
+the sociotechnical system. In aircraft safety, for example, manufacturers play the
+major role while the FAA type certification authority simply provides oversight that
+safety is being successfully engineered into aircraft at the lower levels of the hierarchy. If companies or industries are unwilling or incapable of performing their
+public safety responsibilities, then government has to step in to achieve the overall
+public safety goals. But a much better solution is for company management to take
+responsibility, as it has direct control over the system design and manufacturing and
+over operations.
+The safety-control structure will differ among industries and examples are spread
+among the following chapters. Figure C.1 in appendix C shows the control structure
+and safety constraints for the hierarchical water safety control system in Ontario,
+Canada. The structure is drawn on its side .(as is more common for control diagrams)
+so that the top of the hierarchy is on the left side of the figure. The system hazard
+is exposure of the public to E. coli or other health-related contaminants through the
+public drinking water system; therefore, the goal of the safety control structure is to
+prevent such exposure. This goal leads to two system safety constraints.
+1. Water quality must not be compromised.
+2. Public health measures must reduce the risk of exposure if water quality is
+somehow compromised .(such as notification and procedures to follow).
+The physical processes being controlled by this control structure .(shown at the
+right of the figure). are the water system, the wells used by the local public utilities,
+and public health. Details of the control structure are discussed in appendix C, but
+appropriate responsibility, authority, and accountability must be assigned to each
+component with respect to the role it plays in the overall control structure. For
+example, the responsibility of the Canadian federal government is to establish a
+nationwide public health system and ensure that it is operating effectively. The
+provincial government must establish regulatory bodies and codes, provide resources
+to the regulatory bodies, provide oversight and feedback loops to ensure that the
+regulators are doing their job adequately, and ensure that adequate risk assessment
+is conducted and effective risk management plans are in place. Local public utility
+operations must apply adequate doses of chlorine to kill bacteria, measure the
+chlorine residuals, and take further steps if evidence of bacterial contamination is
+
+
+found. While chlorine residuals are a quick way to get feedback about possible
+contamination, more accurate feedback is provided by analyzing water samples but
+takes longer .(it has a greater time lag). Both have their uses in the overall safety
+control structure of the public water supply.
+Safety control structures may be very complex. Abstracting and concentrating on
+parts of the overall structure may be useful in understanding and communicating
+about the controls. In examining different hazards, only subsets of the overall structure may be relevant and need to be considered in detail and the rest can be treated
+as the inputs to or the environment of the substructure. The only critical part is that
+the hazards must first be identified at the system level and the process must then
+proceed top-down and not bottom-up to identify the safety constraints for the parts
+of the overall control structure.
+The operation of sociotechnical safety control structures at all levels is facing the
+stresses noted in chapter 1, such as rapidly changing technology, competitive and
+time-to-market pressures, and changing public and regulatory views of responsibility
+for safety. These pressures can lead to a need for new procedures or new controls
+to ensure that required safety constraints are not ignored.
+
+section 4 3.
+Process Models.
+
+The third concept used in STAMP, along with safety constraints and hierarchical
+safety control structures, is process models. Process models are an important part of
+control theory. The four conditions required to control a process are described in
+chapter 3. The first is a goal, which in STAMP is the safety constraints that must
+be enforced by each controller in the hierarchical safety control structure. The
+action condition is implemented in the .(downward). control channels and the observability condition is embodied in the .(upward). feedback or measuring channels. The
+final condition is the model condition. Any controller.human or automated.
+needs a model of the process being controlled to control it effectively .(figure 4.6).
+At one extreme, this process model may contain only one or two variables, such
+as the model required for a simple thermostat, which contains the current temperature and the setpoint and perhaps a few control laws about how temperature is
+changed. At the other extreme, effective control may require a very complex model
+with a large number of state variables and transitions, such as the model needed to
+control air traffic.
+Whether the model is embedded in the control logic of an automated controller
+or in the mental model maintained by a human controller, it must contain the same
+type of information. the required relationship among the system variables .(the
+control laws), the current state .(the current values of the system variables), and the
+ways the process can change state. This model is used to determine what control
+
+
+actions are needed, and it is updated through various forms of feedback. If the model
+of the room temperature shows that the ambient temperature is less than the setpoint, then the thermostat issues a control command to start a heating element.
+Temperature sensors provide feedback about the .(hopefully rising). temperature.
+This feedback is used to update the thermostat’s model of the current room temperature. When the setpoint is reached, the thermostat turns off the heating element.
+In the same way, human operators also require accurate process or mental models
+to provide safe control actions.
+Component interaction accidents can usually be explained in terms of incorrect
+process models. For example, the Mars Polar Lander software thought the spacecraft
+had landed and issued a control instruction to shut down the descent engines. The
+captain of the Herald of Free Enterprise thought the ferry doors were closed and
+ordered the ship to leave the mooring. The pilots in the Cali Colombia B757 crash
+thought R was the symbol denoting the radio beacon near Cali.
+In general, accidents often occur, particularly component interaction accidents
+and accidents involving complex digital technology or human error, when the
+process model used by the controller .(automated or human). does not match the
+process and, as a result.
+1. Incorrect or unsafe control commands are given
+2. Required control actions .(for safety). are not provided
+3. Potentially correct control commands are provided at the wrong time .(too
+early or too late), or
+4. Control is stopped too soon or applied too long.
+
+
+These four types of inadequate control actions are used in the new hazard analysis technique described in chapter 8.
+A model of the process being controlled is required not just at the lower physical
+levels of the hierarchical control structure, but at all levels. In order to make proper
+decisions, the manager of an oil refinery may need to have a model of the current
+maintenance level of the safety equipment of the refinery, the state of safety training
+of the workforce, and the degree to which safety requirements are being followed
+or are effective, among other things. The CEO of the global oil conglomerate has a
+much less detailed model of the state of the refineries he controls but at the same
+time requires a broader view of the state of safety of all the corporate assets in order
+to make appropriate corporate-level decisions impacting safety.
+Process models are not only used during operations but also during system development activities. Designers use both models of the system being designed and
+models of the development process itself. The developers may have an incorrect
+model of the system or software behavior necessary for safety or the physical laws
+controlling the system. Safety may also be impacted by developers’ incorrect models
+of the development process itself.
+As an example of the latter, a Titan/Centaur satellite launch system, along with
+the Milstar satellite it was transporting into orbit, was lost due to a typo in a load
+tape used by the computer to determine the attitude change instructions to issue to
+the engines. The information on the load tape was essentially part of the process
+model used by the attitude control software. The typo was not caught during the
+development process partly because of flaws in the developers’ models of the testing
+process.each thought someone else was testing the software using the actual load
+tape when, in fact, nobody was .(see appendix B).
+In summary, process models play an important role .(1). in understanding why
+accidents occur and why humans provide inadequate control over safety-critical
+systems and .(2). in designing safer systems.
+section 4.4.
+STAMP.
+The STAMP .(Systems-Theoretic Accident Model and Process). model of accident
+causation is built on these three basic concepts.safety constraints, a hierarchical
+safety control structure, and process models.along with basic systems theory concepts. All the pieces for a new causation model have been presented. It is now simply
+a matter of putting them together.
+In STAMP, systems are viewed as interrelated components kept in a state of
+dynamic equilibrium by feedback control loops. Systems are not treated as static
+but as dynamic processes that are continually adapting to achieve their ends and to
+react to changes in themselves and their environment.
+
+
+Safety is an emergent property of the system that is achieved when appropriate
+constraints on the behavior of the system and its components are satisfied. The
+original design of the system must not only enforce appropriate constraints on
+behavior to ensure safe operation, but the system must continue to enforce the
+safety constraints as changes and adaptations to the system design occur over time.
+Accidents are the result of flawed processes involving interactions among people,
+societal and organizational structures, engineering activities, and physical system
+components that lead to violating the system safety constraints. The process leading
+up to an accident is described in STAMP in terms of an adaptive feedback function
+that fails to maintain safety as system performance changes over time to meet a
+complex set of goals and values.
+Instead of defining safety management in terms of preventing component
+failures, it is defined as creating a safety control structure that will enforce the
+behavioral safety constraints and ensure its continued effectiveness as changes
+and adaptations occur over time. Effective safety .(and risk). management may
+require limiting the types of changes that occur but the goal is to allow as much
+flexibility and performance enhancement as possible while enforcing the safety
+constraints.
+Accidents can be understood, using STAMP, by identifying the safety constraints
+that were violated and determining why the controls were inadequate in enforcing
+them. For example, understanding the Bhopal accident requires determining not
+simply why the maintenance personnel did not insert the slip blind, but also why
+the controls that had been designed into the system to prevent the release of hazardous chemicals and to mitigate the consequences of such occurrences.including
+maintenance procedures and oversight of maintenance processes, refrigeration units,
+gauges and other monitoring units, a vent scrubber, water spouts, a flare tower,
+safety audits, alarms and practice alerts, emergency procedures and equipment, and
+others.were not successful.
+STAMP not only allows consideration of more accident causes than simple component failures, but it also allows more sophisticated analysis of failures and component failure accidents. Component failures may result from inadequate constraints
+on the manufacturing process; inadequate engineering design such as missing or
+incorrectly implemented fault tolerance; lack of correspondence between individual
+component capacity .(including human capacity). and task requirements; unhandled
+environmental disturbances .(e.g., electromagnetic interference or EMI); inadequate
+maintenance; physical degradation .(wearout); and so on.
+Component failures may be prevented by increasing the integrity or resistance
+of the component to internal or external influences or by building in safety margins
+or safety factors. They may also be avoided by operational controls, such as
+
+
+operating the component within its design envelope and by periodic inspections and
+preventive maintenance. Manufacturing controls can reduce deficiencies or flaws
+introduced during the manufacturing process. The effects of physical component
+failure on system behavior may be eliminated or reduced by using redundancy. The
+important difference from other causality models is that STAMP goes beyond
+simply blaming component failure for accidents by requiring that the reasons be
+identified for why those failures occurred .(including systemic factors). and led to an
+accident, that is, why the controls instituted for preventing such failures or for minimizing their impact on safety were missing or inadequate. And it includes other
+types of accident causes, such as component interaction accidents, which are becoming more frequent with the introduction of new technology and new roles for
+humans in system control.
+STAMP does not lend itself to a simple graphic representation of accident causality .(see figure 4.7). While dominoes, event chains, and holes in Swiss cheese are very
+compelling because they are easy to grasp, they oversimplify causality and thus the
+approaches used to prevent accidents.
+
+
+section 4.5.
+A General Classification of Accident Causes.
+Starting from the basic definitions in STAMP, the general causes of accidents can
+be identified using basic systems and control theory. The resulting classification is
+useful in accident analysis and accident prevention activities.
+Accidents in STAMP are the result of a complex process that results in the system
+behavior violating the safety constraints. The safety constraints are enforced by the
+control loops between the various levels of the hierarchical control structure that
+are in place during design, development, manufacturing, and operations.
+Using the STAMP causality model, if there is an accident, one or more of the
+following must have occurred.
+1. The safety constraints were not enforced by the controller.
+a. The control actions necessary to enforce the associated safety constraint at
+each level of the sociotechnical control structure for the system were not
+provided.
+b. The necessary control actions were provided but at the wrong time .(too
+early or too late). or stopped too soon.
+c. Unsafe control actions were provided that caused a violation of the safety
+constraints.
+2. Appropriate control actions were provided but not followed.
+These same general factors apply at each level of the sociotechnical control structure, but the interpretation .(application). of the factor at each level may differ.
+Classification of accident causal factors starts by examining each of the basic
+components of a control loop .(see figure 3.2). and determining how their improper
+operation may contribute to the general types of inadequate control.
+Figure 4.8 shows the classification. The causal factors in accidents can be divided
+into three general categories. .(1). the controller operation, .(2). the behavior of actuators and controlled processes, and .(3). communication and coordination among
+controllers and decision makers. When humans are involved in the control structure, context and behavior-shaping mechanisms also play an important role in
+causality.
+4.5.1 Controller Operation
+Controller operation has three primary parts. control inputs and other relevant
+external information sources, the control algorithms, and the process model. Inadequate, ineffective, or missing control actions necessary to enforce the safety constraints and ensure safety can stem from flaws in each of these parts. For human
+controllers and actuators, context is also an important factor.
+
+
+Unsafe Inputs .(① in figure 4.8).
+Each controller in the hierarchical control structure is itself controlled by higherlevel controllers. The control actions and other information provided by the higher
+level and required for safe behavior may be missing or wrong. Using the Black Hawk
+friendly fire example again, the F-15 pilots patrolling the no-fly zone were given
+instructions to switch to a non-jammed radio mode for a list of aircraft types that
+did not have the ability to interpret jammed broadcasts. Black Hawk helicopters
+had not been upgraded with new anti-jamming technology but were omitted from
+the list and so could not hear the F-15 radio broadcasts. Other types of missing or
+wrong noncontrol inputs may also affect the operation of the controller.
+Unsafe Control Algorithms .(② in figure 4.8).
+Algorithms in this sense are both the procedures designed by engineers for hardware controllers and the procedures that human controllers use. Control algorithms
+may not enforce safety constraints because the algorithms are inadequately designed
+originally, the process may change and the algorithms become unsafe, or the control
+algorithms may be inadequately modified by maintainers if the algorithms are automated or through various types of natural adaptation if they are implemented by
+humans. Human control algorithms are affected by initial training, by the procedures
+provided to the operators to follow, and by feedback and experimentation over time
+(see figure 2.9).
+Time delays are an important consideration in designing control algorithms. Any
+control loop includes time lags, such as the time between the measurement of
+process parameters and receiving those measurements or between issuing a
+command and the time the process state actually changes. For example, pilot
+response delays are important time lags that must be considered in designing the
+control function for TCAS5 or other aircraft systems, as are time lags in the controlled process.the aircraft trajectory, for example.caused by aircraft performance limitations.
+Delays may not be directly observable, but may need to be inferred. Depending
+on where in the feedback loop the delay occurs, different control algorithms are
+required to cope with the delays . dead time and time constants require an
+algorithm that makes it possible to predict when an action is needed before the
+need. Feedback delays generate requirements to predict when a prior control action
+has taken effect and when resources will be available again. Such requirements may
+impose the need for some type of open loop or feedforward strategy to cope with
+
+
+delays. When time delays are not adequately considered in the control algorithm,
+accidents can result.
+Leplat has noted that many accidents relate to asynchronous evolution ,
+where one part of a system .(in this case the hierarchical safety control structure)
+changes without the related necessary changes in other parts. Changes to subsystems
+may be carefully designed, but consideration of their effects on other parts of the
+system, including the safety control aspects, may be neglected or inadequate. Asynchronous evolution may also occur when one part of a properly designed system
+deteriorates.
+In both these cases, the erroneous expectations of users or system components
+about the behavior of the changed or degraded subsystem may lead to accidents.
+The Ariane 5 trajectory changed from that of the Ariane 4, but the inertial reference
+system software was not changed. As a result, an assumption of the inertial reference
+software was violated and the spacecraft was lost shortly after launch. One factor
+in the loss of contact with SOHO .(SOlar Heliospheric Observatory), a scientific
+spacecraft, in 19 98 was the failure to communicate to operators that a functional
+change had been made in a procedure to perform gyro spin down. The Black Hawk
+friendly fire accident .(analyzed in chapter 5). had several examples of asynchronous
+evolution, for example the mission changed and an individual key to communication
+between the Air Force and Army left, leaving the safety control structure without
+an important component.
+Communication is a critical factor here as well as monitoring for changes that
+may occur and feeding back this information to the higher-level control. For example,
+the safety analysis process that generates constraints always involves some basic
+assumptions about the operating environment of the process. When the environment changes such that those assumptions are no longer true, as in the Ariane 5 and
+SOHO examples, the controls in place may become inadequate. Embedded pacemakers provide another example. These devices were originally assumed to be used
+only in adults, who would lie quietly in the doctor’s office while the pacemaker was
+being “programmed.” Later these devices began to be used in children, and the
+assumptions under which the hazard analysis was conducted and the controls were
+designed no longer held and needed to be revisited. A requirement for effective
+updating of the control algorithms is that the assumptions of the original .(and subsequent). analysis are recorded and retrievable.
+Inconsistent, Incomplete, or Incorrect Process Models .(③ in figure 4.8)
+Section 4.3 stated that effective control is based on a model of the process state.
+Accidents, particularly component interaction accidents, most often result from
+inconsistencies between the models of the process used by the controllers .(both
+
+
+human and automated). and the actual process state. When the controller’s model of
+the process .(either the human mental model or the software or hardware model)
+diverges from the process state, erroneous control commands .(based on the incorrect model). can lead to an accident. for example, .(1). the software does not know that
+the plane is on the ground and raises the landing gear, or .(2). the controller .(automated or human). does not identify an object as friendly and shoots a missile at it, or
+(3). the pilot thinks the aircraft controls are in speed mode but the computer has
+changed the mode to open descent and the pilot behaves inappropriately for that
+mode, or .(4). the computer does not think the aircraft has landed and overrides the
+pilots’ attempts to operate the braking system. All of these examples have actually
+occurred.
+The mental models of the system developers are also important. During software
+development, for example, the programmers’ models of required behavior may not
+match the engineers’ models .(commonly referred to as a software requirements
+error), or the software may be executed on computer hardware or may control
+physical systems during operations that differ from what was assumed by the programmer and used during testing. The situation becomes more even complicated
+when there are multiple controllers .(both human and automated). because each of
+their process models must also be kept consistent.
+The most common form of inconsistency occurs when one or more process
+models is incomplete in terms of not defining appropriate behavior for all possible
+process states or all possible disturbances, including unhandled or incorrectly
+handled component failures. Of course, no models are complete in the absolute
+sense. The goal is to make them complete enough that no safety constraints are
+violated when they are used. Criteria for completeness in this sense are presented
+in Safeware, and completeness analysis is integrated into the new hazard analysis
+method as described in chapter 9.
+How does the process model become inconsistent with the actual process state?
+The process model designed into the system .(or provided by training if the controller is human). may be wrong from the beginning, there may be missing or incorrect
+feedback for updating the process model as the controlled process changes state,
+the process model may be updated incorrectly .(an error in the algorithm of the
+controller), or time lags may not be accounted for. The result can be uncontrolled
+disturbances, unhandled process states, inadvertent commanding of the system into
+a hazardous state, unhandled or incorrectly handled controlled process component
+failures, and so forth.
+Feedback is critically important to the safe operation of the controller. A basic
+principle of system theory is that no control system will perform better than its
+measuring channel. Feedback may be missing or inadequate because such feedback
+is not included in the system design, flaws exist in the monitoring or feedback
+
+
+
+communication channel, the feedback is not timely, or the measuring instrument
+operates inadequately.
+A contributing factor cited in the Cali B757 accident report, for example, was the
+omission of the waypoints6 behind the aircraft from cockpit displays, which contributed to the crew not realizing that the waypoint for which they were searching was
+behind them .(missing feedback). The model of the Ariane 501 attitude used by the
+attitude control software became inconsistent with the launcher attitude when an
+error message sent by the inertial reference system was interpreted by the attitude
+control system as data .(incorrect processing of feedback), causing the spacecraft
+onboard computer to issue an incorrect and unsafe command to the booster and
+main engine nozzles.
+Other reasons for the process models to diverge from the true system state may
+be more subtle. Information about the process state has to be inferred from measurements. For example, in the TCAS TWO aircraft collision avoidance system, relative
+range positions of other aircraft are computed based on round-trip message propagation time. The theoretical control function .(control law). uses the true values of
+the controlled variables or component states .(e.g., true aircraft positions). However,
+at any time, the controller has only measured values, which may be subject to time
+lags or inaccuracies. The controller must use these measured values to infer the true
+conditions in the process and, if necessary, to derive corrective actions to maintain
+the required process state. In the TCAS example, sensors include on-board devices
+such as altimeters that provide measured altitude .(not necessarily true altitude). and
+antennas for communicating with other aircraft. The primary TCAS actuator is the
+pilot, who may or may not respond to system advisories. The mapping between the
+measured or assumed values and the true values can be flawed.
+To summarize, process models can be incorrect from the beginning.where
+correct is defined in terms of consistency with the current process state and with
+the models being used by other controllers.or they can become incorrect due to
+erroneous or missing feedback or measurement inaccuracies. They may also be
+incorrect only for short periods of time due to time lags in the process loop.
+4.5.2. Actuators and Controlled Processes .(④ in figure 4.8)
+The factors discussed so far have involved inadequate control. The other case occurs
+when the control commands maintain the safety constraints, but the controlled
+process may not implement these commands. One reason might be a failure or flaw
+in the reference channel, that is, in the transmission of control commands. Another
+reason might be an actuator or controlled component fault or failure. A third is that
+
+
+
+the safety of the controlled process may depend on inputs from other system components, such as power, for the execution of the control actions provided. If these
+process inputs are missing or inadequate in some way, the controller process may
+be unable to execute the control commands and accidents may result. Finally, there
+may be external disturbances that are not handled by the controller.
+In a hierarchical control structure, the actuators and controlled process may
+themselves be a controller of a lower-level process. In this case, the flaws in executing the control are the same described earlier for a controller.
+Once again, these types of flaws do not simply apply to operations or to the
+technical system but also to system design and development. For example, a common
+flaw in system development is that the safety information gathered or created by
+the system safety engineers .(the hazards and the necessary design constraints to
+control them). is inadequately communicated to the system designers and testers, or
+that flaws exist in the use of this information in the system development process.
+
+section 4.5.3. Coordination and Communication among Controllers and Decision Makers.
+When there are multiple controllers .(human and/or automated), control actions
+may be inadequately coordinated, including unexpected side effects of decisions
+or actions or conflicting control actions. Communication flaws play an important
+role here.
+Leplat suggests that accidents are most likely in overlap areas or in boundary
+areas or where two or more controllers .(human or automated). control the same
+process or processes with common boundaries .(figure 4.9). . In both boundary
+and overlap areas, the potential exists for ambiguity and for conflicts among
+independent decisions.
+Responsibility for the control functions in boundary areas is often poorly defined.
+For example, Leplat cites an iron and steel plant where frequent accidents occurred
+at the boundary of the blast furnace department and the transport department. One
+conflict arose when a signal informing transport workers of the state of the blast
+
+
+furnace did not work and was not repaired because each department was waiting
+for the other to fix it. Faverge suggests that such dysfunction can be related to the
+number of management levels separating the workers in the departments from a
+common manager. The greater the distance, the more difficult the communication,
+and thus the greater the uncertainty and risk.
+Coordination problems in the control of boundary areas are rife. As mentioned
+earlier, a Milstar satellite was lost due to inadequate attitude control of the Titan/
+Centaur launch vehicle, which used an incorrect process model based on erroneous
+inputs on a software load tape. After the accident, it was discovered that nobody
+had tested the software using the actual load tape.each group involved in testing
+and assurance had assumed some other group was doing so. In the system development process, system engineering and mission assurance activities were missing or
+ineffective, and a common control or management function was quite distant from
+the individual development and assurance groups .(see appendix B). One factor
+in the loss of the Black Hawk helicopters to friendly fire over northern Iraq was
+that the helicopters normally flew only in the boundary areas of the no-fly zone and
+procedures for handling aircraft in those areas were ill defined. Another factor was
+that an Army base controlled the flights of the Black Hawks, while an Air Force
+base controlled all the other components of the airspace. A common control point
+once again was high above where the accident occurred in the control structure. In
+addition, communication problems existed between the Army and Air Force bases
+at the intermediate control levels.
+Overlap areas exist when a function is achieved by the cooperation of two controllers or when two controllers exert influence on the same object. Such overlap
+creates the potential for conflicting control actions .(dysfunctional interactions
+among control actions). Leplat cites a study of the steel industry that found 67
+percent of technical incidents with material damage occurred in areas of co-activity,
+although these represented only a small percentage of the total activity areas. In an
+A320 accident in Bangalore, India, the pilot had disconnected his flight director
+during approach and assumed that the copilot would do the same. The result would
+have been a mode configuration in which airspeed is automatically controlled by
+the autothrottle .(the speed mode), which is the recommended procedure for the
+approach phase. However, the copilot had not turned off his flight director, which
+meant that open descent mode became active when a lower altitude was selected
+instead of speed mode, eventually contributing to the crash of the aircraft short of
+the runway . In the Black Hawks’ shootdown by friendly fire, the aircraft surveillance officer .(A S O). thought she was responsible only for identifying and tracking aircraft south of the 36th Parallel, while the air traffic controller for the area
+north of the 36th Parallel thought the A S O was also tracking and identifying aircraft
+in his area and acted accordingly.
+
+
+In 2002, two aircraft collided over southern Germany. An important factor in the
+accident was the lack of coordination between the airborne TCAS .(collision avoidance). system and the ground air traffic controller. They each gave different and
+conflicting advisories on how to avoid a collision. If both pilots had followed one
+or the other, the loss would have been avoided, but one followed the TCAS advisory
+and the other followed the ground air traffic control advisory.
+
+section 4.5.4. Context and Environment.
+Flawed human decision making can result from incorrect information and inaccurate process models, as described earlier. But human behavior is also greatly
+impacted by the context and environment in which the human is working. These
+factors have been called “behavior shaping mechanisms.” While value systems and
+other influences on decision making can be considered to be inputs to the controller,
+describing them in this way oversimplifies their role and origin. A classification of
+the contextual and behavior-shaping mechanisms is premature at this point, but
+relevant principles and heuristics are elucidated throughout the rest of the book.
+
+section 4.6.
+Applying the New Model.
+To summarize, STAMP focuses particular attention on the role of constraints in
+safety management. Accidents are seen as resulting from inadequate control or
+enforcement of constraints on safety-related behavior at each level of the system
+development and system operations control structures. Accidents can be understood
+in terms of why the controls that were in place did not prevent or detect maladaptive changes.
+Accident causal analysis based on STAMP starts with identifying the safety constraints that were violated and then determines why the controls designed to enforce
+the safety constraints were inadequate or, if they were potentially adequate, why
+the system was unable to exert appropriate control over their enforcement.
+In this conception of safety, there is no “root cause.” Instead, the accident “cause”
+consists of an inadequate safety control structure that under some circumstances
+leads to the violation of a behavioral safety constraint. Preventing future accidents
+requires reengineering or designing the safety control structure to be more effective.
+Because the safety control structure and the behavior of the individuals in it, like
+any physical or social system, changes over time, accidents must be viewed as
+dynamic processes. Looking only at the time of the proximal loss events distorts and
+omits from view the most important aspects of the larger accident process that are
+needed to prevent reoccurrences of losses from the same causes in the future.
+Without that view, we see and fix only the symptoms, that is, the results of the flawed
+processes and inadequate safety control structure without getting to the sources of
+those symptoms.
+
+
+To understand the dynamic aspects of accidents, the process leading to the loss
+can be viewed as an adaptive feedback function where the safety control system
+performance degrades over time as the system attempts to meet a complex set of
+goals and values. Adaptation is critical in understanding accidents, and the adaptive
+feedback mechanism inherent in the model allows a STAMP analysis to incorporate
+adaptation as a fundamental system property.
+We have found in practice that using this model helps us to separate factual
+data from the interpretations of that data. While the events and physical data
+involved in accidents may be clear, their importance and the explanations for why
+the factors were present are often subjective as is the selection of the events to
+consider.
+STAMP models are also more complete than most accident reports and other
+models, for example see . Each of the explanations for the incorrect
+FMS input of R in the Cali American Airlines accident described in chapter 2, for
+example, appears in the STAMP analysis of that accident at the appropriate levels
+of the control structure where they operated. The use of STAMP helps not only to
+identify the factors but also to understand the relationships among them.
+While STAMP models will probably not be useful in law suits as they do not
+assign blame for the accident to a specific person or group, they do provide more
+help in understanding accidents by forcing examination of each part of the sociotechnical system to see how it contributed to the loss.and there will usually be
+contributions at each level. Such understanding should help in learning how to
+engineer safer systems, including the technical, managerial, organizational, and regulatory aspects.
+To accomplish this goal, a framework for classifying the factors that lead to accidents was derived from the basic underlying conceptual accident model .(see figure
+4.8). This classification can be used in identifying the factors involved in a particular
+accident and in understanding their role in the process leading to the loss. The accident investigation after the Black Hawk shootdown .(analyzed in detail in the next
+chapter). identified 130 different factors involved in the accident. In the end, only
+the AWACS senior director was court-martialed, and he was acquitted. The more
+one knows about an accident process, the more difficult it is to find one person or
+part of the system responsible, but the easier it is to find effective ways to prevent
+similar occurrences in the future.
+STAMP is useful not only in analyzing accidents that have occurred but in developing new and potentially more effective system engineering methodologies to
+prevent accidents. Hazard analysis can be thought of as investigating an accident
+before it occurs. Traditional hazard analysis techniques, such as fault tree analysis
+and various types of failure analysis techniques, do not work well for very complex
+systems, for software errors, human errors, and system design errors. Nor do they
+usually include organizational and management flaws. The problem is that these
+
+hazard analysis techniques are limited by a focus on failure events and the role of
+component failures in accidents; they do not account for component interaction
+accidents, the complex roles that software and humans are assuming in high-tech
+systems, the organizational factors in accidents, and the indirect relationships
+between events and actions required to understand why accidents occur.
+STAMP provides a direction to take in creating these new hazard analysis and
+prevention techniques. Because in a system accident model everything starts from
+constraints, the new approach focuses on identifying the constraints required to
+maintain safety; identifying the flaws in the control structure that can lead to an
+accident .(inadequate enforcement of the safety constraints); and then designing
+a control structure, physical system and operating conditions that enforces the
+constraints.
+Such hazard analysis techniques augment the typical failure-based design focus
+and encourage a wider variety of risk reduction measures than simply adding redundancy and overdesign to deal with component failures. The new techniques also
+provide a way to implement safety-guided design so that safety analysis guides the
+design generation rather than waiting until a design is complete to discover it is
+unsafe. Part THREE describes ways to use techniques based on STAMP to prevent accidents through system design, including design of the operating conditions and the
+safety management control structure.
+STAMP can also be used to improve performance analysis. Performance monitoring of complex systems has created some dilemmas. Computers allow the collection
+of massive amounts of data, but analyzing that data to determine whether the system
+is moving toward the boundaries of safe behavior is difficult. The use of an accident
+model based on system theory and the basic concept of safety constraints may
+provide directions for identifying appropriate safety metrics and leading indicators;
+determining whether control over the safety constraints is adequate; evaluating the
+assumptions about the technical failures and potential design errors, organizational
+structure, and human behavior underlying the hazard analysis; detecting errors in
+the operational and environmental assumptions underlying the design and the organizational culture; and identifying any maladaptive changes over time that could
+increase risk of accidents to unacceptable levels.
+Finally, STAMP points the way to very different approaches to risk assessment.
+Currently, risk assessment is firmly rooted in the probabilistic analysis of failure
+events. Attempts to extend current P R A techniques to software and other new
+technology, to management, and to cognitively complex human control activities
+have been disappointing. This way forward may lead to a dead end. Significant
+progress in risk assessment for complex systems will require innovative approaches
+starting from a completely different theoretical foundation.
--- a/chapter05.raw
+++ b/chapter05.raw
--- a/47
+++ b/47
@ -1,6 +1,47 @@
+:	.
 —	.
 \[.+\]	
 -\n	
-HMO	H M O
-MIC	M I C   
-DC-10	D C 10.
+ 19(\d\d) 	 19 $1 
+ 200(\d) 	 2 thousand $1 
+ 20(\d\d) 	 20 $1 
+ \(	 .(
+\) 	). 
+ III	3 
+ II	2 
+ IV 	4   
+ ASO	 A S O 
+ PRA 	 P R A  
+ HMO 	 H M O 
+ MIC 	 M I C 
+ DC-10 	 D C 10 
+ OPC	 O P C
+ TAOR	 T A O R
+ AAI 	 A A I 
+ ACO 	 A C O 
+ AFB 	 A F B 
+ AI 	 A I 
+ ATO 	 A T O 
+ BH 	 B H 
+ BSD 	 B S D 
+ CTF 	 C T F 
+ CFAC 	 C FACK 
+ DO 	 D O 
+ GAO 	 GAOW
+ HQ-II 	 H Q-2 
+ IFF 	 I F F 
+ JOIC 	 J O I C 
+ JSOC 	 J SOCK
+ JTIDS 	 J tides
+ MCC 	 M C C 
+ MD 	 M D 
+ NCA 	 N C A 
+ NFZ 	 N F Z 
+ OPC 	 O P C 
+ ROE 	 R O E 
+ SD 	 S D 
+ SITREP	 SIT Rep
+ TACSAT	 Tack sat
+ TAOR 	 T A O R 
+ USCINCEUR U S C in E U R
+ WD 	 W D