chore: add ch 5-6

2025-03-15 21:11:29 -06:00 · 2025-03-15 21:11:29 -06:00 · cbb5561fc6
commit cbb5561fc6
parent ff069b52c4
7 changed files with 3861 additions and 51 deletions
--- a/6
+++ b/6
@ -1,13 +1,14 @@
 PATH:=./piper:$(PATH)
 TXT_FILES := $(patsubst %.raw,%.txt,$(wildcard *.raw))
 WAV_FILES := $(patsubst %.txt,%.wav,$(wildcard *.txt))
 MP3_FILES := $(patsubst %.txt,%.mp3,$(wildcard *.txt))
 MODEL=en_GB-alan-medium.onnx
 CONFIG=en_GB-alan-medium.onnx.json
-complete: $(MP3_FILES)
+complete: $(TXT_FILES) $(MP3_FILES) 
 	echo $@ $^
 $(WAV_FILES): %.wav: %.txt
@ -17,6 +18,9 @@ $(WAV_FILES): %.wav: %.txt
 $(MP3_FILES): %.mp3: %.wav
 	ffmpeg -y -i $^ $@
 $(TXT_FILES): %.txt: %.raw
 	./cleanfile $^ $@
 install:
 	wget -O piper.tar "https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz"
--- a/chapter05.raw
+++ b/chapter05.raw
@ -427,12 +427,12 @@ responsibility for a specific task:
 •
 •
 •
-The enroute controller controlled the flow of OPC aircraft to and from the
+1.The enroute controller controlled the flow of OPC aircraft to and from the
 TAOR. This person also conducted radio and IFF checks on friendly aircraft
 outside the TAOR.
-The TAOR controller provided threat warning and tactical control for all
+2.The TAOR controller provided threat warning and tactical control for all
 OPC aircraft within the TAOR.
-The tanker controller coordinated all air refueling operations (and played no
+3.The tanker controller coordinated all air refueling operations (and played no
 part in the accident so is not mentioned further).
 To facilitate communication and coordination, the SD’s console was physically
 located in the “pit” right between the MCC and the ACE (Airborne Command
@ -1422,4 +1422,978 @@ with the AWACS crew that helicopter activities were not an integral part of OPC
 air operations. In testimony after the accident, the ACE commented, “The way I
 understand it, only as a courtesy does the AWACS track Eagle Flight.”
-page 141
+The Mission Director and ACE also did not have the information necessary to
 exercise their responsibility. The ACE had an inaccurate model of where the Black
 Hawks were located in the airspace. He testified that he presumed the Black Hawks
 were conducting standard operations in the Security Zone and had landed [159].
 He also testified that, although he had a radarscope, he had no knowledge of
 AWACS radar symbology: “I have no idea what those little blips mean.” The Mission
 Director, on the ground, was dependent on the information about the current air-
 space state sent down from the AWACS via JTIDS (the Joint Tactical Information
 Distribution System).
 The ACE testified that he assumed the F-15 pilots would ask him for guidance
 in any situation involving a potentially hostile aircraft, as required by the ROE. The
 ACE’s and F-15 pilots’ mental models of the ROE clearly did not match with respect
 to who had the authority to initiate the engagement of unidentified aircraft. The
 rules of engagement stated that the ACE was responsible, but some pilots believed
 they had authority when an imminent threat was involved. Because of security
 concerns, the actual ROE used were not disclosed during the accident investigation,
 but, as argued earlier, the slow, low-flying Black Hawks posed no serious threat
 to an F-15.
 Although the F-15 pilot never contacted the ACE about the engagement, the
 ACE did hear the call of the F-15 lead pilot to the TAOR controller. The ACE
 testified to the Accident Investigation Board that he did not intervene because
 he believed the F-15 pilots were not committed to anything at the visual identi-
 fication point, and he had no idea they were going to react so quickly. Since being
 assigned to OPC, he said the procedure had been that when the F-15s or other
 fighters were investigating aircraft, they would ask for feedback from the ACE.
 The ACE and AWACS crew would then try to rummage around and find
 out whose aircraft it was and identify it specifically. If they were unsuccessful, the
 ACE would then ask the pilots for a visual identification [159]. Thus, the ACE
 probably assumed that the F-15 pilots would not fire at the helicopters without
 reporting to him first, which they had not done yet. At this point, they had simply
 requested an identification by the AWACS traffic controller. According to his
 understanding of the ROE, the F-15 pilots would not fire without his approval
 unless there was an immediate threat, which there was not. The ACE testified that
 he expected to be queried by the F-15 pilots as to what their course of action
 should be.
 The ACE also testified at one of the hearings:
 I really did not know what the radio call “engaged” meant until this morning. I did
 not think the pilots were going to pull the trigger and kill those guys. As a previous right
 seater in an F-111, I thought “engaged” meant the pilots were going down to do a visual
 intercept. [159]
 Coordination among Multiple Controllers: Not applicable.
 Feedback from Controlled Process: The F-15 lead pilot did not follow the ROE
 and report the identified aircraft to the ACE and ask for guidance, although the
 ACE did learn about it from the questions the F-15 pilots posed to the controllers
 on the AWACS aircraft. The Mission Director got incorrect feedback about the state
 of the airspace from JTIDS.
 Time Lags: An unusual time lag occurred where the lag was in the controller and
 not in one of the other parts of the control loop.10 The F-15 pilots responded faster
 than the ACE (in the AWACS) and Mission Director (on the ground) could issue
 appropriate control instructions (as required by the ROE) with regard to the
 engagement.
 Changes after the Accident.
 There were no changes after the accident, although roles were clarified.
 secton 5.3.5. The AWACS Operators.
 This level of the control structure contains more examples of inconsistent mental
 models and asynchronous evolution. In addition, this control level provides interest-
 ing examples of the adaptation over time of specified procedures to accepted prac-
 tice and of coordination problems. There were multiple controllers with confused
 and overlapping responsibilities for enforcing different aspects of the safety require-
 ments and constraints (figure 5.8). The overlaps and boundary areas in the con-
 trolled processes led to serious coordination problems among those responsible for
 controlling aircraft in the TAOR.
 Context in Which Decisions and Actions Took Place
 Safety Requirements and Constraints: The general safety constraint involved in
 the accident at this level was to prevent misidentification of aircraft by the pilots
 and any friendly fire that might result. More specific requirements and constraints
 are shown in figure 5.8.
 Controls: Controls included procedures for identifying and tracking aircraft, train-
 ing (including simulator missions), briefings, staff controllers, and communication
 channels. The senior director and surveillance officer (ASO) provided real-time
 oversight of the crew’s activities, while the mission crew commander (MCC) coor-
 dinated all the activities aboard the AWACS aircraft.
 footnote. A similar type of time lag led to the loss of an F-18 when a mechanical failure resulted in inputs
 arriving at the computer interface faster than the computer was able to process them
 The Delta Point system, used since the inception of OPC, provided standard code
 names for real locations. These code names were used to prevent the enemy, who
 might be listening to radio transmissions, from knowing the helicopters’ flight plans.
 Roles and Responsibilities: The AWACS crew were responsible for identifying,
 tracking, and controlling all aircraft enroute to and from the TAOR; for coordinating
 air refueling; for providing airborne threat warning and control in the TAOR; and
 for providing surveillance, detection and identification of all unknown aircraft.
 Individual responsibilities are described in section 5.2.
 The staff weapons director (instructor) was permanently assigned to Incirlik. He
 did all incoming briefings for new AWACS crews rotating into Incirlik and accom-
 panied them on their first mission in the TAOR. The OPC leadership recognized
 the potential for some distance to develop between stateside spin-up training and
 continuously evolving practice in the TAOR. Therefore, as mentioned earlier, per-
 manent staff or instructor personnel flew with each new AWACS crew on their
 maiden flight in Turkey. Two of these staff controllers were on the AWACS the day
 of the accident to answer any questions that the new crew might have about local
 procedures and, as described earlier, to inform them about adaptation of accepted
 practice from specified procedures.
 The SD had worked as an AWACS controller for five years. This was his fourth
 deployment to OPC, his second as an SD, and his sixtieth mission over the Iraqi
 TAOR [159]. He worked as a SD more than two hundred days a year and had logged
 more than 2,383 hours flying time [191].
 The enroute controller, who was responsible for aircraft outside the TAOR, was
 a first lieutenant with four years in the Air Force. He had finished AWACS training
 two years earlier (May 1992) and had served in the Iraqi TAOR previously [191].
 The TAOR controller, who was responsible for controlling all air traffic flying
 within the TAOR, was a second lieutenant with more than nine years of service in
 the Air Force, but he had just finished controller’s school and had had no previous
 deployments outside the continental United States. In fact, he had become mission
 ready only two months prior to the incident. This tour was his first in OPC and his
 first time as a TAOR controller. He had only controlled as a mission-ready weapons
 director on three previous training flights [191] and never in the role of TAOR
 controller. AWACS guidance at the time suggested that the most inexperienced
 controller be placed in the TAOR position: None of the reports on the accident
 provided the reasoning behind this practice.
 The air surveillance officer (ASO) was a captain at the time of the shootdown. She
 had been mission-ready since October 1992 and was rated as an instructor ASO.
 Because the crew’s originally assigned ASO was upgrading and could not make it to
 Turkey on time, she volunteered to fill in for him. She had already served for five and
 a half weeks in OPC at the time of the accident and was completing her third assign-
 ment to OPC. She worked as an ASO approximately two hundred days a year [191].
 Environmental and Behavior-Shaping Factors: At the time of the shootdown,
 shrinking defense budgets were leading to base closings and cuts in the size of the
 military. At the same time, a changing political climate, brought about by the fall of
 the Soviet Union, demanded significant U.S. military involvement in a series of
 operations. The military (including the AWACS crews) were working at a greater
 pace than they had ever experienced due to budget cuts, early retirements, force
 outs, slowed promotions, deferred maintenance, and delayed fielding of new equip-
 ment. All of these factors contributed to poor morale, inadequate training, and high
 personnel turnover.
 AWACS crews are stationed and trained at Tinker Air Force Base in Oklahoma
 and then deployed to locations around the world for rotations lasting approximately
 thirty days. Although all but one of the AWACS controllers on the day of the acci-
 dent had served previously in the Iraqi no-fly zone, this was their first day working
 together and, except for the surveillance officer, the first day of their current rota-
 tion. Due to last minute orders, the team got only minimal training, including one
 simulator session instead of the two full three-hour sessions required prior to
 deploying. In the only session they did have, some of the members of the team were
 missing—the ASO, ACE, and MCC were unable to attend—and one was later
 replaced: As noted, the ASO originally designated and trained to deploy with this
 crew was instead shipped off to a career school at the last minute, and another ASO,
 who was just completing a rotation in Turkey, filled in.
 The one simulator session they did receive was less than effective, partly because
 the computer tape provided by Boeing to drive the exercise was not current (another
 instance of asynchronous evolution). For example, the maps were out of date,
 and the rules of engagement used were different and much more restrictive than
 those currently in force in OPC. No Mode I codes were listed. The list of friendly
 participants in OPC did not include UH-60s (Black Hawks) and so on. The second
 simulation session was canceled because of a wing exercise.
 Because the TAOR area had not yet been sanitized, it was a period of low activ-
 ity: At the time, there were still only four aircraft over the no-fly zone—the two
 F-15s and the two Black Hawks. AWACS crews are trained and equipped to track
 literally hundreds of enemy and friendly aircraft during a high-intensity conflict.
 Many accidents occur during periods of low activity when vigilance is reduced com-
 pared to periods of higher activity.
 The MCC sits with the other two key supervisors (SD and ACE) toward the front
 of the aircraft in a three-seat arrangement named the “Pit,” where each has his own
 radarscope. The SD is seated to the MCC’s left. Surveillance is seated in the rear.
 Violations of the no-fly zone had been rare and threats few during the past three
 years, so that day’s flight was expected to be an average one, and the supervisors in
 the Pit anticipated just another routine mission [159].
 During the initial orbit of the AWACS, the technicians determined that one
 of the radar consoles was not operating. According to Snook, this type of problem
 was not uncommon, and the AWACS is therefore designed with extra crew positions.
 When the enroute controller realized his assigned console was not working properly,
 he moved from his normal position between the TAOR and tanker controllers,
 to a spare seat directly behind the senior director. This position kept him out of
 the view of his supervisor and also eliminated physical contact with the TAOR
 controller.
 Dysfunctional Interactions among the Controllers
 According to the formal procedures, control of aircraft was supposed to be handed
 off from the enroute controller to the TAOR controller when the aircraft entered
 the TAOR. This handoff did not occur for the Black Hawks, and the TAOR control-
 ler was not made aware of the Black Hawks’ flight within the TAOR. Snook explains
 this communication error as resulting from the radar console failure, which inter-
 fered with communication between the TAOR and enroute controllers. But this
 explanation does not gibe with the fact that the normal procedure of the enroute
 controller was to continue to control helicopters without handing them off to the
 TAOR controller, even when the enroute and TAOR controllers were seated in their
 usual places next to each other. There may usually have been more informal interac-
 tion about aircraft in the area when they were seated next to each other, but there
 is no guarantee that such interaction would have occurred even with a different
 seating arrangement. Note that the helicopters had been dropped from the radar
 screens and the enroute controller had an incorrect mental model of where they
 were: He thought they were close to the boundary of the TAOR and was unaware
 they had gone deep within it. The enroute controller, therefore, could not have told
 the TAOR controller about the true location of the Black Hawks even if they had
 been sitting next to each other.
 The interaction between the surveillance officer and the senior weapons director
 with respect to tracking the helicopter flight on the radar screen involved many dys-
 functional interactions. For example, the surveillance officer put an attention arrow
 on the senior director’s radarscope in an attempt to query him about the lost heli-
 copter symbol that was floating, at one point, unattached to any track. The senior
 director did not respond to the attention arrow, and it automatically dropped off the
 screen after sixty seconds. The helicopter symbol (H) dropped off the radar screen
 when the radar and IFF returns from the Black Hawks faded and did not return until
 just before the engagement, removing any visual reminder to the AWACS crew that
 there were Black Hawks inside the TAOR. The accident investigation did not include
 an analysis of the design of the AWACS human–computer interface or how it might
 have contributed to the accident, although such an analysis is important in fully
 understanding why it made sense for the controllers to act the way they did.
 During his court-martial for negligent homicide, the senior director argued that
 his radarscope did not identify the helicopters as friendly and that therefore he was
 not responsible. When asked why the Black Hawk identification was dropped from
 the radarscope, he gave two reasons. First, because it was no longer attached to any
 active signal, they assumed the helicopter had landed somewhere. Second, because
 the symbol displayed on their scopes was being relayed in real time through a JTIDS
 downlink to commanders on the ground, they were very concerned about sending
 out an inaccurate picture of the TAOR.
 Even if we suspended it, it would not be an accurate picture, because we wouldn’t know
 for sure if that is where he landed. Or if he landed several minutes earlier, and where
 that would be. So, the most accurate thing for us to do at that time, was to drop the
 symbology [sic].
 Flawed or Inadequate Decision Making and Control Actions.
 There were myriad inadequate control actions in this accident, involving each of the
 controllers in the AWACS. The AWACS crew work as a team so it is sometimes hard
 to trace incorrect decisions to one individual. While from each individual’s stand-
 point the actions and decisions may have been correct, when put together as a whole
 the decisions were incorrect.
 The enroute controller never told the Black Hawk pilots to change to the TAOR
 frequency that was being monitored by the TAOR controller and did not hand off
 control of the Black Hawks to the TAOR controller. The established practice of not
 handing off the helicopters had probably evolved over time as a more efficient way
 of handling traffic—another instance of asynchronous evolution. Because the heli-
 copters were usually only at the very border of the TAOR and spent very little time
 there, the overhead of handing them off twice within a short time period was con-
 sidered inefficient by the AWACS crews. As a result, the procedures used had
 changed over time to the more efficient procedure of keeping them under the
 control of the enroute controller. The AWACS crews were not provided with written
 guidance or training regarding the control of helicopters within the TAOR, and, in
 its absence, they adapted their normal practices for fixed-wing aircraft as best they
 could to apply them to helicopters.
 In addition to not handing off the helicopters, the enroute controller did not
 monitor the course of the Black Hawks while they were in the TAOR (after leaving
 Zakhu), did not take note of the flight plan (from Whiskey to Lima), did not alert
 the F-15 pilots there were friendly helicopters in the area, did not alert the F-15
 pilots before they fired that the helicopters they were targeting were friendly, and
 did not tell the Black Hawk pilots that they were on the wrong frequency and were
 squawking the wrong IFF Mode I code.
 The TAOR controller did not monitor the course of the Black Hawks in the
 TAOR and did not alert the F-15 pilots before they fired that the helicopters they
 were targeting were friendly. None of the controllers warned the F-15 pilots at any
 time that there were friendly helicopters in the area nor did they try to stop the
 engagement. The accident investigation board found that because Army helicopter
 activities were not normally known at the time of the fighter pilots’ daily briefings,
 normal procedures were for the AWACS crews to receive real-time information
 about their activities from the helicopter crews and to relay that information on to
 the other aircraft in the area. If this truly was established practice, it clearly did not
 occur on that day.
 The controllers were supposed to be tracking the helicopters using the Delta
 Point system, and the Black Hawk pilots had reported to the enroute controller that
 they were traveling from Whiskey to Lima. The enroute controller testified, however,
 that he had no idea of the towns to which the code names Whiskey and Lima
 referred. After the shootdown, he went in search of the card defining the call signs
 and finally found it in the Surveillance Section [159]. Clearly, tracking helicopters
 using call signs was not a common practice or the charts would have been closer at
 hand. In fact, during the court-martial of the senior director, the defense was unable
 to locate any AWACS crewmember at Tinker AFB (where AWACS crews were
 stationed and trained) who could testify that he or she had ever used the Delta Point
 system [159] although clearly the Black Hawk pilots thought it was being used
 because they provided their flight plan using Delta Points.
 None of the controllers in the AWACS told the Black Hawk helicopters that
 they were squawking the wrong IFF code for the TAOR. Snook cites testimony
 from the court-martial of the senior director that posits three related explanations
 for this lack of warning: (1) the minimum communication (min comm) policy, (2) a
 belief by the AWACS crew that the Black Hawks should know what they were
 doing, and (3) pilots not liking to be told what to do. None of these explanations
 provided during the trial is very satisfactory and appear to be after-the-fact ratio-
 nalizations for the controllers not doing their job when faced with possible court-
 martial and jail terms. Given that the controllers acknowledged that the Army
 helicopters never squawked the right codes and had not done so for months, there
 must have been other communication channels that could have been used besides
 real-time radio communication to remedy this situation, so the min comm policy is
 not an adequate explanation. Arguing that the pilots should know what they were
 doing is simply an abdication of responsibility, as is the argument that pilots did not
 like being told what to do. A different perspective, and one that likely applies to all
 the controllers, was provided by the staff weapons director, who testified, “For a
 helicopter, if he’s going to Zakhu, I’m not that concerned about him going beyond
 that. So, I’m not really concerned about having an F-15 needing to identify this
 guy.” [159]
 The mission crew commander had provided the crew’s morning briefing. He
 spent some time going over the activity flowsheet, which listed all the friendly air-
 craft flying in the OPC that day, their call signs, and the times they were scheduled
 to enter the TAOR. According to Piper (but nobody else mentions it), he failed to
 note the helicopters, even though their call signs and their IFF information had been
 written on the margin of his flowsheet.
 The shadow crew always flew with new crews on their first day in OPC, but the
 task of these instructors does not seem to have been well defined. At the time of
 the shootdown, one was in the galley “taking a break,” and the other went back to
 the crew rest area, read a book, and took a nap. The staff weapons director, who was
 asleep in the back of the AWACS, during the court-martial of the senior director
 testified that his purpose on the mission was to be the “answer man,” just to answer
 any questions they might have. This was a period of very little activity in the area
 (only the two F-15s were supposed to be in the TAOR), and the shadow crew
 members may have thought their advice was not needed at that time.
 When the staff weapons director went back to the rest area, the only symbol
 displayed on the scopes of the AWACS controllers was the one for the helicopters
 (EE01), which they thought were going to Zakhu only.
 Because many of the dysfunctional actions of the crew did conform to the estab-
 lished practice (e.g., not handing off helicopters to the TAOR controller), it is
 unclear what different result might have occurred if the shadow crew had been in
 place. For example, the staff weapons director testified during the hearings and trial
 that he had seen helicopters out in the TAOR before, past Zakhu, but he really did
 not feel it was necessary to brief crews about the Delta Point system to determine
 a helicopter’s destination [159].
 Reasons for the Flawed Control.
 Inadequate Control Algorithms: This level of the accident analysis provides an
 interesting example of the difference between prescribed procedures and estab-
 lished practice, the adaptation of procedures over time, and migration toward the
 boundaries of safe behavior. Because of the many helicopter missions that ran from
 Diyarbakir to Zakhu and back, the controllers testified that it did not seem worth
 handing them off and switching them over to the TAOR frequency for only a few
 minutes. Established practice (keeping the helicopters under the control of the
 enroute controller instead of handing them off to the TAOR controller) appeared
 to be safe until the day the helicopters’ behavior differed from normal, that is, they
 stayed longer in the TAOR and ventured beyond a few miles inside the boundaries.
 Established practice no longer assured safety under these conditions. A complicat-
 ing factor in the accident was the universal misunderstanding of each of the control-
 lers’ responsibilities with respect to tracking Army helicopters.
 Snook suggests that the min comm norm contributed to the AWACS crew’s
 general reluctance to enforce rules, contributed to AWACS not correcting Eagle
 Flight’s improper Mode I code, and discouraged controllers from pushing helicopter
 pilots to the TAOR frequency when they entered Iraq because they were reluctant
 to say more than absolutely necessary.
 According to Snook, there were also no explicit or written procedures regarding
 the control of helicopters. He states that radio contact with helicopters was lost
 frequently, but there were no procedures to follow when this occurred. In contrast,
 Piper claims the AWACS operations manual says:
 Helicopters are a high interest track and should be hard copied every five minutes in
 turkey and every two minutes in Iraq. These coordinates should be recorded in a special
 log book, because radar contact with helicopters is lost and the radar symbology [sic] can
 be suspended. [159].
 There is no information in the publicly available parts of the accident report about
 any special logbook or whether such a procedure was normally followed.
 footnote. Even if the actions of the shadow crew did not contribute to this particular accident, we can take
 advantage of the accident investigation to perform a safety audit on the operation of the system and
 identify potential improvements.
 Inaccurate and Inconsistent Mental Models: In general, the AWACS crew (and
 the ACE) shared the common view that helicopter activities were not an integral
 part of OPC air operations. There was also a misunderstanding about which provi-
 sions of the ATO applied to Army helicopter activities.
 Most of the people involved in the control of the F-15s were unaware of the
 presence of the Black Hawks in the TAOR that day, the lone exception perhaps
 being the enroute controller who knew they were there but apparently thought
 they would stay at the boundaries of the TAOR and thus were far from their actual
 location deep within it. The TAOR controller testified that he had never talked to
 the Black Hawks: Following their two check-ins with the enroute controller, the
 helicopters had remained on the enroute frequency (as was the usual, accepted
 practice), even as they flew deep into the TAOR.
 The enroute controller, who had been in contact with the Black Hawks, had an
 inaccurate model of where the helicopters were. When the Black Hawk pilots origi-
 nally reported their takeoff from the Army Military Coordination Center at Zakhu,
 they contacted the enroute controller and said they were bound for Lima. The
 enroute controller did not know to what city the call sign Lima referred and did not
 try to look up this information. Other members of the crew also had inaccurate
 models of their responsibilities, as described in the next section. The Black Hawk
 pilots clearly thought the AWACS was tracking them and also thought the con-
 trollers were using the Delta Point system—otherwise helicopter pilots would not
 have provided the route names in that way.
 The AWACS crews did not appear to have accurate models of the Black Hawks
 mission and role in OPC. Some of the flawed control actions seem to have resulted
 from a mental model that helicopters only went to Zakhu and therefore did not
 need to be tracked or to follow the standard TAOR procedures.
 As with the pilots and their visual recognition training, the incorrect mental
 models may have been at least partially the result of the inadequate AWACS train-
 ing the team received.
 Coordination among Multiple Controllers: As mentioned earlier, coordination
 problems are pervasive in this accident due to overlapping control responsibilities
 and confusion about responsibilities in the boundary areas of the controlled process.
 Most notably, the helicopters usually operated close to the boundary of the TAOR,
 resulting in confusion over who was or should be controlling them.
 The official accident report noted a significant amount of confusion within the
 AWACS mission crew regarding the tracking responsibilities for helicopters [5]. The
 mission crew commander testified that nobody was specifically assigned responsibil-
 ity for monitoring helicopter traffic in the no-fly zone and that his crew believed
 the helicopters were not included in their orders [159]. The staff weapons director
 made a point of not knowing what the Black Hawks do: “It was some kind of a
 squirrely mission” [159]. During the court-martial of the senior director, the AWACS
 tanker controller testified that in the briefing the crew received upon arrival at
 Incirlik, the staff weapons director had said about helicopters flying in the no-fly
 zone, ‘‘They’re there, but don’t pay any attention to them.” The enroute controller
 testified that the handoff procedures applied only to fighters. “We generally have
 no set procedures for any of the helicopters. . . . We never had any [verbal] guidance
 [or training] at all on helicopters” [159].
 Coordination problems also existed between the activities of the surveillance
 personnel and the other controllers. During the investigation of the accident, the
 ASO testified that surveillance’s responsibility was south of the 36th Parallel, and
 the other controllers were responsible for tracking and identifying all aircraft north
 of the 36th Parallel. The other controllers suggested that surveillance was respon-
 sible for tracking and identifying all unknown aircraft, regardless of location. In fact,
 Air Force regulations say that surveillance had tracking responsibility for unknown
 and unidentified tracks throughout the TAOR. It is not possible through the
 testimony alone, again because of the threat of court-martial, to piece out exactly
 what was the problem here, including simply a migration of normal operations from
 specified operations. At the least, it is clear that there was confusion about who was
 in control of what.
 One possible explanation for the lack of coordination among controllers at this
 level of the hierarchical control structure is that, as suggested by Snook, this particu-
 lar group had never trained together as a team [191]. But given the lack of proce-
 dures for handling helicopters and the confusion even by experienced controllers
 and the staff instructors about responsibilities for handling helicopters, Snook’s
 explanation is not very convincing. A more plausible explanation is simply a lack of
 guidance and delineation of responsibilities by the management level above. And
 even if the roles of everyone in such a structure had been well defined originally,
 uncontrolled local adaptation to more efficient procedures and asynchronous evolu-
 tion of the different parts of the control structure created dysfunctionalities as time
 passed. The helicopters and fixed wing aircraft had separate control structures that
 only joined fairly high up on the hierarchy and, as is described in the next section,
 there were communication problems between the components at the higher levels
 of the control hierarchy, particularly between the Army Military Coordination
 Center (MCC) and the Combined Forces Air Component (CFAC) headquarters.
 Feedback from the Controlled Process: Signals to the AWACS from the Black
 Hawks were inconsistent due to line-of-sight limitations and the mountainous terrain
 in which the Black Hawks were flying. The helicopters used the terrain to mask them-
 selves from air defense radars, but this terrain masking also caused the radar returns
 from the Black Hawks to the AWACS (and to the fighters) to fade at various times.
 Time Lags: Important time lags contributed to the accident, such as the delay of
 radio reports from the Black Hawk helicopters due to radio signal transmission
 problems and their inability to use the TACSAT radios until they had landed. As
 with the ACE, the speed with which the F-15 pilots acted also provided the control-
 lers with little time to evaluate the situation and respond appropriately.
 Changes after the Accident.
 Many changes were instituted with respect to AWACS operations after the
 accident:
 •
 1. Confirmation of a positive IFF Mode IV check was required for all OPC air-
 craft prior to their entry into the TAOR.
 2. • The responsibilities for coordination of air operations were better defined.
 3. • All AWACS aircrews went through a one-time retraining and recertification
 program, and every AWACS crewmember had to be recertified.
 4.• A plan was produced to reduce the temporary duty of AWACS crews to 120
 days a year. In the end, it was decreased from 166 to 135 days per year from
 January 1995 to July 1995. The Air Combat Command planned to increase the
 number of AWACS crews.
 5.• AWACS control was required for all TAOR flights.
 6.•
 In addition to normal responsibilities, AWACS controllers were required to
 specifically maintain radar surveillance of all TAOR airspace and to issue advi-
 sory/deconflicting assistance on all operations, including helicopters.
 7.• The AWACS controllers were required to periodically broadcast friendly heli-
 copter locations operating in the TAOR to all aircraft.
 Although not mentioned anywhere in the available documentation on the accident,
 it seems reasonable that either the AWACS crews started to use the Delta Point
 system or the Black Hawk pilots were told not to use it and an alternative means
 for transmitting flight plans was mandated.
 section 5.3.6. The Higher Levels of Control.
 Fully understanding the behavior at any level of the sociotechnical control structure
 requires understanding how and why the control at the next higher level allowed
 or contributed to the inadequate control at the current level. In this accident, many
 of the erroneous decisions and control actions at the lower levels can only be fully
 understood by examining this level of control.
 Context in Which Decisions and Actions Took Place
 Safety Requirements and Constraints Violated: There were many safety con-
 straints violated at the higher levels of the control structure—the Military Coordina-
 tion Center, Combined Forces Air Component, and CTF commander—and several
 people were investigated for potential court-martial and received official letters of
 reprimand. These safety constraints include: (1) procedures must be instituted that
 delegate appropriate responsibility, specify tasks, and provide effective training
 to all those responsible for tracking aircraft and conducting combat operations;
 (2) procedures must be consistent or at least complementary for everyone involved
 in TAOR airspace operations; (3) performance must be monitored (feedback chan-
 nels established) to ensure that safety-critical activities are being carried out cor-
 rectly and that local adaptations have not moved operations beyond safe limits;
 (4) equipment and procedures must be coordinated between the Air Force and
 Army to make sure that communication channels are effective and that asynchro-
 nous evolution has not occurred; (5) accurate information about scheduled flights
 must be provided to the pilots and the AWACS crews.
 Controls: The controls in place included operational orders and plans to designate
 roles and responsibilities as well as a management structure, the ACO, coordination
 meetings and briefings, a chain of command (OPC commander to mission director
 to ACE to pilots), disciplinary actions for those not following the written rules, and
 a group (the Joint Operations and Intelligence Center or JOIC) responsible for
 ensuring effective communication occurred.
 Roles and Responsibilities: The MCC had operational control over the Army
 helicopters while the CFAC had operational control over fixed-wing aircraft and
 tactical control over all aircraft in the TAOR. The Combined Task Force commander
 general (who was above both the CFAC and MCC) had ultimate responsibility for
 the coordination of fixed-wing aircraft flights with Army helicopters.
 While specific responsibilities of individuals might be considered here in an offi-
 cial accident analysis, treating the CFAC and MCC as entities is sufficient for the
 purposes of this analysis.
 Environmental and Behavior-Shaping Factors: The Air Force operated on a pre-
 dictable, well-planned, and tightly executed schedule. Detailed mission packages
 were organized weeks and months in advance. Rigid schedules were published and
 executed in preplanned packages. In contrast, Army aviators had to react to con-
 stantly changing local demands, and they prided themselves on their flexibility [191].
 Because of the nature of their missions, exact takeoff times and detailed flight plans
 for helicopters were virtually impossible to schedule in advance. They were even
 more difficult to execute with much rigor. The Black Hawks’ flight plan contained
 their scheduled takeoff time, transit routes between Diyarbakir through Gate 1 to
 Zakhu, and their return time. Because the Army helicopter crews rarely knew
 exactly where they would be going within the TAOR until after they were briefed
 at the Military Coordination Center at Zakhu, most flight plans only indicated that
 Eagle Flight would be “operating in and around the TAOR.”
 The physical separation of the Army Eagle Flight pilots from the CFAC opera-
 tions and Air Force pilots at Incirlik contributed to the communication difficulties
 that already existed between the services.
 Dysfunctional Interactions among Controllers.
 Dysfunctional communication at this level of the control structure played a critical
 role in the accident. These communication flaws contributed to the coordination
 flaws at this level and at the lower levels.
 A critical safety constraint to prevent friendly fire requires that the pilots of the
 fighter aircraft know who is in the no-fly zone and whether they are supposed
 to be there. However, neither the CTF staff nor the Combined Forces Air Compo-
 nent staff requested nor received timely, detailed flight information on planned
 MCC helicopter activities in the TAOR. Consequently, the OPC daily Air Tasking
 Order was published with little detailed information regarding U.S. helicopter flight
 activities over northern Iraq.
 According to the official accident report, specific information on routes of flight
 and times of MCC helicopter activity in the TAOR was normally available to the
 other OPC participants only when AWACS received it from the helicopter crews
 by radio and relayed the information on to the pilots [5]. While those at the higher
 levels of control may have thought this relaying of flight information was occurring,
 that does not seem to be the case given that the Delta point system (wherein the
 helicopter crews provided the AWACS controllers with their flight plan) was not
 used by the AWACS controllers: When the helicopters went beyond Zakhu, the
 AWACS controllers did not know their flight plans and therefore could not relay
 that information to the fighter pilots and other OPC participants.
 The weekly flight schedules the MCC provided to the CFAC staff were not com-
 plete enough for planning purposes. While the Air Force could plan their missions
 in advance, the different type of Army helicopter missions had to be flexible to react
 to daily needs. The MCC daily mission requirements were generally based on the
 events of the previous day. A weekly flight schedule was developed and provided
 to the CTF staff, but a firm itinerary was usually not available until after the next
 day’s ATO was published. The weekly schedule was briefed at the CTF staff meet-
 ings on Mondays, Wednesday, and Fridays, but the information was neither detailed
 nor firm enough for effective rotary-wing and fixed-wing aircraft coordination and
 scheduling purposes [5].
 Each daily ATO was published showing several Black Hawk helicopter lines. Of
 these, two helicopter lines (two flights of two helicopters each) were listed with call
 signs (Eagle 01/02 and Eagle 03/04), mission numbers, IFF Mode II codes, and a
 route of flight described only as LLTC (the identifier for Diyarbakir) to TAOR to
 LLTC. No information regarding route or duration of flight time within the TAOR
 was given on the ATO. Information concerning takeoff time and entry time into the
 TAOR was listed as A/R (as required).
 Every evening, the MCC at Zakhu provided a situation report (SITREP) to the
 JOIC (located at Incirlik), listing the helicopter flights for the following day. The
 SITREP did not contain complete flight details and arrived too late to be included
 in the next day’s ATO. The MCC would call the JOIC the night prior to the sched-
 uled mission to “activate” the ATO line. There were, however, no procedures in
 place to get the SITREP information from the JOIC to those needing to know it
 in CFAC.
 After receiving the SITREP, a duty officer in the JOIC would send takeoff times
 and gate times (the times the helicopters would enter northern Iraq) to Turkish
 operations for approval. Meanwhile, an intelligence representative to the JOIC
 consolidated the MCC weekly schedule with the SITREP and used secure intelli-
 gence channels to pass this updated information to some of his counterparts in
 operational squadrons who had requested it. No procedures existed to pass this
 information from the JOIC to those in CFAC with tactical responsibility for the
 helicopters (through the ACE and Mission Director) [5]. Because CFAC normally
 determined who would fly when, the information channels were designed primarily
 for one-way communications outward and downward.
 In the specific instance involved in the shootdown, the MCC weekly schedule
 was provided on April 8 to the JOIC and thence to the appropriate person in CFAC.
 That schedule showed a two-ship, MCC helicopter administrative flight scheduled
 for April 14. According to the official accident report, two days before (April 12)
 the MCC Commander had requested approval for an April 14 flight outside the
 Security Zone from Zakhu to the towns of Irbil and Salah ad Din. The OPC com-
 manding general approved the written request on April 13, and the JOIC transmit-
 ted the approval to the MCC but apparently the information was not provided to
 those responsible for producing the ATO. The April 13 SITREP from MCC listed
 the flight as “mission support,” but contained no other details. Note more informa-
 tion was available earlier than normal in this instance, and it could have been
 included in the ATO but the established communication channels and procedures
 did not exist to get it to the right places. The MCC weekly schedule update, received
 by the JOIC on the evening of April 13 along with the MCC SITREP, gave the
 destinations for the mission as Salah ad Din and Irbil. This information was not
 passed to CFAC.
 Late in the afternoon on April 13, MCC contacted the JOIC duty officer and
 activated the ATO line for the mission. A takeoff time of 0520 and a gate time of
 0625 were requested. No takeoff time or route of flight beyond Zakhu was specified.
 The April 13 SITREP, the weekly flying schedule update, and the ATO-line activa-
 tion request were received by the JOIC too late to be briefed during the Wednesday
 (April 13) staff meetings. None of the information was passed to the CFAC schedul-
 ing shop (which was responsible for distributing last minute changes to the ATO
 through various sources such as the Battle Staff Directives, morning briefings, and
 so on), to the ground-based Mission Director, nor to the ACE on board the AWACS
 [5]. Note that this flight was not a routine food and medical supply run, but instead
 it carried sixteen high-ranking VIPs and required the personal attention and approval
 of the CTF Commander. Yet information about the flight was never communicated
 to the people who needed to know about it [191]. That is, the information went up
 from the MCC to the CTF staff, but not across from MCC to CFAC nor down from
 the CTF staff to CFAC (see figure 5.3).
 A second example of a major dysfunctional communication involved the com-
 munication of the proper radio frequencies and IFF codes to be used in the TAOR.
 About two years before the shootdown, someone in the CFAC staff decided to
 change the instructions pertaining to IFF modes and codes. According to Snook, no
 one recalled exactly how or why this change occurred. Before the change, all aircraft
 squawked a single Mode I code everywhere they flew. After the change, all aircraft
 were required to switch to a different Mode I code while flying in the no-fly zone. The
 change was communicated through the daily ATO. However, after the accident it was
 discovered that the Air Force’s version of the ATO was not exactly the same as the
 one received electronically by the Army aviators—another instance of asynchronous
 evolution and lack of linkup between system components. For at least two years,
 there existed two versions of the daily ATO: one printed out directly by the Incirlik
 Frag Shop and distributed locally by messenger to all units at Incirlik Air Base, and
 a second one transmitted electronically through an Air Force communications center
 (the JOIC) to Army helicopter operations at Diyarbakir. The one received by the
 Army aviators was identical in all respects to the one distributed by the Frag Shop,
 except for the changed Mode I code information contained in the SPINS. The ATO
 that Eagle Flight received contained no mention of two Mode I codes [191].
 What about the confusion about the proper radio frequency to be used by the
 Black Hawks in the TAOR? Piper notes that the Black Hawk pilots were told
 to use the enroute frequency while flying in the TAOR. The commander of OPC
 testified after the accident that the use by the Black Hawks of the enroute radio
 frequency rather than the TAOR frequency had been briefed to him as a safety
 measure because the Black Hawk helicopters were not equipped with HAVE
 QUICK technology. The ACO (Aircraft Control Order) required the F-15s to use
 non–HAVE QUICK mode when talking to specific types of aircraft (such as F-1s)
 that, like the Black Hawks, did not have the new technology. The list of non-HQ
 aircraft provided to the F-15 pilots, however, for some reason did not include
 UH-60s. Apparently the decision was made to have the Black Hawks use the
 enroute radio frequency but this decision was never communicated to those respon-
 sible for the F-15 procedures specified in the ACO. Note that a thorough investiga-
 tion of the higher levels of control, as is required in a STAMP-based analysis, is
 necessary to explain properly the use of the enroute radio frequency by the Black
 Hawks. Of the various reports on the shootdown, only Piper notes the fact that an
 exception had been made for Army helicopters for safety reasons—the official
 accident report, Snook’s detailed book on the accident, and the GAO report do not
 mention this fact! Piper found out about it from her attendance at the public hear-
 ings and trial. This omission of important information from the accident reports is
 an interesting example of how incomplete investigation of the higher levels of
 control can lead to incorrect causal analysis. In her book, Piper questions why the
 Accident Investigation Board, while producing twenty-one volumes of evidence,
 never asked the commander of OPC about the radio frequency and other problems
 found during the investigation.
 Other official exceptions were made for the helicopter operations, such as
 allowing them in the Security Zone without AWACS coverage. Using STAMP,
 the accident can be understood as a dynamic process where the operations of the
 Army and Air Force adapted and diverged without effective communication and
 coordination.
 Many of the dysfunctional communications and interactions stem from asynchro-
 nous evolution of the mission and the operations plan. In response to the evolving
 mission in northern Iraq, air assets were increased in September 1991 and a signifi-
 cant portion of the ground forces were withdrawn. Although the original organiza-
 tional structure of the CTF was modified at this time, the operations plan was not.
 In particular, the position of the person who was in charge of communication and
 coordination between the MCC and CFAC was eliminated without establishing an
 alternative communication channel.
 Unsafe asynchronous evolution of the safety control structure can be prevented
 by proper documentation of safety constraints, assumptions, and their controls
 during system design and checking before changes are made to determine if the
 constraints and assumptions are violated by the design. Unintentional changes and
 migration of behavior outside the boundaries of safety can be prevented by various
 means, including education, identifying and checking leading indicators, and tar-
 geted audits. Part III describes ways to prevent asynchronous evolution from leading
 to accidents.
 Flawed or Inadequate Control Actions.
 There were many flawed or missing control actions at this level, including:
 1.•
 The Black Hawk pilots were allowed to enter the TAOR without AWACS cover-
 age and the F-15 pilots and AWACS crews were not informed about this excep-
 tion to the policy. This control problem is an example of the problems of
 distributed decision making with other decision makers not being aware of the
 decisions of others (see the Zeebrugge example in figure 2.2).
 Prior to September 1993, Eagle Flight helicopters flew any time required,
 before the fighter sweeps and without fighter coverage, if necessary. After
 September 1993, helicopter flights were restricted to the security zone if
 AWACS and fighter coverage were not on station. But for the mission on April
 14, Eagle Flight requested and received permission to execute their flight
 outside the security zone. A CTF policy letter dated September 1993 imple-
 mented the following policy for UH-60 helicopter flights supporting the MCC:
 “All UH-60 flights into Iraq outside of the security zone require AWACS cover-
 age.” Helicopter flights had routinely been flown within the TAOR security
 zone without AWACS or fighter coverage and CTF personnel at various levels
 were aware of this. MCC personnel were aware of the requirement to have
 AWACS coverage for flights outside the security zone and complied with that
 requirement. However, the F-15 pilots involved in the accident, relying on the
 written guidance in the ACO, believed that no OPC aircraft, fixed or rotary
 wing, were allowed to enter the TAOR prior to a fighter sweep [5].
 At the same time, the Black Hawks also thought they were operating cor-
 rectly. The Army Commander at Zakhu had called the Commander of Opera-
 tions, Plans, and Policy for OPC the night before the shootdown and asked to
 be able to fly the mission without AWACS coverage. He was told that they must
 have AWACS coverage. From the view of the Black Hawks pilots (who had
 reported in to the AWACS during the flight and provided their flight plan and
 destinations) they were complying and were under AWACS control.
 2.•Helicopters were not required to file detailed ,flight plans and follow them.
 Effective procedures were not established for communicating last minute
 changes or updates to the Army flight plans that had been filed.
 3.•F-15 pilots were not told to use non-HQ mode for helicopters.
 4.•No procedures were specified to pass SITREP information to CFAC. Helicop-
 ter flight plans were not distributed to CFAC and the F-15 pilots, but they were
 given to the F-16 squadrons. Why was one squadron informed, while another
 one, located right across the street, was not? F-15s are designed primarily for
 air superiority—high altitude aerial combat missions. F-16s, on the other hand,
 are all-purpose fighters. Unlike F-15s, which rarely flew low-level missions, it
 was common for F-16s to fly low-level missions where they might encounter
 the low-flying Army helicopters. As a result, to avoid low-altitude midair colli-
 sions, staff officers in F-16 squadrons requested details concerning helicopter
 operations from the JOIC, went to pick it up from the mail pickup point on the
 post, and passed it on to the pilots during their daily briefings; F-15 planners
 did not [191].
 5.•Inadequate training on the ROE was provided for new rotators. Piper claims
 that OPC personnel did not receive consistent, comprehensive training to
 ensure they had a thorough understanding of the rules of engagement and that
 many of the aircrews new to OPC questioned the need for the less aggressive
 rules of engagement in what had been designated a combat zone [159]. Judging
 from these complaints (details can be found in [159]) and incidents involving
 F-15 pilots, it appears that the pilots did not fully understand the ROE purpose
 or need.
 6.•Inadequate training was provided to the F-15 pilots on visual identification.
 7.•Inadequate simulator and spin-up training was provided to the AWACS crews.
 Asynchronous evolution occurred between the changes in the training materi-
 als and the actual situation in the no-fly zone. In addition, there were no
 controls to ensure the required simulator sessions were provided and that all
 members of the crew participated.
 8.•Handoff procedures were never established for, helicopters. In fact, no explicit
 or written procedures, verbal guidance, or training of any kind were provided
 to the AWACS crews regarding the control of helicopters within the TAOR
 [191]. The AWACS crews testified during the investigation that they lost contact
 with helicopters all the time, but there were no procedures to follow when that
 occurred.
 9.•Inadequate procedures were specified and enforced for how the shadow crew
 would instruct the new crews.
 10.•The rules and procedures established for the operation did not provide adequate
 control over unsafe F-15 pilot behavior, adequate enforcement of discipline, or
 adequate handling of safety violations. The CFAC Assistant Director of Oper-
 ations told the GAO investigators that there was very little F-15 oversight in
 OPC at the time of the shootdown. There had been so many flight discipline
 incidents leading to close calls that a group safety meeting had been held a
 week before the shootdown to discuss it. The flight discipline and safety issues
 included midair close calls, unsafe incidents when refueling, and unsafe takeoffs.
 The fixes (including the meeting) obviously were not effective. But the fact that
 there were a lot of close calls indicates serious safety problems existed and were
 not handled adequately.
 The CFAC Assistant Director of Operations also told the GAO that con-
 tentious issues involving F-15 actions had become common topics of discus-
 sion at Detachment Commander meetings. No F-15 pilots were on the CTF
 staff to communicate with the F-15 group about these problems. The OPC
 Commander testified that there was no tolerance for mistakes or unprofes-
 sional flying at OPC and that he had regularly sent people home for violation
 of the rules—the majority of those he sent home were F-15 pilots, suggesting
 that there were serious problems in discipline and attitude among this group
 [159].
 11.•The Army pilots were given the wrong information about the IFF codes and
 radio frequencies to use in the TAOR. As described above, this mismatch
 resulted from asynchronous evolution and lack of linkup (consistency) between
 process controls, that is, the two different ATOs. It provides yet another example
 of the danger involved in distributed decision making (again see figure 2.2).
 Reasons for the Flawed Control.
 Ineffective Control Algorithms: Almost all of the control flaws at this level relate
 to the existence and use of ineffective control algorithms. Equipment and
 procedures were not coordinated between the Air Force and the Army to make sure
 that communication channels were effective and that asynchronous evolution had
 not occurred. The last CTF staff member who appears to have actively coordinated
 rotary-wing flying activities with the CFAC organization departed in January 1994.
 No representative of the MCC was specifically assigned to the CFAC for coordina-
 tion purposes. Since December 1993, no MCC helicopter detachment representative
 had attended the CFAC weekly scheduling meetings. The Army liaison officer,
 attached to the MCC helicopter detachment at Zakhu and assigned to Incirlik AB,
 was new on station (he arrived in April 1994) and was not fully aware of the rela-
 tionship of the MCC to the OPC mission [5].
 Performance was not monitored to ensure that safety-critical activities were
 carried out correctly, that local adaptations had not moved operations beyond safe
 limits, and that information was being effectively transmitted and procedures fol-
 lowed. Effective controls were not established to prevent unsafe adaptations.
 The feedback that was provided about the problems at the lower levels was
 ignored. For example, the Piper account of the accident includes a reference to
 helicopter pilots’ testimony that six months before the shootdown, in October 1993,
 they had complained that the fighter aircraft were using their radar to lock onto the
 Black Hawks an unacceptable number of times. The Army helicopter pilots had
 argued there was an urgent need for the Black Hawk pilots to be able to commu-
 nicate with the fixed-wing aircraft, but nothing was changed until after the accident,
 when new radios were installed in the Black Hawks.
 Inaccurate Mental Models: The commander of the Combined Task Force thought
 that the appropriate control and coordination was occurring. This incorrect mental
 model was supported by the feedback he received flying as a regular passenger on
 board the Army helicopter flights, where it was his perception that the AWACS was
 monitoring their flight effectively. The Army helicopter pilots were using the Delta
 Point system to report their location and flight plans, and there was no indication
 from the AWACS that the messages were being ignored. The CTF Commander
 testified that he believed the Delta Point system was standard on all AWACS mis-
 sions. When asked at the court-martial of the AWACS senior director whether the
 AWACS crew were tracking Army helicopters, the OPC Commander replied:
 Well, my experience from flying dozens of times on Eagle Flight, which that—for some
 eleven hundred and nine days prior to this event, that was—that was normal procedures
 for them to flight follow. So, I don’t know that they had something written about it, but I
 know that it seemed very obvious and clear to me as a passenger on Eagle Flight numer-
 ous times that that was occurring. [159]
 The commander was also an active F-16 pilot who attended the F-16 briefings. At
 these briefings he observed that Black Hawk times were part of the daily ATOs
 procedures were not coordinated between the Air Force and the Army to make sure
 that communication channels were effective and that asynchronous evolution had
 not occurred. The last CTF staff member who appears to have actively coordinated
 rotary-wing flying activities with the CFAC organization departed in January 1994.
 No representative of the MCC was specifically assigned to the CFAC for coordina-
 tion purposes. Since December 1993, no MCC helicopter detachment representative
 had attended the CFAC weekly scheduling meetings. The Army liaison officer,
 attached to the MCC helicopter detachment at Zakhu and assigned to Incirlik AB,
 was new on station (he arrived in April 1994) and was not fully aware of the rela-
 tionship of the MCC to the OPC mission [5].
 Performance was not monitored to ensure that safety-critical activities were
 carried out correctly, that local adaptations had not moved operations beyond safe
 limits, and that information was being effectively transmitted and procedures fol-
 lowed. Effective controls were not established to prevent unsafe adaptations.
 The feedback that was provided about the problems at the lower levels was
 ignored. For example, the Piper account of the accident includes a reference to
 helicopter pilots’ testimony that six months before the shootdown, in October 1993,
 they had complained that the fighter aircraft were using their radar to lock onto the
 Black Hawks an unacceptable number of times. The Army helicopter pilots had
 argued there was an urgent need for the Black Hawk pilots to be able to commu-
 nicate with the fixed-wing aircraft, but nothing was changed until after the accident,
 when new radios were installed in the Black Hawks.
 Inaccurate Mental Models: The commander of the Combined Task Force thought
 that the appropriate control and coordination was occurring. This incorrect mental
 model was supported by the feedback he received flying as a regular passenger on
 board the Army helicopter flights, where it was his perception that the AWACS was
 monitoring their flight effectively. The Army helicopter pilots were using the Delta
 Point system to report their location and flight plans, and there was no indication
 from the AWACS that the messages were being ignored. The CTF Commander
 testified that he believed the Delta Point system was standard on all AWACS mis-
 sions. When asked at the court-martial of the AWACS senior director whether the
 AWACS crew were tracking Army helicopters, the OPC Commander replied:
 Well, my experience from flying dozens of times on Eagle Flight, which that—for some
 eleven hundred and nine days prior to this event, that was—that was normal procedures
 for them to flight follow. So, I don’t know that they had something written about it, but I
 know that it seemed very obvious and clear to me as a passenger on Eagle Flight numer-
 ous times that that was occurring. [159]
 The commander was also an active F-16 pilot who attended the F-16 briefings. At
 these briefings he observed that Black Hawk times were part of the daily ATOs
 received by the F-16 pilots and assumed that all squadrons were receiving the same
 information. However, as noted, the head of the squadron with which the com-
 mander flew had gone out of his way to procure the Black Hawk flight information,
 while the F-15 squadron leader had not.
 Many of those involved at this level were also under the impression that the
 ATOs provided to the F-15 pilots and to the Black Hawks pilots were consistent,
 that required information had been distributed to everyone, that official procedures
 were understood and being followed, and so on.
 Coordination among Multiple Controllers: There were clearly problems with over-
 lapping and boundary areas of control between the Army and the Air Force. Coor-
 dination problems between the services are legendary and were not handled
 adequately here. For example, two different versions of the ATO were provided to
 the Air Force and the Army pilots. The Air Force F-15s and the Army helicopters
 had separate control structures, with a common control point fairly high above the
 physical process. The problems were complicated by the differing importance of
 flexibility in flight plans between the two services. One symptom of the problem
 was that there was no requirement for helicopters to file detailed flight plans and
 follow them and no procedures established to deal with last minute changes. These
 deficiencies were also related to the shared control of helicopters by MCC and
 CFAC and complicated by the physical separation of the two headquarters.
 During the accident investigation, a question was raised about whether the Com-
 bined Task Force Chief of Staff was responsible for the breakdown in staff com-
 munication. After reviewing the evidence, the hearing officer recommended that no
 adverse action be taken against the Chief of Staff because he (1) had focused his
 attention according to the CTF Commander’s direction, (2) had neither specific
 direction nor specific reason to inquire into the transmission of info between his
 Director of Operations for Plans and Policy and the CFAC, (3) had been the most
 recent arrival and the only senior Army member of a predominantly Air Force staff
 and therefore generally unfamiliar with air operations, and (4) had relied on expe-
 rienced colonels under whom the deficiencies had occurred [200]. This conclusion
 was obviously influenced by the goal of trying to establish blame. Ignoring the blame
 aspects, the conclusion gives the impression that nobody was in charge and everyone
 thought someone else was.
 According to the official accident report, the contents of the ACO largely reflected
 the guidance given in the operations plan dated September 7, 1991. But that was the
 plan provided before the mission had changed. The accident report concludes that
 key CTF personnel at the time of the accident were either unaware of the existence
 of this particular plan or considered it too outdated to be applicable. The accident
 report states, “Most key personnel within the CFAC and CTF staff did not consider
 coordination of MCC helicopter activities to be part of their respective CFAC / CTF 
 responsibilities” [5].
 Because of the breakdown of clear guidance from the Combined Task Force staff
 to its component organizations (CFAC and MCC ) , they did not have a clear under-
 standing of their respective responsibilities. Consequently, MCC helicopter activities
 were not fully integrated with other OPC air operations in the TAOR.
 section 5.4.
 Conclusions from the Friendly Fire Example.
 When looking only at the proximate events and the behavior of the immediate
 participants in the accidental shootdown, the reasons for this accident appear to be
 gross mistakes by the technical system operators (the pilots and AWACS crew). In
 fact, a special Air Force task force composed of more than 120 people in six com-
 mands concluded that two breakdowns in individual performance contributed to
 the shootdown: (1) the AWACS mission crew did not provide the F-15 pilots an
 accurate picture of the situation and (2) the F-15 pilots misidentified the target.
 From the twenty-one-volume accident report produced by the Accident Investiga-
 tion Board, Secretary of Defense William Perry summarized the “errors, omissions,
 and failures” in the “chain of events” leading to the loss as:
 1.• The F-15 pilots misidentified the helicopters as Iraqi Hinds.
 2.• The AWACS crew failed to intervene.
 3.• The helicopters and their operations were not integrated into the Task Force
 running the no-fly zone operations.
 4.• The Identity Friend or Foe ( IFF ) systems failed.
 According to Snook, the military community has generally accepted these four
 “causes” as the explanation for the shootdown.
 While there certainly were mistakes made at the pilot and AWACS levels, the
 use of the STAMP analysis paints a much more complete explanation of the role of
 the environment and other factors that influenced their behavior including: incon-
 sistent, missing, or inaccurate information; incompatible technology; inadequate
 coordination; overlapping areas of control and confusion about who was responsible
 for what; a migration toward more efficient but less safe operational procedures
 over time without any controls and checks on the potential adaptations; inadequate
 training; and in general a control structure that did not enforce the safety constraints.
 Boiling down this very complex accident to four “causes” and assigning blame in
 this way inhibits learning from the events. The more complete STAMP analysis was
 possible only because individuals outside the military, some of whom were relatives
 of the victims, did not accept the simple analysis provided in the accident report and
 did their own uncovering of the facts.
 STAMP views an accident as a dynamic process. In this case, Army and Air Force
 operations adapted and diverged without communication and coordination. OPC
 had operated incident-free for over three years at the time of the shootdown. During
 that time, local adaptations to compensate for inadequate control from above had
 managed to mask the ongoing problems until a situation occurred where local
 adaptations did not work. A lack of awareness at the highest levels of command of
 the severity of the coordination, communication, and other problems is a key factor
 in this accident.
 Nearly all the types of causal factors identified in section 4.5 can be found in this
 accident. This fact is not an anomaly: Most accidents involve a large number of these
 factors. Concentrating on an event chain focuses attention on the proximate events
 associated with the accident and thus on the principle local actors, in this case, the
 pilots and the AWACS personnel. Treating an accident as a control problem using
 STAMP clearly identifies other organizational factors and actors and the role they
 played. Most important, without this broader view of the accident, only the symp-
 toms of the organizational problems may be identified and eliminated without
 significantly reducing risk of a future accident caused by the same systemic factors
 but involving different symptoms at the lower technical and operational levels of
 the control structure.
 More information on how to build multiple views of an accident using STAMP
 in order to aid understanding can be found in chapter 11. More examples of STAMP
 accident analyses can be found in the appendixes.
--- a/chapter05.txt
+++ b/chapter05.txt
--- a/chapter06.raw
+++ b/chapter06.raw
@ -0,0 +1,349 @@
 part 3. USING STAMP.
 STAMP provides a new theoretical foundation for system safety on which new, more
 powerful techniques and tools for system safety can be constructed. Part III presents
 some practical methods for engineering safer systems. All the techniques described
 in part III have been used successfully on real systems. The surprise to those trying
 them has been how well they work on enormously complex systems and how eco-
 nomical they are to use. Improvements and even more applications of the theory to
 practice will undoubtedly be created in the future.
 chapter 6.
 Engineering and Operating Safer Systems Using
 STAMP.
 Part III of this book is for those who want to build safer systems without incurring
 enormous and perhaps impractical financial, time, and performance costs. The belief
 that building and operating safer systems requires such penalties is widespread and
 arises from the way safety engineering is usually done today. It need not be the case.
 The use of top-down system safety engineering and safety-guided design based on
 STAMP can not only enhance the safety of these systems but also potentially reduce
 the costs associated with engineering for safety. This chapter provides an overview,
 while the chapters following it provide details about how to implement this cost-
 effective safety process.
 section 6.1.
 Why Are Safety Efforts Sometimes Not Cost-Effective?
 While there are certainly some very effective safety engineering programs, too
 many expend a large amount of resources with little return on the investment in
 terms of improved safety. To fix a problem, we first need to understand it. Why are
 safety efforts sometimes not cost-effective? There are five general answers to this
 question:
 1. Safety efforts may be superficial, isolated, or misdirected.
 2. Safety activities often start too late.
 3. The techniques used are not appropriate for the systems we are building today
 and for new technology.
 4. Efforts may be narrowly focused on the technical components.
 5. Systems are usually assumed to be static throughout their lifetime.
 Superficial, isolated, or misdirected safety engineering activities: Often, safety
 engineering consists of performing a lot of very costly and tedious activities of
 limited usefulness in improving safety in the final system design. Childs calls this
 “cosmetic system safety” [37]. Detailed hazard logs are created and analyses
 performed, but these have limited impact on the actual system design. Numbers are
 associated with unquantifiable properties. These numbers always seem to support
 whatever numerical requirement is the goal, and all involved feel as if they have
 done their jobs. The safety analyses provide the answer the customer or designer
 wants—that the system is safe—and everyone is happy. Haddon-Cave, in the 2009
 Nimrod MR2 accident report, called such efforts compliance only exercises [78]. The
 results impact certification of the system or acceptance by management, but despite
 all the activity and large amounts of money spent, the safety of the system has been
 unaffected.
 A variant of this problem is that safety activities may be isolated from the engi-
 neers and developers building the system. Too often, safety professionals are sepa-
 rated from engineering design and placed within a mission assurance organization.
 Safety cannot be assured without its already being part of the design; systems must
 be constructed to be safe from the beginning. Separating safety engineering from
 design engineering is almost guaranteed to make the effort and resources expended
 a poor investment. Safety engineering is effective when it participates in and pro-
 vides input to the design process, not when it focuses on making arguments about
 the artifacts created after the major safety-related decisions have been made.
 Sometimes the major focus of the safety engineering efforts is on creating a safety
 case that proves the completed design is safe, often by showing that a particular
 process was followed during development. Simply following a process does not
 mean that the process was effective, which is the basic limitation of many process
 assurance activities. In other cases the arguments go beyond the process, but they
 start from the assumption that the system is safe and then focus on showing the
 conclusion is true. Most of the effort is spent in seeking evidence that shows the
 system is safe while not looking for evidence that the system is not safe. The basic
 mindset is wrong, so the conclusions are biased.
 One of the reasons System Safety has been so successful is that it takes the oppo-
 site approach: an attempt is made to show that the system is unsafe and to identify
 hazardous scenarios. By using this alternative perspective, paths to hazards are often
 identified that were missed by the engineers, who tend to focus on what they want
 to happen, not what they do not want to happen.
 If safety-guided design, as defined in part III of this book, is used, the “safety
 case” is created along with the design. Developing the certification argument
 becomes trivial and consists primarily of simply gathering the documentation that
 has been created during the development process.
 Safety efforts start too late: Unlike the examples of ineffective safety activities
 above, the safety efforts may involve potentially useful activities, but they may start
 too late. Frola and Miller claim that 70–80 percent of the most critical decisions
 related to the safety of the completed system are made during early concept devel-
 opment [70]. Unless the safety engineering effort impacts these decisions, it is
 unlikely to have much effect on safety. Too often, safety engineers are busy doing
 safety analyses, while the system engineers are in parallel making critical decisions
 about system design and concepts of operation that are not based on that hazard
 analysis. By the time the system engineers get the information generated by the
 safety engineers, it is too late to have a significant impact on design decisions.
 Of course, engineers normally do try to consider safety early, but the information
 commonly available is only whether a particular function is safety-critical or not.
 They are told that the function they are designing can contribute to an accident,
 with perhaps some letter or numerical “score” of how critical it is, but not much else.
 Armed only with this very limited information, they have no choice but to focus
 safety design efforts on increasing the component’s reliability by adding redundancy
 or safety margins. These features are often added without careful analysis of whether
 they are needed or will be effective for the specific hazards related to that system
 function. The design then becomes expensive to build and maintain without neces-
 sarily having the maximum possible (or sometimes any) impact on eliminating
 or reducing hazards. As argued earlier, redundancy and overdesign, such as building
 in safety margins, are effective primarily for purely electromechanical components
 and component failure accidents. They do not apply to software and miss component
 interaction accidents entirely. In some cases, such design techniques can even
 contribute to component interaction accidents when they add to the complexity of
 the design.
 Most of our current safety engineering techniques start from detailed designs. So
 even if they are conscientiously applied, they are useful only in evaluating the safety
 of a completed design, not in guiding the decisions made early in the design creation
 process. One of the results of evaluating designs after they are created is that engi-
 neers are confronted with important safety concerns only after it is too late or too
 expensive to make significant changes. If and when the system and component
 design engineers get the results of the safety activities, often in the form of a critique
 of the design late in the development process, the safety concerns are frequently
 ignored or argued away because changing the design at that time is too costly.
 Design reviews then turn into contentious exercises where one side argues that the
 system has serious safety limitations while the other side argues that those limita-
 tions do not exist, they are not serious, or the safety analysis is wrong.
 The problem is not a lack of concern by designers; it’s simply that safety concerns
 about their design are raised at a time when major design changes are not possible—
 the design engineers have no other option than to defend the design they have.
 If they lose that argument, then they must try to patch the current design; starting
 over with a safer design is, in almost all cases, impractical. If the designers had the
 information necessary to factor safety into their early decision making, then the
 process of creating safer designs need cost no more and, in fact, will cost less due
 to two factors: (1) reduced rework after the decisions made are found to be flawed
 or to provide inadequate safety and (2) less unnecessary overdesign and unneeded
 protection.
 The key to having a cost-effective safety effort is to embed it into a system
 engineering process starting from early concept development and then to design
 safety into the system as the design decisions are made. Costs are much less when
 safety is built into the system design from the beginning rather than added on or
 retrofitted later.
 The techniques used are not appropriate for today’s systems and new technol-
 ogy: The assumptions of the major safety engineering techniques currently used,
 almost all of which stem from decades past, do not match the assumptions underlying
 the technology and complexity of the systems being built today or the new emerging
 causes of accidents: They do not apply to human or software errors or flawed man-
 agement decision making, and they certainly do not apply to weaknesses in the
 organizational structure or social infrastructure systems. These contributors to acci-
 dents do not “fail” in the same way assumed by the current safety analysis tools.
 But with no other tools to use, safety engineers attempt to force square pegs into
 round holes, hoping this will be sufficient. As a result, nothing much is accomplished
 beyond expending time, money, and other resources. It’s time we face up to the fact
 that new safety engineering techniques are needed to handle those aspects of
 systems that go beyond the analog hardware components and the relatively simple
 designs of the past for which the current techniques were invented. Chapter 8
 describes a new hazard analysis technique based on STAMP, called STPA, but others
 are possible. The important thing is to confront these problems head on and not
 ignore them and waste our time misapplying or futilely trying to extend techniques
 that do not apply to today’s systems.
 The safety efforts are focused on the technical components of the system: Many
 safety engineering (and system engineering, for that matter) efforts focus on the
 technical system details. Little effort is made to consider the social, organizational,
 and human components of the system in the design process. Assumptions are made
 that operators will be trained to do the right things and that they will adapt to
 whatever design they are given. Sophisticated human factors and system analysis
 input is lacking, and when accidents inevitably result, they are blamed on the opera-
 tors for not behaving the way the designers thought they would. To give just one
 example (although most accident reports contain such examples), one of the four
 causes, all of which cited pilot error, identified in the loss of the American Airlines
 B757 near Cali, Colombia (see chapter 2), was “Failure of the flight crew to revert
 to basic radio navigation when the FMS-assisted navigation became confusing and
 demanded an excessive workload in a critical phase of the flight.” A more useful
 alternative statement of the cause might have been “An FMS system that confused
 the operators and demanded an excessive workload in a critical phase of flight.”
 Virtually all systems contain humans, but engineers are often not taught much
 about human factors and draw convenient boundaries around the technical com-
 ponents, focusing their attention inside these artificial boundaries. Human factors
 experts have complained about the resulting technology-centered automation [208],
 where the designers focus on technical issues and not on supporting operator tasks.
 The result is what has been called “clumsy” automation that increases the chance
 of human error [183, 22, 208]. One of the new assumptions for safety in chapter 2
 is that operator “error” is a product of the environment in which it occurs.
 A variant of the problem is common in systems using information technology.
 Many medical information systems, for example, have not been as successful as they
 might have been in increasing safety and have even led to new types of hazards and
 losses [104, 140]. Often, little effort is invested during development in considering
 the usability of the system by medical professionals or of the impact, not always
 positive, that the information system design will have on workflow and on the
 practice of medicine.
 Automation is commonly assumed to be safer than manual systems because
 the hazards associated with the manual systems are eliminated. Inadequate con-
 sideration is given to whether new, and maybe even worse, hazards are introduced
 by the automated system and how to prevent or minimize these new hazards. The
 aviation industry has, for the most part, learned this lesson for cockpit and flight
 control design, where eliminating errors of commission simply created new errors
 of omission [181, 182] (see chapter 9), but most other industries are far behind in
 this respect.
 Like other safety-related system properties that are ignored until too late, opera-
 tors and human-factors experts often are not brought into the early design process
 or they work in isolation from the designers until changes are extremely expensive
 to make. Sometimes, human factors design is not considered until after an accident,
 and occasionally not even then, almost guaranteeing that more accidents will occur.
 To provide cost-effective safety engineering, the system and safety analysis
 and design process needs to consider the humans in systems—including those that
 are not directly controlling the physical processes—not separately or after the fact
 but starting at concept development and continuing throughout the life cycle of
 the system.
 Systems are assumed to be static throughout their lifetimes: It is rare for engi-
 neers to consider how the system will evolve and change over time. While designing
 for maintainability may be considered, unintended changes are often ignored.
 Change is a constant for all systems: physical equipment ages and degrades over
 its lifetime and may not be maintained properly; human behavior and priorities
 usually change over time; organizations change and evolve, which means the safety
 control structure itself will evolve. Change may also occur in the physical and social
 environment within which the system operates and with which it interacts. To be
 effective, controls need to be designed that will reduce the risk associated with all
 these types of changes. Not only are accidents expensive, but once again planning
 for system change can reduce the costs associated with the change itself. In addition,
 much of the effort in operations needs to be focused on managing and reacting
 to change.
 section 6.2.
 The Role of System Engineering in Safety.
 As the systems we build and operate increase in size and complexity, the use of
 sophisticated system engineering approaches becomes more critical. Important
 system-level (emergent) properties, such as safety, must be built into the design of
 these systems; they cannot be effectively added on or simply measured afterward.
 While system engineering was developed originally for technical systems, the
 approach is just as important and applicable to social systems or the social compo-
 nents of systems that are usually not thought of as “engineered.” All systems are
 engineered in the sense that they are designed to achieve specific goals, namely to
 satisfy requirements and constraints. So ensuring hospital safety or pharmaceutical
 safety, for example, while not normally thought of as engineering problems, falls
 within the broad definition of engineering. The goal of the system engineering
 process is to create a system that satisfies the mission while maintaining the con-
 straints on how the mission is achieved.
 Engineering is a way of organizing that design process to achieve the most
 cost-effective results. Social systems may not have been “designed” in the sense of
 a purposeful design process but may have evolved over time. Any effort to change
 such systems in order to improve them, however, can be thought of as a redesign or
 reengineering process and can again benefit from a system engineering approach.
 When using STAMP as the underlying causality model, engineering or reengineer-
 ing safer systems means designing (or redesigning) the safety-control structure and
 the controls designed into it to ensure the system operates safely, that is, without
 unacceptable losses. What is being controlled—chemical manufacturing processes,
 spacecraft or aircraft, public health, safety of the food supply, corporate fraud, risks
 in the financial system—is irrelevant in terms of the general process, although
 significant differences will exist in the types of controls applicable and the design
 of those controls. The process, however, is very similar to a regular system engineer-
 ing process.
 The problem is that most engineering and even many system engineering tech-
 niques were developed under conditions and assumptions that do not hold for
 complex social systems, as discussed in part I. But STAMP and new system-theoretic
 approaches to safety can point the way forward for both complex technical and
 social processes. The general engineering and reengineering process described in
 part III applies to all systems.
 section 6.3.
 A System Safety Engineering Process.
 In STAMP, accidents and losses result from not enforcing safety constraints on
 behavior. Not only must the original system design incorporate appropriate con-
 straints to ensure safe operations, but the safety constraints must continue to be
 enforced as changes and adaptations to the system design occur over time. This goal
 forms the basis for safe management, development, and operations.
 There is no agreed upon best system engineering process and probably cannot
 be one—the process needs to match the specific problem and environment in which
 it is being used. What is described in part III of this book is how to integrate system
 safety into any reasonable system engineering process. Figure 6.1 shows the three
 major components of a cost-effective system safety process: management, develop-
 ment, and operations.
 section 6.3.1. Management.
 Safety starts with management leadership and commitment. Without these, the
 efforts of others in the organization are almost doomed to failure. Leadership
 creates culture, which drives behavior.
 Besides setting the culture through their own behavior, managers need to estab-
 lish the organizational safety policy and create a safety control structure with appro-
 priate responsibilities, accountability and authority, safety controls, and feedback
 channels. Management must also establish a safety management plan and ensure
 that a safety information system and continual learning and improvement processes
 are in place and effective.
 Chapter 13 discusses management’s role and responsibilities in safety.
 section 6.3.2. Engineering Development.
 The key to having a cost-effective safety effort is to embed it into a system engineer-
 ing process from the very beginning and to design safety into the system as the
 design decisions are made. All viewpoints and system components must be included
 in the process and information used and documented in a way that is accessible,
 understandable, and helpful.
 System engineering starts with first determining the goals of the system. Potential
 hazards to be avoided are then identified. From the goals and system hazards, a set
 of system functional and safety requirements and constraints are identified that set
 the foundation for design, operations, and management. Chapter 7 describes how
 to establish these fundamentals.
 To start safety engineering early enough to be cost-effective, safety must be con-
 sidered from the early concept formation stages of development and continue
 throughout the life cycle of the system. Design decisions should be guided by safety
 considerations while at the same time taking other system requirements and con-
 straints into account and resolving conflicts. The hazard analysis techniques used
 must not require a completed design and must include all the factors involved
 in accidents. Chapter 8 describes a new hazard analysis technique, based on the
 STAMP model of causation, that provides the information necessary to design
 safety into the system, and chapter 9 shows how to use it in a safety-guided design
 process. Chapter 9 also presents general principles for safe design including how to
 design systems and system components used by humans that do not contribute to
 human error.
 Documentation is critical not only for communication in the design and develop-
 ment process but also because of inevitable changes over time. That documentation
 must include the rationale for the design decisions and traceability from high-level
 requirements and constraints down to detailed design features. After the original
 system development is finished, the information necessary to operate and maintain
 it safely must be passed in a usable form to operators and maintainers. Chapter 10
 describes how to integrate safety considerations into specifications and the general
 system engineering process.
 Engineers have often concentrated more on the technological aspects of system
 development while assuming that humans in the system will either adapt to what-
 ever is given to them or will be trained to do the “right thing.” When an accident
 occurs, it is blamed on the operator. This approach to safety, as argued above, is
 one of the reasons safety engineering is not as effective as it could be. The system
 design process needs to start by considering the human controller and continuing
 that perspective throughout development. The best way to reach that goal is to
 involve operators in the design decisions and safety analyses. Operators are
 sometimes left out of the conceptual design stages and only brought in later in
 development. To design safer systems, operators and maintainers must be included
 in the design process starting from the conceptual development stage and con-
 siderations of human error and preventing it should be at the forefront of the
 design effort.
 Many companies, particularly in aerospace, use integrated product teams that
 include, among others, design engineers, safety engineers, human factors experts,
 potential users of the system (operators), and maintainers. But the development
 process used may not necessarily take maximum advantage of this potential for
 collaboration. The process outlined in part III tries to do that.
 section 6.3.3. Operations.
 Once the system is built, it must be operated safely. System engineering creates the
 basic information needed to do this in the form of the safety constraints and operat-
 ing assumptions upon which the safety of the design was based. These constraints
 and assumptions must be passed to operations in a form that they can understand
 and use.
 Because changes in the physical components, human behavior, and the organiza-
 tional safety control structure are almost guaranteed to occur over the life of the
 system, operations must manage change in order to ensure that the safety con-
 straints are not violated. The requirements for safe operations are discussed in
 chapter 12.
 It’s now time to look at the changes in system engineering, operations, and man-
 agement, based on STAMP, that can assist in engineering a safer world.
--- a/chapter06.txt
+++ b/chapter06.txt
@ -0,0 +1,312 @@
 part 3. USING STAMP.
 STAMP provides a new theoretical foundation for system safety on which new, more
 powerful techniques and tools for system safety can be constructed. Part 3  presents
 some practical methods for engineering safer systems. All the techniques described
 in part 3  have been used successfully on real systems. The surprise to those trying
 them has been how well they work on enormously complex systems and how economical they are to use. Improvements and even more applications of the theory to
 practice will undoubtedly be created in the future.
 chapter 6.
 Engineering and Operating Safer Systems Using
 STAMP.
 Part 3  of this book is for those who want to build safer systems without incurring
 enormous and perhaps impractical financial, time, and performance costs. The belief
 that building and operating safer systems requires such penalties is widespread and
 arises from the way safety engineering is usually done today. It need not be the case.
 The use of top-down system safety engineering and safety-guided design based on
 STAMP can not only enhance the safety of these systems but also potentially reduce
 the costs associated with engineering for safety. This chapter provides an overview,
 while the chapters following it provide details about how to implement this costeffective safety process.
 section 6.1.
 Why Are Safety Efforts Sometimes Not Cost-Effective?
 While there are certainly some very effective safety engineering programs, too
 many expend a large amount of resources with little return on the investment in
 terms of improved safety. To fix a problem, we first need to understand it. Why are
 safety efforts sometimes not cost-effective? There are five general answers to this
 question.
 1. Safety efforts may be superficial, isolated, or misdirected.
 2. Safety activities often start too late.
 3. The techniques used are not appropriate for the systems we are building today
 and for new technology.
 4. Efforts may be narrowly focused on the technical components.
 5. Systems are usually assumed to be static throughout their lifetime.
 Superficial, isolated, or misdirected safety engineering activities. Often, safety
 engineering consists of performing a lot of very costly and tedious activities of
 limited usefulness in improving safety in the final system design. Childs calls this
 “cosmetic system safety”  . Detailed hazard logs are created and analyses
 performed, but these have limited impact on the actual system design. Numbers are
 associated with unquantifiable properties. These numbers always seem to support
 whatever numerical requirement is the goal, and all involved feel as if they have
 done their jobs. The safety analyses provide the answer the customer or designer
 wants.that the system is safe.and everyone is happy. Haddon-Cave, in the 2 thousand 9 
 Nimrod MR2 accident report, called such efforts compliance only exercises  . The
 results impact certification of the system or acceptance by management, but despite
 all the activity and large amounts of money spent, the safety of the system has been
 unaffected.
 A variant of this problem is that safety activities may be isolated from the engineers and developers building the system. Too often, safety professionals are separated from engineering design and placed within a mission assurance organization.
 Safety cannot be assured without its already being part of the design; systems must
 be constructed to be safe from the beginning. Separating safety engineering from
 design engineering is almost guaranteed to make the effort and resources expended
 a poor investment. Safety engineering is effective when it participates in and provides input to the design process, not when it focuses on making arguments about
 the artifacts created after the major safety-related decisions have been made.
 Sometimes the major focus of the safety engineering efforts is on creating a safety
 case that proves the completed design is safe, often by showing that a particular
 process was followed during development. Simply following a process does not
 mean that the process was effective, which is the basic limitation of many process
 assurance activities. In other cases the arguments go beyond the process, but they
 start from the assumption that the system is safe and then focus on showing the
 conclusion is true. Most of the effort is spent in seeking evidence that shows the
 system is safe while not looking for evidence that the system is not safe. The basic
 mindset is wrong, so the conclusions are biased.
 One of the reasons System Safety has been so successful is that it takes the opposite approach. an attempt is made to show that the system is unsafe and to identify
 hazardous scenarios. By using this alternative perspective, paths to hazards are often
 identified that were missed by the engineers, who tend to focus on what they want
 to happen, not what they do not want to happen.
 If safety-guided design, as defined in part 3  of this book, is used, the “safety
 case” is created along with the design. Developing the certification argument
 becomes trivial and consists primarily of simply gathering the documentation that
 has been created during the development process.
 Safety efforts start too late. Unlike the examples of ineffective safety activities
 above, the safety efforts may involve potentially useful activities, but they may start
 too late. Frola and Miller claim that 70–80 percent of the most critical decisions
 related to the safety of the completed system are made during early concept development  . Unless the safety engineering effort impacts these decisions, it is
 unlikely to have much effect on safety. Too often, safety engineers are busy doing
 safety analyses, while the system engineers are in parallel making critical decisions
 about system design and concepts of operation that are not based on that hazard
 analysis. By the time the system engineers get the information generated by the
 safety engineers, it is too late to have a significant impact on design decisions.
 Of course, engineers normally do try to consider safety early, but the information
 commonly available is only whether a particular function is safety-critical or not.
 They are told that the function they are designing can contribute to an accident,
 with perhaps some letter or numerical “score” of how critical it is, but not much else.
 Armed only with this very limited information, they have no choice but to focus
 safety design efforts on increasing the component’s reliability by adding redundancy
 or safety margins. These features are often added without careful analysis of whether
 they are needed or will be effective for the specific hazards related to that system
 function. The design then becomes expensive to build and maintain without necessarily having the maximum possible .(or sometimes any). impact on eliminating
 or reducing hazards. As argued earlier, redundancy and overdesign, such as building
 in safety margins, are effective primarily for purely electromechanical components
 and component failure accidents. They do not apply to software and miss component
 interaction accidents entirely. In some cases, such design techniques can even
 contribute to component interaction accidents when they add to the complexity of
 the design.
 Most of our current safety engineering techniques start from detailed designs. So
 even if they are conscientiously applied, they are useful only in evaluating the safety
 of a completed design, not in guiding the decisions made early in the design creation
 process. One of the results of evaluating designs after they are created is that engineers are confronted with important safety concerns only after it is too late or too
 expensive to make significant changes. If and when the system and component
 design engineers get the results of the safety activities, often in the form of a critique
 of the design late in the development process, the safety concerns are frequently
 ignored or argued away because changing the design at that time is too costly.
 Design reviews then turn into contentious exercises where one side argues that the
 system has serious safety limitations while the other side argues that those limitations do not exist, they are not serious, or the safety analysis is wrong.
 The problem is not a lack of concern by designers; it’s simply that safety concerns
 about their design are raised at a time when major design changes are not possible.
 the design engineers have no other option than to defend the design they have.
 If they lose that argument, then they must try to patch the current design; starting
 over with a safer design is, in almost all cases, impractical. If the designers had the
 information necessary to factor safety into their early decision making, then the
 process of creating safer designs need cost no more and, in fact, will cost less due
 to two factors. .(1). reduced rework after the decisions made are found to be flawed
 or to provide inadequate safety and .(2). less unnecessary overdesign and unneeded
 protection.
 The key to having a cost-effective safety effort is to embed it into a system
 engineering process starting from early concept development and then to design
 safety into the system as the design decisions are made. Costs are much less when
 safety is built into the system design from the beginning rather than added on or
 retrofitted later.
 The techniques used are not appropriate for today’s systems and new technology. The assumptions of the major safety engineering techniques currently used,
 almost all of which stem from decades past, do not match the assumptions underlying
 the technology and complexity of the systems being built today or the new emerging
 causes of accidents. They do not apply to human or software errors or flawed management decision making, and they certainly do not apply to weaknesses in the
 organizational structure or social infrastructure systems. These contributors to accidents do not “fail” in the same way assumed by the current safety analysis tools.
 But with no other tools to use, safety engineers attempt to force square pegs into
 round holes, hoping this will be sufficient. As a result, nothing much is accomplished
 beyond expending time, money, and other resources. It’s time we face up to the fact
 that new safety engineering techniques are needed to handle those aspects of
 systems that go beyond the analog hardware components and the relatively simple
 designs of the past for which the current techniques were invented. Chapter 8
 describes a new hazard analysis technique based on STAMP, called STPA, but others
 are possible. The important thing is to confront these problems head on and not
 ignore them and waste our time misapplying or futilely trying to extend techniques
 that do not apply to today’s systems.
 The safety efforts are focused on the technical components of the system. Many
 safety engineering .(and system engineering, for that matter). efforts focus on the
 technical system details. Little effort is made to consider the social, organizational,
 and human components of the system in the design process. Assumptions are made
 that operators will be trained to do the right things and that they will adapt to
 whatever design they are given. Sophisticated human factors and system analysis
 input is lacking, and when accidents inevitably result, they are blamed on the operators for not behaving the way the designers thought they would. To give just one
 example .(although most accident reports contain such examples), one of the four
 causes, all of which cited pilot error, identified in the loss of the American Airlines
 B757 near Cali, Colombia .(see chapter 2), was “Failure of the flight crew to revert
 to basic radio navigation when the FMS-assisted navigation became confusing and
 demanded an excessive workload in a critical phase of the flight.” A more useful
 alternative statement of the cause might have been “An FMS system that confused
 the operators and demanded an excessive workload in a critical phase of flight.”
 Virtually all systems contain humans, but engineers are often not taught much
 about human factors and draw convenient boundaries around the technical components, focusing their attention inside these artificial boundaries. Human factors
 experts have complained about the resulting technology-centered automation  ,
 where the designers focus on technical issues and not on supporting operator tasks.
 The result is what has been called “clumsy” automation that increases the chance
 of human error  . One of the new assumptions for safety in chapter 2
 is that operator “error” is a product of the environment in which it occurs.
 A variant of the problem is common in systems using information technology.
 Many medical information systems, for example, have not been as successful as they
 might have been in increasing safety and have even led to new types of hazards and
 losses  . Often, little effort is invested during development in considering
 the usability of the system by medical professionals or of the impact, not always
 positive, that the information system design will have on workflow and on the
 practice of medicine.
 Automation is commonly assumed to be safer than manual systems because
 the hazards associated with the manual systems are eliminated. Inadequate consideration is given to whether new, and maybe even worse, hazards are introduced
 by the automated system and how to prevent or minimize these new hazards. The
 aviation industry has, for the most part, learned this lesson for cockpit and flight
 control design, where eliminating errors of commission simply created new errors
 of omission   .(see chapter 9), but most other industries are far behind in
 this respect.
 Like other safety-related system properties that are ignored until too late, operators and human-factors experts often are not brought into the early design process
 or they work in isolation from the designers until changes are extremely expensive
 to make. Sometimes, human factors design is not considered until after an accident,
 and occasionally not even then, almost guaranteeing that more accidents will occur.
 To provide cost-effective safety engineering, the system and safety analysis
 and design process needs to consider the humans in systems.including those that
 are not directly controlling the physical processes.not separately or after the fact
 but starting at concept development and continuing throughout the life cycle of
 the system.
 Systems are assumed to be static throughout their lifetimes. It is rare for engineers to consider how the system will evolve and change over time. While designing
 for maintainability may be considered, unintended changes are often ignored.
 Change is a constant for all systems. physical equipment ages and degrades over
 its lifetime and may not be maintained properly; human behavior and priorities
 usually change over time; organizations change and evolve, which means the safety
 control structure itself will evolve. Change may also occur in the physical and social
 environment within which the system operates and with which it interacts. To be
 effective, controls need to be designed that will reduce the risk associated with all
 these types of changes. Not only are accidents expensive, but once again planning
 for system change can reduce the costs associated with the change itself. In addition,
 much of the effort in operations needs to be focused on managing and reacting
 to change.
 section 6.2.
 The Role of System Engineering in Safety.
 As the systems we build and operate increase in size and complexity, the use of
 sophisticated system engineering approaches becomes more critical. Important
 system-level .(emergent). properties, such as safety, must be built into the design of
 these systems; they cannot be effectively added on or simply measured afterward.
 While system engineering was developed originally for technical systems, the
 approach is just as important and applicable to social systems or the social components of systems that are usually not thought of as “engineered.” All systems are
 engineered in the sense that they are designed to achieve specific goals, namely to
 satisfy requirements and constraints. So ensuring hospital safety or pharmaceutical
 safety, for example, while not normally thought of as engineering problems, falls
 within the broad definition of engineering. The goal of the system engineering
 process is to create a system that satisfies the mission while maintaining the constraints on how the mission is achieved.
 Engineering is a way of organizing that design process to achieve the most
 cost-effective results. Social systems may not have been “designed” in the sense of
 a purposeful design process but may have evolved over time. Any effort to change
 such systems in order to improve them, however, can be thought of as a redesign or
 reengineering process and can again benefit from a system engineering approach.
 When using STAMP as the underlying causality model, engineering or reengineering safer systems means designing .(or redesigning). the safety-control structure and
 the controls designed into it to ensure the system operates safely, that is, without
 unacceptable losses. What is being controlled.chemical manufacturing processes,
 spacecraft or aircraft, public health, safety of the food supply, corporate fraud, risks
 in the financial system.is irrelevant in terms of the general process, although
 significant differences will exist in the types of controls applicable and the design
 of those controls. The process, however, is very similar to a regular system engineering process.
 The problem is that most engineering and even many system engineering techniques were developed under conditions and assumptions that do not hold for
 complex social systems, as discussed in part I. But STAMP and new system-theoretic
 approaches to safety can point the way forward for both complex technical and
 social processes. The general engineering and reengineering process described in
 part 3  applies to all systems.
 section 6.3.
 A System Safety Engineering Process.
 In STAMP, accidents and losses result from not enforcing safety constraints on
 behavior. Not only must the original system design incorporate appropriate constraints to ensure safe operations, but the safety constraints must continue to be
 enforced as changes and adaptations to the system design occur over time. This goal
 forms the basis for safe management, development, and operations.
 There is no agreed upon best system engineering process and probably cannot
 be one.the process needs to match the specific problem and environment in which
 it is being used. What is described in part 3  of this book is how to integrate system
 safety into any reasonable system engineering process. Figure 6.1 shows the three
 major components of a cost-effective system safety process. management, development, and operations.
 section 6.3.1. Management.
 Safety starts with management leadership and commitment. Without these, the
 efforts of others in the organization are almost doomed to failure. Leadership
 creates culture, which drives behavior.
 Besides setting the culture through their own behavior, managers need to establish the organizational safety policy and create a safety control structure with appropriate responsibilities, accountability and authority, safety controls, and feedback
 channels. Management must also establish a safety management plan and ensure
 that a safety information system and continual learning and improvement processes
 are in place and effective.
 Chapter 13 discusses management’s role and responsibilities in safety.
 section 6.3.2. Engineering Development.
 The key to having a cost-effective safety effort is to embed it into a system engineering process from the very beginning and to design safety into the system as the
 design decisions are made. All viewpoints and system components must be included
 in the process and information used and documented in a way that is accessible,
 understandable, and helpful.
 System engineering starts with first determining the goals of the system. Potential
 hazards to be avoided are then identified. From the goals and system hazards, a set
 of system functional and safety requirements and constraints are identified that set
 the foundation for design, operations, and management. Chapter 7 describes how
 to establish these fundamentals.
 To start safety engineering early enough to be cost-effective, safety must be considered from the early concept formation stages of development and continue
 throughout the life cycle of the system. Design decisions should be guided by safety
 considerations while at the same time taking other system requirements and constraints into account and resolving conflicts. The hazard analysis techniques used
 must not require a completed design and must include all the factors involved
 in accidents. Chapter 8 describes a new hazard analysis technique, based on the
 STAMP model of causation, that provides the information necessary to design
 safety into the system, and chapter 9 shows how to use it in a safety-guided design
 process. Chapter 9 also presents general principles for safe design including how to
 design systems and system components used by humans that do not contribute to
 human error.
 Documentation is critical not only for communication in the design and development process but also because of inevitable changes over time. That documentation
 must include the rationale for the design decisions and traceability from high-level
 requirements and constraints down to detailed design features. After the original
 system development is finished, the information necessary to operate and maintain
 it safely must be passed in a usable form to operators and maintainers. Chapter 10
 describes how to integrate safety considerations into specifications and the general
 system engineering process.
 Engineers have often concentrated more on the technological aspects of system
 development while assuming that humans in the system will either adapt to whatever is given to them or will be trained to do the “right thing.” When an accident
 occurs, it is blamed on the operator. This approach to safety, as argued above, is
 one of the reasons safety engineering is not as effective as it could be. The system
 design process needs to start by considering the human controller and continuing
 that perspective throughout development. The best way to reach that goal is to
 involve operators in the design decisions and safety analyses. Operators are
 sometimes left out of the conceptual design stages and only brought in later in
 development. To design safer systems, operators and maintainers must be included
 in the design process starting from the conceptual development stage and considerations of human error and preventing it should be at the forefront of the
 design effort.
 Many companies, particularly in aerospace, use integrated product teams that
 include, among others, design engineers, safety engineers, human factors experts,
 potential users of the system .(operators), and maintainers. But the development
 process used may not necessarily take maximum advantage of this potential for
 collaboration. The process outlined in part 3  tries to do that.
 section 6.3.3. Operations.
 Once the system is built, it must be operated safely. System engineering creates the
 basic information needed to do this in the form of the safety constraints and operating assumptions upon which the safety of the design was based. These constraints
 and assumptions must be passed to operations in a form that they can understand
 and use.
 Because changes in the physical components, human behavior, and the organizational safety control structure are almost guaranteed to occur over the life of the
 system, operations must manage change in order to ensure that the safety constraints are not violated. The requirements for safe operations are discussed in
 chapter 12.
 It’s now time to look at the changes in system engineering, operations, and management, based on STAMP, that can assist in engineering a safer world.
--- a/13
+++ b/13
@ -0,0 +1,13 @@
 #!/bin/bash
 SED=$(
    while IFS=$'\t' read -r -a myArray
    do
    echo -ne "s_"${myArray[0]}"_"${myArray[1]}"_g;\n"
    done < replacements
 )
 echo  sed -e "$SED"
 cat $1 | sed -e "$SED" | sed -z 's_-\n__g'> $2
--- a/17
+++ b/17
@ -1,15 +1,13 @@
 :	.
 —	.
-\[.+\]	
+\[.\\+\]	 	
-\n	
+ (	 .(
- 19(\d\d) 	 19 $1 
+) 	). 
- 200(\d) 	 2 thousand $1 
+HQ-II	 H Q-2
 20(\d\d) 	 20 $1 
 \(	 .(
 \) 	). 
 III	3 
 II	2 
 IV 	4   
 AWACS	A Wacks
 ASO	 A S O 
 PRA 	 P R A  
 HMO 	 H M O 
@ -28,7 +26,6 @@
 CFAC 	 C FACK 
 DO 	 D O 
 GAO 	 GAOW 
 HQ-II 	 H Q-2 
 IFF 	 I F F 
 JOIC 	 J O I C 
 JSOC 	 J SOCK
@ -45,3 +42,7 @@
 TAOR 	 T A O R 
 USCINCEUR U S C in E U R
 WD 	 W D 
 19\\([[:digit:]][[:digit:]]\\)	 19 \\1 
 200\\([[:digit:]]\\)	 2 thousand \1 
 20\\([[:digit:]][[:digit:]]\\)	 20 \1  
 B757	B 7 57