1
0

chore: add ch 5-6

This commit is contained in:
xuu 2025-03-15 21:11:29 -06:00
parent ff069b52c4
commit cbb5561fc6
Signed by: xuu
GPG Key ID: 8B3B0604F164E04F
7 changed files with 3861 additions and 51 deletions

View File

@ -1,13 +1,14 @@
PATH:=./piper:$(PATH) PATH:=./piper:$(PATH)
TXT_FILES := $(patsubst %.raw,%.txt,$(wildcard *.raw))
WAV_FILES := $(patsubst %.txt,%.wav,$(wildcard *.txt)) WAV_FILES := $(patsubst %.txt,%.wav,$(wildcard *.txt))
MP3_FILES := $(patsubst %.txt,%.mp3,$(wildcard *.txt)) MP3_FILES := $(patsubst %.txt,%.mp3,$(wildcard *.txt))
MODEL=en_GB-alan-medium.onnx MODEL=en_GB-alan-medium.onnx
CONFIG=en_GB-alan-medium.onnx.json CONFIG=en_GB-alan-medium.onnx.json
complete: $(MP3_FILES) complete: $(TXT_FILES) $(MP3_FILES)
echo $@ $^ echo $@ $^
$(WAV_FILES): %.wav: %.txt $(WAV_FILES): %.wav: %.txt
@ -17,6 +18,9 @@ $(WAV_FILES): %.wav: %.txt
$(MP3_FILES): %.mp3: %.wav $(MP3_FILES): %.mp3: %.wav
ffmpeg -y -i $^ $@ ffmpeg -y -i $^ $@
$(TXT_FILES): %.txt: %.raw
./cleanfile $^ $@
install: install:
wget -O piper.tar "https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz" wget -O piper.tar "https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz"

View File

@ -427,12 +427,12 @@ responsibility for a specific task:
The enroute controller controlled the flow of OPC aircraft to and from the 1.The enroute controller controlled the flow of OPC aircraft to and from the
TAOR. This person also conducted radio and IFF checks on friendly aircraft TAOR. This person also conducted radio and IFF checks on friendly aircraft
outside the TAOR. outside the TAOR.
The TAOR controller provided threat warning and tactical control for all 2.The TAOR controller provided threat warning and tactical control for all
OPC aircraft within the TAOR. OPC aircraft within the TAOR.
The tanker controller coordinated all air refueling operations (and played no 3.The tanker controller coordinated all air refueling operations (and played no
part in the accident so is not mentioned further). part in the accident so is not mentioned further).
To facilitate communication and coordination, the SDs console was physically To facilitate communication and coordination, the SDs console was physically
located in the “pit” right between the MCC and the ACE (Airborne Command located in the “pit” right between the MCC and the ACE (Airborne Command
@ -1422,4 +1422,978 @@ with the AWACS crew that helicopter activities were not an integral part of OPC
air operations. In testimony after the accident, the ACE commented, “The way I air operations. In testimony after the accident, the ACE commented, “The way I
understand it, only as a courtesy does the AWACS track Eagle Flight.” understand it, only as a courtesy does the AWACS track Eagle Flight.”
page 141 The Mission Director and ACE also did not have the information necessary to
exercise their responsibility. The ACE had an inaccurate model of where the Black
Hawks were located in the airspace. He testified that he presumed the Black Hawks
were conducting standard operations in the Security Zone and had landed [159].
He also testified that, although he had a radarscope, he had no knowledge of
AWACS radar symbology: “I have no idea what those little blips mean.” The Mission
Director, on the ground, was dependent on the information about the current air-
space state sent down from the AWACS via JTIDS (the Joint Tactical Information
Distribution System).
The ACE testified that he assumed the F-15 pilots would ask him for guidance
in any situation involving a potentially hostile aircraft, as required by the ROE. The
ACEs and F-15 pilots mental models of the ROE clearly did not match with respect
to who had the authority to initiate the engagement of unidentified aircraft. The
rules of engagement stated that the ACE was responsible, but some pilots believed
they had authority when an imminent threat was involved. Because of security
concerns, the actual ROE used were not disclosed during the accident investigation,
but, as argued earlier, the slow, low-flying Black Hawks posed no serious threat
to an F-15.
Although the F-15 pilot never contacted the ACE about the engagement, the
ACE did hear the call of the F-15 lead pilot to the TAOR controller. The ACE
testified to the Accident Investigation Board that he did not intervene because
he believed the F-15 pilots were not committed to anything at the visual identi-
fication point, and he had no idea they were going to react so quickly. Since being
assigned to OPC, he said the procedure had been that when the F-15s or other
fighters were investigating aircraft, they would ask for feedback from the ACE.
The ACE and AWACS crew would then try to rummage around and find
out whose aircraft it was and identify it specifically. If they were unsuccessful, the
ACE would then ask the pilots for a visual identification [159]. Thus, the ACE
probably assumed that the F-15 pilots would not fire at the helicopters without
reporting to him first, which they had not done yet. At this point, they had simply
requested an identification by the AWACS traffic controller. According to his
understanding of the ROE, the F-15 pilots would not fire without his approval
unless there was an immediate threat, which there was not. The ACE testified that
he expected to be queried by the F-15 pilots as to what their course of action
should be.
The ACE also testified at one of the hearings:
I really did not know what the radio call “engaged” meant until this morning. I did
not think the pilots were going to pull the trigger and kill those guys. As a previous right
seater in an F-111, I thought “engaged” meant the pilots were going down to do a visual
intercept. [159]
Coordination among Multiple Controllers: Not applicable.
Feedback from Controlled Process: The F-15 lead pilot did not follow the ROE
and report the identified aircraft to the ACE and ask for guidance, although the
ACE did learn about it from the questions the F-15 pilots posed to the controllers
on the AWACS aircraft. The Mission Director got incorrect feedback about the state
of the airspace from JTIDS.
Time Lags: An unusual time lag occurred where the lag was in the controller and
not in one of the other parts of the control loop.10 The F-15 pilots responded faster
than the ACE (in the AWACS) and Mission Director (on the ground) could issue
appropriate control instructions (as required by the ROE) with regard to the
engagement.
Changes after the Accident.
There were no changes after the accident, although roles were clarified.
secton 5.3.5. The AWACS Operators.
This level of the control structure contains more examples of inconsistent mental
models and asynchronous evolution. In addition, this control level provides interest-
ing examples of the adaptation over time of specified procedures to accepted prac-
tice and of coordination problems. There were multiple controllers with confused
and overlapping responsibilities for enforcing different aspects of the safety require-
ments and constraints (figure 5.8). The overlaps and boundary areas in the con-
trolled processes led to serious coordination problems among those responsible for
controlling aircraft in the TAOR.
Context in Which Decisions and Actions Took Place
Safety Requirements and Constraints: The general safety constraint involved in
the accident at this level was to prevent misidentification of aircraft by the pilots
and any friendly fire that might result. More specific requirements and constraints
are shown in figure 5.8.
Controls: Controls included procedures for identifying and tracking aircraft, train-
ing (including simulator missions), briefings, staff controllers, and communication
channels. The senior director and surveillance officer (ASO) provided real-time
oversight of the crews activities, while the mission crew commander (MCC) coor-
dinated all the activities aboard the AWACS aircraft.
footnote. A similar type of time lag led to the loss of an F-18 when a mechanical failure resulted in inputs
arriving at the computer interface faster than the computer was able to process them
The Delta Point system, used since the inception of OPC, provided standard code
names for real locations. These code names were used to prevent the enemy, who
might be listening to radio transmissions, from knowing the helicopters flight plans.
Roles and Responsibilities: The AWACS crew were responsible for identifying,
tracking, and controlling all aircraft enroute to and from the TAOR; for coordinating
air refueling; for providing airborne threat warning and control in the TAOR; and
for providing surveillance, detection and identification of all unknown aircraft.
Individual responsibilities are described in section 5.2.
The staff weapons director (instructor) was permanently assigned to Incirlik. He
did all incoming briefings for new AWACS crews rotating into Incirlik and accom-
panied them on their first mission in the TAOR. The OPC leadership recognized
the potential for some distance to develop between stateside spin-up training and
continuously evolving practice in the TAOR. Therefore, as mentioned earlier, per-
manent staff or instructor personnel flew with each new AWACS crew on their
maiden flight in Turkey. Two of these staff controllers were on the AWACS the day
of the accident to answer any questions that the new crew might have about local
procedures and, as described earlier, to inform them about adaptation of accepted
practice from specified procedures.
The SD had worked as an AWACS controller for five years. This was his fourth
deployment to OPC, his second as an SD, and his sixtieth mission over the Iraqi
TAOR [159]. He worked as a SD more than two hundred days a year and had logged
more than 2,383 hours flying time [191].
The enroute controller, who was responsible for aircraft outside the TAOR, was
a first lieutenant with four years in the Air Force. He had finished AWACS training
two years earlier (May 1992) and had served in the Iraqi TAOR previously [191].
The TAOR controller, who was responsible for controlling all air traffic flying
within the TAOR, was a second lieutenant with more than nine years of service in
the Air Force, but he had just finished controllers school and had had no previous
deployments outside the continental United States. In fact, he had become mission
ready only two months prior to the incident. This tour was his first in OPC and his
first time as a TAOR controller. He had only controlled as a mission-ready weapons
director on three previous training flights [191] and never in the role of TAOR
controller. AWACS guidance at the time suggested that the most inexperienced
controller be placed in the TAOR position: None of the reports on the accident
provided the reasoning behind this practice.
The air surveillance officer (ASO) was a captain at the time of the shootdown. She
had been mission-ready since October 1992 and was rated as an instructor ASO.
Because the crews originally assigned ASO was upgrading and could not make it to
Turkey on time, she volunteered to fill in for him. She had already served for five and
a half weeks in OPC at the time of the accident and was completing her third assign-
ment to OPC. She worked as an ASO approximately two hundred days a year [191].
Environmental and Behavior-Shaping Factors: At the time of the shootdown,
shrinking defense budgets were leading to base closings and cuts in the size of the
military. At the same time, a changing political climate, brought about by the fall of
the Soviet Union, demanded significant U.S. military involvement in a series of
operations. The military (including the AWACS crews) were working at a greater
pace than they had ever experienced due to budget cuts, early retirements, force
outs, slowed promotions, deferred maintenance, and delayed fielding of new equip-
ment. All of these factors contributed to poor morale, inadequate training, and high
personnel turnover.
AWACS crews are stationed and trained at Tinker Air Force Base in Oklahoma
and then deployed to locations around the world for rotations lasting approximately
thirty days. Although all but one of the AWACS controllers on the day of the acci-
dent had served previously in the Iraqi no-fly zone, this was their first day working
together and, except for the surveillance officer, the first day of their current rota-
tion. Due to last minute orders, the team got only minimal training, including one
simulator session instead of the two full three-hour sessions required prior to
deploying. In the only session they did have, some of the members of the team were
missing—the ASO, ACE, and MCC were unable to attend—and one was later
replaced: As noted, the ASO originally designated and trained to deploy with this
crew was instead shipped off to a career school at the last minute, and another ASO,
who was just completing a rotation in Turkey, filled in.
The one simulator session they did receive was less than effective, partly because
the computer tape provided by Boeing to drive the exercise was not current (another
instance of asynchronous evolution). For example, the maps were out of date,
and the rules of engagement used were different and much more restrictive than
those currently in force in OPC. No Mode I codes were listed. The list of friendly
participants in OPC did not include UH-60s (Black Hawks) and so on. The second
simulation session was canceled because of a wing exercise.
Because the TAOR area had not yet been sanitized, it was a period of low activ-
ity: At the time, there were still only four aircraft over the no-fly zone—the two
F-15s and the two Black Hawks. AWACS crews are trained and equipped to track
literally hundreds of enemy and friendly aircraft during a high-intensity conflict.
Many accidents occur during periods of low activity when vigilance is reduced com-
pared to periods of higher activity.
The MCC sits with the other two key supervisors (SD and ACE) toward the front
of the aircraft in a three-seat arrangement named the “Pit,” where each has his own
radarscope. The SD is seated to the MCCs left. Surveillance is seated in the rear.
Violations of the no-fly zone had been rare and threats few during the past three
years, so that days flight was expected to be an average one, and the supervisors in
the Pit anticipated just another routine mission [159].
During the initial orbit of the AWACS, the technicians determined that one
of the radar consoles was not operating. According to Snook, this type of problem
was not uncommon, and the AWACS is therefore designed with extra crew positions.
When the enroute controller realized his assigned console was not working properly,
he moved from his normal position between the TAOR and tanker controllers,
to a spare seat directly behind the senior director. This position kept him out of
the view of his supervisor and also eliminated physical contact with the TAOR
controller.
Dysfunctional Interactions among the Controllers
According to the formal procedures, control of aircraft was supposed to be handed
off from the enroute controller to the TAOR controller when the aircraft entered
the TAOR. This handoff did not occur for the Black Hawks, and the TAOR control-
ler was not made aware of the Black Hawks flight within the TAOR. Snook explains
this communication error as resulting from the radar console failure, which inter-
fered with communication between the TAOR and enroute controllers. But this
explanation does not gibe with the fact that the normal procedure of the enroute
controller was to continue to control helicopters without handing them off to the
TAOR controller, even when the enroute and TAOR controllers were seated in their
usual places next to each other. There may usually have been more informal interac-
tion about aircraft in the area when they were seated next to each other, but there
is no guarantee that such interaction would have occurred even with a different
seating arrangement. Note that the helicopters had been dropped from the radar
screens and the enroute controller had an incorrect mental model of where they
were: He thought they were close to the boundary of the TAOR and was unaware
they had gone deep within it. The enroute controller, therefore, could not have told
the TAOR controller about the true location of the Black Hawks even if they had
been sitting next to each other.
The interaction between the surveillance officer and the senior weapons director
with respect to tracking the helicopter flight on the radar screen involved many dys-
functional interactions. For example, the surveillance officer put an attention arrow
on the senior directors radarscope in an attempt to query him about the lost heli-
copter symbol that was floating, at one point, unattached to any track. The senior
director did not respond to the attention arrow, and it automatically dropped off the
screen after sixty seconds. The helicopter symbol (H) dropped off the radar screen
when the radar and IFF returns from the Black Hawks faded and did not return until
just before the engagement, removing any visual reminder to the AWACS crew that
there were Black Hawks inside the TAOR. The accident investigation did not include
an analysis of the design of the AWACS humancomputer interface or how it might
have contributed to the accident, although such an analysis is important in fully
understanding why it made sense for the controllers to act the way they did.
During his court-martial for negligent homicide, the senior director argued that
his radarscope did not identify the helicopters as friendly and that therefore he was
not responsible. When asked why the Black Hawk identification was dropped from
the radarscope, he gave two reasons. First, because it was no longer attached to any
active signal, they assumed the helicopter had landed somewhere. Second, because
the symbol displayed on their scopes was being relayed in real time through a JTIDS
downlink to commanders on the ground, they were very concerned about sending
out an inaccurate picture of the TAOR.
Even if we suspended it, it would not be an accurate picture, because we wouldnt know
for sure if that is where he landed. Or if he landed several minutes earlier, and where
that would be. So, the most accurate thing for us to do at that time, was to drop the
symbology [sic].
Flawed or Inadequate Decision Making and Control Actions.
There were myriad inadequate control actions in this accident, involving each of the
controllers in the AWACS. The AWACS crew work as a team so it is sometimes hard
to trace incorrect decisions to one individual. While from each individuals stand-
point the actions and decisions may have been correct, when put together as a whole
the decisions were incorrect.
The enroute controller never told the Black Hawk pilots to change to the TAOR
frequency that was being monitored by the TAOR controller and did not hand off
control of the Black Hawks to the TAOR controller. The established practice of not
handing off the helicopters had probably evolved over time as a more efficient way
of handling traffic—another instance of asynchronous evolution. Because the heli-
copters were usually only at the very border of the TAOR and spent very little time
there, the overhead of handing them off twice within a short time period was con-
sidered inefficient by the AWACS crews. As a result, the procedures used had
changed over time to the more efficient procedure of keeping them under the
control of the enroute controller. The AWACS crews were not provided with written
guidance or training regarding the control of helicopters within the TAOR, and, in
its absence, they adapted their normal practices for fixed-wing aircraft as best they
could to apply them to helicopters.
In addition to not handing off the helicopters, the enroute controller did not
monitor the course of the Black Hawks while they were in the TAOR (after leaving
Zakhu), did not take note of the flight plan (from Whiskey to Lima), did not alert
the F-15 pilots there were friendly helicopters in the area, did not alert the F-15
pilots before they fired that the helicopters they were targeting were friendly, and
did not tell the Black Hawk pilots that they were on the wrong frequency and were
squawking the wrong IFF Mode I code.
The TAOR controller did not monitor the course of the Black Hawks in the
TAOR and did not alert the F-15 pilots before they fired that the helicopters they
were targeting were friendly. None of the controllers warned the F-15 pilots at any
time that there were friendly helicopters in the area nor did they try to stop the
engagement. The accident investigation board found that because Army helicopter
activities were not normally known at the time of the fighter pilots daily briefings,
normal procedures were for the AWACS crews to receive real-time information
about their activities from the helicopter crews and to relay that information on to
the other aircraft in the area. If this truly was established practice, it clearly did not
occur on that day.
The controllers were supposed to be tracking the helicopters using the Delta
Point system, and the Black Hawk pilots had reported to the enroute controller that
they were traveling from Whiskey to Lima. The enroute controller testified, however,
that he had no idea of the towns to which the code names Whiskey and Lima
referred. After the shootdown, he went in search of the card defining the call signs
and finally found it in the Surveillance Section [159]. Clearly, tracking helicopters
using call signs was not a common practice or the charts would have been closer at
hand. In fact, during the court-martial of the senior director, the defense was unable
to locate any AWACS crewmember at Tinker AFB (where AWACS crews were
stationed and trained) who could testify that he or she had ever used the Delta Point
system [159] although clearly the Black Hawk pilots thought it was being used
because they provided their flight plan using Delta Points.
None of the controllers in the AWACS told the Black Hawk helicopters that
they were squawking the wrong IFF code for the TAOR. Snook cites testimony
from the court-martial of the senior director that posits three related explanations
for this lack of warning: (1) the minimum communication (min comm) policy, (2) a
belief by the AWACS crew that the Black Hawks should know what they were
doing, and (3) pilots not liking to be told what to do. None of these explanations
provided during the trial is very satisfactory and appear to be after-the-fact ratio-
nalizations for the controllers not doing their job when faced with possible court-
martial and jail terms. Given that the controllers acknowledged that the Army
helicopters never squawked the right codes and had not done so for months, there
must have been other communication channels that could have been used besides
real-time radio communication to remedy this situation, so the min comm policy is
not an adequate explanation. Arguing that the pilots should know what they were
doing is simply an abdication of responsibility, as is the argument that pilots did not
like being told what to do. A different perspective, and one that likely applies to all
the controllers, was provided by the staff weapons director, who testified, “For a
helicopter, if hes going to Zakhu, Im not that concerned about him going beyond
that. So, Im not really concerned about having an F-15 needing to identify this
guy.” [159]
The mission crew commander had provided the crews morning briefing. He
spent some time going over the activity flowsheet, which listed all the friendly air-
craft flying in the OPC that day, their call signs, and the times they were scheduled
to enter the TAOR. According to Piper (but nobody else mentions it), he failed to
note the helicopters, even though their call signs and their IFF information had been
written on the margin of his flowsheet.
The shadow crew always flew with new crews on their first day in OPC, but the
task of these instructors does not seem to have been well defined. At the time of
the shootdown, one was in the galley “taking a break,” and the other went back to
the crew rest area, read a book, and took a nap. The staff weapons director, who was
asleep in the back of the AWACS, during the court-martial of the senior director
testified that his purpose on the mission was to be the “answer man,” just to answer
any questions they might have. This was a period of very little activity in the area
(only the two F-15s were supposed to be in the TAOR), and the shadow crew
members may have thought their advice was not needed at that time.
When the staff weapons director went back to the rest area, the only symbol
displayed on the scopes of the AWACS controllers was the one for the helicopters
(EE01), which they thought were going to Zakhu only.
Because many of the dysfunctional actions of the crew did conform to the estab-
lished practice (e.g., not handing off helicopters to the TAOR controller), it is
unclear what different result might have occurred if the shadow crew had been in
place. For example, the staff weapons director testified during the hearings and trial
that he had seen helicopters out in the TAOR before, past Zakhu, but he really did
not feel it was necessary to brief crews about the Delta Point system to determine
a helicopters destination [159].
Reasons for the Flawed Control.
Inadequate Control Algorithms: This level of the accident analysis provides an
interesting example of the difference between prescribed procedures and estab-
lished practice, the adaptation of procedures over time, and migration toward the
boundaries of safe behavior. Because of the many helicopter missions that ran from
Diyarbakir to Zakhu and back, the controllers testified that it did not seem worth
handing them off and switching them over to the TAOR frequency for only a few
minutes. Established practice (keeping the helicopters under the control of the
enroute controller instead of handing them off to the TAOR controller) appeared
to be safe until the day the helicopters behavior differed from normal, that is, they
stayed longer in the TAOR and ventured beyond a few miles inside the boundaries.
Established practice no longer assured safety under these conditions. A complicat-
ing factor in the accident was the universal misunderstanding of each of the control-
lers responsibilities with respect to tracking Army helicopters.
Snook suggests that the min comm norm contributed to the AWACS crews
general reluctance to enforce rules, contributed to AWACS not correcting Eagle
Flights improper Mode I code, and discouraged controllers from pushing helicopter
pilots to the TAOR frequency when they entered Iraq because they were reluctant
to say more than absolutely necessary.
According to Snook, there were also no explicit or written procedures regarding
the control of helicopters. He states that radio contact with helicopters was lost
frequently, but there were no procedures to follow when this occurred. In contrast,
Piper claims the AWACS operations manual says:
Helicopters are a high interest track and should be hard copied every five minutes in
turkey and every two minutes in Iraq. These coordinates should be recorded in a special
log book, because radar contact with helicopters is lost and the radar symbology [sic] can
be suspended. [159].
There is no information in the publicly available parts of the accident report about
any special logbook or whether such a procedure was normally followed.
footnote. Even if the actions of the shadow crew did not contribute to this particular accident, we can take
advantage of the accident investigation to perform a safety audit on the operation of the system and
identify potential improvements.
Inaccurate and Inconsistent Mental Models: In general, the AWACS crew (and
the ACE) shared the common view that helicopter activities were not an integral
part of OPC air operations. There was also a misunderstanding about which provi-
sions of the ATO applied to Army helicopter activities.
Most of the people involved in the control of the F-15s were unaware of the
presence of the Black Hawks in the TAOR that day, the lone exception perhaps
being the enroute controller who knew they were there but apparently thought
they would stay at the boundaries of the TAOR and thus were far from their actual
location deep within it. The TAOR controller testified that he had never talked to
the Black Hawks: Following their two check-ins with the enroute controller, the
helicopters had remained on the enroute frequency (as was the usual, accepted
practice), even as they flew deep into the TAOR.
The enroute controller, who had been in contact with the Black Hawks, had an
inaccurate model of where the helicopters were. When the Black Hawk pilots origi-
nally reported their takeoff from the Army Military Coordination Center at Zakhu,
they contacted the enroute controller and said they were bound for Lima. The
enroute controller did not know to what city the call sign Lima referred and did not
try to look up this information. Other members of the crew also had inaccurate
models of their responsibilities, as described in the next section. The Black Hawk
pilots clearly thought the AWACS was tracking them and also thought the con-
trollers were using the Delta Point system—otherwise helicopter pilots would not
have provided the route names in that way.
The AWACS crews did not appear to have accurate models of the Black Hawks
mission and role in OPC. Some of the flawed control actions seem to have resulted
from a mental model that helicopters only went to Zakhu and therefore did not
need to be tracked or to follow the standard TAOR procedures.
As with the pilots and their visual recognition training, the incorrect mental
models may have been at least partially the result of the inadequate AWACS train-
ing the team received.
Coordination among Multiple Controllers: As mentioned earlier, coordination
problems are pervasive in this accident due to overlapping control responsibilities
and confusion about responsibilities in the boundary areas of the controlled process.
Most notably, the helicopters usually operated close to the boundary of the TAOR,
resulting in confusion over who was or should be controlling them.
The official accident report noted a significant amount of confusion within the
AWACS mission crew regarding the tracking responsibilities for helicopters [5]. The
mission crew commander testified that nobody was specifically assigned responsibil-
ity for monitoring helicopter traffic in the no-fly zone and that his crew believed
the helicopters were not included in their orders [159]. The staff weapons director
made a point of not knowing what the Black Hawks do: “It was some kind of a
squirrely mission” [159]. During the court-martial of the senior director, the AWACS
tanker controller testified that in the briefing the crew received upon arrival at
Incirlik, the staff weapons director had said about helicopters flying in the no-fly
zone, Theyre there, but dont pay any attention to them.” The enroute controller
testified that the handoff procedures applied only to fighters. “We generally have
no set procedures for any of the helicopters. . . . We never had any [verbal] guidance
[or training] at all on helicopters” [159].
Coordination problems also existed between the activities of the surveillance
personnel and the other controllers. During the investigation of the accident, the
ASO testified that surveillances responsibility was south of the 36th Parallel, and
the other controllers were responsible for tracking and identifying all aircraft north
of the 36th Parallel. The other controllers suggested that surveillance was respon-
sible for tracking and identifying all unknown aircraft, regardless of location. In fact,
Air Force regulations say that surveillance had tracking responsibility for unknown
and unidentified tracks throughout the TAOR. It is not possible through the
testimony alone, again because of the threat of court-martial, to piece out exactly
what was the problem here, including simply a migration of normal operations from
specified operations. At the least, it is clear that there was confusion about who was
in control of what.
One possible explanation for the lack of coordination among controllers at this
level of the hierarchical control structure is that, as suggested by Snook, this particu-
lar group had never trained together as a team [191]. But given the lack of proce-
dures for handling helicopters and the confusion even by experienced controllers
and the staff instructors about responsibilities for handling helicopters, Snooks
explanation is not very convincing. A more plausible explanation is simply a lack of
guidance and delineation of responsibilities by the management level above. And
even if the roles of everyone in such a structure had been well defined originally,
uncontrolled local adaptation to more efficient procedures and asynchronous evolu-
tion of the different parts of the control structure created dysfunctionalities as time
passed. The helicopters and fixed wing aircraft had separate control structures that
only joined fairly high up on the hierarchy and, as is described in the next section,
there were communication problems between the components at the higher levels
of the control hierarchy, particularly between the Army Military Coordination
Center (MCC) and the Combined Forces Air Component (CFAC) headquarters.
Feedback from the Controlled Process: Signals to the AWACS from the Black
Hawks were inconsistent due to line-of-sight limitations and the mountainous terrain
in which the Black Hawks were flying. The helicopters used the terrain to mask them-
selves from air defense radars, but this terrain masking also caused the radar returns
from the Black Hawks to the AWACS (and to the fighters) to fade at various times.
Time Lags: Important time lags contributed to the accident, such as the delay of
radio reports from the Black Hawk helicopters due to radio signal transmission
problems and their inability to use the TACSAT radios until they had landed. As
with the ACE, the speed with which the F-15 pilots acted also provided the control-
lers with little time to evaluate the situation and respond appropriately.
Changes after the Accident.
Many changes were instituted with respect to AWACS operations after the
accident:
1. Confirmation of a positive IFF Mode IV check was required for all OPC air-
craft prior to their entry into the TAOR.
2. • The responsibilities for coordination of air operations were better defined.
3. • All AWACS aircrews went through a one-time retraining and recertification
program, and every AWACS crewmember had to be recertified.
4.• A plan was produced to reduce the temporary duty of AWACS crews to 120
days a year. In the end, it was decreased from 166 to 135 days per year from
January 1995 to July 1995. The Air Combat Command planned to increase the
number of AWACS crews.
5.• AWACS control was required for all TAOR flights.
6.•
In addition to normal responsibilities, AWACS controllers were required to
specifically maintain radar surveillance of all TAOR airspace and to issue advi-
sory/deconflicting assistance on all operations, including helicopters.
7.• The AWACS controllers were required to periodically broadcast friendly heli-
copter locations operating in the TAOR to all aircraft.
Although not mentioned anywhere in the available documentation on the accident,
it seems reasonable that either the AWACS crews started to use the Delta Point
system or the Black Hawk pilots were told not to use it and an alternative means
for transmitting flight plans was mandated.
section 5.3.6. The Higher Levels of Control.
Fully understanding the behavior at any level of the sociotechnical control structure
requires understanding how and why the control at the next higher level allowed
or contributed to the inadequate control at the current level. In this accident, many
of the erroneous decisions and control actions at the lower levels can only be fully
understood by examining this level of control.
Context in Which Decisions and Actions Took Place
Safety Requirements and Constraints Violated: There were many safety con-
straints violated at the higher levels of the control structure—the Military Coordina-
tion Center, Combined Forces Air Component, and CTF commander—and several
people were investigated for potential court-martial and received official letters of
reprimand. These safety constraints include: (1) procedures must be instituted that
delegate appropriate responsibility, specify tasks, and provide effective training
to all those responsible for tracking aircraft and conducting combat operations;
(2) procedures must be consistent or at least complementary for everyone involved
in TAOR airspace operations; (3) performance must be monitored (feedback chan-
nels established) to ensure that safety-critical activities are being carried out cor-
rectly and that local adaptations have not moved operations beyond safe limits;
(4) equipment and procedures must be coordinated between the Air Force and
Army to make sure that communication channels are effective and that asynchro-
nous evolution has not occurred; (5) accurate information about scheduled flights
must be provided to the pilots and the AWACS crews.
Controls: The controls in place included operational orders and plans to designate
roles and responsibilities as well as a management structure, the ACO, coordination
meetings and briefings, a chain of command (OPC commander to mission director
to ACE to pilots), disciplinary actions for those not following the written rules, and
a group (the Joint Operations and Intelligence Center or JOIC) responsible for
ensuring effective communication occurred.
Roles and Responsibilities: The MCC had operational control over the Army
helicopters while the CFAC had operational control over fixed-wing aircraft and
tactical control over all aircraft in the TAOR. The Combined Task Force commander
general (who was above both the CFAC and MCC) had ultimate responsibility for
the coordination of fixed-wing aircraft flights with Army helicopters.
While specific responsibilities of individuals might be considered here in an offi-
cial accident analysis, treating the CFAC and MCC as entities is sufficient for the
purposes of this analysis.
Environmental and Behavior-Shaping Factors: The Air Force operated on a pre-
dictable, well-planned, and tightly executed schedule. Detailed mission packages
were organized weeks and months in advance. Rigid schedules were published and
executed in preplanned packages. In contrast, Army aviators had to react to con-
stantly changing local demands, and they prided themselves on their flexibility [191].
Because of the nature of their missions, exact takeoff times and detailed flight plans
for helicopters were virtually impossible to schedule in advance. They were even
more difficult to execute with much rigor. The Black Hawks flight plan contained
their scheduled takeoff time, transit routes between Diyarbakir through Gate 1 to
Zakhu, and their return time. Because the Army helicopter crews rarely knew
exactly where they would be going within the TAOR until after they were briefed
at the Military Coordination Center at Zakhu, most flight plans only indicated that
Eagle Flight would be “operating in and around the TAOR.”
The physical separation of the Army Eagle Flight pilots from the CFAC opera-
tions and Air Force pilots at Incirlik contributed to the communication difficulties
that already existed between the services.
Dysfunctional Interactions among Controllers.
Dysfunctional communication at this level of the control structure played a critical
role in the accident. These communication flaws contributed to the coordination
flaws at this level and at the lower levels.
A critical safety constraint to prevent friendly fire requires that the pilots of the
fighter aircraft know who is in the no-fly zone and whether they are supposed
to be there. However, neither the CTF staff nor the Combined Forces Air Compo-
nent staff requested nor received timely, detailed flight information on planned
MCC helicopter activities in the TAOR. Consequently, the OPC daily Air Tasking
Order was published with little detailed information regarding U.S. helicopter flight
activities over northern Iraq.
According to the official accident report, specific information on routes of flight
and times of MCC helicopter activity in the TAOR was normally available to the
other OPC participants only when AWACS received it from the helicopter crews
by radio and relayed the information on to the pilots [5]. While those at the higher
levels of control may have thought this relaying of flight information was occurring,
that does not seem to be the case given that the Delta point system (wherein the
helicopter crews provided the AWACS controllers with their flight plan) was not
used by the AWACS controllers: When the helicopters went beyond Zakhu, the
AWACS controllers did not know their flight plans and therefore could not relay
that information to the fighter pilots and other OPC participants.
The weekly flight schedules the MCC provided to the CFAC staff were not com-
plete enough for planning purposes. While the Air Force could plan their missions
in advance, the different type of Army helicopter missions had to be flexible to react
to daily needs. The MCC daily mission requirements were generally based on the
events of the previous day. A weekly flight schedule was developed and provided
to the CTF staff, but a firm itinerary was usually not available until after the next
days ATO was published. The weekly schedule was briefed at the CTF staff meet-
ings on Mondays, Wednesday, and Fridays, but the information was neither detailed
nor firm enough for effective rotary-wing and fixed-wing aircraft coordination and
scheduling purposes [5].
Each daily ATO was published showing several Black Hawk helicopter lines. Of
these, two helicopter lines (two flights of two helicopters each) were listed with call
signs (Eagle 01/02 and Eagle 03/04), mission numbers, IFF Mode II codes, and a
route of flight described only as LLTC (the identifier for Diyarbakir) to TAOR to
LLTC. No information regarding route or duration of flight time within the TAOR
was given on the ATO. Information concerning takeoff time and entry time into the
TAOR was listed as A/R (as required).
Every evening, the MCC at Zakhu provided a situation report (SITREP) to the
JOIC (located at Incirlik), listing the helicopter flights for the following day. The
SITREP did not contain complete flight details and arrived too late to be included
in the next days ATO. The MCC would call the JOIC the night prior to the sched-
uled mission to “activate” the ATO line. There were, however, no procedures in
place to get the SITREP information from the JOIC to those needing to know it
in CFAC.
After receiving the SITREP, a duty officer in the JOIC would send takeoff times
and gate times (the times the helicopters would enter northern Iraq) to Turkish
operations for approval. Meanwhile, an intelligence representative to the JOIC
consolidated the MCC weekly schedule with the SITREP and used secure intelli-
gence channels to pass this updated information to some of his counterparts in
operational squadrons who had requested it. No procedures existed to pass this
information from the JOIC to those in CFAC with tactical responsibility for the
helicopters (through the ACE and Mission Director) [5]. Because CFAC normally
determined who would fly when, the information channels were designed primarily
for one-way communications outward and downward.
In the specific instance involved in the shootdown, the MCC weekly schedule
was provided on April 8 to the JOIC and thence to the appropriate person in CFAC.
That schedule showed a two-ship, MCC helicopter administrative flight scheduled
for April 14. According to the official accident report, two days before (April 12)
the MCC Commander had requested approval for an April 14 flight outside the
Security Zone from Zakhu to the towns of Irbil and Salah ad Din. The OPC com-
manding general approved the written request on April 13, and the JOIC transmit-
ted the approval to the MCC but apparently the information was not provided to
those responsible for producing the ATO. The April 13 SITREP from MCC listed
the flight as “mission support,” but contained no other details. Note more informa-
tion was available earlier than normal in this instance, and it could have been
included in the ATO but the established communication channels and procedures
did not exist to get it to the right places. The MCC weekly schedule update, received
by the JOIC on the evening of April 13 along with the MCC SITREP, gave the
destinations for the mission as Salah ad Din and Irbil. This information was not
passed to CFAC.
Late in the afternoon on April 13, MCC contacted the JOIC duty officer and
activated the ATO line for the mission. A takeoff time of 0520 and a gate time of
0625 were requested. No takeoff time or route of flight beyond Zakhu was specified.
The April 13 SITREP, the weekly flying schedule update, and the ATO-line activa-
tion request were received by the JOIC too late to be briefed during the Wednesday
(April 13) staff meetings. None of the information was passed to the CFAC schedul-
ing shop (which was responsible for distributing last minute changes to the ATO
through various sources such as the Battle Staff Directives, morning briefings, and
so on), to the ground-based Mission Director, nor to the ACE on board the AWACS
[5]. Note that this flight was not a routine food and medical supply run, but instead
it carried sixteen high-ranking VIPs and required the personal attention and approval
of the CTF Commander. Yet information about the flight was never communicated
to the people who needed to know about it [191]. That is, the information went up
from the MCC to the CTF staff, but not across from MCC to CFAC nor down from
the CTF staff to CFAC (see figure 5.3).
A second example of a major dysfunctional communication involved the com-
munication of the proper radio frequencies and IFF codes to be used in the TAOR.
About two years before the shootdown, someone in the CFAC staff decided to
change the instructions pertaining to IFF modes and codes. According to Snook, no
one recalled exactly how or why this change occurred. Before the change, all aircraft
squawked a single Mode I code everywhere they flew. After the change, all aircraft
were required to switch to a different Mode I code while flying in the no-fly zone. The
change was communicated through the daily ATO. However, after the accident it was
discovered that the Air Forces version of the ATO was not exactly the same as the
one received electronically by the Army aviators—another instance of asynchronous
evolution and lack of linkup between system components. For at least two years,
there existed two versions of the daily ATO: one printed out directly by the Incirlik
Frag Shop and distributed locally by messenger to all units at Incirlik Air Base, and
a second one transmitted electronically through an Air Force communications center
(the JOIC) to Army helicopter operations at Diyarbakir. The one received by the
Army aviators was identical in all respects to the one distributed by the Frag Shop,
except for the changed Mode I code information contained in the SPINS. The ATO
that Eagle Flight received contained no mention of two Mode I codes [191].
What about the confusion about the proper radio frequency to be used by the
Black Hawks in the TAOR? Piper notes that the Black Hawk pilots were told
to use the enroute frequency while flying in the TAOR. The commander of OPC
testified after the accident that the use by the Black Hawks of the enroute radio
frequency rather than the TAOR frequency had been briefed to him as a safety
measure because the Black Hawk helicopters were not equipped with HAVE
QUICK technology. The ACO (Aircraft Control Order) required the F-15s to use
nonHAVE QUICK mode when talking to specific types of aircraft (such as F-1s)
that, like the Black Hawks, did not have the new technology. The list of non-HQ
aircraft provided to the F-15 pilots, however, for some reason did not include
UH-60s. Apparently the decision was made to have the Black Hawks use the
enroute radio frequency but this decision was never communicated to those respon-
sible for the F-15 procedures specified in the ACO. Note that a thorough investiga-
tion of the higher levels of control, as is required in a STAMP-based analysis, is
necessary to explain properly the use of the enroute radio frequency by the Black
Hawks. Of the various reports on the shootdown, only Piper notes the fact that an
exception had been made for Army helicopters for safety reasons—the official
accident report, Snooks detailed book on the accident, and the GAO report do not
mention this fact! Piper found out about it from her attendance at the public hear-
ings and trial. This omission of important information from the accident reports is
an interesting example of how incomplete investigation of the higher levels of
control can lead to incorrect causal analysis. In her book, Piper questions why the
Accident Investigation Board, while producing twenty-one volumes of evidence,
never asked the commander of OPC about the radio frequency and other problems
found during the investigation.
Other official exceptions were made for the helicopter operations, such as
allowing them in the Security Zone without AWACS coverage. Using STAMP,
the accident can be understood as a dynamic process where the operations of the
Army and Air Force adapted and diverged without effective communication and
coordination.
Many of the dysfunctional communications and interactions stem from asynchro-
nous evolution of the mission and the operations plan. In response to the evolving
mission in northern Iraq, air assets were increased in September 1991 and a signifi-
cant portion of the ground forces were withdrawn. Although the original organiza-
tional structure of the CTF was modified at this time, the operations plan was not.
In particular, the position of the person who was in charge of communication and
coordination between the MCC and CFAC was eliminated without establishing an
alternative communication channel.
Unsafe asynchronous evolution of the safety control structure can be prevented
by proper documentation of safety constraints, assumptions, and their controls
during system design and checking before changes are made to determine if the
constraints and assumptions are violated by the design. Unintentional changes and
migration of behavior outside the boundaries of safety can be prevented by various
means, including education, identifying and checking leading indicators, and tar-
geted audits. Part III describes ways to prevent asynchronous evolution from leading
to accidents.
Flawed or Inadequate Control Actions.
There were many flawed or missing control actions at this level, including:
1.•
The Black Hawk pilots were allowed to enter the TAOR without AWACS cover-
age and the F-15 pilots and AWACS crews were not informed about this excep-
tion to the policy. This control problem is an example of the problems of
distributed decision making with other decision makers not being aware of the
decisions of others (see the Zeebrugge example in figure 2.2).
Prior to September 1993, Eagle Flight helicopters flew any time required,
before the fighter sweeps and without fighter coverage, if necessary. After
September 1993, helicopter flights were restricted to the security zone if
AWACS and fighter coverage were not on station. But for the mission on April
14, Eagle Flight requested and received permission to execute their flight
outside the security zone. A CTF policy letter dated September 1993 imple-
mented the following policy for UH-60 helicopter flights supporting the MCC:
“All UH-60 flights into Iraq outside of the security zone require AWACS cover-
age.” Helicopter flights had routinely been flown within the TAOR security
zone without AWACS or fighter coverage and CTF personnel at various levels
were aware of this. MCC personnel were aware of the requirement to have
AWACS coverage for flights outside the security zone and complied with that
requirement. However, the F-15 pilots involved in the accident, relying on the
written guidance in the ACO, believed that no OPC aircraft, fixed or rotary
wing, were allowed to enter the TAOR prior to a fighter sweep [5].
At the same time, the Black Hawks also thought they were operating cor-
rectly. The Army Commander at Zakhu had called the Commander of Opera-
tions, Plans, and Policy for OPC the night before the shootdown and asked to
be able to fly the mission without AWACS coverage. He was told that they must
have AWACS coverage. From the view of the Black Hawks pilots (who had
reported in to the AWACS during the flight and provided their flight plan and
destinations) they were complying and were under AWACS control.
2.•Helicopters were not required to file detailed ,flight plans and follow them.
Effective procedures were not established for communicating last minute
changes or updates to the Army flight plans that had been filed.
3.•F-15 pilots were not told to use non-HQ mode for helicopters.
4.•No procedures were specified to pass SITREP information to CFAC. Helicop-
ter flight plans were not distributed to CFAC and the F-15 pilots, but they were
given to the F-16 squadrons. Why was one squadron informed, while another
one, located right across the street, was not? F-15s are designed primarily for
air superiority—high altitude aerial combat missions. F-16s, on the other hand,
are all-purpose fighters. Unlike F-15s, which rarely flew low-level missions, it
was common for F-16s to fly low-level missions where they might encounter
the low-flying Army helicopters. As a result, to avoid low-altitude midair colli-
sions, staff officers in F-16 squadrons requested details concerning helicopter
operations from the JOIC, went to pick it up from the mail pickup point on the
post, and passed it on to the pilots during their daily briefings; F-15 planners
did not [191].
5.•Inadequate training on the ROE was provided for new rotators. Piper claims
that OPC personnel did not receive consistent, comprehensive training to
ensure they had a thorough understanding of the rules of engagement and that
many of the aircrews new to OPC questioned the need for the less aggressive
rules of engagement in what had been designated a combat zone [159]. Judging
from these complaints (details can be found in [159]) and incidents involving
F-15 pilots, it appears that the pilots did not fully understand the ROE purpose
or need.
6.•Inadequate training was provided to the F-15 pilots on visual identification.
7.•Inadequate simulator and spin-up training was provided to the AWACS crews.
Asynchronous evolution occurred between the changes in the training materi-
als and the actual situation in the no-fly zone. In addition, there were no
controls to ensure the required simulator sessions were provided and that all
members of the crew participated.
8.•Handoff procedures were never established for, helicopters. In fact, no explicit
or written procedures, verbal guidance, or training of any kind were provided
to the AWACS crews regarding the control of helicopters within the TAOR
[191]. The AWACS crews testified during the investigation that they lost contact
with helicopters all the time, but there were no procedures to follow when that
occurred.
9.•Inadequate procedures were specified and enforced for how the shadow crew
would instruct the new crews.
10.•The rules and procedures established for the operation did not provide adequate
control over unsafe F-15 pilot behavior, adequate enforcement of discipline, or
adequate handling of safety violations. The CFAC Assistant Director of Oper-
ations told the GAO investigators that there was very little F-15 oversight in
OPC at the time of the shootdown. There had been so many flight discipline
incidents leading to close calls that a group safety meeting had been held a
week before the shootdown to discuss it. The flight discipline and safety issues
included midair close calls, unsafe incidents when refueling, and unsafe takeoffs.
The fixes (including the meeting) obviously were not effective. But the fact that
there were a lot of close calls indicates serious safety problems existed and were
not handled adequately.
The CFAC Assistant Director of Operations also told the GAO that con-
tentious issues involving F-15 actions had become common topics of discus-
sion at Detachment Commander meetings. No F-15 pilots were on the CTF
staff to communicate with the F-15 group about these problems. The OPC
Commander testified that there was no tolerance for mistakes or unprofes-
sional flying at OPC and that he had regularly sent people home for violation
of the rules—the majority of those he sent home were F-15 pilots, suggesting
that there were serious problems in discipline and attitude among this group
[159].
11.•The Army pilots were given the wrong information about the IFF codes and
radio frequencies to use in the TAOR. As described above, this mismatch
resulted from asynchronous evolution and lack of linkup (consistency) between
process controls, that is, the two different ATOs. It provides yet another example
of the danger involved in distributed decision making (again see figure 2.2).
Reasons for the Flawed Control.
Ineffective Control Algorithms: Almost all of the control flaws at this level relate
to the existence and use of ineffective control algorithms. Equipment and
procedures were not coordinated between the Air Force and the Army to make sure
that communication channels were effective and that asynchronous evolution had
not occurred. The last CTF staff member who appears to have actively coordinated
rotary-wing flying activities with the CFAC organization departed in January 1994.
No representative of the MCC was specifically assigned to the CFAC for coordina-
tion purposes. Since December 1993, no MCC helicopter detachment representative
had attended the CFAC weekly scheduling meetings. The Army liaison officer,
attached to the MCC helicopter detachment at Zakhu and assigned to Incirlik AB,
was new on station (he arrived in April 1994) and was not fully aware of the rela-
tionship of the MCC to the OPC mission [5].
Performance was not monitored to ensure that safety-critical activities were
carried out correctly, that local adaptations had not moved operations beyond safe
limits, and that information was being effectively transmitted and procedures fol-
lowed. Effective controls were not established to prevent unsafe adaptations.
The feedback that was provided about the problems at the lower levels was
ignored. For example, the Piper account of the accident includes a reference to
helicopter pilots testimony that six months before the shootdown, in October 1993,
they had complained that the fighter aircraft were using their radar to lock onto the
Black Hawks an unacceptable number of times. The Army helicopter pilots had
argued there was an urgent need for the Black Hawk pilots to be able to commu-
nicate with the fixed-wing aircraft, but nothing was changed until after the accident,
when new radios were installed in the Black Hawks.
Inaccurate Mental Models: The commander of the Combined Task Force thought
that the appropriate control and coordination was occurring. This incorrect mental
model was supported by the feedback he received flying as a regular passenger on
board the Army helicopter flights, where it was his perception that the AWACS was
monitoring their flight effectively. The Army helicopter pilots were using the Delta
Point system to report their location and flight plans, and there was no indication
from the AWACS that the messages were being ignored. The CTF Commander
testified that he believed the Delta Point system was standard on all AWACS mis-
sions. When asked at the court-martial of the AWACS senior director whether the
AWACS crew were tracking Army helicopters, the OPC Commander replied:
Well, my experience from flying dozens of times on Eagle Flight, which that—for some
eleven hundred and nine days prior to this event, that was—that was normal procedures
for them to flight follow. So, I dont know that they had something written about it, but I
know that it seemed very obvious and clear to me as a passenger on Eagle Flight numer-
ous times that that was occurring. [159]
The commander was also an active F-16 pilot who attended the F-16 briefings. At
these briefings he observed that Black Hawk times were part of the daily ATOs
procedures were not coordinated between the Air Force and the Army to make sure
that communication channels were effective and that asynchronous evolution had
not occurred. The last CTF staff member who appears to have actively coordinated
rotary-wing flying activities with the CFAC organization departed in January 1994.
No representative of the MCC was specifically assigned to the CFAC for coordina-
tion purposes. Since December 1993, no MCC helicopter detachment representative
had attended the CFAC weekly scheduling meetings. The Army liaison officer,
attached to the MCC helicopter detachment at Zakhu and assigned to Incirlik AB,
was new on station (he arrived in April 1994) and was not fully aware of the rela-
tionship of the MCC to the OPC mission [5].
Performance was not monitored to ensure that safety-critical activities were
carried out correctly, that local adaptations had not moved operations beyond safe
limits, and that information was being effectively transmitted and procedures fol-
lowed. Effective controls were not established to prevent unsafe adaptations.
The feedback that was provided about the problems at the lower levels was
ignored. For example, the Piper account of the accident includes a reference to
helicopter pilots testimony that six months before the shootdown, in October 1993,
they had complained that the fighter aircraft were using their radar to lock onto the
Black Hawks an unacceptable number of times. The Army helicopter pilots had
argued there was an urgent need for the Black Hawk pilots to be able to commu-
nicate with the fixed-wing aircraft, but nothing was changed until after the accident,
when new radios were installed in the Black Hawks.
Inaccurate Mental Models: The commander of the Combined Task Force thought
that the appropriate control and coordination was occurring. This incorrect mental
model was supported by the feedback he received flying as a regular passenger on
board the Army helicopter flights, where it was his perception that the AWACS was
monitoring their flight effectively. The Army helicopter pilots were using the Delta
Point system to report their location and flight plans, and there was no indication
from the AWACS that the messages were being ignored. The CTF Commander
testified that he believed the Delta Point system was standard on all AWACS mis-
sions. When asked at the court-martial of the AWACS senior director whether the
AWACS crew were tracking Army helicopters, the OPC Commander replied:
Well, my experience from flying dozens of times on Eagle Flight, which that—for some
eleven hundred and nine days prior to this event, that was—that was normal procedures
for them to flight follow. So, I dont know that they had something written about it, but I
know that it seemed very obvious and clear to me as a passenger on Eagle Flight numer-
ous times that that was occurring. [159]
The commander was also an active F-16 pilot who attended the F-16 briefings. At
these briefings he observed that Black Hawk times were part of the daily ATOs
received by the F-16 pilots and assumed that all squadrons were receiving the same
information. However, as noted, the head of the squadron with which the com-
mander flew had gone out of his way to procure the Black Hawk flight information,
while the F-15 squadron leader had not.
Many of those involved at this level were also under the impression that the
ATOs provided to the F-15 pilots and to the Black Hawks pilots were consistent,
that required information had been distributed to everyone, that official procedures
were understood and being followed, and so on.
Coordination among Multiple Controllers: There were clearly problems with over-
lapping and boundary areas of control between the Army and the Air Force. Coor-
dination problems between the services are legendary and were not handled
adequately here. For example, two different versions of the ATO were provided to
the Air Force and the Army pilots. The Air Force F-15s and the Army helicopters
had separate control structures, with a common control point fairly high above the
physical process. The problems were complicated by the differing importance of
flexibility in flight plans between the two services. One symptom of the problem
was that there was no requirement for helicopters to file detailed flight plans and
follow them and no procedures established to deal with last minute changes. These
deficiencies were also related to the shared control of helicopters by MCC and
CFAC and complicated by the physical separation of the two headquarters.
During the accident investigation, a question was raised about whether the Com-
bined Task Force Chief of Staff was responsible for the breakdown in staff com-
munication. After reviewing the evidence, the hearing officer recommended that no
adverse action be taken against the Chief of Staff because he (1) had focused his
attention according to the CTF Commanders direction, (2) had neither specific
direction nor specific reason to inquire into the transmission of info between his
Director of Operations for Plans and Policy and the CFAC, (3) had been the most
recent arrival and the only senior Army member of a predominantly Air Force staff
and therefore generally unfamiliar with air operations, and (4) had relied on expe-
rienced colonels under whom the deficiencies had occurred [200]. This conclusion
was obviously influenced by the goal of trying to establish blame. Ignoring the blame
aspects, the conclusion gives the impression that nobody was in charge and everyone
thought someone else was.
According to the official accident report, the contents of the ACO largely reflected
the guidance given in the operations plan dated September 7, 1991. But that was the
plan provided before the mission had changed. The accident report concludes that
key CTF personnel at the time of the accident were either unaware of the existence
of this particular plan or considered it too outdated to be applicable. The accident
report states, “Most key personnel within the CFAC and CTF staff did not consider
coordination of MCC helicopter activities to be part of their respective CFAC / CTF
responsibilities” [5].
Because of the breakdown of clear guidance from the Combined Task Force staff
to its component organizations (CFAC and MCC ) , they did not have a clear under-
standing of their respective responsibilities. Consequently, MCC helicopter activities
were not fully integrated with other OPC air operations in the TAOR.
section 5.4.
Conclusions from the Friendly Fire Example.
When looking only at the proximate events and the behavior of the immediate
participants in the accidental shootdown, the reasons for this accident appear to be
gross mistakes by the technical system operators (the pilots and AWACS crew). In
fact, a special Air Force task force composed of more than 120 people in six com-
mands concluded that two breakdowns in individual performance contributed to
the shootdown: (1) the AWACS mission crew did not provide the F-15 pilots an
accurate picture of the situation and (2) the F-15 pilots misidentified the target.
From the twenty-one-volume accident report produced by the Accident Investiga-
tion Board, Secretary of Defense William Perry summarized the “errors, omissions,
and failures” in the “chain of events” leading to the loss as:
1.• The F-15 pilots misidentified the helicopters as Iraqi Hinds.
2.• The AWACS crew failed to intervene.
3.• The helicopters and their operations were not integrated into the Task Force
running the no-fly zone operations.
4.• The Identity Friend or Foe ( IFF ) systems failed.
According to Snook, the military community has generally accepted these four
“causes” as the explanation for the shootdown.
While there certainly were mistakes made at the pilot and AWACS levels, the
use of the STAMP analysis paints a much more complete explanation of the role of
the environment and other factors that influenced their behavior including: incon-
sistent, missing, or inaccurate information; incompatible technology; inadequate
coordination; overlapping areas of control and confusion about who was responsible
for what; a migration toward more efficient but less safe operational procedures
over time without any controls and checks on the potential adaptations; inadequate
training; and in general a control structure that did not enforce the safety constraints.
Boiling down this very complex accident to four “causes” and assigning blame in
this way inhibits learning from the events. The more complete STAMP analysis was
possible only because individuals outside the military, some of whom were relatives
of the victims, did not accept the simple analysis provided in the accident report and
did their own uncovering of the facts.
STAMP views an accident as a dynamic process. In this case, Army and Air Force
operations adapted and diverged without communication and coordination. OPC
had operated incident-free for over three years at the time of the shootdown. During
that time, local adaptations to compensate for inadequate control from above had
managed to mask the ongoing problems until a situation occurred where local
adaptations did not work. A lack of awareness at the highest levels of command of
the severity of the coordination, communication, and other problems is a key factor
in this accident.
Nearly all the types of causal factors identified in section 4.5 can be found in this
accident. This fact is not an anomaly: Most accidents involve a large number of these
factors. Concentrating on an event chain focuses attention on the proximate events
associated with the accident and thus on the principle local actors, in this case, the
pilots and the AWACS personnel. Treating an accident as a control problem using
STAMP clearly identifies other organizational factors and actors and the role they
played. Most important, without this broader view of the accident, only the symp-
toms of the organizational problems may be identified and eliminated without
significantly reducing risk of a future accident caused by the same systemic factors
but involving different symptoms at the lower technical and operational levels of
the control structure.
More information on how to build multiple views of an accident using STAMP
in order to aid understanding can be found in chapter 11. More examples of STAMP
accident analyses can be found in the appendixes.

2157
chapter05.txt Normal file

File diff suppressed because it is too large Load Diff

349
chapter06.raw Normal file
View File

@ -0,0 +1,349 @@
part 3. USING STAMP.
STAMP provides a new theoretical foundation for system safety on which new, more
powerful techniques and tools for system safety can be constructed. Part III presents
some practical methods for engineering safer systems. All the techniques described
in part III have been used successfully on real systems. The surprise to those trying
them has been how well they work on enormously complex systems and how eco-
nomical they are to use. Improvements and even more applications of the theory to
practice will undoubtedly be created in the future.
chapter 6.
Engineering and Operating Safer Systems Using
STAMP.
Part III of this book is for those who want to build safer systems without incurring
enormous and perhaps impractical financial, time, and performance costs. The belief
that building and operating safer systems requires such penalties is widespread and
arises from the way safety engineering is usually done today. It need not be the case.
The use of top-down system safety engineering and safety-guided design based on
STAMP can not only enhance the safety of these systems but also potentially reduce
the costs associated with engineering for safety. This chapter provides an overview,
while the chapters following it provide details about how to implement this cost-
effective safety process.
section 6.1.
Why Are Safety Efforts Sometimes Not Cost-Effective?
While there are certainly some very effective safety engineering programs, too
many expend a large amount of resources with little return on the investment in
terms of improved safety. To fix a problem, we first need to understand it. Why are
safety efforts sometimes not cost-effective? There are five general answers to this
question:
1. Safety efforts may be superficial, isolated, or misdirected.
2. Safety activities often start too late.
3. The techniques used are not appropriate for the systems we are building today
and for new technology.
4. Efforts may be narrowly focused on the technical components.
5. Systems are usually assumed to be static throughout their lifetime.
Superficial, isolated, or misdirected safety engineering activities: Often, safety
engineering consists of performing a lot of very costly and tedious activities of
limited usefulness in improving safety in the final system design. Childs calls this
“cosmetic system safety” [37]. Detailed hazard logs are created and analyses
performed, but these have limited impact on the actual system design. Numbers are
associated with unquantifiable properties. These numbers always seem to support
whatever numerical requirement is the goal, and all involved feel as if they have
done their jobs. The safety analyses provide the answer the customer or designer
wants—that the system is safe—and everyone is happy. Haddon-Cave, in the 2009
Nimrod MR2 accident report, called such efforts compliance only exercises [78]. The
results impact certification of the system or acceptance by management, but despite
all the activity and large amounts of money spent, the safety of the system has been
unaffected.
A variant of this problem is that safety activities may be isolated from the engi-
neers and developers building the system. Too often, safety professionals are sepa-
rated from engineering design and placed within a mission assurance organization.
Safety cannot be assured without its already being part of the design; systems must
be constructed to be safe from the beginning. Separating safety engineering from
design engineering is almost guaranteed to make the effort and resources expended
a poor investment. Safety engineering is effective when it participates in and pro-
vides input to the design process, not when it focuses on making arguments about
the artifacts created after the major safety-related decisions have been made.
Sometimes the major focus of the safety engineering efforts is on creating a safety
case that proves the completed design is safe, often by showing that a particular
process was followed during development. Simply following a process does not
mean that the process was effective, which is the basic limitation of many process
assurance activities. In other cases the arguments go beyond the process, but they
start from the assumption that the system is safe and then focus on showing the
conclusion is true. Most of the effort is spent in seeking evidence that shows the
system is safe while not looking for evidence that the system is not safe. The basic
mindset is wrong, so the conclusions are biased.
One of the reasons System Safety has been so successful is that it takes the oppo-
site approach: an attempt is made to show that the system is unsafe and to identify
hazardous scenarios. By using this alternative perspective, paths to hazards are often
identified that were missed by the engineers, who tend to focus on what they want
to happen, not what they do not want to happen.
If safety-guided design, as defined in part III of this book, is used, the “safety
case” is created along with the design. Developing the certification argument
becomes trivial and consists primarily of simply gathering the documentation that
has been created during the development process.
Safety efforts start too late: Unlike the examples of ineffective safety activities
above, the safety efforts may involve potentially useful activities, but they may start
too late. Frola and Miller claim that 7080 percent of the most critical decisions
related to the safety of the completed system are made during early concept devel-
opment [70]. Unless the safety engineering effort impacts these decisions, it is
unlikely to have much effect on safety. Too often, safety engineers are busy doing
safety analyses, while the system engineers are in parallel making critical decisions
about system design and concepts of operation that are not based on that hazard
analysis. By the time the system engineers get the information generated by the
safety engineers, it is too late to have a significant impact on design decisions.
Of course, engineers normally do try to consider safety early, but the information
commonly available is only whether a particular function is safety-critical or not.
They are told that the function they are designing can contribute to an accident,
with perhaps some letter or numerical “score” of how critical it is, but not much else.
Armed only with this very limited information, they have no choice but to focus
safety design efforts on increasing the components reliability by adding redundancy
or safety margins. These features are often added without careful analysis of whether
they are needed or will be effective for the specific hazards related to that system
function. The design then becomes expensive to build and maintain without neces-
sarily having the maximum possible (or sometimes any) impact on eliminating
or reducing hazards. As argued earlier, redundancy and overdesign, such as building
in safety margins, are effective primarily for purely electromechanical components
and component failure accidents. They do not apply to software and miss component
interaction accidents entirely. In some cases, such design techniques can even
contribute to component interaction accidents when they add to the complexity of
the design.
Most of our current safety engineering techniques start from detailed designs. So
even if they are conscientiously applied, they are useful only in evaluating the safety
of a completed design, not in guiding the decisions made early in the design creation
process. One of the results of evaluating designs after they are created is that engi-
neers are confronted with important safety concerns only after it is too late or too
expensive to make significant changes. If and when the system and component
design engineers get the results of the safety activities, often in the form of a critique
of the design late in the development process, the safety concerns are frequently
ignored or argued away because changing the design at that time is too costly.
Design reviews then turn into contentious exercises where one side argues that the
system has serious safety limitations while the other side argues that those limita-
tions do not exist, they are not serious, or the safety analysis is wrong.
The problem is not a lack of concern by designers; its simply that safety concerns
about their design are raised at a time when major design changes are not possible—
the design engineers have no other option than to defend the design they have.
If they lose that argument, then they must try to patch the current design; starting
over with a safer design is, in almost all cases, impractical. If the designers had the
information necessary to factor safety into their early decision making, then the
process of creating safer designs need cost no more and, in fact, will cost less due
to two factors: (1) reduced rework after the decisions made are found to be flawed
or to provide inadequate safety and (2) less unnecessary overdesign and unneeded
protection.
The key to having a cost-effective safety effort is to embed it into a system
engineering process starting from early concept development and then to design
safety into the system as the design decisions are made. Costs are much less when
safety is built into the system design from the beginning rather than added on or
retrofitted later.
The techniques used are not appropriate for todays systems and new technol-
ogy: The assumptions of the major safety engineering techniques currently used,
almost all of which stem from decades past, do not match the assumptions underlying
the technology and complexity of the systems being built today or the new emerging
causes of accidents: They do not apply to human or software errors or flawed man-
agement decision making, and they certainly do not apply to weaknesses in the
organizational structure or social infrastructure systems. These contributors to acci-
dents do not “fail” in the same way assumed by the current safety analysis tools.
But with no other tools to use, safety engineers attempt to force square pegs into
round holes, hoping this will be sufficient. As a result, nothing much is accomplished
beyond expending time, money, and other resources. Its time we face up to the fact
that new safety engineering techniques are needed to handle those aspects of
systems that go beyond the analog hardware components and the relatively simple
designs of the past for which the current techniques were invented. Chapter 8
describes a new hazard analysis technique based on STAMP, called STPA, but others
are possible. The important thing is to confront these problems head on and not
ignore them and waste our time misapplying or futilely trying to extend techniques
that do not apply to todays systems.
The safety efforts are focused on the technical components of the system: Many
safety engineering (and system engineering, for that matter) efforts focus on the
technical system details. Little effort is made to consider the social, organizational,
and human components of the system in the design process. Assumptions are made
that operators will be trained to do the right things and that they will adapt to
whatever design they are given. Sophisticated human factors and system analysis
input is lacking, and when accidents inevitably result, they are blamed on the opera-
tors for not behaving the way the designers thought they would. To give just one
example (although most accident reports contain such examples), one of the four
causes, all of which cited pilot error, identified in the loss of the American Airlines
B757 near Cali, Colombia (see chapter 2), was “Failure of the flight crew to revert
to basic radio navigation when the FMS-assisted navigation became confusing and
demanded an excessive workload in a critical phase of the flight.” A more useful
alternative statement of the cause might have been “An FMS system that confused
the operators and demanded an excessive workload in a critical phase of flight.”
Virtually all systems contain humans, but engineers are often not taught much
about human factors and draw convenient boundaries around the technical com-
ponents, focusing their attention inside these artificial boundaries. Human factors
experts have complained about the resulting technology-centered automation [208],
where the designers focus on technical issues and not on supporting operator tasks.
The result is what has been called “clumsy” automation that increases the chance
of human error [183, 22, 208]. One of the new assumptions for safety in chapter 2
is that operator “error” is a product of the environment in which it occurs.
A variant of the problem is common in systems using information technology.
Many medical information systems, for example, have not been as successful as they
might have been in increasing safety and have even led to new types of hazards and
losses [104, 140]. Often, little effort is invested during development in considering
the usability of the system by medical professionals or of the impact, not always
positive, that the information system design will have on workflow and on the
practice of medicine.
Automation is commonly assumed to be safer than manual systems because
the hazards associated with the manual systems are eliminated. Inadequate con-
sideration is given to whether new, and maybe even worse, hazards are introduced
by the automated system and how to prevent or minimize these new hazards. The
aviation industry has, for the most part, learned this lesson for cockpit and flight
control design, where eliminating errors of commission simply created new errors
of omission [181, 182] (see chapter 9), but most other industries are far behind in
this respect.
Like other safety-related system properties that are ignored until too late, opera-
tors and human-factors experts often are not brought into the early design process
or they work in isolation from the designers until changes are extremely expensive
to make. Sometimes, human factors design is not considered until after an accident,
and occasionally not even then, almost guaranteeing that more accidents will occur.
To provide cost-effective safety engineering, the system and safety analysis
and design process needs to consider the humans in systems—including those that
are not directly controlling the physical processes—not separately or after the fact
but starting at concept development and continuing throughout the life cycle of
the system.
Systems are assumed to be static throughout their lifetimes: It is rare for engi-
neers to consider how the system will evolve and change over time. While designing
for maintainability may be considered, unintended changes are often ignored.
Change is a constant for all systems: physical equipment ages and degrades over
its lifetime and may not be maintained properly; human behavior and priorities
usually change over time; organizations change and evolve, which means the safety
control structure itself will evolve. Change may also occur in the physical and social
environment within which the system operates and with which it interacts. To be
effective, controls need to be designed that will reduce the risk associated with all
these types of changes. Not only are accidents expensive, but once again planning
for system change can reduce the costs associated with the change itself. In addition,
much of the effort in operations needs to be focused on managing and reacting
to change.
section 6.2.
The Role of System Engineering in Safety.
As the systems we build and operate increase in size and complexity, the use of
sophisticated system engineering approaches becomes more critical. Important
system-level (emergent) properties, such as safety, must be built into the design of
these systems; they cannot be effectively added on or simply measured afterward.
While system engineering was developed originally for technical systems, the
approach is just as important and applicable to social systems or the social compo-
nents of systems that are usually not thought of as “engineered.” All systems are
engineered in the sense that they are designed to achieve specific goals, namely to
satisfy requirements and constraints. So ensuring hospital safety or pharmaceutical
safety, for example, while not normally thought of as engineering problems, falls
within the broad definition of engineering. The goal of the system engineering
process is to create a system that satisfies the mission while maintaining the con-
straints on how the mission is achieved.
Engineering is a way of organizing that design process to achieve the most
cost-effective results. Social systems may not have been “designed” in the sense of
a purposeful design process but may have evolved over time. Any effort to change
such systems in order to improve them, however, can be thought of as a redesign or
reengineering process and can again benefit from a system engineering approach.
When using STAMP as the underlying causality model, engineering or reengineer-
ing safer systems means designing (or redesigning) the safety-control structure and
the controls designed into it to ensure the system operates safely, that is, without
unacceptable losses. What is being controlled—chemical manufacturing processes,
spacecraft or aircraft, public health, safety of the food supply, corporate fraud, risks
in the financial system—is irrelevant in terms of the general process, although
significant differences will exist in the types of controls applicable and the design
of those controls. The process, however, is very similar to a regular system engineer-
ing process.
The problem is that most engineering and even many system engineering tech-
niques were developed under conditions and assumptions that do not hold for
complex social systems, as discussed in part I. But STAMP and new system-theoretic
approaches to safety can point the way forward for both complex technical and
social processes. The general engineering and reengineering process described in
part III applies to all systems.
section 6.3.
A System Safety Engineering Process.
In STAMP, accidents and losses result from not enforcing safety constraints on
behavior. Not only must the original system design incorporate appropriate con-
straints to ensure safe operations, but the safety constraints must continue to be
enforced as changes and adaptations to the system design occur over time. This goal
forms the basis for safe management, development, and operations.
There is no agreed upon best system engineering process and probably cannot
be one—the process needs to match the specific problem and environment in which
it is being used. What is described in part III of this book is how to integrate system
safety into any reasonable system engineering process. Figure 6.1 shows the three
major components of a cost-effective system safety process: management, develop-
ment, and operations.
section 6.3.1. Management.
Safety starts with management leadership and commitment. Without these, the
efforts of others in the organization are almost doomed to failure. Leadership
creates culture, which drives behavior.
Besides setting the culture through their own behavior, managers need to estab-
lish the organizational safety policy and create a safety control structure with appro-
priate responsibilities, accountability and authority, safety controls, and feedback
channels. Management must also establish a safety management plan and ensure
that a safety information system and continual learning and improvement processes
are in place and effective.
Chapter 13 discusses managements role and responsibilities in safety.
section 6.3.2. Engineering Development.
The key to having a cost-effective safety effort is to embed it into a system engineer-
ing process from the very beginning and to design safety into the system as the
design decisions are made. All viewpoints and system components must be included
in the process and information used and documented in a way that is accessible,
understandable, and helpful.
System engineering starts with first determining the goals of the system. Potential
hazards to be avoided are then identified. From the goals and system hazards, a set
of system functional and safety requirements and constraints are identified that set
the foundation for design, operations, and management. Chapter 7 describes how
to establish these fundamentals.
To start safety engineering early enough to be cost-effective, safety must be con-
sidered from the early concept formation stages of development and continue
throughout the life cycle of the system. Design decisions should be guided by safety
considerations while at the same time taking other system requirements and con-
straints into account and resolving conflicts. The hazard analysis techniques used
must not require a completed design and must include all the factors involved
in accidents. Chapter 8 describes a new hazard analysis technique, based on the
STAMP model of causation, that provides the information necessary to design
safety into the system, and chapter 9 shows how to use it in a safety-guided design
process. Chapter 9 also presents general principles for safe design including how to
design systems and system components used by humans that do not contribute to
human error.
Documentation is critical not only for communication in the design and develop-
ment process but also because of inevitable changes over time. That documentation
must include the rationale for the design decisions and traceability from high-level
requirements and constraints down to detailed design features. After the original
system development is finished, the information necessary to operate and maintain
it safely must be passed in a usable form to operators and maintainers. Chapter 10
describes how to integrate safety considerations into specifications and the general
system engineering process.
Engineers have often concentrated more on the technological aspects of system
development while assuming that humans in the system will either adapt to what-
ever is given to them or will be trained to do the “right thing.” When an accident
occurs, it is blamed on the operator. This approach to safety, as argued above, is
one of the reasons safety engineering is not as effective as it could be. The system
design process needs to start by considering the human controller and continuing
that perspective throughout development. The best way to reach that goal is to
involve operators in the design decisions and safety analyses. Operators are
sometimes left out of the conceptual design stages and only brought in later in
development. To design safer systems, operators and maintainers must be included
in the design process starting from the conceptual development stage and con-
siderations of human error and preventing it should be at the forefront of the
design effort.
Many companies, particularly in aerospace, use integrated product teams that
include, among others, design engineers, safety engineers, human factors experts,
potential users of the system (operators), and maintainers. But the development
process used may not necessarily take maximum advantage of this potential for
collaboration. The process outlined in part III tries to do that.
section 6.3.3. Operations.
Once the system is built, it must be operated safely. System engineering creates the
basic information needed to do this in the form of the safety constraints and operat-
ing assumptions upon which the safety of the design was based. These constraints
and assumptions must be passed to operations in a form that they can understand
and use.
Because changes in the physical components, human behavior, and the organiza-
tional safety control structure are almost guaranteed to occur over the life of the
system, operations must manage change in order to ensure that the safety con-
straints are not violated. The requirements for safe operations are discussed in
chapter 12.
Its now time to look at the changes in system engineering, operations, and man-
agement, based on STAMP, that can assist in engineering a safer world.

312
chapter06.txt Normal file
View File

@ -0,0 +1,312 @@
part 3. USING STAMP.
STAMP provides a new theoretical foundation for system safety on which new, more
powerful techniques and tools for system safety can be constructed. Part 3 presents
some practical methods for engineering safer systems. All the techniques described
in part 3 have been used successfully on real systems. The surprise to those trying
them has been how well they work on enormously complex systems and how economical they are to use. Improvements and even more applications of the theory to
practice will undoubtedly be created in the future.
chapter 6.
Engineering and Operating Safer Systems Using
STAMP.
Part 3 of this book is for those who want to build safer systems without incurring
enormous and perhaps impractical financial, time, and performance costs. The belief
that building and operating safer systems requires such penalties is widespread and
arises from the way safety engineering is usually done today. It need not be the case.
The use of top-down system safety engineering and safety-guided design based on
STAMP can not only enhance the safety of these systems but also potentially reduce
the costs associated with engineering for safety. This chapter provides an overview,
while the chapters following it provide details about how to implement this costeffective safety process.
section 6.1.
Why Are Safety Efforts Sometimes Not Cost-Effective?
While there are certainly some very effective safety engineering programs, too
many expend a large amount of resources with little return on the investment in
terms of improved safety. To fix a problem, we first need to understand it. Why are
safety efforts sometimes not cost-effective? There are five general answers to this
question.
1. Safety efforts may be superficial, isolated, or misdirected.
2. Safety activities often start too late.
3. The techniques used are not appropriate for the systems we are building today
and for new technology.
4. Efforts may be narrowly focused on the technical components.
5. Systems are usually assumed to be static throughout their lifetime.
Superficial, isolated, or misdirected safety engineering activities. Often, safety
engineering consists of performing a lot of very costly and tedious activities of
limited usefulness in improving safety in the final system design. Childs calls this
“cosmetic system safety” . Detailed hazard logs are created and analyses
performed, but these have limited impact on the actual system design. Numbers are
associated with unquantifiable properties. These numbers always seem to support
whatever numerical requirement is the goal, and all involved feel as if they have
done their jobs. The safety analyses provide the answer the customer or designer
wants.that the system is safe.and everyone is happy. Haddon-Cave, in the 2 thousand 9
Nimrod MR2 accident report, called such efforts compliance only exercises . The
results impact certification of the system or acceptance by management, but despite
all the activity and large amounts of money spent, the safety of the system has been
unaffected.
A variant of this problem is that safety activities may be isolated from the engineers and developers building the system. Too often, safety professionals are separated from engineering design and placed within a mission assurance organization.
Safety cannot be assured without its already being part of the design; systems must
be constructed to be safe from the beginning. Separating safety engineering from
design engineering is almost guaranteed to make the effort and resources expended
a poor investment. Safety engineering is effective when it participates in and provides input to the design process, not when it focuses on making arguments about
the artifacts created after the major safety-related decisions have been made.
Sometimes the major focus of the safety engineering efforts is on creating a safety
case that proves the completed design is safe, often by showing that a particular
process was followed during development. Simply following a process does not
mean that the process was effective, which is the basic limitation of many process
assurance activities. In other cases the arguments go beyond the process, but they
start from the assumption that the system is safe and then focus on showing the
conclusion is true. Most of the effort is spent in seeking evidence that shows the
system is safe while not looking for evidence that the system is not safe. The basic
mindset is wrong, so the conclusions are biased.
One of the reasons System Safety has been so successful is that it takes the opposite approach. an attempt is made to show that the system is unsafe and to identify
hazardous scenarios. By using this alternative perspective, paths to hazards are often
identified that were missed by the engineers, who tend to focus on what they want
to happen, not what they do not want to happen.
If safety-guided design, as defined in part 3 of this book, is used, the “safety
case” is created along with the design. Developing the certification argument
becomes trivial and consists primarily of simply gathering the documentation that
has been created during the development process.
Safety efforts start too late. Unlike the examples of ineffective safety activities
above, the safety efforts may involve potentially useful activities, but they may start
too late. Frola and Miller claim that 7080 percent of the most critical decisions
related to the safety of the completed system are made during early concept development . Unless the safety engineering effort impacts these decisions, it is
unlikely to have much effect on safety. Too often, safety engineers are busy doing
safety analyses, while the system engineers are in parallel making critical decisions
about system design and concepts of operation that are not based on that hazard
analysis. By the time the system engineers get the information generated by the
safety engineers, it is too late to have a significant impact on design decisions.
Of course, engineers normally do try to consider safety early, but the information
commonly available is only whether a particular function is safety-critical or not.
They are told that the function they are designing can contribute to an accident,
with perhaps some letter or numerical “score” of how critical it is, but not much else.
Armed only with this very limited information, they have no choice but to focus
safety design efforts on increasing the components reliability by adding redundancy
or safety margins. These features are often added without careful analysis of whether
they are needed or will be effective for the specific hazards related to that system
function. The design then becomes expensive to build and maintain without necessarily having the maximum possible .(or sometimes any). impact on eliminating
or reducing hazards. As argued earlier, redundancy and overdesign, such as building
in safety margins, are effective primarily for purely electromechanical components
and component failure accidents. They do not apply to software and miss component
interaction accidents entirely. In some cases, such design techniques can even
contribute to component interaction accidents when they add to the complexity of
the design.
Most of our current safety engineering techniques start from detailed designs. So
even if they are conscientiously applied, they are useful only in evaluating the safety
of a completed design, not in guiding the decisions made early in the design creation
process. One of the results of evaluating designs after they are created is that engineers are confronted with important safety concerns only after it is too late or too
expensive to make significant changes. If and when the system and component
design engineers get the results of the safety activities, often in the form of a critique
of the design late in the development process, the safety concerns are frequently
ignored or argued away because changing the design at that time is too costly.
Design reviews then turn into contentious exercises where one side argues that the
system has serious safety limitations while the other side argues that those limitations do not exist, they are not serious, or the safety analysis is wrong.
The problem is not a lack of concern by designers; its simply that safety concerns
about their design are raised at a time when major design changes are not possible.
the design engineers have no other option than to defend the design they have.
If they lose that argument, then they must try to patch the current design; starting
over with a safer design is, in almost all cases, impractical. If the designers had the
information necessary to factor safety into their early decision making, then the
process of creating safer designs need cost no more and, in fact, will cost less due
to two factors. .(1). reduced rework after the decisions made are found to be flawed
or to provide inadequate safety and .(2). less unnecessary overdesign and unneeded
protection.
The key to having a cost-effective safety effort is to embed it into a system
engineering process starting from early concept development and then to design
safety into the system as the design decisions are made. Costs are much less when
safety is built into the system design from the beginning rather than added on or
retrofitted later.
The techniques used are not appropriate for todays systems and new technology. The assumptions of the major safety engineering techniques currently used,
almost all of which stem from decades past, do not match the assumptions underlying
the technology and complexity of the systems being built today or the new emerging
causes of accidents. They do not apply to human or software errors or flawed management decision making, and they certainly do not apply to weaknesses in the
organizational structure or social infrastructure systems. These contributors to accidents do not “fail” in the same way assumed by the current safety analysis tools.
But with no other tools to use, safety engineers attempt to force square pegs into
round holes, hoping this will be sufficient. As a result, nothing much is accomplished
beyond expending time, money, and other resources. Its time we face up to the fact
that new safety engineering techniques are needed to handle those aspects of
systems that go beyond the analog hardware components and the relatively simple
designs of the past for which the current techniques were invented. Chapter 8
describes a new hazard analysis technique based on STAMP, called STPA, but others
are possible. The important thing is to confront these problems head on and not
ignore them and waste our time misapplying or futilely trying to extend techniques
that do not apply to todays systems.
The safety efforts are focused on the technical components of the system. Many
safety engineering .(and system engineering, for that matter). efforts focus on the
technical system details. Little effort is made to consider the social, organizational,
and human components of the system in the design process. Assumptions are made
that operators will be trained to do the right things and that they will adapt to
whatever design they are given. Sophisticated human factors and system analysis
input is lacking, and when accidents inevitably result, they are blamed on the operators for not behaving the way the designers thought they would. To give just one
example .(although most accident reports contain such examples), one of the four
causes, all of which cited pilot error, identified in the loss of the American Airlines
B757 near Cali, Colombia .(see chapter 2), was “Failure of the flight crew to revert
to basic radio navigation when the FMS-assisted navigation became confusing and
demanded an excessive workload in a critical phase of the flight.” A more useful
alternative statement of the cause might have been “An FMS system that confused
the operators and demanded an excessive workload in a critical phase of flight.”
Virtually all systems contain humans, but engineers are often not taught much
about human factors and draw convenient boundaries around the technical components, focusing their attention inside these artificial boundaries. Human factors
experts have complained about the resulting technology-centered automation ,
where the designers focus on technical issues and not on supporting operator tasks.
The result is what has been called “clumsy” automation that increases the chance
of human error . One of the new assumptions for safety in chapter 2
is that operator “error” is a product of the environment in which it occurs.
A variant of the problem is common in systems using information technology.
Many medical information systems, for example, have not been as successful as they
might have been in increasing safety and have even led to new types of hazards and
losses . Often, little effort is invested during development in considering
the usability of the system by medical professionals or of the impact, not always
positive, that the information system design will have on workflow and on the
practice of medicine.
Automation is commonly assumed to be safer than manual systems because
the hazards associated with the manual systems are eliminated. Inadequate consideration is given to whether new, and maybe even worse, hazards are introduced
by the automated system and how to prevent or minimize these new hazards. The
aviation industry has, for the most part, learned this lesson for cockpit and flight
control design, where eliminating errors of commission simply created new errors
of omission .(see chapter 9), but most other industries are far behind in
this respect.
Like other safety-related system properties that are ignored until too late, operators and human-factors experts often are not brought into the early design process
or they work in isolation from the designers until changes are extremely expensive
to make. Sometimes, human factors design is not considered until after an accident,
and occasionally not even then, almost guaranteeing that more accidents will occur.
To provide cost-effective safety engineering, the system and safety analysis
and design process needs to consider the humans in systems.including those that
are not directly controlling the physical processes.not separately or after the fact
but starting at concept development and continuing throughout the life cycle of
the system.
Systems are assumed to be static throughout their lifetimes. It is rare for engineers to consider how the system will evolve and change over time. While designing
for maintainability may be considered, unintended changes are often ignored.
Change is a constant for all systems. physical equipment ages and degrades over
its lifetime and may not be maintained properly; human behavior and priorities
usually change over time; organizations change and evolve, which means the safety
control structure itself will evolve. Change may also occur in the physical and social
environment within which the system operates and with which it interacts. To be
effective, controls need to be designed that will reduce the risk associated with all
these types of changes. Not only are accidents expensive, but once again planning
for system change can reduce the costs associated with the change itself. In addition,
much of the effort in operations needs to be focused on managing and reacting
to change.
section 6.2.
The Role of System Engineering in Safety.
As the systems we build and operate increase in size and complexity, the use of
sophisticated system engineering approaches becomes more critical. Important
system-level .(emergent). properties, such as safety, must be built into the design of
these systems; they cannot be effectively added on or simply measured afterward.
While system engineering was developed originally for technical systems, the
approach is just as important and applicable to social systems or the social components of systems that are usually not thought of as “engineered.” All systems are
engineered in the sense that they are designed to achieve specific goals, namely to
satisfy requirements and constraints. So ensuring hospital safety or pharmaceutical
safety, for example, while not normally thought of as engineering problems, falls
within the broad definition of engineering. The goal of the system engineering
process is to create a system that satisfies the mission while maintaining the constraints on how the mission is achieved.
Engineering is a way of organizing that design process to achieve the most
cost-effective results. Social systems may not have been “designed” in the sense of
a purposeful design process but may have evolved over time. Any effort to change
such systems in order to improve them, however, can be thought of as a redesign or
reengineering process and can again benefit from a system engineering approach.
When using STAMP as the underlying causality model, engineering or reengineering safer systems means designing .(or redesigning). the safety-control structure and
the controls designed into it to ensure the system operates safely, that is, without
unacceptable losses. What is being controlled.chemical manufacturing processes,
spacecraft or aircraft, public health, safety of the food supply, corporate fraud, risks
in the financial system.is irrelevant in terms of the general process, although
significant differences will exist in the types of controls applicable and the design
of those controls. The process, however, is very similar to a regular system engineering process.
The problem is that most engineering and even many system engineering techniques were developed under conditions and assumptions that do not hold for
complex social systems, as discussed in part I. But STAMP and new system-theoretic
approaches to safety can point the way forward for both complex technical and
social processes. The general engineering and reengineering process described in
part 3 applies to all systems.
section 6.3.
A System Safety Engineering Process.
In STAMP, accidents and losses result from not enforcing safety constraints on
behavior. Not only must the original system design incorporate appropriate constraints to ensure safe operations, but the safety constraints must continue to be
enforced as changes and adaptations to the system design occur over time. This goal
forms the basis for safe management, development, and operations.
There is no agreed upon best system engineering process and probably cannot
be one.the process needs to match the specific problem and environment in which
it is being used. What is described in part 3 of this book is how to integrate system
safety into any reasonable system engineering process. Figure 6.1 shows the three
major components of a cost-effective system safety process. management, development, and operations.
section 6.3.1. Management.
Safety starts with management leadership and commitment. Without these, the
efforts of others in the organization are almost doomed to failure. Leadership
creates culture, which drives behavior.
Besides setting the culture through their own behavior, managers need to establish the organizational safety policy and create a safety control structure with appropriate responsibilities, accountability and authority, safety controls, and feedback
channels. Management must also establish a safety management plan and ensure
that a safety information system and continual learning and improvement processes
are in place and effective.
Chapter 13 discusses managements role and responsibilities in safety.
section 6.3.2. Engineering Development.
The key to having a cost-effective safety effort is to embed it into a system engineering process from the very beginning and to design safety into the system as the
design decisions are made. All viewpoints and system components must be included
in the process and information used and documented in a way that is accessible,
understandable, and helpful.
System engineering starts with first determining the goals of the system. Potential
hazards to be avoided are then identified. From the goals and system hazards, a set
of system functional and safety requirements and constraints are identified that set
the foundation for design, operations, and management. Chapter 7 describes how
to establish these fundamentals.
To start safety engineering early enough to be cost-effective, safety must be considered from the early concept formation stages of development and continue
throughout the life cycle of the system. Design decisions should be guided by safety
considerations while at the same time taking other system requirements and constraints into account and resolving conflicts. The hazard analysis techniques used
must not require a completed design and must include all the factors involved
in accidents. Chapter 8 describes a new hazard analysis technique, based on the
STAMP model of causation, that provides the information necessary to design
safety into the system, and chapter 9 shows how to use it in a safety-guided design
process. Chapter 9 also presents general principles for safe design including how to
design systems and system components used by humans that do not contribute to
human error.
Documentation is critical not only for communication in the design and development process but also because of inevitable changes over time. That documentation
must include the rationale for the design decisions and traceability from high-level
requirements and constraints down to detailed design features. After the original
system development is finished, the information necessary to operate and maintain
it safely must be passed in a usable form to operators and maintainers. Chapter 10
describes how to integrate safety considerations into specifications and the general
system engineering process.
Engineers have often concentrated more on the technological aspects of system
development while assuming that humans in the system will either adapt to whatever is given to them or will be trained to do the “right thing.” When an accident
occurs, it is blamed on the operator. This approach to safety, as argued above, is
one of the reasons safety engineering is not as effective as it could be. The system
design process needs to start by considering the human controller and continuing
that perspective throughout development. The best way to reach that goal is to
involve operators in the design decisions and safety analyses. Operators are
sometimes left out of the conceptual design stages and only brought in later in
development. To design safer systems, operators and maintainers must be included
in the design process starting from the conceptual development stage and considerations of human error and preventing it should be at the forefront of the
design effort.
Many companies, particularly in aerospace, use integrated product teams that
include, among others, design engineers, safety engineers, human factors experts,
potential users of the system .(operators), and maintainers. But the development
process used may not necessarily take maximum advantage of this potential for
collaboration. The process outlined in part 3 tries to do that.
section 6.3.3. Operations.
Once the system is built, it must be operated safely. System engineering creates the
basic information needed to do this in the form of the safety constraints and operating assumptions upon which the safety of the design was based. These constraints
and assumptions must be passed to operations in a form that they can understand
and use.
Because changes in the physical components, human behavior, and the organizational safety control structure are almost guaranteed to occur over the life of the
system, operations must manage change in order to ensure that the safety constraints are not violated. The requirements for safe operations are discussed in
chapter 12.
Its now time to look at the changes in system engineering, operations, and management, based on STAMP, that can assist in engineering a safer world.

13
cleanfile Executable file
View File

@ -0,0 +1,13 @@
#!/bin/bash
SED=$(
while IFS=$'\t' read -r -a myArray
do
echo -ne "s_"${myArray[0]}"_"${myArray[1]}"_g;\n"
done < replacements
)
echo sed -e "$SED"
cat $1 | sed -e "$SED" | sed -z 's_-\n__g'> $2

View File

@ -1,15 +1,13 @@
: . : .
— . — .
\[.+\] \[.\\+\]
-\n ( .(
19(\d\d) 19 $1 ) ).
200(\d) 2 thousand $1 HQ-II H Q-2
20(\d\d) 20 $1
\( .(
\) ).
III 3 III 3
II 2 II 2
IV 4 IV 4
AWACS A Wacks
ASO A S O ASO A S O
PRA P R A PRA P R A
HMO H M O HMO H M O
@ -28,7 +26,6 @@
CFAC C FACK CFAC C FACK
DO D O DO D O
GAO GAOW GAO GAOW
HQ-II H Q-2
IFF I F F IFF I F F
JOIC J O I C JOIC J O I C
JSOC J SOCK JSOC J SOCK
@ -45,3 +42,7 @@
TAOR T A O R TAOR T A O R
USCINCEUR U S C in E U R USCINCEUR U S C in E U R
WD W D WD W D
19\\([[:digit:]][[:digit:]]\\) 19 \\1
200\\([[:digit:]]\\) 2 thousand \1
20\\([[:digit:]][[:digit:]]\\) 20 \1
B757 B 7 57