chore: add ch 5-6
This commit is contained in:
parent
ff069b52c4
commit
cbb5561fc6
6
Makefile
6
Makefile
@ -1,13 +1,14 @@
|
||||
|
||||
PATH:=./piper:$(PATH)
|
||||
|
||||
TXT_FILES := $(patsubst %.raw,%.txt,$(wildcard *.raw))
|
||||
WAV_FILES := $(patsubst %.txt,%.wav,$(wildcard *.txt))
|
||||
MP3_FILES := $(patsubst %.txt,%.mp3,$(wildcard *.txt))
|
||||
|
||||
MODEL=en_GB-alan-medium.onnx
|
||||
CONFIG=en_GB-alan-medium.onnx.json
|
||||
|
||||
complete: $(MP3_FILES)
|
||||
complete: $(TXT_FILES) $(MP3_FILES)
|
||||
echo $@ $^
|
||||
|
||||
$(WAV_FILES): %.wav: %.txt
|
||||
@ -17,6 +18,9 @@ $(WAV_FILES): %.wav: %.txt
|
||||
$(MP3_FILES): %.mp3: %.wav
|
||||
ffmpeg -y -i $^ $@
|
||||
|
||||
$(TXT_FILES): %.txt: %.raw
|
||||
./cleanfile $^ $@
|
||||
|
||||
|
||||
install:
|
||||
wget -O piper.tar "https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz"
|
||||
|
982
chapter05.raw
982
chapter05.raw
@ -427,12 +427,12 @@ responsibility for a specific task:
|
||||
•
|
||||
•
|
||||
•
|
||||
The enroute controller controlled the flow of OPC aircraft to and from the
|
||||
1.The enroute controller controlled the flow of OPC aircraft to and from the
|
||||
TAOR. This person also conducted radio and IFF checks on friendly aircraft
|
||||
outside the TAOR.
|
||||
The TAOR controller provided threat warning and tactical control for all
|
||||
2.The TAOR controller provided threat warning and tactical control for all
|
||||
OPC aircraft within the TAOR.
|
||||
The tanker controller coordinated all air refueling operations (and played no
|
||||
3.The tanker controller coordinated all air refueling operations (and played no
|
||||
part in the accident so is not mentioned further).
|
||||
To facilitate communication and coordination, the SD’s console was physically
|
||||
located in the “pit” right between the MCC and the ACE (Airborne Command
|
||||
@ -1422,4 +1422,978 @@ with the AWACS crew that helicopter activities were not an integral part of OPC
|
||||
air operations. In testimony after the accident, the ACE commented, “The way I
|
||||
understand it, only as a courtesy does the AWACS track Eagle Flight.”
|
||||
|
||||
page 141
|
||||
The Mission Director and ACE also did not have the information necessary to
|
||||
exercise their responsibility. The ACE had an inaccurate model of where the Black
|
||||
Hawks were located in the airspace. He testified that he presumed the Black Hawks
|
||||
were conducting standard operations in the Security Zone and had landed [159].
|
||||
He also testified that, although he had a radarscope, he had no knowledge of
|
||||
AWACS radar symbology: “I have no idea what those little blips mean.” The Mission
|
||||
Director, on the ground, was dependent on the information about the current air-
|
||||
space state sent down from the AWACS via JTIDS (the Joint Tactical Information
|
||||
Distribution System).
|
||||
The ACE testified that he assumed the F-15 pilots would ask him for guidance
|
||||
in any situation involving a potentially hostile aircraft, as required by the ROE. The
|
||||
ACE’s and F-15 pilots’ mental models of the ROE clearly did not match with respect
|
||||
to who had the authority to initiate the engagement of unidentified aircraft. The
|
||||
rules of engagement stated that the ACE was responsible, but some pilots believed
|
||||
they had authority when an imminent threat was involved. Because of security
|
||||
concerns, the actual ROE used were not disclosed during the accident investigation,
|
||||
but, as argued earlier, the slow, low-flying Black Hawks posed no serious threat
|
||||
to an F-15.
|
||||
Although the F-15 pilot never contacted the ACE about the engagement, the
|
||||
ACE did hear the call of the F-15 lead pilot to the TAOR controller. The ACE
|
||||
testified to the Accident Investigation Board that he did not intervene because
|
||||
he believed the F-15 pilots were not committed to anything at the visual identi-
|
||||
fication point, and he had no idea they were going to react so quickly. Since being
|
||||
assigned to OPC, he said the procedure had been that when the F-15s or other
|
||||
fighters were investigating aircraft, they would ask for feedback from the ACE.
|
||||
The ACE and AWACS crew would then try to rummage around and find
|
||||
out whose aircraft it was and identify it specifically. If they were unsuccessful, the
|
||||
ACE would then ask the pilots for a visual identification [159]. Thus, the ACE
|
||||
probably assumed that the F-15 pilots would not fire at the helicopters without
|
||||
reporting to him first, which they had not done yet. At this point, they had simply
|
||||
requested an identification by the AWACS traffic controller. According to his
|
||||
understanding of the ROE, the F-15 pilots would not fire without his approval
|
||||
unless there was an immediate threat, which there was not. The ACE testified that
|
||||
he expected to be queried by the F-15 pilots as to what their course of action
|
||||
should be.
|
||||
The ACE also testified at one of the hearings:
|
||||
I really did not know what the radio call “engaged” meant until this morning. I did
|
||||
not think the pilots were going to pull the trigger and kill those guys. As a previous right
|
||||
seater in an F-111, I thought “engaged” meant the pilots were going down to do a visual
|
||||
intercept. [159]
|
||||
|
||||
|
||||
Coordination among Multiple Controllers: Not applicable.
|
||||
Feedback from Controlled Process: The F-15 lead pilot did not follow the ROE
|
||||
and report the identified aircraft to the ACE and ask for guidance, although the
|
||||
ACE did learn about it from the questions the F-15 pilots posed to the controllers
|
||||
on the AWACS aircraft. The Mission Director got incorrect feedback about the state
|
||||
of the airspace from JTIDS.
|
||||
Time Lags: An unusual time lag occurred where the lag was in the controller and
|
||||
not in one of the other parts of the control loop.10 The F-15 pilots responded faster
|
||||
than the ACE (in the AWACS) and Mission Director (on the ground) could issue
|
||||
appropriate control instructions (as required by the ROE) with regard to the
|
||||
engagement.
|
||||
Changes after the Accident.
|
||||
There were no changes after the accident, although roles were clarified.
|
||||
secton 5.3.5. The AWACS Operators.
|
||||
This level of the control structure contains more examples of inconsistent mental
|
||||
models and asynchronous evolution. In addition, this control level provides interest-
|
||||
ing examples of the adaptation over time of specified procedures to accepted prac-
|
||||
tice and of coordination problems. There were multiple controllers with confused
|
||||
and overlapping responsibilities for enforcing different aspects of the safety require-
|
||||
ments and constraints (figure 5.8). The overlaps and boundary areas in the con-
|
||||
trolled processes led to serious coordination problems among those responsible for
|
||||
controlling aircraft in the TAOR.
|
||||
Context in Which Decisions and Actions Took Place
|
||||
Safety Requirements and Constraints: The general safety constraint involved in
|
||||
the accident at this level was to prevent misidentification of aircraft by the pilots
|
||||
and any friendly fire that might result. More specific requirements and constraints
|
||||
are shown in figure 5.8.
|
||||
Controls: Controls included procedures for identifying and tracking aircraft, train-
|
||||
ing (including simulator missions), briefings, staff controllers, and communication
|
||||
channels. The senior director and surveillance officer (ASO) provided real-time
|
||||
oversight of the crew’s activities, while the mission crew commander (MCC) coor-
|
||||
dinated all the activities aboard the AWACS aircraft.
|
||||
|
||||
|
||||
|
||||
footnote. A similar type of time lag led to the loss of an F-18 when a mechanical failure resulted in inputs
|
||||
arriving at the computer interface faster than the computer was able to process them
|
||||
|
||||
|
||||
The Delta Point system, used since the inception of OPC, provided standard code
|
||||
names for real locations. These code names were used to prevent the enemy, who
|
||||
might be listening to radio transmissions, from knowing the helicopters’ flight plans.
|
||||
Roles and Responsibilities: The AWACS crew were responsible for identifying,
|
||||
tracking, and controlling all aircraft enroute to and from the TAOR; for coordinating
|
||||
air refueling; for providing airborne threat warning and control in the TAOR; and
|
||||
for providing surveillance, detection and identification of all unknown aircraft.
|
||||
Individual responsibilities are described in section 5.2.
|
||||
The staff weapons director (instructor) was permanently assigned to Incirlik. He
|
||||
did all incoming briefings for new AWACS crews rotating into Incirlik and accom-
|
||||
panied them on their first mission in the TAOR. The OPC leadership recognized
|
||||
the potential for some distance to develop between stateside spin-up training and
|
||||
continuously evolving practice in the TAOR. Therefore, as mentioned earlier, per-
|
||||
manent staff or instructor personnel flew with each new AWACS crew on their
|
||||
maiden flight in Turkey. Two of these staff controllers were on the AWACS the day
|
||||
of the accident to answer any questions that the new crew might have about local
|
||||
procedures and, as described earlier, to inform them about adaptation of accepted
|
||||
practice from specified procedures.
|
||||
The SD had worked as an AWACS controller for five years. This was his fourth
|
||||
deployment to OPC, his second as an SD, and his sixtieth mission over the Iraqi
|
||||
TAOR [159]. He worked as a SD more than two hundred days a year and had logged
|
||||
more than 2,383 hours flying time [191].
|
||||
The enroute controller, who was responsible for aircraft outside the TAOR, was
|
||||
a first lieutenant with four years in the Air Force. He had finished AWACS training
|
||||
two years earlier (May 1992) and had served in the Iraqi TAOR previously [191].
|
||||
The TAOR controller, who was responsible for controlling all air traffic flying
|
||||
within the TAOR, was a second lieutenant with more than nine years of service in
|
||||
the Air Force, but he had just finished controller’s school and had had no previous
|
||||
deployments outside the continental United States. In fact, he had become mission
|
||||
ready only two months prior to the incident. This tour was his first in OPC and his
|
||||
first time as a TAOR controller. He had only controlled as a mission-ready weapons
|
||||
director on three previous training flights [191] and never in the role of TAOR
|
||||
controller. AWACS guidance at the time suggested that the most inexperienced
|
||||
controller be placed in the TAOR position: None of the reports on the accident
|
||||
provided the reasoning behind this practice.
|
||||
The air surveillance officer (ASO) was a captain at the time of the shootdown. She
|
||||
had been mission-ready since October 1992 and was rated as an instructor ASO.
|
||||
Because the crew’s originally assigned ASO was upgrading and could not make it to
|
||||
Turkey on time, she volunteered to fill in for him. She had already served for five and
|
||||
|
||||
a half weeks in OPC at the time of the accident and was completing her third assign-
|
||||
ment to OPC. She worked as an ASO approximately two hundred days a year [191].
|
||||
Environmental and Behavior-Shaping Factors: At the time of the shootdown,
|
||||
shrinking defense budgets were leading to base closings and cuts in the size of the
|
||||
military. At the same time, a changing political climate, brought about by the fall of
|
||||
the Soviet Union, demanded significant U.S. military involvement in a series of
|
||||
operations. The military (including the AWACS crews) were working at a greater
|
||||
pace than they had ever experienced due to budget cuts, early retirements, force
|
||||
outs, slowed promotions, deferred maintenance, and delayed fielding of new equip-
|
||||
ment. All of these factors contributed to poor morale, inadequate training, and high
|
||||
personnel turnover.
|
||||
AWACS crews are stationed and trained at Tinker Air Force Base in Oklahoma
|
||||
and then deployed to locations around the world for rotations lasting approximately
|
||||
thirty days. Although all but one of the AWACS controllers on the day of the acci-
|
||||
dent had served previously in the Iraqi no-fly zone, this was their first day working
|
||||
together and, except for the surveillance officer, the first day of their current rota-
|
||||
tion. Due to last minute orders, the team got only minimal training, including one
|
||||
simulator session instead of the two full three-hour sessions required prior to
|
||||
deploying. In the only session they did have, some of the members of the team were
|
||||
missing—the ASO, ACE, and MCC were unable to attend—and one was later
|
||||
replaced: As noted, the ASO originally designated and trained to deploy with this
|
||||
crew was instead shipped off to a career school at the last minute, and another ASO,
|
||||
who was just completing a rotation in Turkey, filled in.
|
||||
The one simulator session they did receive was less than effective, partly because
|
||||
the computer tape provided by Boeing to drive the exercise was not current (another
|
||||
instance of asynchronous evolution). For example, the maps were out of date,
|
||||
and the rules of engagement used were different and much more restrictive than
|
||||
those currently in force in OPC. No Mode I codes were listed. The list of friendly
|
||||
participants in OPC did not include UH-60s (Black Hawks) and so on. The second
|
||||
simulation session was canceled because of a wing exercise.
|
||||
Because the TAOR area had not yet been sanitized, it was a period of low activ-
|
||||
ity: At the time, there were still only four aircraft over the no-fly zone—the two
|
||||
F-15s and the two Black Hawks. AWACS crews are trained and equipped to track
|
||||
literally hundreds of enemy and friendly aircraft during a high-intensity conflict.
|
||||
Many accidents occur during periods of low activity when vigilance is reduced com-
|
||||
pared to periods of higher activity.
|
||||
The MCC sits with the other two key supervisors (SD and ACE) toward the front
|
||||
of the aircraft in a three-seat arrangement named the “Pit,” where each has his own
|
||||
radarscope. The SD is seated to the MCC’s left. Surveillance is seated in the rear.
|
||||
|
||||
|
||||
Violations of the no-fly zone had been rare and threats few during the past three
|
||||
years, so that day’s flight was expected to be an average one, and the supervisors in
|
||||
the Pit anticipated just another routine mission [159].
|
||||
During the initial orbit of the AWACS, the technicians determined that one
|
||||
of the radar consoles was not operating. According to Snook, this type of problem
|
||||
was not uncommon, and the AWACS is therefore designed with extra crew positions.
|
||||
When the enroute controller realized his assigned console was not working properly,
|
||||
he moved from his normal position between the TAOR and tanker controllers,
|
||||
to a spare seat directly behind the senior director. This position kept him out of
|
||||
the view of his supervisor and also eliminated physical contact with the TAOR
|
||||
controller.
|
||||
Dysfunctional Interactions among the Controllers
|
||||
According to the formal procedures, control of aircraft was supposed to be handed
|
||||
off from the enroute controller to the TAOR controller when the aircraft entered
|
||||
the TAOR. This handoff did not occur for the Black Hawks, and the TAOR control-
|
||||
ler was not made aware of the Black Hawks’ flight within the TAOR. Snook explains
|
||||
this communication error as resulting from the radar console failure, which inter-
|
||||
fered with communication between the TAOR and enroute controllers. But this
|
||||
explanation does not gibe with the fact that the normal procedure of the enroute
|
||||
controller was to continue to control helicopters without handing them off to the
|
||||
TAOR controller, even when the enroute and TAOR controllers were seated in their
|
||||
usual places next to each other. There may usually have been more informal interac-
|
||||
tion about aircraft in the area when they were seated next to each other, but there
|
||||
is no guarantee that such interaction would have occurred even with a different
|
||||
seating arrangement. Note that the helicopters had been dropped from the radar
|
||||
screens and the enroute controller had an incorrect mental model of where they
|
||||
were: He thought they were close to the boundary of the TAOR and was unaware
|
||||
they had gone deep within it. The enroute controller, therefore, could not have told
|
||||
the TAOR controller about the true location of the Black Hawks even if they had
|
||||
been sitting next to each other.
|
||||
The interaction between the surveillance officer and the senior weapons director
|
||||
with respect to tracking the helicopter flight on the radar screen involved many dys-
|
||||
functional interactions. For example, the surveillance officer put an attention arrow
|
||||
on the senior director’s radarscope in an attempt to query him about the lost heli-
|
||||
copter symbol that was floating, at one point, unattached to any track. The senior
|
||||
director did not respond to the attention arrow, and it automatically dropped off the
|
||||
screen after sixty seconds. The helicopter symbol (H) dropped off the radar screen
|
||||
when the radar and IFF returns from the Black Hawks faded and did not return until
|
||||
just before the engagement, removing any visual reminder to the AWACS crew that
|
||||
|
||||
|
||||
|
||||
there were Black Hawks inside the TAOR. The accident investigation did not include
|
||||
an analysis of the design of the AWACS human–computer interface or how it might
|
||||
have contributed to the accident, although such an analysis is important in fully
|
||||
understanding why it made sense for the controllers to act the way they did.
|
||||
During his court-martial for negligent homicide, the senior director argued that
|
||||
his radarscope did not identify the helicopters as friendly and that therefore he was
|
||||
not responsible. When asked why the Black Hawk identification was dropped from
|
||||
the radarscope, he gave two reasons. First, because it was no longer attached to any
|
||||
active signal, they assumed the helicopter had landed somewhere. Second, because
|
||||
the symbol displayed on their scopes was being relayed in real time through a JTIDS
|
||||
downlink to commanders on the ground, they were very concerned about sending
|
||||
out an inaccurate picture of the TAOR.
|
||||
Even if we suspended it, it would not be an accurate picture, because we wouldn’t know
|
||||
for sure if that is where he landed. Or if he landed several minutes earlier, and where
|
||||
that would be. So, the most accurate thing for us to do at that time, was to drop the
|
||||
symbology [sic].
|
||||
Flawed or Inadequate Decision Making and Control Actions.
|
||||
There were myriad inadequate control actions in this accident, involving each of the
|
||||
controllers in the AWACS. The AWACS crew work as a team so it is sometimes hard
|
||||
to trace incorrect decisions to one individual. While from each individual’s stand-
|
||||
point the actions and decisions may have been correct, when put together as a whole
|
||||
the decisions were incorrect.
|
||||
The enroute controller never told the Black Hawk pilots to change to the TAOR
|
||||
frequency that was being monitored by the TAOR controller and did not hand off
|
||||
control of the Black Hawks to the TAOR controller. The established practice of not
|
||||
handing off the helicopters had probably evolved over time as a more efficient way
|
||||
of handling traffic—another instance of asynchronous evolution. Because the heli-
|
||||
copters were usually only at the very border of the TAOR and spent very little time
|
||||
there, the overhead of handing them off twice within a short time period was con-
|
||||
sidered inefficient by the AWACS crews. As a result, the procedures used had
|
||||
changed over time to the more efficient procedure of keeping them under the
|
||||
control of the enroute controller. The AWACS crews were not provided with written
|
||||
guidance or training regarding the control of helicopters within the TAOR, and, in
|
||||
its absence, they adapted their normal practices for fixed-wing aircraft as best they
|
||||
could to apply them to helicopters.
|
||||
In addition to not handing off the helicopters, the enroute controller did not
|
||||
monitor the course of the Black Hawks while they were in the TAOR (after leaving
|
||||
Zakhu), did not take note of the flight plan (from Whiskey to Lima), did not alert
|
||||
the F-15 pilots there were friendly helicopters in the area, did not alert the F-15
|
||||
|
||||
|
||||
|
||||
pilots before they fired that the helicopters they were targeting were friendly, and
|
||||
did not tell the Black Hawk pilots that they were on the wrong frequency and were
|
||||
squawking the wrong IFF Mode I code.
|
||||
The TAOR controller did not monitor the course of the Black Hawks in the
|
||||
TAOR and did not alert the F-15 pilots before they fired that the helicopters they
|
||||
were targeting were friendly. None of the controllers warned the F-15 pilots at any
|
||||
time that there were friendly helicopters in the area nor did they try to stop the
|
||||
engagement. The accident investigation board found that because Army helicopter
|
||||
activities were not normally known at the time of the fighter pilots’ daily briefings,
|
||||
normal procedures were for the AWACS crews to receive real-time information
|
||||
about their activities from the helicopter crews and to relay that information on to
|
||||
the other aircraft in the area. If this truly was established practice, it clearly did not
|
||||
occur on that day.
|
||||
The controllers were supposed to be tracking the helicopters using the Delta
|
||||
Point system, and the Black Hawk pilots had reported to the enroute controller that
|
||||
they were traveling from Whiskey to Lima. The enroute controller testified, however,
|
||||
that he had no idea of the towns to which the code names Whiskey and Lima
|
||||
referred. After the shootdown, he went in search of the card defining the call signs
|
||||
and finally found it in the Surveillance Section [159]. Clearly, tracking helicopters
|
||||
using call signs was not a common practice or the charts would have been closer at
|
||||
hand. In fact, during the court-martial of the senior director, the defense was unable
|
||||
to locate any AWACS crewmember at Tinker AFB (where AWACS crews were
|
||||
stationed and trained) who could testify that he or she had ever used the Delta Point
|
||||
system [159] although clearly the Black Hawk pilots thought it was being used
|
||||
because they provided their flight plan using Delta Points.
|
||||
None of the controllers in the AWACS told the Black Hawk helicopters that
|
||||
they were squawking the wrong IFF code for the TAOR. Snook cites testimony
|
||||
from the court-martial of the senior director that posits three related explanations
|
||||
for this lack of warning: (1) the minimum communication (min comm) policy, (2) a
|
||||
belief by the AWACS crew that the Black Hawks should know what they were
|
||||
doing, and (3) pilots not liking to be told what to do. None of these explanations
|
||||
provided during the trial is very satisfactory and appear to be after-the-fact ratio-
|
||||
nalizations for the controllers not doing their job when faced with possible court-
|
||||
martial and jail terms. Given that the controllers acknowledged that the Army
|
||||
helicopters never squawked the right codes and had not done so for months, there
|
||||
must have been other communication channels that could have been used besides
|
||||
real-time radio communication to remedy this situation, so the min comm policy is
|
||||
not an adequate explanation. Arguing that the pilots should know what they were
|
||||
doing is simply an abdication of responsibility, as is the argument that pilots did not
|
||||
like being told what to do. A different perspective, and one that likely applies to all
|
||||
|
||||
|
||||
the controllers, was provided by the staff weapons director, who testified, “For a
|
||||
helicopter, if he’s going to Zakhu, I’m not that concerned about him going beyond
|
||||
that. So, I’m not really concerned about having an F-15 needing to identify this
|
||||
guy.” [159]
|
||||
The mission crew commander had provided the crew’s morning briefing. He
|
||||
spent some time going over the activity flowsheet, which listed all the friendly air-
|
||||
craft flying in the OPC that day, their call signs, and the times they were scheduled
|
||||
to enter the TAOR. According to Piper (but nobody else mentions it), he failed to
|
||||
note the helicopters, even though their call signs and their IFF information had been
|
||||
written on the margin of his flowsheet.
|
||||
The shadow crew always flew with new crews on their first day in OPC, but the
|
||||
task of these instructors does not seem to have been well defined. At the time of
|
||||
the shootdown, one was in the galley “taking a break,” and the other went back to
|
||||
the crew rest area, read a book, and took a nap. The staff weapons director, who was
|
||||
asleep in the back of the AWACS, during the court-martial of the senior director
|
||||
testified that his purpose on the mission was to be the “answer man,” just to answer
|
||||
any questions they might have. This was a period of very little activity in the area
|
||||
(only the two F-15s were supposed to be in the TAOR), and the shadow crew
|
||||
members may have thought their advice was not needed at that time.
|
||||
When the staff weapons director went back to the rest area, the only symbol
|
||||
displayed on the scopes of the AWACS controllers was the one for the helicopters
|
||||
(EE01), which they thought were going to Zakhu only.
|
||||
Because many of the dysfunctional actions of the crew did conform to the estab-
|
||||
lished practice (e.g., not handing off helicopters to the TAOR controller), it is
|
||||
unclear what different result might have occurred if the shadow crew had been in
|
||||
place. For example, the staff weapons director testified during the hearings and trial
|
||||
that he had seen helicopters out in the TAOR before, past Zakhu, but he really did
|
||||
not feel it was necessary to brief crews about the Delta Point system to determine
|
||||
a helicopter’s destination [159].
|
||||
Reasons for the Flawed Control.
|
||||
Inadequate Control Algorithms: This level of the accident analysis provides an
|
||||
interesting example of the difference between prescribed procedures and estab-
|
||||
lished practice, the adaptation of procedures over time, and migration toward the
|
||||
boundaries of safe behavior. Because of the many helicopter missions that ran from
|
||||
Diyarbakir to Zakhu and back, the controllers testified that it did not seem worth
|
||||
|
||||
handing them off and switching them over to the TAOR frequency for only a few
|
||||
minutes. Established practice (keeping the helicopters under the control of the
|
||||
enroute controller instead of handing them off to the TAOR controller) appeared
|
||||
to be safe until the day the helicopters’ behavior differed from normal, that is, they
|
||||
stayed longer in the TAOR and ventured beyond a few miles inside the boundaries.
|
||||
Established practice no longer assured safety under these conditions. A complicat-
|
||||
ing factor in the accident was the universal misunderstanding of each of the control-
|
||||
lers’ responsibilities with respect to tracking Army helicopters.
|
||||
Snook suggests that the min comm norm contributed to the AWACS crew’s
|
||||
general reluctance to enforce rules, contributed to AWACS not correcting Eagle
|
||||
Flight’s improper Mode I code, and discouraged controllers from pushing helicopter
|
||||
pilots to the TAOR frequency when they entered Iraq because they were reluctant
|
||||
to say more than absolutely necessary.
|
||||
According to Snook, there were also no explicit or written procedures regarding
|
||||
the control of helicopters. He states that radio contact with helicopters was lost
|
||||
frequently, but there were no procedures to follow when this occurred. In contrast,
|
||||
Piper claims the AWACS operations manual says:
|
||||
Helicopters are a high interest track and should be hard copied every five minutes in
|
||||
turkey and every two minutes in Iraq. These coordinates should be recorded in a special
|
||||
log book, because radar contact with helicopters is lost and the radar symbology [sic] can
|
||||
be suspended. [159].
|
||||
There is no information in the publicly available parts of the accident report about
|
||||
any special logbook or whether such a procedure was normally followed.
|
||||
|
||||
footnote. Even if the actions of the shadow crew did not contribute to this particular accident, we can take
|
||||
advantage of the accident investigation to perform a safety audit on the operation of the system and
|
||||
identify potential improvements.
|
||||
|
||||
|
||||
Inaccurate and Inconsistent Mental Models: In general, the AWACS crew (and
|
||||
the ACE) shared the common view that helicopter activities were not an integral
|
||||
part of OPC air operations. There was also a misunderstanding about which provi-
|
||||
sions of the ATO applied to Army helicopter activities.
|
||||
Most of the people involved in the control of the F-15s were unaware of the
|
||||
presence of the Black Hawks in the TAOR that day, the lone exception perhaps
|
||||
being the enroute controller who knew they were there but apparently thought
|
||||
they would stay at the boundaries of the TAOR and thus were far from their actual
|
||||
location deep within it. The TAOR controller testified that he had never talked to
|
||||
the Black Hawks: Following their two check-ins with the enroute controller, the
|
||||
helicopters had remained on the enroute frequency (as was the usual, accepted
|
||||
practice), even as they flew deep into the TAOR.
|
||||
The enroute controller, who had been in contact with the Black Hawks, had an
|
||||
inaccurate model of where the helicopters were. When the Black Hawk pilots origi-
|
||||
nally reported their takeoff from the Army Military Coordination Center at Zakhu,
|
||||
they contacted the enroute controller and said they were bound for Lima. The
|
||||
|
||||
|
||||
|
||||
enroute controller did not know to what city the call sign Lima referred and did not
|
||||
try to look up this information. Other members of the crew also had inaccurate
|
||||
models of their responsibilities, as described in the next section. The Black Hawk
|
||||
pilots clearly thought the AWACS was tracking them and also thought the con-
|
||||
trollers were using the Delta Point system—otherwise helicopter pilots would not
|
||||
have provided the route names in that way.
|
||||
The AWACS crews did not appear to have accurate models of the Black Hawks
|
||||
mission and role in OPC. Some of the flawed control actions seem to have resulted
|
||||
from a mental model that helicopters only went to Zakhu and therefore did not
|
||||
need to be tracked or to follow the standard TAOR procedures.
|
||||
As with the pilots and their visual recognition training, the incorrect mental
|
||||
models may have been at least partially the result of the inadequate AWACS train-
|
||||
ing the team received.
|
||||
Coordination among Multiple Controllers: As mentioned earlier, coordination
|
||||
problems are pervasive in this accident due to overlapping control responsibilities
|
||||
and confusion about responsibilities in the boundary areas of the controlled process.
|
||||
Most notably, the helicopters usually operated close to the boundary of the TAOR,
|
||||
resulting in confusion over who was or should be controlling them.
|
||||
The official accident report noted a significant amount of confusion within the
|
||||
AWACS mission crew regarding the tracking responsibilities for helicopters [5]. The
|
||||
mission crew commander testified that nobody was specifically assigned responsibil-
|
||||
ity for monitoring helicopter traffic in the no-fly zone and that his crew believed
|
||||
the helicopters were not included in their orders [159]. The staff weapons director
|
||||
made a point of not knowing what the Black Hawks do: “It was some kind of a
|
||||
squirrely mission” [159]. During the court-martial of the senior director, the AWACS
|
||||
tanker controller testified that in the briefing the crew received upon arrival at
|
||||
Incirlik, the staff weapons director had said about helicopters flying in the no-fly
|
||||
zone, ‘‘They’re there, but don’t pay any attention to them.” The enroute controller
|
||||
testified that the handoff procedures applied only to fighters. “We generally have
|
||||
no set procedures for any of the helicopters. . . . We never had any [verbal] guidance
|
||||
[or training] at all on helicopters” [159].
|
||||
Coordination problems also existed between the activities of the surveillance
|
||||
personnel and the other controllers. During the investigation of the accident, the
|
||||
ASO testified that surveillance’s responsibility was south of the 36th Parallel, and
|
||||
the other controllers were responsible for tracking and identifying all aircraft north
|
||||
of the 36th Parallel. The other controllers suggested that surveillance was respon-
|
||||
sible for tracking and identifying all unknown aircraft, regardless of location. In fact,
|
||||
Air Force regulations say that surveillance had tracking responsibility for unknown
|
||||
and unidentified tracks throughout the TAOR. It is not possible through the
|
||||
|
||||
|
||||
testimony alone, again because of the threat of court-martial, to piece out exactly
|
||||
what was the problem here, including simply a migration of normal operations from
|
||||
specified operations. At the least, it is clear that there was confusion about who was
|
||||
in control of what.
|
||||
One possible explanation for the lack of coordination among controllers at this
|
||||
level of the hierarchical control structure is that, as suggested by Snook, this particu-
|
||||
lar group had never trained together as a team [191]. But given the lack of proce-
|
||||
dures for handling helicopters and the confusion even by experienced controllers
|
||||
and the staff instructors about responsibilities for handling helicopters, Snook’s
|
||||
explanation is not very convincing. A more plausible explanation is simply a lack of
|
||||
guidance and delineation of responsibilities by the management level above. And
|
||||
even if the roles of everyone in such a structure had been well defined originally,
|
||||
uncontrolled local adaptation to more efficient procedures and asynchronous evolu-
|
||||
tion of the different parts of the control structure created dysfunctionalities as time
|
||||
passed. The helicopters and fixed wing aircraft had separate control structures that
|
||||
only joined fairly high up on the hierarchy and, as is described in the next section,
|
||||
there were communication problems between the components at the higher levels
|
||||
of the control hierarchy, particularly between the Army Military Coordination
|
||||
Center (MCC) and the Combined Forces Air Component (CFAC) headquarters.
|
||||
Feedback from the Controlled Process: Signals to the AWACS from the Black
|
||||
Hawks were inconsistent due to line-of-sight limitations and the mountainous terrain
|
||||
in which the Black Hawks were flying. The helicopters used the terrain to mask them-
|
||||
selves from air defense radars, but this terrain masking also caused the radar returns
|
||||
from the Black Hawks to the AWACS (and to the fighters) to fade at various times.
|
||||
Time Lags: Important time lags contributed to the accident, such as the delay of
|
||||
radio reports from the Black Hawk helicopters due to radio signal transmission
|
||||
problems and their inability to use the TACSAT radios until they had landed. As
|
||||
with the ACE, the speed with which the F-15 pilots acted also provided the control-
|
||||
lers with little time to evaluate the situation and respond appropriately.
|
||||
Changes after the Accident.
|
||||
Many changes were instituted with respect to AWACS operations after the
|
||||
accident:
|
||||
•
|
||||
1. Confirmation of a positive IFF Mode IV check was required for all OPC air-
|
||||
craft prior to their entry into the TAOR.
|
||||
2. • The responsibilities for coordination of air operations were better defined.
|
||||
3. • All AWACS aircrews went through a one-time retraining and recertification
|
||||
program, and every AWACS crewmember had to be recertified.
|
||||
|
||||
|
||||
|
||||
4.• A plan was produced to reduce the temporary duty of AWACS crews to 120
|
||||
days a year. In the end, it was decreased from 166 to 135 days per year from
|
||||
January 1995 to July 1995. The Air Combat Command planned to increase the
|
||||
number of AWACS crews.
|
||||
5.• AWACS control was required for all TAOR flights.
|
||||
6.•
|
||||
In addition to normal responsibilities, AWACS controllers were required to
|
||||
specifically maintain radar surveillance of all TAOR airspace and to issue advi-
|
||||
sory/deconflicting assistance on all operations, including helicopters.
|
||||
7.• The AWACS controllers were required to periodically broadcast friendly heli-
|
||||
copter locations operating in the TAOR to all aircraft.
|
||||
Although not mentioned anywhere in the available documentation on the accident,
|
||||
it seems reasonable that either the AWACS crews started to use the Delta Point
|
||||
system or the Black Hawk pilots were told not to use it and an alternative means
|
||||
for transmitting flight plans was mandated.
|
||||
|
||||
section 5.3.6. The Higher Levels of Control.
|
||||
Fully understanding the behavior at any level of the sociotechnical control structure
|
||||
requires understanding how and why the control at the next higher level allowed
|
||||
or contributed to the inadequate control at the current level. In this accident, many
|
||||
of the erroneous decisions and control actions at the lower levels can only be fully
|
||||
understood by examining this level of control.
|
||||
Context in Which Decisions and Actions Took Place
|
||||
Safety Requirements and Constraints Violated: There were many safety con-
|
||||
straints violated at the higher levels of the control structure—the Military Coordina-
|
||||
tion Center, Combined Forces Air Component, and CTF commander—and several
|
||||
people were investigated for potential court-martial and received official letters of
|
||||
reprimand. These safety constraints include: (1) procedures must be instituted that
|
||||
delegate appropriate responsibility, specify tasks, and provide effective training
|
||||
to all those responsible for tracking aircraft and conducting combat operations;
|
||||
(2) procedures must be consistent or at least complementary for everyone involved
|
||||
in TAOR airspace operations; (3) performance must be monitored (feedback chan-
|
||||
nels established) to ensure that safety-critical activities are being carried out cor-
|
||||
rectly and that local adaptations have not moved operations beyond safe limits;
|
||||
(4) equipment and procedures must be coordinated between the Air Force and
|
||||
Army to make sure that communication channels are effective and that asynchro-
|
||||
nous evolution has not occurred; (5) accurate information about scheduled flights
|
||||
must be provided to the pilots and the AWACS crews.
|
||||
|
||||
|
||||
Controls: The controls in place included operational orders and plans to designate
|
||||
roles and responsibilities as well as a management structure, the ACO, coordination
|
||||
meetings and briefings, a chain of command (OPC commander to mission director
|
||||
to ACE to pilots), disciplinary actions for those not following the written rules, and
|
||||
a group (the Joint Operations and Intelligence Center or JOIC) responsible for
|
||||
ensuring effective communication occurred.
|
||||
Roles and Responsibilities: The MCC had operational control over the Army
|
||||
helicopters while the CFAC had operational control over fixed-wing aircraft and
|
||||
tactical control over all aircraft in the TAOR. The Combined Task Force commander
|
||||
general (who was above both the CFAC and MCC) had ultimate responsibility for
|
||||
the coordination of fixed-wing aircraft flights with Army helicopters.
|
||||
While specific responsibilities of individuals might be considered here in an offi-
|
||||
cial accident analysis, treating the CFAC and MCC as entities is sufficient for the
|
||||
purposes of this analysis.
|
||||
Environmental and Behavior-Shaping Factors: The Air Force operated on a pre-
|
||||
dictable, well-planned, and tightly executed schedule. Detailed mission packages
|
||||
were organized weeks and months in advance. Rigid schedules were published and
|
||||
executed in preplanned packages. In contrast, Army aviators had to react to con-
|
||||
stantly changing local demands, and they prided themselves on their flexibility [191].
|
||||
Because of the nature of their missions, exact takeoff times and detailed flight plans
|
||||
for helicopters were virtually impossible to schedule in advance. They were even
|
||||
more difficult to execute with much rigor. The Black Hawks’ flight plan contained
|
||||
their scheduled takeoff time, transit routes between Diyarbakir through Gate 1 to
|
||||
Zakhu, and their return time. Because the Army helicopter crews rarely knew
|
||||
exactly where they would be going within the TAOR until after they were briefed
|
||||
at the Military Coordination Center at Zakhu, most flight plans only indicated that
|
||||
Eagle Flight would be “operating in and around the TAOR.”
|
||||
The physical separation of the Army Eagle Flight pilots from the CFAC opera-
|
||||
tions and Air Force pilots at Incirlik contributed to the communication difficulties
|
||||
that already existed between the services.
|
||||
|
||||
Dysfunctional Interactions among Controllers.
|
||||
Dysfunctional communication at this level of the control structure played a critical
|
||||
role in the accident. These communication flaws contributed to the coordination
|
||||
flaws at this level and at the lower levels.
|
||||
A critical safety constraint to prevent friendly fire requires that the pilots of the
|
||||
fighter aircraft know who is in the no-fly zone and whether they are supposed
|
||||
to be there. However, neither the CTF staff nor the Combined Forces Air Compo-
|
||||
nent staff requested nor received timely, detailed flight information on planned
|
||||
|
||||
MCC helicopter activities in the TAOR. Consequently, the OPC daily Air Tasking
|
||||
Order was published with little detailed information regarding U.S. helicopter flight
|
||||
activities over northern Iraq.
|
||||
According to the official accident report, specific information on routes of flight
|
||||
and times of MCC helicopter activity in the TAOR was normally available to the
|
||||
other OPC participants only when AWACS received it from the helicopter crews
|
||||
by radio and relayed the information on to the pilots [5]. While those at the higher
|
||||
levels of control may have thought this relaying of flight information was occurring,
|
||||
that does not seem to be the case given that the Delta point system (wherein the
|
||||
helicopter crews provided the AWACS controllers with their flight plan) was not
|
||||
used by the AWACS controllers: When the helicopters went beyond Zakhu, the
|
||||
AWACS controllers did not know their flight plans and therefore could not relay
|
||||
that information to the fighter pilots and other OPC participants.
|
||||
The weekly flight schedules the MCC provided to the CFAC staff were not com-
|
||||
plete enough for planning purposes. While the Air Force could plan their missions
|
||||
in advance, the different type of Army helicopter missions had to be flexible to react
|
||||
to daily needs. The MCC daily mission requirements were generally based on the
|
||||
events of the previous day. A weekly flight schedule was developed and provided
|
||||
to the CTF staff, but a firm itinerary was usually not available until after the next
|
||||
day’s ATO was published. The weekly schedule was briefed at the CTF staff meet-
|
||||
ings on Mondays, Wednesday, and Fridays, but the information was neither detailed
|
||||
nor firm enough for effective rotary-wing and fixed-wing aircraft coordination and
|
||||
scheduling purposes [5].
|
||||
Each daily ATO was published showing several Black Hawk helicopter lines. Of
|
||||
these, two helicopter lines (two flights of two helicopters each) were listed with call
|
||||
signs (Eagle 01/02 and Eagle 03/04), mission numbers, IFF Mode II codes, and a
|
||||
route of flight described only as LLTC (the identifier for Diyarbakir) to TAOR to
|
||||
LLTC. No information regarding route or duration of flight time within the TAOR
|
||||
was given on the ATO. Information concerning takeoff time and entry time into the
|
||||
TAOR was listed as A/R (as required).
|
||||
Every evening, the MCC at Zakhu provided a situation report (SITREP) to the
|
||||
JOIC (located at Incirlik), listing the helicopter flights for the following day. The
|
||||
SITREP did not contain complete flight details and arrived too late to be included
|
||||
in the next day’s ATO. The MCC would call the JOIC the night prior to the sched-
|
||||
uled mission to “activate” the ATO line. There were, however, no procedures in
|
||||
place to get the SITREP information from the JOIC to those needing to know it
|
||||
in CFAC.
|
||||
After receiving the SITREP, a duty officer in the JOIC would send takeoff times
|
||||
and gate times (the times the helicopters would enter northern Iraq) to Turkish
|
||||
operations for approval. Meanwhile, an intelligence representative to the JOIC
|
||||
|
||||
|
||||
|
||||
consolidated the MCC weekly schedule with the SITREP and used secure intelli-
|
||||
gence channels to pass this updated information to some of his counterparts in
|
||||
operational squadrons who had requested it. No procedures existed to pass this
|
||||
information from the JOIC to those in CFAC with tactical responsibility for the
|
||||
helicopters (through the ACE and Mission Director) [5]. Because CFAC normally
|
||||
determined who would fly when, the information channels were designed primarily
|
||||
for one-way communications outward and downward.
|
||||
In the specific instance involved in the shootdown, the MCC weekly schedule
|
||||
was provided on April 8 to the JOIC and thence to the appropriate person in CFAC.
|
||||
That schedule showed a two-ship, MCC helicopter administrative flight scheduled
|
||||
for April 14. According to the official accident report, two days before (April 12)
|
||||
the MCC Commander had requested approval for an April 14 flight outside the
|
||||
Security Zone from Zakhu to the towns of Irbil and Salah ad Din. The OPC com-
|
||||
manding general approved the written request on April 13, and the JOIC transmit-
|
||||
ted the approval to the MCC but apparently the information was not provided to
|
||||
those responsible for producing the ATO. The April 13 SITREP from MCC listed
|
||||
the flight as “mission support,” but contained no other details. Note more informa-
|
||||
tion was available earlier than normal in this instance, and it could have been
|
||||
included in the ATO but the established communication channels and procedures
|
||||
did not exist to get it to the right places. The MCC weekly schedule update, received
|
||||
by the JOIC on the evening of April 13 along with the MCC SITREP, gave the
|
||||
destinations for the mission as Salah ad Din and Irbil. This information was not
|
||||
passed to CFAC.
|
||||
Late in the afternoon on April 13, MCC contacted the JOIC duty officer and
|
||||
activated the ATO line for the mission. A takeoff time of 0520 and a gate time of
|
||||
0625 were requested. No takeoff time or route of flight beyond Zakhu was specified.
|
||||
The April 13 SITREP, the weekly flying schedule update, and the ATO-line activa-
|
||||
tion request were received by the JOIC too late to be briefed during the Wednesday
|
||||
(April 13) staff meetings. None of the information was passed to the CFAC schedul-
|
||||
ing shop (which was responsible for distributing last minute changes to the ATO
|
||||
through various sources such as the Battle Staff Directives, morning briefings, and
|
||||
so on), to the ground-based Mission Director, nor to the ACE on board the AWACS
|
||||
[5]. Note that this flight was not a routine food and medical supply run, but instead
|
||||
it carried sixteen high-ranking VIPs and required the personal attention and approval
|
||||
of the CTF Commander. Yet information about the flight was never communicated
|
||||
to the people who needed to know about it [191]. That is, the information went up
|
||||
from the MCC to the CTF staff, but not across from MCC to CFAC nor down from
|
||||
the CTF staff to CFAC (see figure 5.3).
|
||||
A second example of a major dysfunctional communication involved the com-
|
||||
munication of the proper radio frequencies and IFF codes to be used in the TAOR.
|
||||
|
||||
|
||||
About two years before the shootdown, someone in the CFAC staff decided to
|
||||
change the instructions pertaining to IFF modes and codes. According to Snook, no
|
||||
one recalled exactly how or why this change occurred. Before the change, all aircraft
|
||||
squawked a single Mode I code everywhere they flew. After the change, all aircraft
|
||||
were required to switch to a different Mode I code while flying in the no-fly zone. The
|
||||
change was communicated through the daily ATO. However, after the accident it was
|
||||
discovered that the Air Force’s version of the ATO was not exactly the same as the
|
||||
one received electronically by the Army aviators—another instance of asynchronous
|
||||
evolution and lack of linkup between system components. For at least two years,
|
||||
there existed two versions of the daily ATO: one printed out directly by the Incirlik
|
||||
Frag Shop and distributed locally by messenger to all units at Incirlik Air Base, and
|
||||
a second one transmitted electronically through an Air Force communications center
|
||||
(the JOIC) to Army helicopter operations at Diyarbakir. The one received by the
|
||||
Army aviators was identical in all respects to the one distributed by the Frag Shop,
|
||||
except for the changed Mode I code information contained in the SPINS. The ATO
|
||||
that Eagle Flight received contained no mention of two Mode I codes [191].
|
||||
What about the confusion about the proper radio frequency to be used by the
|
||||
Black Hawks in the TAOR? Piper notes that the Black Hawk pilots were told
|
||||
to use the enroute frequency while flying in the TAOR. The commander of OPC
|
||||
testified after the accident that the use by the Black Hawks of the enroute radio
|
||||
frequency rather than the TAOR frequency had been briefed to him as a safety
|
||||
measure because the Black Hawk helicopters were not equipped with HAVE
|
||||
QUICK technology. The ACO (Aircraft Control Order) required the F-15s to use
|
||||
non–HAVE QUICK mode when talking to specific types of aircraft (such as F-1s)
|
||||
that, like the Black Hawks, did not have the new technology. The list of non-HQ
|
||||
aircraft provided to the F-15 pilots, however, for some reason did not include
|
||||
UH-60s. Apparently the decision was made to have the Black Hawks use the
|
||||
enroute radio frequency but this decision was never communicated to those respon-
|
||||
sible for the F-15 procedures specified in the ACO. Note that a thorough investiga-
|
||||
tion of the higher levels of control, as is required in a STAMP-based analysis, is
|
||||
necessary to explain properly the use of the enroute radio frequency by the Black
|
||||
Hawks. Of the various reports on the shootdown, only Piper notes the fact that an
|
||||
exception had been made for Army helicopters for safety reasons—the official
|
||||
accident report, Snook’s detailed book on the accident, and the GAO report do not
|
||||
mention this fact! Piper found out about it from her attendance at the public hear-
|
||||
ings and trial. This omission of important information from the accident reports is
|
||||
an interesting example of how incomplete investigation of the higher levels of
|
||||
control can lead to incorrect causal analysis. In her book, Piper questions why the
|
||||
Accident Investigation Board, while producing twenty-one volumes of evidence,
|
||||
never asked the commander of OPC about the radio frequency and other problems
|
||||
found during the investigation.
|
||||
|
||||
|
||||
|
||||
Other official exceptions were made for the helicopter operations, such as
|
||||
allowing them in the Security Zone without AWACS coverage. Using STAMP,
|
||||
the accident can be understood as a dynamic process where the operations of the
|
||||
Army and Air Force adapted and diverged without effective communication and
|
||||
coordination.
|
||||
Many of the dysfunctional communications and interactions stem from asynchro-
|
||||
nous evolution of the mission and the operations plan. In response to the evolving
|
||||
mission in northern Iraq, air assets were increased in September 1991 and a signifi-
|
||||
cant portion of the ground forces were withdrawn. Although the original organiza-
|
||||
tional structure of the CTF was modified at this time, the operations plan was not.
|
||||
In particular, the position of the person who was in charge of communication and
|
||||
coordination between the MCC and CFAC was eliminated without establishing an
|
||||
alternative communication channel.
|
||||
Unsafe asynchronous evolution of the safety control structure can be prevented
|
||||
by proper documentation of safety constraints, assumptions, and their controls
|
||||
during system design and checking before changes are made to determine if the
|
||||
constraints and assumptions are violated by the design. Unintentional changes and
|
||||
migration of behavior outside the boundaries of safety can be prevented by various
|
||||
means, including education, identifying and checking leading indicators, and tar-
|
||||
geted audits. Part III describes ways to prevent asynchronous evolution from leading
|
||||
to accidents.
|
||||
Flawed or Inadequate Control Actions.
|
||||
There were many flawed or missing control actions at this level, including:
|
||||
1.•
|
||||
The Black Hawk pilots were allowed to enter the TAOR without AWACS cover-
|
||||
age and the F-15 pilots and AWACS crews were not informed about this excep-
|
||||
tion to the policy. This control problem is an example of the problems of
|
||||
distributed decision making with other decision makers not being aware of the
|
||||
decisions of others (see the Zeebrugge example in figure 2.2).
|
||||
Prior to September 1993, Eagle Flight helicopters flew any time required,
|
||||
before the fighter sweeps and without fighter coverage, if necessary. After
|
||||
September 1993, helicopter flights were restricted to the security zone if
|
||||
AWACS and fighter coverage were not on station. But for the mission on April
|
||||
14, Eagle Flight requested and received permission to execute their flight
|
||||
outside the security zone. A CTF policy letter dated September 1993 imple-
|
||||
mented the following policy for UH-60 helicopter flights supporting the MCC:
|
||||
“All UH-60 flights into Iraq outside of the security zone require AWACS cover-
|
||||
age.” Helicopter flights had routinely been flown within the TAOR security
|
||||
zone without AWACS or fighter coverage and CTF personnel at various levels
|
||||
were aware of this. MCC personnel were aware of the requirement to have
|
||||
|
||||
|
||||
AWACS coverage for flights outside the security zone and complied with that
|
||||
requirement. However, the F-15 pilots involved in the accident, relying on the
|
||||
written guidance in the ACO, believed that no OPC aircraft, fixed or rotary
|
||||
wing, were allowed to enter the TAOR prior to a fighter sweep [5].
|
||||
At the same time, the Black Hawks also thought they were operating cor-
|
||||
rectly. The Army Commander at Zakhu had called the Commander of Opera-
|
||||
tions, Plans, and Policy for OPC the night before the shootdown and asked to
|
||||
be able to fly the mission without AWACS coverage. He was told that they must
|
||||
have AWACS coverage. From the view of the Black Hawks pilots (who had
|
||||
reported in to the AWACS during the flight and provided their flight plan and
|
||||
destinations) they were complying and were under AWACS control.
|
||||
2.•Helicopters were not required to file detailed ,flight plans and follow them.
|
||||
Effective procedures were not established for communicating last minute
|
||||
changes or updates to the Army flight plans that had been filed.
|
||||
3.•F-15 pilots were not told to use non-HQ mode for helicopters.
|
||||
4.•No procedures were specified to pass SITREP information to CFAC. Helicop-
|
||||
ter flight plans were not distributed to CFAC and the F-15 pilots, but they were
|
||||
given to the F-16 squadrons. Why was one squadron informed, while another
|
||||
one, located right across the street, was not? F-15s are designed primarily for
|
||||
air superiority—high altitude aerial combat missions. F-16s, on the other hand,
|
||||
are all-purpose fighters. Unlike F-15s, which rarely flew low-level missions, it
|
||||
was common for F-16s to fly low-level missions where they might encounter
|
||||
the low-flying Army helicopters. As a result, to avoid low-altitude midair colli-
|
||||
sions, staff officers in F-16 squadrons requested details concerning helicopter
|
||||
operations from the JOIC, went to pick it up from the mail pickup point on the
|
||||
post, and passed it on to the pilots during their daily briefings; F-15 planners
|
||||
did not [191].
|
||||
5.•Inadequate training on the ROE was provided for new rotators. Piper claims
|
||||
that OPC personnel did not receive consistent, comprehensive training to
|
||||
ensure they had a thorough understanding of the rules of engagement and that
|
||||
many of the aircrews new to OPC questioned the need for the less aggressive
|
||||
rules of engagement in what had been designated a combat zone [159]. Judging
|
||||
from these complaints (details can be found in [159]) and incidents involving
|
||||
F-15 pilots, it appears that the pilots did not fully understand the ROE purpose
|
||||
or need.
|
||||
6.•Inadequate training was provided to the F-15 pilots on visual identification.
|
||||
7.•Inadequate simulator and spin-up training was provided to the AWACS crews.
|
||||
Asynchronous evolution occurred between the changes in the training materi-
|
||||
als and the actual situation in the no-fly zone. In addition, there were no
|
||||
|
||||
|
||||
controls to ensure the required simulator sessions were provided and that all
|
||||
members of the crew participated.
|
||||
8.•Handoff procedures were never established for, helicopters. In fact, no explicit
|
||||
or written procedures, verbal guidance, or training of any kind were provided
|
||||
to the AWACS crews regarding the control of helicopters within the TAOR
|
||||
[191]. The AWACS crews testified during the investigation that they lost contact
|
||||
with helicopters all the time, but there were no procedures to follow when that
|
||||
occurred.
|
||||
9.•Inadequate procedures were specified and enforced for how the shadow crew
|
||||
would instruct the new crews.
|
||||
10.•The rules and procedures established for the operation did not provide adequate
|
||||
control over unsafe F-15 pilot behavior, adequate enforcement of discipline, or
|
||||
adequate handling of safety violations. The CFAC Assistant Director of Oper-
|
||||
ations told the GAO investigators that there was very little F-15 oversight in
|
||||
OPC at the time of the shootdown. There had been so many flight discipline
|
||||
incidents leading to close calls that a group safety meeting had been held a
|
||||
week before the shootdown to discuss it. The flight discipline and safety issues
|
||||
included midair close calls, unsafe incidents when refueling, and unsafe takeoffs.
|
||||
The fixes (including the meeting) obviously were not effective. But the fact that
|
||||
there were a lot of close calls indicates serious safety problems existed and were
|
||||
not handled adequately.
|
||||
The CFAC Assistant Director of Operations also told the GAO that con-
|
||||
tentious issues involving F-15 actions had become common topics of discus-
|
||||
sion at Detachment Commander meetings. No F-15 pilots were on the CTF
|
||||
staff to communicate with the F-15 group about these problems. The OPC
|
||||
Commander testified that there was no tolerance for mistakes or unprofes-
|
||||
sional flying at OPC and that he had regularly sent people home for violation
|
||||
of the rules—the majority of those he sent home were F-15 pilots, suggesting
|
||||
that there were serious problems in discipline and attitude among this group
|
||||
[159].
|
||||
11.•The Army pilots were given the wrong information about the IFF codes and
|
||||
radio frequencies to use in the TAOR. As described above, this mismatch
|
||||
resulted from asynchronous evolution and lack of linkup (consistency) between
|
||||
process controls, that is, the two different ATOs. It provides yet another example
|
||||
of the danger involved in distributed decision making (again see figure 2.2).
|
||||
Reasons for the Flawed Control.
|
||||
Ineffective Control Algorithms: Almost all of the control flaws at this level relate
|
||||
to the existence and use of ineffective control algorithms. Equipment and
|
||||
|
||||
|
||||
procedures were not coordinated between the Air Force and the Army to make sure
|
||||
that communication channels were effective and that asynchronous evolution had
|
||||
not occurred. The last CTF staff member who appears to have actively coordinated
|
||||
rotary-wing flying activities with the CFAC organization departed in January 1994.
|
||||
No representative of the MCC was specifically assigned to the CFAC for coordina-
|
||||
tion purposes. Since December 1993, no MCC helicopter detachment representative
|
||||
had attended the CFAC weekly scheduling meetings. The Army liaison officer,
|
||||
attached to the MCC helicopter detachment at Zakhu and assigned to Incirlik AB,
|
||||
was new on station (he arrived in April 1994) and was not fully aware of the rela-
|
||||
tionship of the MCC to the OPC mission [5].
|
||||
Performance was not monitored to ensure that safety-critical activities were
|
||||
carried out correctly, that local adaptations had not moved operations beyond safe
|
||||
limits, and that information was being effectively transmitted and procedures fol-
|
||||
lowed. Effective controls were not established to prevent unsafe adaptations.
|
||||
The feedback that was provided about the problems at the lower levels was
|
||||
ignored. For example, the Piper account of the accident includes a reference to
|
||||
helicopter pilots’ testimony that six months before the shootdown, in October 1993,
|
||||
they had complained that the fighter aircraft were using their radar to lock onto the
|
||||
Black Hawks an unacceptable number of times. The Army helicopter pilots had
|
||||
argued there was an urgent need for the Black Hawk pilots to be able to commu-
|
||||
nicate with the fixed-wing aircraft, but nothing was changed until after the accident,
|
||||
when new radios were installed in the Black Hawks.
|
||||
Inaccurate Mental Models: The commander of the Combined Task Force thought
|
||||
that the appropriate control and coordination was occurring. This incorrect mental
|
||||
model was supported by the feedback he received flying as a regular passenger on
|
||||
board the Army helicopter flights, where it was his perception that the AWACS was
|
||||
monitoring their flight effectively. The Army helicopter pilots were using the Delta
|
||||
Point system to report their location and flight plans, and there was no indication
|
||||
from the AWACS that the messages were being ignored. The CTF Commander
|
||||
testified that he believed the Delta Point system was standard on all AWACS mis-
|
||||
sions. When asked at the court-martial of the AWACS senior director whether the
|
||||
AWACS crew were tracking Army helicopters, the OPC Commander replied:
|
||||
Well, my experience from flying dozens of times on Eagle Flight, which that—for some
|
||||
eleven hundred and nine days prior to this event, that was—that was normal procedures
|
||||
for them to flight follow. So, I don’t know that they had something written about it, but I
|
||||
know that it seemed very obvious and clear to me as a passenger on Eagle Flight numer-
|
||||
ous times that that was occurring. [159]
|
||||
The commander was also an active F-16 pilot who attended the F-16 briefings. At
|
||||
these briefings he observed that Black Hawk times were part of the daily ATOs
|
||||
|
||||
|
||||
procedures were not coordinated between the Air Force and the Army to make sure
|
||||
that communication channels were effective and that asynchronous evolution had
|
||||
not occurred. The last CTF staff member who appears to have actively coordinated
|
||||
rotary-wing flying activities with the CFAC organization departed in January 1994.
|
||||
No representative of the MCC was specifically assigned to the CFAC for coordina-
|
||||
tion purposes. Since December 1993, no MCC helicopter detachment representative
|
||||
had attended the CFAC weekly scheduling meetings. The Army liaison officer,
|
||||
attached to the MCC helicopter detachment at Zakhu and assigned to Incirlik AB,
|
||||
was new on station (he arrived in April 1994) and was not fully aware of the rela-
|
||||
tionship of the MCC to the OPC mission [5].
|
||||
Performance was not monitored to ensure that safety-critical activities were
|
||||
carried out correctly, that local adaptations had not moved operations beyond safe
|
||||
limits, and that information was being effectively transmitted and procedures fol-
|
||||
lowed. Effective controls were not established to prevent unsafe adaptations.
|
||||
The feedback that was provided about the problems at the lower levels was
|
||||
ignored. For example, the Piper account of the accident includes a reference to
|
||||
helicopter pilots’ testimony that six months before the shootdown, in October 1993,
|
||||
they had complained that the fighter aircraft were using their radar to lock onto the
|
||||
Black Hawks an unacceptable number of times. The Army helicopter pilots had
|
||||
argued there was an urgent need for the Black Hawk pilots to be able to commu-
|
||||
nicate with the fixed-wing aircraft, but nothing was changed until after the accident,
|
||||
when new radios were installed in the Black Hawks.
|
||||
Inaccurate Mental Models: The commander of the Combined Task Force thought
|
||||
that the appropriate control and coordination was occurring. This incorrect mental
|
||||
model was supported by the feedback he received flying as a regular passenger on
|
||||
board the Army helicopter flights, where it was his perception that the AWACS was
|
||||
monitoring their flight effectively. The Army helicopter pilots were using the Delta
|
||||
Point system to report their location and flight plans, and there was no indication
|
||||
from the AWACS that the messages were being ignored. The CTF Commander
|
||||
testified that he believed the Delta Point system was standard on all AWACS mis-
|
||||
sions. When asked at the court-martial of the AWACS senior director whether the
|
||||
AWACS crew were tracking Army helicopters, the OPC Commander replied:
|
||||
Well, my experience from flying dozens of times on Eagle Flight, which that—for some
|
||||
eleven hundred and nine days prior to this event, that was—that was normal procedures
|
||||
for them to flight follow. So, I don’t know that they had something written about it, but I
|
||||
know that it seemed very obvious and clear to me as a passenger on Eagle Flight numer-
|
||||
ous times that that was occurring. [159]
|
||||
The commander was also an active F-16 pilot who attended the F-16 briefings. At
|
||||
these briefings he observed that Black Hawk times were part of the daily ATOs
|
||||
|
||||
|
||||
received by the F-16 pilots and assumed that all squadrons were receiving the same
|
||||
information. However, as noted, the head of the squadron with which the com-
|
||||
mander flew had gone out of his way to procure the Black Hawk flight information,
|
||||
while the F-15 squadron leader had not.
|
||||
Many of those involved at this level were also under the impression that the
|
||||
ATOs provided to the F-15 pilots and to the Black Hawks pilots were consistent,
|
||||
that required information had been distributed to everyone, that official procedures
|
||||
were understood and being followed, and so on.
|
||||
Coordination among Multiple Controllers: There were clearly problems with over-
|
||||
lapping and boundary areas of control between the Army and the Air Force. Coor-
|
||||
dination problems between the services are legendary and were not handled
|
||||
adequately here. For example, two different versions of the ATO were provided to
|
||||
the Air Force and the Army pilots. The Air Force F-15s and the Army helicopters
|
||||
had separate control structures, with a common control point fairly high above the
|
||||
physical process. The problems were complicated by the differing importance of
|
||||
flexibility in flight plans between the two services. One symptom of the problem
|
||||
was that there was no requirement for helicopters to file detailed flight plans and
|
||||
follow them and no procedures established to deal with last minute changes. These
|
||||
deficiencies were also related to the shared control of helicopters by MCC and
|
||||
CFAC and complicated by the physical separation of the two headquarters.
|
||||
During the accident investigation, a question was raised about whether the Com-
|
||||
bined Task Force Chief of Staff was responsible for the breakdown in staff com-
|
||||
munication. After reviewing the evidence, the hearing officer recommended that no
|
||||
adverse action be taken against the Chief of Staff because he (1) had focused his
|
||||
attention according to the CTF Commander’s direction, (2) had neither specific
|
||||
direction nor specific reason to inquire into the transmission of info between his
|
||||
Director of Operations for Plans and Policy and the CFAC, (3) had been the most
|
||||
recent arrival and the only senior Army member of a predominantly Air Force staff
|
||||
and therefore generally unfamiliar with air operations, and (4) had relied on expe-
|
||||
rienced colonels under whom the deficiencies had occurred [200]. This conclusion
|
||||
was obviously influenced by the goal of trying to establish blame. Ignoring the blame
|
||||
aspects, the conclusion gives the impression that nobody was in charge and everyone
|
||||
thought someone else was.
|
||||
According to the official accident report, the contents of the ACO largely reflected
|
||||
the guidance given in the operations plan dated September 7, 1991. But that was the
|
||||
plan provided before the mission had changed. The accident report concludes that
|
||||
key CTF personnel at the time of the accident were either unaware of the existence
|
||||
of this particular plan or considered it too outdated to be applicable. The accident
|
||||
report states, “Most key personnel within the CFAC and CTF staff did not consider
|
||||
|
||||
|
||||
|
||||
coordination of MCC helicopter activities to be part of their respective CFAC / CTF
|
||||
responsibilities” [5].
|
||||
Because of the breakdown of clear guidance from the Combined Task Force staff
|
||||
to its component organizations (CFAC and MCC ) , they did not have a clear under-
|
||||
standing of their respective responsibilities. Consequently, MCC helicopter activities
|
||||
were not fully integrated with other OPC air operations in the TAOR.
|
||||
|
||||
|
||||
|
||||
section 5.4.
|
||||
Conclusions from the Friendly Fire Example.
|
||||
When looking only at the proximate events and the behavior of the immediate
|
||||
participants in the accidental shootdown, the reasons for this accident appear to be
|
||||
gross mistakes by the technical system operators (the pilots and AWACS crew). In
|
||||
fact, a special Air Force task force composed of more than 120 people in six com-
|
||||
mands concluded that two breakdowns in individual performance contributed to
|
||||
the shootdown: (1) the AWACS mission crew did not provide the F-15 pilots an
|
||||
accurate picture of the situation and (2) the F-15 pilots misidentified the target.
|
||||
From the twenty-one-volume accident report produced by the Accident Investiga-
|
||||
tion Board, Secretary of Defense William Perry summarized the “errors, omissions,
|
||||
and failures” in the “chain of events” leading to the loss as:
|
||||
1.• The F-15 pilots misidentified the helicopters as Iraqi Hinds.
|
||||
2.• The AWACS crew failed to intervene.
|
||||
3.• The helicopters and their operations were not integrated into the Task Force
|
||||
running the no-fly zone operations.
|
||||
4.• The Identity Friend or Foe ( IFF ) systems failed.
|
||||
According to Snook, the military community has generally accepted these four
|
||||
“causes” as the explanation for the shootdown.
|
||||
While there certainly were mistakes made at the pilot and AWACS levels, the
|
||||
use of the STAMP analysis paints a much more complete explanation of the role of
|
||||
the environment and other factors that influenced their behavior including: incon-
|
||||
sistent, missing, or inaccurate information; incompatible technology; inadequate
|
||||
coordination; overlapping areas of control and confusion about who was responsible
|
||||
for what; a migration toward more efficient but less safe operational procedures
|
||||
over time without any controls and checks on the potential adaptations; inadequate
|
||||
training; and in general a control structure that did not enforce the safety constraints.
|
||||
Boiling down this very complex accident to four “causes” and assigning blame in
|
||||
this way inhibits learning from the events. The more complete STAMP analysis was
|
||||
possible only because individuals outside the military, some of whom were relatives
|
||||
|
||||
of the victims, did not accept the simple analysis provided in the accident report and
|
||||
did their own uncovering of the facts.
|
||||
STAMP views an accident as a dynamic process. In this case, Army and Air Force
|
||||
operations adapted and diverged without communication and coordination. OPC
|
||||
had operated incident-free for over three years at the time of the shootdown. During
|
||||
that time, local adaptations to compensate for inadequate control from above had
|
||||
managed to mask the ongoing problems until a situation occurred where local
|
||||
adaptations did not work. A lack of awareness at the highest levels of command of
|
||||
the severity of the coordination, communication, and other problems is a key factor
|
||||
in this accident.
|
||||
Nearly all the types of causal factors identified in section 4.5 can be found in this
|
||||
accident. This fact is not an anomaly: Most accidents involve a large number of these
|
||||
factors. Concentrating on an event chain focuses attention on the proximate events
|
||||
associated with the accident and thus on the principle local actors, in this case, the
|
||||
pilots and the AWACS personnel. Treating an accident as a control problem using
|
||||
STAMP clearly identifies other organizational factors and actors and the role they
|
||||
played. Most important, without this broader view of the accident, only the symp-
|
||||
toms of the organizational problems may be identified and eliminated without
|
||||
significantly reducing risk of a future accident caused by the same systemic factors
|
||||
but involving different symptoms at the lower technical and operational levels of
|
||||
the control structure.
|
||||
More information on how to build multiple views of an accident using STAMP
|
||||
in order to aid understanding can be found in chapter 11. More examples of STAMP
|
||||
accident analyses can be found in the appendixes.
|
||||
|
||||
|
2157
chapter05.txt
Normal file
2157
chapter05.txt
Normal file
File diff suppressed because it is too large
Load Diff
349
chapter06.raw
Normal file
349
chapter06.raw
Normal file
@ -0,0 +1,349 @@
|
||||
part 3. USING STAMP.
|
||||
|
||||
STAMP provides a new theoretical foundation for system safety on which new, more
|
||||
powerful techniques and tools for system safety can be constructed. Part III presents
|
||||
some practical methods for engineering safer systems. All the techniques described
|
||||
in part III have been used successfully on real systems. The surprise to those trying
|
||||
them has been how well they work on enormously complex systems and how eco-
|
||||
nomical they are to use. Improvements and even more applications of the theory to
|
||||
practice will undoubtedly be created in the future.
|
||||
|
||||
|
||||
chapter 6.
|
||||
|
||||
Engineering and Operating Safer Systems Using
|
||||
STAMP.
|
||||
Part III of this book is for those who want to build safer systems without incurring
|
||||
enormous and perhaps impractical financial, time, and performance costs. The belief
|
||||
that building and operating safer systems requires such penalties is widespread and
|
||||
arises from the way safety engineering is usually done today. It need not be the case.
|
||||
The use of top-down system safety engineering and safety-guided design based on
|
||||
STAMP can not only enhance the safety of these systems but also potentially reduce
|
||||
the costs associated with engineering for safety. This chapter provides an overview,
|
||||
while the chapters following it provide details about how to implement this cost-
|
||||
effective safety process.
|
||||
section 6.1.
|
||||
Why Are Safety Efforts Sometimes Not Cost-Effective?
|
||||
While there are certainly some very effective safety engineering programs, too
|
||||
many expend a large amount of resources with little return on the investment in
|
||||
terms of improved safety. To fix a problem, we first need to understand it. Why are
|
||||
safety efforts sometimes not cost-effective? There are five general answers to this
|
||||
question:
|
||||
1. Safety efforts may be superficial, isolated, or misdirected.
|
||||
2. Safety activities often start too late.
|
||||
3. The techniques used are not appropriate for the systems we are building today
|
||||
and for new technology.
|
||||
4. Efforts may be narrowly focused on the technical components.
|
||||
5. Systems are usually assumed to be static throughout their lifetime.
|
||||
Superficial, isolated, or misdirected safety engineering activities: Often, safety
|
||||
engineering consists of performing a lot of very costly and tedious activities of
|
||||
limited usefulness in improving safety in the final system design. Childs calls this
|
||||
“cosmetic system safety” [37]. Detailed hazard logs are created and analyses
|
||||
|
||||
|
||||
performed, but these have limited impact on the actual system design. Numbers are
|
||||
associated with unquantifiable properties. These numbers always seem to support
|
||||
whatever numerical requirement is the goal, and all involved feel as if they have
|
||||
done their jobs. The safety analyses provide the answer the customer or designer
|
||||
wants—that the system is safe—and everyone is happy. Haddon-Cave, in the 2009
|
||||
Nimrod MR2 accident report, called such efforts compliance only exercises [78]. The
|
||||
results impact certification of the system or acceptance by management, but despite
|
||||
all the activity and large amounts of money spent, the safety of the system has been
|
||||
unaffected.
|
||||
A variant of this problem is that safety activities may be isolated from the engi-
|
||||
neers and developers building the system. Too often, safety professionals are sepa-
|
||||
rated from engineering design and placed within a mission assurance organization.
|
||||
Safety cannot be assured without its already being part of the design; systems must
|
||||
be constructed to be safe from the beginning. Separating safety engineering from
|
||||
design engineering is almost guaranteed to make the effort and resources expended
|
||||
a poor investment. Safety engineering is effective when it participates in and pro-
|
||||
vides input to the design process, not when it focuses on making arguments about
|
||||
the artifacts created after the major safety-related decisions have been made.
|
||||
Sometimes the major focus of the safety engineering efforts is on creating a safety
|
||||
case that proves the completed design is safe, often by showing that a particular
|
||||
process was followed during development. Simply following a process does not
|
||||
mean that the process was effective, which is the basic limitation of many process
|
||||
assurance activities. In other cases the arguments go beyond the process, but they
|
||||
start from the assumption that the system is safe and then focus on showing the
|
||||
conclusion is true. Most of the effort is spent in seeking evidence that shows the
|
||||
system is safe while not looking for evidence that the system is not safe. The basic
|
||||
mindset is wrong, so the conclusions are biased.
|
||||
One of the reasons System Safety has been so successful is that it takes the oppo-
|
||||
site approach: an attempt is made to show that the system is unsafe and to identify
|
||||
hazardous scenarios. By using this alternative perspective, paths to hazards are often
|
||||
identified that were missed by the engineers, who tend to focus on what they want
|
||||
to happen, not what they do not want to happen.
|
||||
If safety-guided design, as defined in part III of this book, is used, the “safety
|
||||
case” is created along with the design. Developing the certification argument
|
||||
becomes trivial and consists primarily of simply gathering the documentation that
|
||||
has been created during the development process.
|
||||
Safety efforts start too late: Unlike the examples of ineffective safety activities
|
||||
above, the safety efforts may involve potentially useful activities, but they may start
|
||||
too late. Frola and Miller claim that 70–80 percent of the most critical decisions
|
||||
|
||||
related to the safety of the completed system are made during early concept devel-
|
||||
opment [70]. Unless the safety engineering effort impacts these decisions, it is
|
||||
unlikely to have much effect on safety. Too often, safety engineers are busy doing
|
||||
safety analyses, while the system engineers are in parallel making critical decisions
|
||||
about system design and concepts of operation that are not based on that hazard
|
||||
analysis. By the time the system engineers get the information generated by the
|
||||
safety engineers, it is too late to have a significant impact on design decisions.
|
||||
Of course, engineers normally do try to consider safety early, but the information
|
||||
commonly available is only whether a particular function is safety-critical or not.
|
||||
They are told that the function they are designing can contribute to an accident,
|
||||
with perhaps some letter or numerical “score” of how critical it is, but not much else.
|
||||
Armed only with this very limited information, they have no choice but to focus
|
||||
safety design efforts on increasing the component’s reliability by adding redundancy
|
||||
or safety margins. These features are often added without careful analysis of whether
|
||||
they are needed or will be effective for the specific hazards related to that system
|
||||
function. The design then becomes expensive to build and maintain without neces-
|
||||
sarily having the maximum possible (or sometimes any) impact on eliminating
|
||||
or reducing hazards. As argued earlier, redundancy and overdesign, such as building
|
||||
in safety margins, are effective primarily for purely electromechanical components
|
||||
and component failure accidents. They do not apply to software and miss component
|
||||
interaction accidents entirely. In some cases, such design techniques can even
|
||||
contribute to component interaction accidents when they add to the complexity of
|
||||
the design.
|
||||
Most of our current safety engineering techniques start from detailed designs. So
|
||||
even if they are conscientiously applied, they are useful only in evaluating the safety
|
||||
of a completed design, not in guiding the decisions made early in the design creation
|
||||
process. One of the results of evaluating designs after they are created is that engi-
|
||||
neers are confronted with important safety concerns only after it is too late or too
|
||||
expensive to make significant changes. If and when the system and component
|
||||
design engineers get the results of the safety activities, often in the form of a critique
|
||||
of the design late in the development process, the safety concerns are frequently
|
||||
ignored or argued away because changing the design at that time is too costly.
|
||||
Design reviews then turn into contentious exercises where one side argues that the
|
||||
system has serious safety limitations while the other side argues that those limita-
|
||||
tions do not exist, they are not serious, or the safety analysis is wrong.
|
||||
The problem is not a lack of concern by designers; it’s simply that safety concerns
|
||||
about their design are raised at a time when major design changes are not possible—
|
||||
the design engineers have no other option than to defend the design they have.
|
||||
If they lose that argument, then they must try to patch the current design; starting
|
||||
over with a safer design is, in almost all cases, impractical. If the designers had the
|
||||
|
||||
|
||||
information necessary to factor safety into their early decision making, then the
|
||||
process of creating safer designs need cost no more and, in fact, will cost less due
|
||||
to two factors: (1) reduced rework after the decisions made are found to be flawed
|
||||
or to provide inadequate safety and (2) less unnecessary overdesign and unneeded
|
||||
protection.
|
||||
The key to having a cost-effective safety effort is to embed it into a system
|
||||
engineering process starting from early concept development and then to design
|
||||
safety into the system as the design decisions are made. Costs are much less when
|
||||
safety is built into the system design from the beginning rather than added on or
|
||||
retrofitted later.
|
||||
The techniques used are not appropriate for today’s systems and new technol-
|
||||
ogy: The assumptions of the major safety engineering techniques currently used,
|
||||
almost all of which stem from decades past, do not match the assumptions underlying
|
||||
the technology and complexity of the systems being built today or the new emerging
|
||||
causes of accidents: They do not apply to human or software errors or flawed man-
|
||||
agement decision making, and they certainly do not apply to weaknesses in the
|
||||
organizational structure or social infrastructure systems. These contributors to acci-
|
||||
dents do not “fail” in the same way assumed by the current safety analysis tools.
|
||||
But with no other tools to use, safety engineers attempt to force square pegs into
|
||||
round holes, hoping this will be sufficient. As a result, nothing much is accomplished
|
||||
beyond expending time, money, and other resources. It’s time we face up to the fact
|
||||
that new safety engineering techniques are needed to handle those aspects of
|
||||
systems that go beyond the analog hardware components and the relatively simple
|
||||
designs of the past for which the current techniques were invented. Chapter 8
|
||||
describes a new hazard analysis technique based on STAMP, called STPA, but others
|
||||
are possible. The important thing is to confront these problems head on and not
|
||||
ignore them and waste our time misapplying or futilely trying to extend techniques
|
||||
that do not apply to today’s systems.
|
||||
The safety efforts are focused on the technical components of the system: Many
|
||||
safety engineering (and system engineering, for that matter) efforts focus on the
|
||||
technical system details. Little effort is made to consider the social, organizational,
|
||||
and human components of the system in the design process. Assumptions are made
|
||||
that operators will be trained to do the right things and that they will adapt to
|
||||
whatever design they are given. Sophisticated human factors and system analysis
|
||||
input is lacking, and when accidents inevitably result, they are blamed on the opera-
|
||||
tors for not behaving the way the designers thought they would. To give just one
|
||||
example (although most accident reports contain such examples), one of the four
|
||||
causes, all of which cited pilot error, identified in the loss of the American Airlines
|
||||
B757 near Cali, Colombia (see chapter 2), was “Failure of the flight crew to revert
|
||||
|
||||
to basic radio navigation when the FMS-assisted navigation became confusing and
|
||||
demanded an excessive workload in a critical phase of the flight.” A more useful
|
||||
alternative statement of the cause might have been “An FMS system that confused
|
||||
the operators and demanded an excessive workload in a critical phase of flight.”
|
||||
Virtually all systems contain humans, but engineers are often not taught much
|
||||
about human factors and draw convenient boundaries around the technical com-
|
||||
ponents, focusing their attention inside these artificial boundaries. Human factors
|
||||
experts have complained about the resulting technology-centered automation [208],
|
||||
where the designers focus on technical issues and not on supporting operator tasks.
|
||||
The result is what has been called “clumsy” automation that increases the chance
|
||||
of human error [183, 22, 208]. One of the new assumptions for safety in chapter 2
|
||||
is that operator “error” is a product of the environment in which it occurs.
|
||||
A variant of the problem is common in systems using information technology.
|
||||
Many medical information systems, for example, have not been as successful as they
|
||||
might have been in increasing safety and have even led to new types of hazards and
|
||||
losses [104, 140]. Often, little effort is invested during development in considering
|
||||
the usability of the system by medical professionals or of the impact, not always
|
||||
positive, that the information system design will have on workflow and on the
|
||||
practice of medicine.
|
||||
Automation is commonly assumed to be safer than manual systems because
|
||||
the hazards associated with the manual systems are eliminated. Inadequate con-
|
||||
sideration is given to whether new, and maybe even worse, hazards are introduced
|
||||
by the automated system and how to prevent or minimize these new hazards. The
|
||||
aviation industry has, for the most part, learned this lesson for cockpit and flight
|
||||
control design, where eliminating errors of commission simply created new errors
|
||||
of omission [181, 182] (see chapter 9), but most other industries are far behind in
|
||||
this respect.
|
||||
Like other safety-related system properties that are ignored until too late, opera-
|
||||
tors and human-factors experts often are not brought into the early design process
|
||||
or they work in isolation from the designers until changes are extremely expensive
|
||||
to make. Sometimes, human factors design is not considered until after an accident,
|
||||
and occasionally not even then, almost guaranteeing that more accidents will occur.
|
||||
To provide cost-effective safety engineering, the system and safety analysis
|
||||
and design process needs to consider the humans in systems—including those that
|
||||
are not directly controlling the physical processes—not separately or after the fact
|
||||
but starting at concept development and continuing throughout the life cycle of
|
||||
the system.
|
||||
Systems are assumed to be static throughout their lifetimes: It is rare for engi-
|
||||
neers to consider how the system will evolve and change over time. While designing
|
||||
|
||||
|
||||
|
||||
for maintainability may be considered, unintended changes are often ignored.
|
||||
Change is a constant for all systems: physical equipment ages and degrades over
|
||||
its lifetime and may not be maintained properly; human behavior and priorities
|
||||
usually change over time; organizations change and evolve, which means the safety
|
||||
control structure itself will evolve. Change may also occur in the physical and social
|
||||
environment within which the system operates and with which it interacts. To be
|
||||
effective, controls need to be designed that will reduce the risk associated with all
|
||||
these types of changes. Not only are accidents expensive, but once again planning
|
||||
for system change can reduce the costs associated with the change itself. In addition,
|
||||
much of the effort in operations needs to be focused on managing and reacting
|
||||
to change.
|
||||
section 6.2.
|
||||
The Role of System Engineering in Safety.
|
||||
As the systems we build and operate increase in size and complexity, the use of
|
||||
sophisticated system engineering approaches becomes more critical. Important
|
||||
system-level (emergent) properties, such as safety, must be built into the design of
|
||||
these systems; they cannot be effectively added on or simply measured afterward.
|
||||
While system engineering was developed originally for technical systems, the
|
||||
approach is just as important and applicable to social systems or the social compo-
|
||||
nents of systems that are usually not thought of as “engineered.” All systems are
|
||||
engineered in the sense that they are designed to achieve specific goals, namely to
|
||||
satisfy requirements and constraints. So ensuring hospital safety or pharmaceutical
|
||||
safety, for example, while not normally thought of as engineering problems, falls
|
||||
within the broad definition of engineering. The goal of the system engineering
|
||||
process is to create a system that satisfies the mission while maintaining the con-
|
||||
straints on how the mission is achieved.
|
||||
Engineering is a way of organizing that design process to achieve the most
|
||||
cost-effective results. Social systems may not have been “designed” in the sense of
|
||||
a purposeful design process but may have evolved over time. Any effort to change
|
||||
such systems in order to improve them, however, can be thought of as a redesign or
|
||||
reengineering process and can again benefit from a system engineering approach.
|
||||
When using STAMP as the underlying causality model, engineering or reengineer-
|
||||
ing safer systems means designing (or redesigning) the safety-control structure and
|
||||
the controls designed into it to ensure the system operates safely, that is, without
|
||||
unacceptable losses. What is being controlled—chemical manufacturing processes,
|
||||
spacecraft or aircraft, public health, safety of the food supply, corporate fraud, risks
|
||||
in the financial system—is irrelevant in terms of the general process, although
|
||||
significant differences will exist in the types of controls applicable and the design
|
||||
|
||||
|
||||
of those controls. The process, however, is very similar to a regular system engineer-
|
||||
ing process.
|
||||
The problem is that most engineering and even many system engineering tech-
|
||||
niques were developed under conditions and assumptions that do not hold for
|
||||
complex social systems, as discussed in part I. But STAMP and new system-theoretic
|
||||
approaches to safety can point the way forward for both complex technical and
|
||||
social processes. The general engineering and reengineering process described in
|
||||
part III applies to all systems.
|
||||
section 6.3.
|
||||
A System Safety Engineering Process.
|
||||
In STAMP, accidents and losses result from not enforcing safety constraints on
|
||||
behavior. Not only must the original system design incorporate appropriate con-
|
||||
straints to ensure safe operations, but the safety constraints must continue to be
|
||||
enforced as changes and adaptations to the system design occur over time. This goal
|
||||
forms the basis for safe management, development, and operations.
|
||||
There is no agreed upon best system engineering process and probably cannot
|
||||
be one—the process needs to match the specific problem and environment in which
|
||||
it is being used. What is described in part III of this book is how to integrate system
|
||||
safety into any reasonable system engineering process. Figure 6.1 shows the three
|
||||
major components of a cost-effective system safety process: management, develop-
|
||||
ment, and operations.
|
||||
section 6.3.1. Management.
|
||||
Safety starts with management leadership and commitment. Without these, the
|
||||
efforts of others in the organization are almost doomed to failure. Leadership
|
||||
creates culture, which drives behavior.
|
||||
Besides setting the culture through their own behavior, managers need to estab-
|
||||
lish the organizational safety policy and create a safety control structure with appro-
|
||||
priate responsibilities, accountability and authority, safety controls, and feedback
|
||||
channels. Management must also establish a safety management plan and ensure
|
||||
that a safety information system and continual learning and improvement processes
|
||||
are in place and effective.
|
||||
Chapter 13 discusses management’s role and responsibilities in safety.
|
||||
|
||||
section 6.3.2. Engineering Development.
|
||||
The key to having a cost-effective safety effort is to embed it into a system engineer-
|
||||
ing process from the very beginning and to design safety into the system as the
|
||||
design decisions are made. All viewpoints and system components must be included
|
||||
|
||||
|
||||
in the process and information used and documented in a way that is accessible,
|
||||
understandable, and helpful.
|
||||
System engineering starts with first determining the goals of the system. Potential
|
||||
hazards to be avoided are then identified. From the goals and system hazards, a set
|
||||
of system functional and safety requirements and constraints are identified that set
|
||||
the foundation for design, operations, and management. Chapter 7 describes how
|
||||
to establish these fundamentals.
|
||||
To start safety engineering early enough to be cost-effective, safety must be con-
|
||||
sidered from the early concept formation stages of development and continue
|
||||
throughout the life cycle of the system. Design decisions should be guided by safety
|
||||
|
||||
|
||||
considerations while at the same time taking other system requirements and con-
|
||||
straints into account and resolving conflicts. The hazard analysis techniques used
|
||||
must not require a completed design and must include all the factors involved
|
||||
in accidents. Chapter 8 describes a new hazard analysis technique, based on the
|
||||
STAMP model of causation, that provides the information necessary to design
|
||||
safety into the system, and chapter 9 shows how to use it in a safety-guided design
|
||||
process. Chapter 9 also presents general principles for safe design including how to
|
||||
design systems and system components used by humans that do not contribute to
|
||||
human error.
|
||||
Documentation is critical not only for communication in the design and develop-
|
||||
ment process but also because of inevitable changes over time. That documentation
|
||||
must include the rationale for the design decisions and traceability from high-level
|
||||
requirements and constraints down to detailed design features. After the original
|
||||
system development is finished, the information necessary to operate and maintain
|
||||
it safely must be passed in a usable form to operators and maintainers. Chapter 10
|
||||
describes how to integrate safety considerations into specifications and the general
|
||||
system engineering process.
|
||||
Engineers have often concentrated more on the technological aspects of system
|
||||
development while assuming that humans in the system will either adapt to what-
|
||||
ever is given to them or will be trained to do the “right thing.” When an accident
|
||||
occurs, it is blamed on the operator. This approach to safety, as argued above, is
|
||||
one of the reasons safety engineering is not as effective as it could be. The system
|
||||
design process needs to start by considering the human controller and continuing
|
||||
that perspective throughout development. The best way to reach that goal is to
|
||||
involve operators in the design decisions and safety analyses. Operators are
|
||||
sometimes left out of the conceptual design stages and only brought in later in
|
||||
development. To design safer systems, operators and maintainers must be included
|
||||
in the design process starting from the conceptual development stage and con-
|
||||
siderations of human error and preventing it should be at the forefront of the
|
||||
design effort.
|
||||
Many companies, particularly in aerospace, use integrated product teams that
|
||||
include, among others, design engineers, safety engineers, human factors experts,
|
||||
potential users of the system (operators), and maintainers. But the development
|
||||
process used may not necessarily take maximum advantage of this potential for
|
||||
collaboration. The process outlined in part III tries to do that.
|
||||
section 6.3.3. Operations.
|
||||
Once the system is built, it must be operated safely. System engineering creates the
|
||||
basic information needed to do this in the form of the safety constraints and operat-
|
||||
ing assumptions upon which the safety of the design was based. These constraints
|
||||
|
||||
and assumptions must be passed to operations in a form that they can understand
|
||||
and use.
|
||||
Because changes in the physical components, human behavior, and the organiza-
|
||||
tional safety control structure are almost guaranteed to occur over the life of the
|
||||
system, operations must manage change in order to ensure that the safety con-
|
||||
straints are not violated. The requirements for safe operations are discussed in
|
||||
chapter 12.
|
||||
It’s now time to look at the changes in system engineering, operations, and man-
|
||||
agement, based on STAMP, that can assist in engineering a safer world.
|
||||
|
||||
|
312
chapter06.txt
Normal file
312
chapter06.txt
Normal file
@ -0,0 +1,312 @@
|
||||
part 3. USING STAMP.
|
||||
|
||||
STAMP provides a new theoretical foundation for system safety on which new, more
|
||||
powerful techniques and tools for system safety can be constructed. Part 3 presents
|
||||
some practical methods for engineering safer systems. All the techniques described
|
||||
in part 3 have been used successfully on real systems. The surprise to those trying
|
||||
them has been how well they work on enormously complex systems and how economical they are to use. Improvements and even more applications of the theory to
|
||||
practice will undoubtedly be created in the future.
|
||||
|
||||
|
||||
chapter 6.
|
||||
|
||||
Engineering and Operating Safer Systems Using
|
||||
STAMP.
|
||||
Part 3 of this book is for those who want to build safer systems without incurring
|
||||
enormous and perhaps impractical financial, time, and performance costs. The belief
|
||||
that building and operating safer systems requires such penalties is widespread and
|
||||
arises from the way safety engineering is usually done today. It need not be the case.
|
||||
The use of top-down system safety engineering and safety-guided design based on
|
||||
STAMP can not only enhance the safety of these systems but also potentially reduce
|
||||
the costs associated with engineering for safety. This chapter provides an overview,
|
||||
while the chapters following it provide details about how to implement this costeffective safety process.
|
||||
section 6.1.
|
||||
Why Are Safety Efforts Sometimes Not Cost-Effective?
|
||||
While there are certainly some very effective safety engineering programs, too
|
||||
many expend a large amount of resources with little return on the investment in
|
||||
terms of improved safety. To fix a problem, we first need to understand it. Why are
|
||||
safety efforts sometimes not cost-effective? There are five general answers to this
|
||||
question.
|
||||
1. Safety efforts may be superficial, isolated, or misdirected.
|
||||
2. Safety activities often start too late.
|
||||
3. The techniques used are not appropriate for the systems we are building today
|
||||
and for new technology.
|
||||
4. Efforts may be narrowly focused on the technical components.
|
||||
5. Systems are usually assumed to be static throughout their lifetime.
|
||||
Superficial, isolated, or misdirected safety engineering activities. Often, safety
|
||||
engineering consists of performing a lot of very costly and tedious activities of
|
||||
limited usefulness in improving safety in the final system design. Childs calls this
|
||||
“cosmetic system safety” . Detailed hazard logs are created and analyses
|
||||
|
||||
|
||||
performed, but these have limited impact on the actual system design. Numbers are
|
||||
associated with unquantifiable properties. These numbers always seem to support
|
||||
whatever numerical requirement is the goal, and all involved feel as if they have
|
||||
done their jobs. The safety analyses provide the answer the customer or designer
|
||||
wants.that the system is safe.and everyone is happy. Haddon-Cave, in the 2 thousand 9
|
||||
Nimrod MR2 accident report, called such efforts compliance only exercises . The
|
||||
results impact certification of the system or acceptance by management, but despite
|
||||
all the activity and large amounts of money spent, the safety of the system has been
|
||||
unaffected.
|
||||
A variant of this problem is that safety activities may be isolated from the engineers and developers building the system. Too often, safety professionals are separated from engineering design and placed within a mission assurance organization.
|
||||
Safety cannot be assured without its already being part of the design; systems must
|
||||
be constructed to be safe from the beginning. Separating safety engineering from
|
||||
design engineering is almost guaranteed to make the effort and resources expended
|
||||
a poor investment. Safety engineering is effective when it participates in and provides input to the design process, not when it focuses on making arguments about
|
||||
the artifacts created after the major safety-related decisions have been made.
|
||||
Sometimes the major focus of the safety engineering efforts is on creating a safety
|
||||
case that proves the completed design is safe, often by showing that a particular
|
||||
process was followed during development. Simply following a process does not
|
||||
mean that the process was effective, which is the basic limitation of many process
|
||||
assurance activities. In other cases the arguments go beyond the process, but they
|
||||
start from the assumption that the system is safe and then focus on showing the
|
||||
conclusion is true. Most of the effort is spent in seeking evidence that shows the
|
||||
system is safe while not looking for evidence that the system is not safe. The basic
|
||||
mindset is wrong, so the conclusions are biased.
|
||||
One of the reasons System Safety has been so successful is that it takes the opposite approach. an attempt is made to show that the system is unsafe and to identify
|
||||
hazardous scenarios. By using this alternative perspective, paths to hazards are often
|
||||
identified that were missed by the engineers, who tend to focus on what they want
|
||||
to happen, not what they do not want to happen.
|
||||
If safety-guided design, as defined in part 3 of this book, is used, the “safety
|
||||
case” is created along with the design. Developing the certification argument
|
||||
becomes trivial and consists primarily of simply gathering the documentation that
|
||||
has been created during the development process.
|
||||
Safety efforts start too late. Unlike the examples of ineffective safety activities
|
||||
above, the safety efforts may involve potentially useful activities, but they may start
|
||||
too late. Frola and Miller claim that 70–80 percent of the most critical decisions
|
||||
|
||||
related to the safety of the completed system are made during early concept development . Unless the safety engineering effort impacts these decisions, it is
|
||||
unlikely to have much effect on safety. Too often, safety engineers are busy doing
|
||||
safety analyses, while the system engineers are in parallel making critical decisions
|
||||
about system design and concepts of operation that are not based on that hazard
|
||||
analysis. By the time the system engineers get the information generated by the
|
||||
safety engineers, it is too late to have a significant impact on design decisions.
|
||||
Of course, engineers normally do try to consider safety early, but the information
|
||||
commonly available is only whether a particular function is safety-critical or not.
|
||||
They are told that the function they are designing can contribute to an accident,
|
||||
with perhaps some letter or numerical “score” of how critical it is, but not much else.
|
||||
Armed only with this very limited information, they have no choice but to focus
|
||||
safety design efforts on increasing the component’s reliability by adding redundancy
|
||||
or safety margins. These features are often added without careful analysis of whether
|
||||
they are needed or will be effective for the specific hazards related to that system
|
||||
function. The design then becomes expensive to build and maintain without necessarily having the maximum possible .(or sometimes any). impact on eliminating
|
||||
or reducing hazards. As argued earlier, redundancy and overdesign, such as building
|
||||
in safety margins, are effective primarily for purely electromechanical components
|
||||
and component failure accidents. They do not apply to software and miss component
|
||||
interaction accidents entirely. In some cases, such design techniques can even
|
||||
contribute to component interaction accidents when they add to the complexity of
|
||||
the design.
|
||||
Most of our current safety engineering techniques start from detailed designs. So
|
||||
even if they are conscientiously applied, they are useful only in evaluating the safety
|
||||
of a completed design, not in guiding the decisions made early in the design creation
|
||||
process. One of the results of evaluating designs after they are created is that engineers are confronted with important safety concerns only after it is too late or too
|
||||
expensive to make significant changes. If and when the system and component
|
||||
design engineers get the results of the safety activities, often in the form of a critique
|
||||
of the design late in the development process, the safety concerns are frequently
|
||||
ignored or argued away because changing the design at that time is too costly.
|
||||
Design reviews then turn into contentious exercises where one side argues that the
|
||||
system has serious safety limitations while the other side argues that those limitations do not exist, they are not serious, or the safety analysis is wrong.
|
||||
The problem is not a lack of concern by designers; it’s simply that safety concerns
|
||||
about their design are raised at a time when major design changes are not possible.
|
||||
the design engineers have no other option than to defend the design they have.
|
||||
If they lose that argument, then they must try to patch the current design; starting
|
||||
over with a safer design is, in almost all cases, impractical. If the designers had the
|
||||
|
||||
|
||||
information necessary to factor safety into their early decision making, then the
|
||||
process of creating safer designs need cost no more and, in fact, will cost less due
|
||||
to two factors. .(1). reduced rework after the decisions made are found to be flawed
|
||||
or to provide inadequate safety and .(2). less unnecessary overdesign and unneeded
|
||||
protection.
|
||||
The key to having a cost-effective safety effort is to embed it into a system
|
||||
engineering process starting from early concept development and then to design
|
||||
safety into the system as the design decisions are made. Costs are much less when
|
||||
safety is built into the system design from the beginning rather than added on or
|
||||
retrofitted later.
|
||||
The techniques used are not appropriate for today’s systems and new technology. The assumptions of the major safety engineering techniques currently used,
|
||||
almost all of which stem from decades past, do not match the assumptions underlying
|
||||
the technology and complexity of the systems being built today or the new emerging
|
||||
causes of accidents. They do not apply to human or software errors or flawed management decision making, and they certainly do not apply to weaknesses in the
|
||||
organizational structure or social infrastructure systems. These contributors to accidents do not “fail” in the same way assumed by the current safety analysis tools.
|
||||
But with no other tools to use, safety engineers attempt to force square pegs into
|
||||
round holes, hoping this will be sufficient. As a result, nothing much is accomplished
|
||||
beyond expending time, money, and other resources. It’s time we face up to the fact
|
||||
that new safety engineering techniques are needed to handle those aspects of
|
||||
systems that go beyond the analog hardware components and the relatively simple
|
||||
designs of the past for which the current techniques were invented. Chapter 8
|
||||
describes a new hazard analysis technique based on STAMP, called STPA, but others
|
||||
are possible. The important thing is to confront these problems head on and not
|
||||
ignore them and waste our time misapplying or futilely trying to extend techniques
|
||||
that do not apply to today’s systems.
|
||||
The safety efforts are focused on the technical components of the system. Many
|
||||
safety engineering .(and system engineering, for that matter). efforts focus on the
|
||||
technical system details. Little effort is made to consider the social, organizational,
|
||||
and human components of the system in the design process. Assumptions are made
|
||||
that operators will be trained to do the right things and that they will adapt to
|
||||
whatever design they are given. Sophisticated human factors and system analysis
|
||||
input is lacking, and when accidents inevitably result, they are blamed on the operators for not behaving the way the designers thought they would. To give just one
|
||||
example .(although most accident reports contain such examples), one of the four
|
||||
causes, all of which cited pilot error, identified in the loss of the American Airlines
|
||||
B757 near Cali, Colombia .(see chapter 2), was “Failure of the flight crew to revert
|
||||
|
||||
to basic radio navigation when the FMS-assisted navigation became confusing and
|
||||
demanded an excessive workload in a critical phase of the flight.” A more useful
|
||||
alternative statement of the cause might have been “An FMS system that confused
|
||||
the operators and demanded an excessive workload in a critical phase of flight.”
|
||||
Virtually all systems contain humans, but engineers are often not taught much
|
||||
about human factors and draw convenient boundaries around the technical components, focusing their attention inside these artificial boundaries. Human factors
|
||||
experts have complained about the resulting technology-centered automation ,
|
||||
where the designers focus on technical issues and not on supporting operator tasks.
|
||||
The result is what has been called “clumsy” automation that increases the chance
|
||||
of human error . One of the new assumptions for safety in chapter 2
|
||||
is that operator “error” is a product of the environment in which it occurs.
|
||||
A variant of the problem is common in systems using information technology.
|
||||
Many medical information systems, for example, have not been as successful as they
|
||||
might have been in increasing safety and have even led to new types of hazards and
|
||||
losses . Often, little effort is invested during development in considering
|
||||
the usability of the system by medical professionals or of the impact, not always
|
||||
positive, that the information system design will have on workflow and on the
|
||||
practice of medicine.
|
||||
Automation is commonly assumed to be safer than manual systems because
|
||||
the hazards associated with the manual systems are eliminated. Inadequate consideration is given to whether new, and maybe even worse, hazards are introduced
|
||||
by the automated system and how to prevent or minimize these new hazards. The
|
||||
aviation industry has, for the most part, learned this lesson for cockpit and flight
|
||||
control design, where eliminating errors of commission simply created new errors
|
||||
of omission .(see chapter 9), but most other industries are far behind in
|
||||
this respect.
|
||||
Like other safety-related system properties that are ignored until too late, operators and human-factors experts often are not brought into the early design process
|
||||
or they work in isolation from the designers until changes are extremely expensive
|
||||
to make. Sometimes, human factors design is not considered until after an accident,
|
||||
and occasionally not even then, almost guaranteeing that more accidents will occur.
|
||||
To provide cost-effective safety engineering, the system and safety analysis
|
||||
and design process needs to consider the humans in systems.including those that
|
||||
are not directly controlling the physical processes.not separately or after the fact
|
||||
but starting at concept development and continuing throughout the life cycle of
|
||||
the system.
|
||||
Systems are assumed to be static throughout their lifetimes. It is rare for engineers to consider how the system will evolve and change over time. While designing
|
||||
|
||||
|
||||
|
||||
for maintainability may be considered, unintended changes are often ignored.
|
||||
Change is a constant for all systems. physical equipment ages and degrades over
|
||||
its lifetime and may not be maintained properly; human behavior and priorities
|
||||
usually change over time; organizations change and evolve, which means the safety
|
||||
control structure itself will evolve. Change may also occur in the physical and social
|
||||
environment within which the system operates and with which it interacts. To be
|
||||
effective, controls need to be designed that will reduce the risk associated with all
|
||||
these types of changes. Not only are accidents expensive, but once again planning
|
||||
for system change can reduce the costs associated with the change itself. In addition,
|
||||
much of the effort in operations needs to be focused on managing and reacting
|
||||
to change.
|
||||
section 6.2.
|
||||
The Role of System Engineering in Safety.
|
||||
As the systems we build and operate increase in size and complexity, the use of
|
||||
sophisticated system engineering approaches becomes more critical. Important
|
||||
system-level .(emergent). properties, such as safety, must be built into the design of
|
||||
these systems; they cannot be effectively added on or simply measured afterward.
|
||||
While system engineering was developed originally for technical systems, the
|
||||
approach is just as important and applicable to social systems or the social components of systems that are usually not thought of as “engineered.” All systems are
|
||||
engineered in the sense that they are designed to achieve specific goals, namely to
|
||||
satisfy requirements and constraints. So ensuring hospital safety or pharmaceutical
|
||||
safety, for example, while not normally thought of as engineering problems, falls
|
||||
within the broad definition of engineering. The goal of the system engineering
|
||||
process is to create a system that satisfies the mission while maintaining the constraints on how the mission is achieved.
|
||||
Engineering is a way of organizing that design process to achieve the most
|
||||
cost-effective results. Social systems may not have been “designed” in the sense of
|
||||
a purposeful design process but may have evolved over time. Any effort to change
|
||||
such systems in order to improve them, however, can be thought of as a redesign or
|
||||
reengineering process and can again benefit from a system engineering approach.
|
||||
When using STAMP as the underlying causality model, engineering or reengineering safer systems means designing .(or redesigning). the safety-control structure and
|
||||
the controls designed into it to ensure the system operates safely, that is, without
|
||||
unacceptable losses. What is being controlled.chemical manufacturing processes,
|
||||
spacecraft or aircraft, public health, safety of the food supply, corporate fraud, risks
|
||||
in the financial system.is irrelevant in terms of the general process, although
|
||||
significant differences will exist in the types of controls applicable and the design
|
||||
|
||||
|
||||
of those controls. The process, however, is very similar to a regular system engineering process.
|
||||
The problem is that most engineering and even many system engineering techniques were developed under conditions and assumptions that do not hold for
|
||||
complex social systems, as discussed in part I. But STAMP and new system-theoretic
|
||||
approaches to safety can point the way forward for both complex technical and
|
||||
social processes. The general engineering and reengineering process described in
|
||||
part 3 applies to all systems.
|
||||
section 6.3.
|
||||
A System Safety Engineering Process.
|
||||
In STAMP, accidents and losses result from not enforcing safety constraints on
|
||||
behavior. Not only must the original system design incorporate appropriate constraints to ensure safe operations, but the safety constraints must continue to be
|
||||
enforced as changes and adaptations to the system design occur over time. This goal
|
||||
forms the basis for safe management, development, and operations.
|
||||
There is no agreed upon best system engineering process and probably cannot
|
||||
be one.the process needs to match the specific problem and environment in which
|
||||
it is being used. What is described in part 3 of this book is how to integrate system
|
||||
safety into any reasonable system engineering process. Figure 6.1 shows the three
|
||||
major components of a cost-effective system safety process. management, development, and operations.
|
||||
section 6.3.1. Management.
|
||||
Safety starts with management leadership and commitment. Without these, the
|
||||
efforts of others in the organization are almost doomed to failure. Leadership
|
||||
creates culture, which drives behavior.
|
||||
Besides setting the culture through their own behavior, managers need to establish the organizational safety policy and create a safety control structure with appropriate responsibilities, accountability and authority, safety controls, and feedback
|
||||
channels. Management must also establish a safety management plan and ensure
|
||||
that a safety information system and continual learning and improvement processes
|
||||
are in place and effective.
|
||||
Chapter 13 discusses management’s role and responsibilities in safety.
|
||||
|
||||
section 6.3.2. Engineering Development.
|
||||
The key to having a cost-effective safety effort is to embed it into a system engineering process from the very beginning and to design safety into the system as the
|
||||
design decisions are made. All viewpoints and system components must be included
|
||||
|
||||
|
||||
in the process and information used and documented in a way that is accessible,
|
||||
understandable, and helpful.
|
||||
System engineering starts with first determining the goals of the system. Potential
|
||||
hazards to be avoided are then identified. From the goals and system hazards, a set
|
||||
of system functional and safety requirements and constraints are identified that set
|
||||
the foundation for design, operations, and management. Chapter 7 describes how
|
||||
to establish these fundamentals.
|
||||
To start safety engineering early enough to be cost-effective, safety must be considered from the early concept formation stages of development and continue
|
||||
throughout the life cycle of the system. Design decisions should be guided by safety
|
||||
|
||||
|
||||
considerations while at the same time taking other system requirements and constraints into account and resolving conflicts. The hazard analysis techniques used
|
||||
must not require a completed design and must include all the factors involved
|
||||
in accidents. Chapter 8 describes a new hazard analysis technique, based on the
|
||||
STAMP model of causation, that provides the information necessary to design
|
||||
safety into the system, and chapter 9 shows how to use it in a safety-guided design
|
||||
process. Chapter 9 also presents general principles for safe design including how to
|
||||
design systems and system components used by humans that do not contribute to
|
||||
human error.
|
||||
Documentation is critical not only for communication in the design and development process but also because of inevitable changes over time. That documentation
|
||||
must include the rationale for the design decisions and traceability from high-level
|
||||
requirements and constraints down to detailed design features. After the original
|
||||
system development is finished, the information necessary to operate and maintain
|
||||
it safely must be passed in a usable form to operators and maintainers. Chapter 10
|
||||
describes how to integrate safety considerations into specifications and the general
|
||||
system engineering process.
|
||||
Engineers have often concentrated more on the technological aspects of system
|
||||
development while assuming that humans in the system will either adapt to whatever is given to them or will be trained to do the “right thing.” When an accident
|
||||
occurs, it is blamed on the operator. This approach to safety, as argued above, is
|
||||
one of the reasons safety engineering is not as effective as it could be. The system
|
||||
design process needs to start by considering the human controller and continuing
|
||||
that perspective throughout development. The best way to reach that goal is to
|
||||
involve operators in the design decisions and safety analyses. Operators are
|
||||
sometimes left out of the conceptual design stages and only brought in later in
|
||||
development. To design safer systems, operators and maintainers must be included
|
||||
in the design process starting from the conceptual development stage and considerations of human error and preventing it should be at the forefront of the
|
||||
design effort.
|
||||
Many companies, particularly in aerospace, use integrated product teams that
|
||||
include, among others, design engineers, safety engineers, human factors experts,
|
||||
potential users of the system .(operators), and maintainers. But the development
|
||||
process used may not necessarily take maximum advantage of this potential for
|
||||
collaboration. The process outlined in part 3 tries to do that.
|
||||
section 6.3.3. Operations.
|
||||
Once the system is built, it must be operated safely. System engineering creates the
|
||||
basic information needed to do this in the form of the safety constraints and operating assumptions upon which the safety of the design was based. These constraints
|
||||
|
||||
and assumptions must be passed to operations in a form that they can understand
|
||||
and use.
|
||||
Because changes in the physical components, human behavior, and the organizational safety control structure are almost guaranteed to occur over the life of the
|
||||
system, operations must manage change in order to ensure that the safety constraints are not violated. The requirements for safe operations are discussed in
|
||||
chapter 12.
|
||||
It’s now time to look at the changes in system engineering, operations, and management, based on STAMP, that can assist in engineering a safer world.
|
||||
|
||||
|
13
cleanfile
Executable file
13
cleanfile
Executable file
@ -0,0 +1,13 @@
|
||||
#!/bin/bash
|
||||
|
||||
SED=$(
|
||||
while IFS=$'\t' read -r -a myArray
|
||||
do
|
||||
echo -ne "s_"${myArray[0]}"_"${myArray[1]}"_g;\n"
|
||||
done < replacements
|
||||
)
|
||||
|
||||
echo sed -e "$SED"
|
||||
cat $1 | sed -e "$SED" | sed -z 's_-\n__g'> $2
|
||||
|
||||
|
91
replacements
91
replacements
@ -1,47 +1,48 @@
|
||||
: .
|
||||
— .
|
||||
\[.+\]
|
||||
-\n
|
||||
19(\d\d) 19 $1
|
||||
200(\d) 2 thousand $1
|
||||
20(\d\d) 20 $1
|
||||
\( .(
|
||||
\) ).
|
||||
III 3
|
||||
II 2
|
||||
IV 4
|
||||
ASO A S O
|
||||
PRA P R A
|
||||
HMO H M O
|
||||
MIC M I C
|
||||
DC-10 D C 10
|
||||
OPC O P C
|
||||
TAOR T A O R
|
||||
AAI A A I
|
||||
ACO A C O
|
||||
AFB A F B
|
||||
AI A I
|
||||
ATO A T O
|
||||
BH B H
|
||||
BSD B S D
|
||||
CTF C T F
|
||||
CFAC C FACK
|
||||
DO D O
|
||||
GAO GAOW
|
||||
HQ-II H Q-2
|
||||
IFF I F F
|
||||
JOIC J O I C
|
||||
JSOC J SOCK
|
||||
JTIDS J tides
|
||||
MCC M C C
|
||||
MD M D
|
||||
NCA N C A
|
||||
NFZ N F Z
|
||||
OPC O P C
|
||||
ROE R O E
|
||||
SD S D
|
||||
SITREP SIT Rep
|
||||
TACSAT Tack sat
|
||||
TAOR T A O R
|
||||
USCINCEUR U S C in E U R
|
||||
WD W D
|
||||
\[.\\+\]
|
||||
( .(
|
||||
) ).
|
||||
HQ-II H Q-2
|
||||
III 3
|
||||
II 2
|
||||
IV 4
|
||||
AWACS A Wacks
|
||||
ASO A S O
|
||||
PRA P R A
|
||||
HMO H M O
|
||||
MIC M I C
|
||||
DC-10 D C 10
|
||||
OPC O P C
|
||||
TAOR T A O R
|
||||
AAI A A I
|
||||
ACO A C O
|
||||
AFB A F B
|
||||
AI A I
|
||||
ATO A T O
|
||||
BH B H
|
||||
BSD B S D
|
||||
CTF C T F
|
||||
CFAC C FACK
|
||||
DO D O
|
||||
GAO GAOW
|
||||
IFF I F F
|
||||
JOIC J O I C
|
||||
JSOC J SOCK
|
||||
JTIDS J tides
|
||||
MCC M C C
|
||||
MD M D
|
||||
NCA N C A
|
||||
NFZ N F Z
|
||||
OPC O P C
|
||||
ROE R O E
|
||||
SD S D
|
||||
SITREP SIT Rep
|
||||
TACSAT Tack sat
|
||||
TAOR T A O R
|
||||
USCINCEUR U S C in E U R
|
||||
WD W D
|
||||
19\\([[:digit:]][[:digit:]]\\) 19 \\1
|
||||
200\\([[:digit:]]\\) 2 thousand \1
|
||||
20\\([[:digit:]][[:digit:]]\\) 20 \1
|
||||
B757 B 7 57
|
Loading…
x
Reference in New Issue
Block a user