1362 lines
91 KiB
Plaintext
1362 lines
91 KiB
Plaintext
chapter 10.
|
||
Integrating Safety into System Engineering.
|
||
Previous chapters have provided the individual pieces of the solution to engineering
|
||
a safer world. This chapter demonstrates how to put these pieces together to inte-
|
||
grate safety into a system engineering process. No one process is being proposed:
|
||
Safety must be part of any system engineering process.
|
||
The glue that integrates the activities of engineering and operating complex
|
||
systems is specifications and the safety information system. Communication is criti-
|
||
cal in handling any emergent property in a complex system. Our systems today are
|
||
designed and built by hundreds and often thousands of engineers and then operated
|
||
by thousands and even tens of thousands more people. Enforcing safety constraints
|
||
on system behavior requires that the information needed for decision making is
|
||
available to the right people at the right time, whether during system development,
|
||
operations, maintenance, or reengineering.
|
||
This chapter starts with a discussion of the role of specifications and how systems
|
||
theory can be used as the foundation for the specification of complex systems. Then
|
||
an example of how to put the components together in system design and develop-
|
||
ment is presented. Chapters 11 and 12 cover how to maximize learning from acci-
|
||
dents and incidents and how to enforce safety constraints during operations. The
|
||
design of safety information systems is discussed in chapter 13.
|
||
|
||
section 10.1. The Role of Specifications and the Safety Information System.
|
||
While engineers may have been able to get away with minimal specifications during
|
||
development of the simpler electromechanical systems of the past, specifications are
|
||
critical to the successful engineering of systems of the size and complexity we are
|
||
attempting to build today. Specifications are no longer simply a means of archiving
|
||
information; they need to play an active role in the system engineering process. They
|
||
are a critical tool in stretching our intellectual capabilities to deal with increasing
|
||
complexity.
|
||
|
||
|
||
Our specifications must reflect and support the system safety engineering process
|
||
and the safe operation, evolution and change of the system over time. Specifications
|
||
should support the use of notations and techniques for reasoning about hazards and
|
||
safety, designing the system to eliminate or control hazards, and validating—at each
|
||
step, starting from the very beginning of system development—that the evolving
|
||
system has the desired safety level. Later, specifications must support operations
|
||
and change over time.
|
||
Specification languages can help (or hinder) human performance of the various
|
||
problem-solving activities involved in system requirements analysis, hazard analysis,
|
||
design, review, verification and validation, debugging, operational use, and mainte-
|
||
nance and evolution (sustainment). They do this by including notations and tools
|
||
that enhance our ability to: (1) reason about particular properties, (2) construct the
|
||
system and the software in it to achieve them, and (3) validate—at each step, starting
|
||
from the very beginning of system development—that the evolving system has the
|
||
desired qualities. In addition, systems and particularly the software components are
|
||
continually changing and evolving; they must be designed to be changeable and the
|
||
specifications must support evolution without compromising the confidence in the
|
||
properties that were initially verified.
|
||
Documenting and tracking hazards and their resolution are basic requirements
|
||
for any effective safety program. But simply having the safety engineer track them
|
||
and maintain a hazard log is not enough—information must be derived from the
|
||
hazards to inform the system engineering process and that information needs to be
|
||
specified and recorded in a way that has an impact on the decisions made during
|
||
system design and operations. To have such an impact, the safety-related informa-
|
||
tion required by the engineers needs to be integrated into the environment in which
|
||
safety-related engineering decisions are made. Engineers are unlikely to be able to
|
||
read through volumes of hazard analysis information and relate it easily to the
|
||
specific component upon which they are working. The information the system safety
|
||
engineer has generated must be presented to the system designers, implementers,
|
||
maintainers, and operators in such a way that they can easily find what they need
|
||
to make safer decisions.
|
||
Safety information is not only important during system design; it also needs to
|
||
be presented in a form that people can learn from, apply to their daily jobs, and use
|
||
throughout the life cycle of projects. Too often, preventable accidents have occurred
|
||
due to changes that were made after the initial design period. Accidents are fre-
|
||
quently the result of safe designs becoming unsafe over time when changes in the
|
||
system itself or in its environment violate the basic assumptions of the original
|
||
hazard analysis. Clearly, these assumptions must be recorded and easily retrievable
|
||
when changes occur. Good documentation is the most important in complex systems
|
||
|
||
|
||
where nobody is able to keep all the information necessary to make safe decisions
|
||
in their head.
|
||
What types of specifications are needed to support humans in system safety
|
||
engineering and operations? Design decisions at each stage must be mapped into
|
||
the goals and constraints they are derived to satisfy, with earlier decisions mapped
|
||
or traced to later stages of the process. The result should be a seamless and gapless
|
||
record of the progression from high-level requirements down to component require-
|
||
ments and designs or operational procedures. The rationale behind the design deci-
|
||
sions needs to be recorded in a way that is easily retrievable by those reviewing or
|
||
changing the system design. The specifications must also support the various types
|
||
of formal and informal analysis used to decide between alternative designs and to
|
||
verify the results of the design process. Finally, specifications must assist in the
|
||
coordinated design of the component functions and the interfaces between them.
|
||
The notations used in specification languages must be easily readable and learn-
|
||
able. Usability is enhanced by using notations and models that are close to the
|
||
mental models created by the users of the specification and the standard notations
|
||
in their fields of expertise.
|
||
The structure of the specification is also important for usability. The structure will
|
||
enhance or limit the ability to retrieve needed information at the appropriate times.
|
||
Finally, specifications should not limit the problem-solving strategies of the users
|
||
of the specification. Not only do different people prefer different strategies for
|
||
solving problems, but the most effective problem solvers have been found to change
|
||
strategies frequently [167, 58]. Experts switch problem-solving strategy when they
|
||
run into difficulties following a particular strategy and as new information is obtained
|
||
that changes the objectives or subgoals or the mental workload needed to use a
|
||
particular strategy. Tools often limit the strategies that can be used, usually imple-
|
||
menting the favorite strategy of the tool designer, and therefore limiting the problem
|
||
solving strategies supported by the specification.
|
||
One way to implement these principles is to use intent specifications [120].
|
||
|
||
section 10.2.
|
||
Intent Specifications.
|
||
Intent specifications are based on systems theory, system engineering principles, and
|
||
psychological research on human problem solving and how to enhance it. The goal
|
||
is to assist humans in dealing with complexity. While commercial tools exist that
|
||
implement intent specifications directly, any specification languages and tools can
|
||
be used that allow implementing the properties of an intent specification.
|
||
An intent specification differs from a standard specification primarily in its struc-
|
||
ture, not its content: no extra information is involved that is not commonly found
|
||
|
||
|
||
in detailed specifications—the information is simply organized in a way that has
|
||
been found to assist in its location and use. Most complex systems have voluminous
|
||
documentation, much of it redundant or inconsistent, and it degrades quickly as
|
||
changes are made over time. Sometimes important information is missing, particu-
|
||
larly information about why something was done the way it was—the intent or
|
||
design rationale. Trying to determine whether a change might have a negative
|
||
impact on safety, if possible at all, is usually enormously expensive and often involves
|
||
regenerating analyses and work that was already done but either not recorded or
|
||
not easily located when needed. Intent specifications were designed to help with
|
||
these problems: Design rationale, safety analysis results, and the assumptions upon
|
||
which the system design and validation are based are integrated directly into the
|
||
system specification and its structure, rather than stored in separate documents, so
|
||
the information is at hand when needed for decision making.
|
||
The structure of an intent specification is based on the fundamental concept of
|
||
hierarchy in systems theory (see chapter 3) where complex systems are modeled in
|
||
terms of a hierarchy of levels of organization, each level imposing constraints on
|
||
the degree of freedom of the components at the lower level. Different description
|
||
languages may be appropriate at the different levels. Figure 10.1 shows the seven
|
||
levels of an intent specification.
|
||
Intent specifications are organized along three dimensions: intent abstraction,
|
||
part-whole abstraction, and refinement. These dimensions constitute the problem
|
||
space in which the human navigates. Part-whole abstraction (along the horizontal
|
||
dimension) and refinement (within each level) allow users to change their
|
||
focus of attention to more or less detailed views within each level or model.
|
||
The vertical dimension specifies the level of intent at which the problem is being
|
||
considered.
|
||
Each intent level contains information about the characteristics of the environ-
|
||
ment, human operators or users, the physical and functional system components,
|
||
and requirements for and results of verification and validation activities for that
|
||
level. The safety information is embedded in each level, instead of being maintained
|
||
in a separate safety log, but linked together so that it can easily be located and
|
||
reviewed.
|
||
The vertical intent dimension has seven levels. Each level represents a different
|
||
model of the system from a different perspective and supports a different type of
|
||
reasoning about it. Refinement and decomposition occurs within each level of the
|
||
specification, rather than between levels. Each level provides information not just
|
||
about what and how, but why, that is, the design rationale and reasons behind the
|
||
design decisions, including safety considerations.
|
||
Figure 10.2 shows an example of the information that might be contained in each
|
||
level of the intent specification.
|
||
|
||
|
||
The top level (level 0) provides a project management view and insight into the
|
||
relationship between the plans and the project development status through links
|
||
to the other parts of the intent specification. This level might contain the project
|
||
management plans, the safety plan, status information, and so on.
|
||
Level 1 is the customer view and assists system engineers and customers in
|
||
agreeing on what should be built and, later, whether that has been accomplished. It
|
||
includes goals, high-level requirements and constraints (both physical and operator),
|
||
environmental assumptions, definitions of accidents, hazard information, and system
|
||
limitations.
|
||
Level 2 is the system engineering view and helps system engineers record and
|
||
reason about the system in terms of the physical principles and system-level design
|
||
principles upon which the system design is based.
|
||
Level 3 specifies the system architecture and serves as an unambiguous interface
|
||
between system engineers and component engineers or contractors. At level 3, the
|
||
system functions defined at level 2 are decomposed, allocated to components, and
|
||
specified rigorously and completely. Black-box behavioral component models may
|
||
be used to specify and reason about the logical design of the system as a whole and
|
||
|
||
|
||
the interactions among individual system components without being distracted by
|
||
implementation details.
|
||
If the language used at level 3 is formal (rigorously defined), then it can play an
|
||
important role in system validation. For example, the models can be executed in
|
||
system simulation environments to identify system requirements and design errors
|
||
early in development. They can also be used to automate the generation of system
|
||
and component test data, various types of mathematical analyses, and so forth. It is
|
||
important, however, that the black-box (that is, transfer function) models be easily
|
||
reviewed by domain experts—most of the safety-related errors in specifications will
|
||
be found by expert review, not by automated tools or formal proofs.
|
||
A readable but formal and executable black-box requirements specification lan-
|
||
guage was developed by the author and her students while helping the FAA specify
|
||
the TCAS (Traffic Alert and Collision Avoidance System) requirements [123].
|
||
Reviewers can learn to read the specifications with a few minutes of instruction
|
||
about the notation. Improvements have been made over the years, and it is being
|
||
used successfully on real systems. This language provides an existence case that a
|
||
|
||
|
||
readable and easily learnable but formal specification language is possible. Other
|
||
languages with the same properties, of course, can also be used effectively.
|
||
The next two levels, Design Representation and Physical Representation,
|
||
provide the information necessary to reason about individual component design
|
||
and implementation issues. Some parts of level 4 may not be needed if at least por-
|
||
tions of the physical design can be generated automatically from the models at
|
||
level 3.
|
||
The final level, Operations, provides a view of the operational system and acts as
|
||
the interface between development and operations. It assists in designing and per-
|
||
forming system safety activities during system operations. It may contain required
|
||
or suggested operational audit procedures, user manuals, training materials, main-
|
||
tenance requirements, error reports and change requests, historical usage informa-
|
||
tion, and so on.
|
||
Each level of an intent specification supports a different type of reasoning about
|
||
the system, with the highest level assisting systems engineers in their reasoning
|
||
about system-level goals, constraints, priorities, and tradeoffs. The second level,
|
||
System Design Principles, allows engineers to reason about the system in terms of
|
||
the physical principles and laws upon which the design is based. The Architecture
|
||
level enhances reasoning about the logical design of the system as a whole, the
|
||
interactions between the components, and the functions computed by the compo-
|
||
nents without being distracted by implementation issues. The lowest two levels
|
||
provide the information necessary to reason about individual component design and
|
||
implementation issues. The mappings between levels provide the relational informa-
|
||
tion that allows reasoning across hierarchical levels and traceability of requirements
|
||
to design.
|
||
Hyperlinks are used to provide the relational information that allows reasoning
|
||
within and across levels, including the tracing from high-level requirements down
|
||
to implementation and vice versa. Examples can be found in the rest of this
|
||
chapter.
|
||
The structure of an intent specification does not imply that the development must
|
||
proceed from the top levels down to the bottom levels in that order, only that at
|
||
the end of the development process, all levels are complete. Almost all development
|
||
involves work at all of the levels at the same time.
|
||
When the system changes, the environment in which the system operates changes,
|
||
or components are reused in a different system, a new or updated safety analysis is
|
||
required. Intent specifications can make that process feasible and practical.
|
||
Examples of intent specifications are available [121, 151] as are commercial tools
|
||
to support them. But most of the principles can be implemented without special
|
||
tools beyond a text editor and hyperlinking facilities. The rest of this chapter assumes
|
||
only these very limited facilities are available.
|
||
|
||
|
||
section 10.3. An Integrated System and Safety Engineering Process.
|
||
There is no agreed upon best system engineering process and probably cannot be
|
||
one—the process needs to match the specific problem and environment in which it
|
||
is being used. What is described in this section is how to integrate safety engineering
|
||
into any reasonable system engineering process.
|
||
The system engineering process provides a logical structure for problem solving.
|
||
Briefly, first a need or problem is specified in terms of objectives that the system
|
||
must satisfy and criteria that can be used to rank alternative designs. Then a process
|
||
of system synthesis takes place that usually involves considering alternative designs.
|
||
Each of the alternatives is analyzed and evaluated in terms of the stated objectives
|
||
and design criteria, and one alternative is selected. In practice, the process is highly
|
||
iterative: The results from later stages are fed back to early stages to modify objec-
|
||
tives, criteria, design decisions, and so on.
|
||
Design alternatives are generated through a process of system architecture devel-
|
||
opment and analysis. The system engineers first develop requirements and design
|
||
constraints for the system as a whole and then break the system into subsystems
|
||
and design the subsystem interfaces and the subsystem interface topology. System
|
||
functions and constraints are refined and allocated to the individual subsystems. The
|
||
emerging design is analyzed with respect to desired system performance character-
|
||
istics and constraints, and the process is iterated until an acceptable system design
|
||
results.
|
||
The difference in safety-guided design is that hazard analysis is used throughout
|
||
the process to generate the safety constraints that are factored into the design deci-
|
||
sions as they are made. The preliminary design at the end of this process must be
|
||
described in sufficient detail that subsystem implementation can proceed indepen-
|
||
dently. The subsystem requirements and design processes are subsets of the larger
|
||
system engineering process.
|
||
This general system engineering process has some particularly important aspects.
|
||
One of these is the focus on interfaces. System engineering views each system as an
|
||
integrated whole even though it is composed of diverse, specialized components,
|
||
which may be physical, logical (software), or human. The objective is to design
|
||
subsystems that when integrated into the whole provide the most effective system
|
||
possible to achieve the overall objectives. The most challenging problems in building
|
||
complex systems today arise in the interfaces between components. One example
|
||
is the new highly automated aircraft where most incidents and accidents have been
|
||
blamed on human error, but more properly reflect difficulties in the collateral design
|
||
of the aircraft, the avionics systems, the cockpit displays and controls, and the
|
||
demands placed on the pilots.
|
||
|
||
|
||
|
||
A second critical factor is the integration of humans and nonhuman system
|
||
components. As with safety, a separate group traditionally does human factors
|
||
design and analysis. Building safety-critical systems requires integrating both
|
||
system safety and human factors into the basic system engineering process, which
|
||
in turn has important implications for engineering education. Unfortunately,
|
||
neither safety nor human factors plays an important role in most engineering
|
||
education today.
|
||
During program and project planning, a system safety plan, standards, and
|
||
project development safety control structure need to be designed including
|
||
policies, procedures, the safety management and control structure, and communica-
|
||
tion channels. More about safety management plans can be found in chapters 12
|
||
and 13.
|
||
Figure 10.3 shows the types of activities that need to be performed in such an
|
||
integrated process and the system safety and human factors inputs and products.
|
||
Standard validation and verification activities are not shown, since they should be
|
||
included throughout the entire process.
|
||
The rest of this chapter provides an example using TCAS II. Other examples are
|
||
interspersed where TCAS is not appropriate or does not provide an interesting
|
||
enough example.
|
||
section 10.3.1. Establishing the Goals for the System.
|
||
The first step in any system engineering process is to identify the goals of the effort.
|
||
Without agreeing on where you are going, it is not possible to determine how to get
|
||
there or when you have arrived.
|
||
TCAS II is a box required on most commercial and some general aviation aircraft
|
||
that assists in avoiding midair collisions. The goals for TCAS II are to:
|
||
G1: Provide affordable and compatible collision avoidance system options for a
|
||
broad spectrum of National Airspace System users.
|
||
G2: Detect potential midair collisions with other aircraft in all meteorological
|
||
conditions; throughout navigable airspace, including airspace not covered
|
||
by ATC primary or secondary radar systems; and in the absence of ground
|
||
equipment.
|
||
TCAS was intended to be an independent backup to the normal Air Traffic Control
|
||
(ATC) system and the pilot’s “see and avoid” responsibilities. It interrogates air
|
||
traffic control transponders on aircraft in its vicinity and listens for the transponder
|
||
replies. By analyzing these replies with respect to slant range and relative altitude,
|
||
TCAS determines which aircraft represent potential collision threats and provides
|
||
appropriate display indications, called advisories, to the flight crew to assure proper
|
||
|
||
|
||
separation. Two types of advisories can be issued. Resolution advisories (RAs)
|
||
provide instructions to the pilots to ensure safe separation from nearby traffic in
|
||
the vertical plane. Traffic advisories (TAs) indicate the positions of intruding air-
|
||
craft that may later cause resolution advisories to be displayed.
|
||
TCAS is an example of a system created to directly impact safety where the goals
|
||
are all directly related to safety. But system safety engineering and safety-driven
|
||
design can be applied to systems where maintaining safety is not the only goal and,
|
||
in fact, human safety is not even a factor. The example of an outer planets explorer
|
||
spacecraft was shown in chapter 7. Another example is the air traffic control system,
|
||
which has both safety and nonsafety (throughput) goals.
|
||
|
||
footnote. Horizontal advisories were originally planned for later versions of TCAS but have not yet been
|
||
implemented.
|
||
|
||
section 10.3.2. Defining Accidents.
|
||
Before any safety-related activities can start, the definition of an accident needs to
|
||
be agreed upon by the system customer and other stakeholders. This definition, in
|
||
essence, establishes the goals for the safety effort.
|
||
Defining accidents in TCAS is straightforward—only one is relevant, a midair
|
||
collision. Other more interesting examples are shown in chapter 7.
|
||
Basically, the criterion for specifying events as accidents is that the losses are so
|
||
important that they need to play a central role in the design and tradeoff process.
|
||
In the outer planets explorer example in chapter 7, some of the losses involve the
|
||
mission goals themselves while others involve losses to other missions or a negative
|
||
impact on our solar system ecology.
|
||
Priorities and evaluation criteria may be assigned to the accidents to indicate how
|
||
conflicts are to be resolved, such as conflicts between safety goals or conflicts
|
||
between mission goals and safety goals and to guide design choices at lower levels.
|
||
The priorities are then inherited by the hazards related to each of the accidents and
|
||
traced down to the safety-related design features.
|
||
|
||
section 10.3.3. Identifying the System Hazards.
|
||
Once the set of accidents has been agreed upon, hazards can be derived from them.
|
||
This process is part of what is called Preliminary Hazard Analysis (PHA) in System
|
||
Safety. The hazard log is usually started as soon as the hazards to be considered are
|
||
identified. While much of the information in the hazard log will be filled in later,
|
||
some information is available at this time.
|
||
There is no right or wrong list of hazards—only an agreement by all involved on
|
||
what hazards will be considered. Some hazards that were considered during the
|
||
design of TCAS are listed in chapter 7 and are repeated here for convenience:
|
||
|
||
|
||
1. TCAS causes or contributes to a near midair collision (NMAC), defined as a
|
||
pair of controlled aircraft violating minimum separation standards.
|
||
2. TCAS causes or contributes to a controlled maneuver into the ground.
|
||
3. TCAS causes or contributes to the pilot losing control over the aircraft.
|
||
4. TCAS interferes with other safety-related aircraft systems (for example,
|
||
ground proximity warning).
|
||
5. TCAS interferes with the ground-based air traffic control system (e.g., tran-
|
||
sponder transmissions to the ground or radar or radio services).
|
||
6. TCAS interferes with an ATC advisory that is safety-related (e.g., avoiding a
|
||
restricted area or adverse weather conditions).
|
||
Once accidents and hazards have been identified, early concept formation (some-
|
||
times called high-level architecture development) can be started for the integrated
|
||
system and safety engineering process.
|
||
|
||
section 10.3.4. Integrating Safety into Architecture Selection and System Trade Studies.
|
||
An early activity in the system engineering of complex systems is the selection of
|
||
an overall architecture for the system, or as it is sometimes called, system concept
|
||
formation. For example, an architecture for manned space exploration might include
|
||
a transportation system with parameters and options for each possible architectural
|
||
feature related to technology, policy, and operations. Decisions will need to be made
|
||
early, for example, about the number and type of vehicles and modules, the destina-
|
||
tions for the vehicles, the roles and activities for each vehicle including dockings
|
||
and undockings, trajectories, assembly of the vehicles (in space or on Earth), discard-
|
||
ing of vehicles, prepositioning of vehicles in orbit and on the planet surface, and so
|
||
on. Technology options include type of propulsion, level of autonomy, support
|
||
systems (water and oxygen if the vehicle is used to transport humans), and many
|
||
others. Policy and operational options may include crew size, level of international
|
||
investment, types of missions and their duration, landing sites, and so on. Decisions
|
||
about these overall system concepts clearly must precede the actual implementation
|
||
of the system.
|
||
How are these decisions made? The selection process usually involves extensive
|
||
tradeoff analysis that compares the different feasible architectures with respect to
|
||
some important system property or properties. Cost, not surprisingly, usually plays
|
||
a large role in the selection process while other properties, including system safety,
|
||
are usually left as a problem to be addressed later in the development lifecycle.
|
||
Many of the early architectural decisions, however, have a significant and lasting
|
||
impact on safety and may not be reversible after the basic architectural decisions
|
||
have been made. For example, the decision not to include a crew escape system on
|
||
|
||
|
||
the Space Shuttle was an early architectural decision and has been impacting Shuttle
|
||
safety for more than thirty years [74, 136]. After the Challenger accident and again
|
||
after the Columbia loss, the idea resurfaced, but there was no cost-effective way to
|
||
add crew escape at that time.
|
||
The primary reason why safety is rarely factored in during the early architectural
|
||
tradeoff process, except perhaps informally, is that practical methods for analyzing
|
||
safety, that is, hazard analysis methods that can be applied at that time, do not exist.
|
||
But if information about safety were available early, it could be used in the selection
|
||
process and hazards could be eliminated by the selection of appropriate architec-
|
||
tural options or mitigated early when the cost of doing so is much less than later in
|
||
the system lifecycle. Making basic design changes downstream becomes increasingly
|
||
costly and disruptive as development progresses and, often, compromises in safety
|
||
must be accepted that could have been eliminated if safety had been considered in
|
||
the early architectural evaluation process.
|
||
While it is relatively easy to identify hazards at system conception, performing a
|
||
hazard or risk assessment before a design is available is more problematic. At best,
|
||
only a very rough estimate is possible. Risk is usually defined as a combination of
|
||
severity and likelihood. Because these two different qualities (severity and likeli-
|
||
hood) cannot be combined mathematically, they are commonly qualitatively com-
|
||
bined using a risk matrix. Figure 10.4 shows a fairly standard form for such a matrix.
|
||
|
||
|
||
High-level hazards are first identified and, for each identified hazard, a qualitative
|
||
evaluation is performed by classifying the hazard according to its severity and
|
||
likelihood.
|
||
While severity can usually be evaluated using the worst possible consequences
|
||
of that hazard, likelihood is almost always unknown and, arguably, unknowable for
|
||
complex systems before any system design decisions have been made. The problem
|
||
is even worse before a system architecture has been selected. Some probabilistic
|
||
information is usually available about physical events, of course, and historical
|
||
information may theoretically be available. But new systems are usually being
|
||
created because existing systems and designs are not adequate to achieve the system
|
||
goals, and the new systems will probably use new technology and design features
|
||
that limit the accuracy of historical information. For example, historical information
|
||
about the likelihood of propulsion-related losses may not be accurate for new space-
|
||
craft designs using nuclear propulsion. Similarly, historical information about the
|
||
errors air traffic controllers make has no relevance for new air traffic control systems,
|
||
where the type of errors may change dramatically.
|
||
The increasing use of software in most complex systems complicates the situation
|
||
further. Much or even most of the software in the system will be new and have no
|
||
historical usage information. In addition, statistical techniques that assume random-
|
||
ness are not applicable to software design flaws. Software and digital systems also
|
||
introduce new ways for hazards to occur, including new types of component interac-
|
||
tion accidents. Safety is a system property, and, as argued in part I, combining the
|
||
probability of failure of the system components to be used has little or no relation-
|
||
ship to the safety of the system as a whole.
|
||
There are no known or accepted rigorous or scientific ways to obtain probabilistic
|
||
or even subjective likelihood information using historical data or analysis in the case
|
||
of non-random failures and system design errors, including unsafe software behav-
|
||
ior. When forced to come up with such evaluations, engineering judgment is usually
|
||
used, which in most cases amounts to pulling numbers out of the air, often influ-
|
||
enced by political and other nontechnical factors. Selection of a system architecture
|
||
and early architectural trade evaluations on such a basis is questionable and perhaps
|
||
one reason why risk usually does not play a primary role in the early architectural
|
||
trade process.
|
||
Alternatives to the standard risk matrix are possible, but they tend to be applica-
|
||
tion specific and so must be constructed for each new system. For many systems,
|
||
the use of severity alone is often adequate to categorize the hazards in trade studies.
|
||
Two examples of other alternatives are presented here, one created for augmented
|
||
air traffic control technology and the other created and used in the early architec-
|
||
tural trade study of NASA’s Project Constellation, the program to return to the
|
||
moon and later go on to Mars. The reader is encouraged to come up with their own
|
||
|
||
|
||
methods appropriate for their particular application. The examples are not meant
|
||
to be definitive, but simply illustrative of what is possible.
|
||
Example 1: A Human-Intensive System: Air Traffic Control Enhancements
|
||
Enhancements to the air traffic control (ATC) system are unique in that the problem
|
||
is not to create a new or safer system but to maintain the very high level of safety
|
||
built into the current system: The goal is to not degrade safety. The risk likelihood
|
||
estimate can be restated, in this case, as the likelihood that safety will be degraded
|
||
by the proposed changes and new tools. To tackle this problem, we created a set of
|
||
criteria to be used in the evaluation of likelihood. The criteria ranked various high-
|
||
level architectural design features of the proposed set of ATC tools on a variety of
|
||
factors related to risk in these systems. The ranking was qualitative and most criteria
|
||
were ranked as having low, medium, or high impact on the likelihood of safety being
|
||
degraded from the current level. For the majority of factors, “low” meant insignifi-
|
||
cant or no change in safety with respect to that factor in the new versus the current
|
||
system, “medium” denoted the potential for a minor change, and “high” signified
|
||
potential for a significant change in safety. Many of the criteria involve human-
|
||
automation interaction, since ATC is a very human-intensive system and the new
|
||
features being proposed involved primarily new automation to assist human air
|
||
traffic controllers. Here are examples of the likelihood level criteria used:
|
||
1.•Safety margins: Does the new feature have the potential for (1) an insignifi-
|
||
cant or no change to the existing safety margins, (2) a minor change, or (3) a
|
||
significant change.
|
||
2.•Situation awareness: What is the level of change in the potential for reducing
|
||
situation awareness.
|
||
3.•Skills currently used and those necessary to backup and monitor the new deci-
|
||
sion-support tools: Is there an insignificant or no change in the controller
|
||
skills, a minor change, or a significant change.
|
||
4.•Introduction of new failure modes and hazard causes: Do the new tools have
|
||
the same function and failure modes as the system components they are replac-
|
||
ing, are new failure modes and hazards introduced but well understood and
|
||
effective mitigation measures can be designed, or are the new failure modes
|
||
and hazard causes difficult to control.
|
||
5.•Effect of the new software functions on the current system hazard mitigation
|
||
measures: Can the new features render the current safety measures ineffective
|
||
or are they unrelated to current safety features.
|
||
|
||
|
||
6. Need for new system hazard mitigation measures: Will the proposed changes
|
||
require new hazard mitigation measures.
|
||
These criteria and others were converted into a numerical scheme so they could be
|
||
combined and used in an early risk assessment of the changes being contemplated
|
||
and their potential likelihood for introducing significant new risk into the system.
|
||
The criteria were weighted to reflect their relative importance in the risk analysis.
|
||
|
||
footnote. These criteria were developed for a NASA contract by the author and have not been published
|
||
previously.
|
||
|
||
|
||
Example 2: Early Risk Analysis of Manned Space Exploration
|
||
A second example was created by Nicolas Dulac and others as part of an MIT and
|
||
Draper Labs contract with NASA to perform an architectural tradeoff analysis for
|
||
future human space exploration [59]. The system engineers wanted to include safety
|
||
along with the usual factors, such as mass, to evaluate the candidate architectures,
|
||
but once again little information was available at this early stage of system engineer-
|
||
ing. It was not possible to evaluate likelihood using historical information; all of the
|
||
potential architectures involved new technology, new missions, and significant
|
||
amounts of software.
|
||
In the procedure developed to achieve the goal, the hazards were first identified
|
||
as shown in figure 10.5. As is the case at the beginning of any project, identifying
|
||
system hazards involved ten percent creativity and ninety percent experience.
|
||
Hazards were identified for each mission phase by domain experts under the guid-
|
||
ance of the safety experts. Some hazards, such as fire, explosion, or loss of life-
|
||
support span multiple (if not all) mission phases and were grouped as General
|
||
Hazards. The control strategies used to mitigate them, however, may depend on the
|
||
mission phase in which they occur.
|
||
Once the hazards were identified, the severity of each hazard was evaluated by
|
||
considering the worst-case loss associated with the hazard. In the example, the losses
|
||
are evaluated for each of three categories: humans (H), mission (M), and equipment
|
||
(E). Initially, potential damage to the Earth and planet surface environment was
|
||
included in the hazard log. In the end, the environment component was left out of
|
||
the analysis because project managers decided to replace the analysis with manda-
|
||
tory compliance with NASA’s planetary protection standards. A risk analysis can be
|
||
replaced by a customer policy on how the hazards are to be treated. A more com-
|
||
plete example, however, for a different system would normally include environmen-
|
||
tal hazards.
|
||
A severity scale was created to account for the losses associated with each of the
|
||
three categories. The scale used is shown in figure 10.6, but obviously a different
|
||
scale could easily be created to match the specific policies or standard practice in
|
||
different industries and companies.
|
||
As usual, severity was relatively easy to handle but the likelihood of the potential
|
||
hazard occurring was unknowable at this early stage of system engineering. In
|
||
|
||
|
||
addition, space exploration is the polar opposite of the ATC example above as the
|
||
system did not already exist and the architectures and missions would involve things
|
||
never attempted before, which created a need for a different approach to estimating
|
||
likelihood.
|
||
We decided to use the mitigation potential of the hazard in the candidate archi-
|
||
tecture as an estimator of, or surrogate for, likelihood. Hazards that are more easily
|
||
mitigated in the design and operations are less likely to lead to accidents. Similarly,
|
||
hazards that have been eliminated during system design, and thus are not part of
|
||
that candidate architecture or can easily be eliminated in the detailed design process,
|
||
cannot lead to an accident.
|
||
The safety goal of the architectural analysis process was to assist in selecting the
|
||
architecture with the fewest serious hazards and highest mitigation potential for
|
||
those hazards that were not eliminated. Not all hazards will be eliminated even if
|
||
they can be. One reason for not eliminating hazards might be that it would reduce
|
||
the potential for achieving other important system goals or constraints. Obviously,
|
||
safety is not the only consideration in the architecture selection process, but it is
|
||
important enough in this case to be a criterion in the selection process.
|
||
Mitigation potential was chosen as a surrogate for likelihood for two reasons:
|
||
(1) the potential for eliminating or controlling the hazard in the design or operations
|
||
has a direct and important bearing on the likelihood of the hazard occurring
|
||
(whether traditional or new designs and technology are used) and (2) mitigatibility
|
||
of the hazard can be determined before an architecture or design is selected—
|
||
indeed, it assists in the selection process.
|
||
Figure 10.7 shows an example from the hazard log created during the PHA effort.
|
||
The example hazard shown is nuclear reactor overheating. Nuclear power generation
|
||
and use, particularly during planetary surface operations, was considered to be an
|
||
important option in the architectural tradeoffs. The potential accident and its effects
|
||
are described in the hazard log as:
|
||
Nuclear core meltdown would cause loss of power, and possibly radiation exposure.
|
||
Surface operations must abort mission and evacuate. If abort is unsuccessful or unavailable
|
||
at the time, the crew and surface equipment could be lost. There would be no environ-
|
||
mental impact on Earth.
|
||
The hazard is defined as the nuclear reactor operating at temperatures above the
|
||
design limits.
|
||
Although some causal factors can be hypothesized early, a hazard analysis using
|
||
STPA can be used to generate a more complete list of causal factors later in the
|
||
development process to guide the design process after an architecture is chosen.
|
||
Like severity, mitigatibility was evaluated by domain experts under the guidance
|
||
of safety experts. Both the cost of the potential mitigation strategy and its
|
||
|
||
|
||
effectiveness were evaluated. For the nuclear power example, two strategies were
|
||
identified: the first is not to use nuclear power generation at all. The cost of this option
|
||
was evaluated as medium (on a low, medium, high scale). But the mitigation potential
|
||
was rated as high because it eliminates the hazard completely. The mitigation priority
|
||
scale used is shown in figure 10.8. The second mitigation potential identified by
|
||
the engineers was to provide a backup power generation system for surface opera-
|
||
tions. The difficulty and cost was rated high and the mitigation rating was 1, which was
|
||
the lowest possible level, because at best it would only reduce the damage if an acci-
|
||
dent occurred but potential serious losses would still occur. Other mitigation strate-
|
||
gies are also possible but have been omitted from the sample hazard log entry shown.
|
||
None of the effort expended here is wasted. The information included in the
|
||
hazard log about the mitigation strategies will be useful later in the design process
|
||
if the final architecture selected uses surface nuclear power generation. NASA might
|
||
also be able to use the information in future projects and the creation of such early
|
||
risk analysis information might be common to companies or industries and not have
|
||
to be created for each project. As new technologies are introduced to an industry,
|
||
new hazards or mitigation possibilities could be added to the previously stored
|
||
information.
|
||
The final step in the process is to create safety risk metrics for each candidate
|
||
architecture. Because the system engineers on the project created hundreds of fea-
|
||
sible architectures, the evaluation process was automated. The actual details of the
|
||
mathematical procedures used are of limited general interest and are available
|
||
elsewhere [59]. Weighted averages were used to combine mitigation factors and
|
||
severity factors to come up with a final Overall Residual Safety-Risk Metric. This
|
||
metric was then used in the evaluation and ranking of the potential manned space
|
||
exploration architectures.
|
||
By selecting and deselecting options in the architecture description, it was also
|
||
possible to perform a first-order assessment of the relative importance of each
|
||
architectural option in determining the Overall Residual Safety-Risk Metric.
|
||
While hundreds of parameters were considered in the risk analysis, the process
|
||
allowed the identification of major contributors to the hazard mitigation potential
|
||
of selected architectures and thus informed the architecture selection process and
|
||
|
||
|
||
the tradeoff analysis. For example, important contributors to increased safety were
|
||
determined to include the use of heavy module and equipment prepositioning on
|
||
the surface of Mars and the use of minimal rendezvous and docking maneuvers.
|
||
Prepositioning modules allows for pretesting and mitigates the hazards associated
|
||
with loss of life support, equipment damage, and so on. On the other hand, prepo-
|
||
sitioning modules increases the reliance on precision landing to ensure that all
|
||
landed modules are within range of each other. Consequently, using heavy preposi-
|
||
tioning may require additional mitigation strategies and technology development
|
||
to reduce the risk associated with landing in the wrong location. All of this infor-
|
||
mation must be considered in selecting the best architecture. As another example,
|
||
on one hand, a transportation architecture requiring no docking at Mars orbit
|
||
or upon return to Earth inherently mitigates hazards associated with collisions or
|
||
failed rendezvous and docking maneuvers. On the other hand, having the capability
|
||
to dock during an emergency, even though it is not required during nominal opera-
|
||
tions, provides additional mitigation potential for loss of life support, especially in
|
||
Earth orbit.
|
||
Reducing these considerations to a number is clearly not ideal, but with hundreds
|
||
of potential architectures it was necessary in this case in order to pare down the
|
||
choices to a smaller number. More careful tradeoff analysis is then possible on the
|
||
reduced set of choices.
|
||
While mitigatibility is widely applicable as a surrogate for likelihood in many
|
||
types of domains, the actual process used above is just one example of how it might
|
||
be used. Engineers will need to adapt the scales and other features of the process
|
||
to the customary practices in their own industry. Other types of surrogates or ways
|
||
to handle likelihood estimates in early phases of projects are possible beyond the
|
||
two examples provided in this section. While none of these approaches is ideal, they
|
||
are much better than ignoring safety in decision making or selecting likelihood
|
||
estimates based solely on wishful thinking or the politics that often surround the
|
||
preliminary hazard analysis process.
|
||
After a conceptual design is chosen, development begins.
|
||
|
||
section 10.3.5. Documenting Environmental Assumptions.
|
||
An important part of the system development process is to determine and document
|
||
the assumptions under which the system requirements and design features are
|
||
derived and upon which the hazard analysis is based. Assumptions will be identified
|
||
and specified throughout the system engineering process and the engineering speci-
|
||
fications to explain decisions or to record fundamental information upon which the
|
||
design is based. If the assumptions change over time or the system changes and the
|
||
assumptions are no longer true, then the requirements and the safety constraints
|
||
and design features based on those assumptions need to be revisited to ensure safety
|
||
has not been compromised by the change.
|
||
|
||
|
||
Because operational safety depends on the accuracy of the assumptions and
|
||
models underlying the design and hazard analysis processes, the operational system
|
||
should be monitored to ensure that:
|
||
1. The system is constructed, operated, and maintained in the manner assumed
|
||
by the designers.
|
||
2. The models and assumptions used during initial decision making and design
|
||
are correct.
|
||
3. The models and assumptions are not violated by changes in the system, such
|
||
as workarounds or unauthorized changes in procedures, or by changes in the
|
||
environment.
|
||
Operational feedback on trends, incidents, and accidents should trigger reanalysis
|
||
when appropriate. Linking the assumptions throughout the document with the parts
|
||
of the hazard analysis based on that assumption will assist in performing safety
|
||
maintenance activities.
|
||
Several types of assumptions are relevant. One is the assumptions under which
|
||
the system will be used and the environment in which the system will operate. Not
|
||
only will these assumptions play an important role in system development, but they
|
||
also provide part of the basis for creating the operational safety control structure
|
||
and other operational safety controls such as creating feedback loops to ensure the
|
||
assumptions underlying the system design and the safety analyses are not violated
|
||
during operations as the system and its environment change over time.
|
||
While many of the assumptions that originate in the existing environment into
|
||
which the new system will be integrated can be identified at the beginning of devel-
|
||
opment, additional assumptions will be identified as the design process continues
|
||
and new requirements and design decisions and features are identified. In addition,
|
||
assumptions that the emerging system design imposes on the surrounding environ-
|
||
ment will become clear only after detailed decisions are made in the design and
|
||
safety analyses.
|
||
Examples of important environment assumptions for TCAS II are that:
|
||
EA1: High-integrity communications exist between aircraft.
|
||
EA2: The TCAS-equipped aircraft carries a Mode-S air traffic control transponder.
|
||
|
||
|
||
EA3: All aircraft have operating transponders.j
|
||
EA4: All aircraft have legal identification numbers.
|
||
EA5: Altitude information is available from intruding targets with a minimum
|
||
precision of 100 feet.
|
||
EA6: The altimetry system that provides own aircraft pressure altitude to the TCAS
|
||
equipment will satisfy the requirements in RTCA Standard . . .
|
||
EA7: Threat aircraft will not make an abrupt maneuver that thwarts the TCAS
|
||
escape maneuver.
|
||
|
||
|
||
footnote. An aircraft transponder sends information to help air traffic control maintain aircraft separation.
|
||
Primary radar generally provides bearing and range position information, but lacks altitude information.
|
||
Mode A transponders transmit only an identification signal, while Mode C and Mode S transponders
|
||
also report pressure altitude. Mode S is newer and has more capabilities than Mode C, some of which
|
||
are required for the collision avoidance functions in TCAS.
|
||
|
||
|
||
|
||
As noted, these assumptions must be enforced in the overall safety control struc-
|
||
ture. With respect to assumption EA4, for example, identification numbers are
|
||
usually provided by the aviation authorities in each country, and that requirement
|
||
will need to be ensured by international agreement or by some international agency.
|
||
The assumption that aircraft have operating transponders (EA3) may be enforced
|
||
by the airspace rules in a particular country and, again, must be ensured by some
|
||
group. Clearly, these assumptions play an important role in the construction of the
|
||
safety control structure and assignments of responsibilities for the final system. For
|
||
TCAS, some of these assumptions will already be imposed by the existing air trans-
|
||
portation safety control structure while others may need to be added to the respon-
|
||
sibilities of some group(s) in the control structure. The last assumption, EA7, imposes
|
||
constraints on pilots and the air traffic control system.
|
||
Environment requirements and constraints may lead to restrictions on the use of
|
||
the new system (in this case, TCAS) or may indicate the need for system safety and
|
||
other analyses to determine the constraints that must be imposed on the system
|
||
being created (TCAS again) or the larger encompassing system to ensure safety. The
|
||
requirements for the integration of the new subsystem safely into the larger system
|
||
must be determined early. Examples for TCAS include:
|
||
E1: The behavior or interaction of non-TCAS equipment with TCAS must not
|
||
degrade the performance of the TCAS equipment or the performance of the
|
||
equipment with which TCAS interacts.
|
||
E2: Among the aircraft environmental alerts, the hierarchy shall be: Windshear has
|
||
first priority, then the Ground Proximity Warning System (GPWS), then TCAS.
|
||
E3: The TCAS alerts and advisories must be independent of those using the master
|
||
caution and warming system.
|
||
|
||
section 10.3.6. System-Level Requirements Generation.
|
||
Once the goals and hazards have been identified and a conceptual system architec-
|
||
ture has been selected, system-level requirements generation can begin. Usually, in
|
||
|
||
|
||
the early stages of a project, goals are stated in very general terms, as shown in G1
|
||
and G2. One of the first steps in the design process is to refine the goals into test-
|
||
able and achievable high-level requirements (the “shall” statements). Examples of
|
||
high-level functional requirements implementing the goals for TCAS are:
|
||
1.18: TCAS shall provide collision avoidance protection for any two aircraft
|
||
closing horizontally at any rate up to 1200 knots and vertically up to 10,000 feet
|
||
per minute.
|
||
Assumption: This requirement is derived from the assumption that commer-
|
||
cial aircraft can operate up to 600 knots and 5000 fpm during vertical climb
|
||
or controlled descent (and therefore two planes can close horizontally up to
|
||
1200 knots and vertically up to 10,000 fpm).
|
||
1.19.1: TCAS shall operate in enroute and terminal areas with traffic densities up
|
||
to 0.3 aircraft per square nautical miles (i.e., 24 aircraft within 5 nmi).
|
||
Assumption: Traffic density may increase to this level by 1990, and this will
|
||
be the maximum density over the next 20 years.
|
||
As stated earlier, assumptions should continue to be specified when appropriate to
|
||
explain a decision or to record fundamental information on which the design is
|
||
based. Assumptions are an important component of the documentation of design
|
||
rationale and form the basis for safety audits during operations. Consider the above
|
||
requirement labeled 1.18, for example. In the future, if aircraft performance limits
|
||
change or there are proposed changes in airspace management, the origin of the
|
||
specific numbers in the requirement (1,200 and 10,000) can be determined and
|
||
evaluated for their continued relevance. In the absence of the documentation of
|
||
such assumptions and how they impact the detailed design decisions, numbers tend
|
||
to become “gospel,” and everyone is afraid to change them.
|
||
Requirements (and constraints) must also be included for the human operator
|
||
and for the human–computer interface. These requirements will in part be derived
|
||
from the concept of operations, which should in turn include a human task analysis
|
||
[48, 47], to determine how TCAS is expected to be used by pilots (which, again,
|
||
should be checked in safety audits during operations). These analyses use infor-
|
||
mation about the goals of the system, the constraints on how the goals are achieved,
|
||
including safety constraints, how the automation will be used, how humans now
|
||
control the system and work in the system without automation, and the tasks
|
||
humans need to perform and how the automation will support them in performing
|
||
these tasks. The task analysis must also consider workload and its impact on opera-
|
||
tor performance. Note that a low workload may be more dangerous than a high one.
|
||
Requirements on the operator (in this case, the pilot) are used to guide the design
|
||
of the TCAS-pilot interface, the design of the automation logic, flight-crew tasks
|
||
|
||
and procedures, aircraft flight manuals, and training plans and program. Traceability
|
||
links should be provided to show the relationships. Links should also be provided
|
||
to the parts of the hazard analysis from which safety-related requirements are
|
||
derived. Examples of TCAS II operator safety requirements and constraints are:
|
||
OP.4: After the threat is resolved, the pilot shall return promptly and smoothly to
|
||
his/her previously assigned fight path (→ HA-560, ↓3.3).
|
||
OP.9: The pilot must not maneuver on the basis of a Traffic Advisory only (→
|
||
HA-630, ↓2.71.3).
|
||
The requirements and constraints include links to the hazard analysis that produced
|
||
the information and to design documents and decisions to show where the require-
|
||
ments are applied. These two examples have links to the parts of the hazard analysis
|
||
from which they were derived, links to the system design and operator procedures
|
||
where they are enforced, and links to the user manuals (in this case, the pilot
|
||
manuals) to explain why certain activities or behaviors are required.
|
||
The links not only provide traceability from requirements to implementation and
|
||
vice versa to assist in review activities, but they also embed the design rationale
|
||
information into the specification. If changes need to be made to the system, it is
|
||
easy to follow the links and determine why and how particular design decisions
|
||
were made.
|
||
|
||
secton 10.3.7. Identifying High-Level Design and Safety Constraints.
|
||
Design constraints are restrictions on how the system can achieve its purpose. For
|
||
example, TCAS is not allowed to interfere with the ground-level air traffic control
|
||
system while it is trying to maintain adequate separation between aircraft. Avoiding
|
||
interference is not a goal or purpose of TCAS—the best way to achieve the goal is
|
||
not to build the system at all. It is instead a constraint on how the system can achieve
|
||
its purpose, that is, a constraint on the potential system designs. Because of the need
|
||
to evaluate and clarify tradeoffs among alternative designs, separating these two
|
||
types of intent information (goals and design constraints) is important.
|
||
For safety-critical systems, constraints should be further separated into safety-
|
||
related and not safety-related. One nonsafety constraint identified for TCAS, for
|
||
example, was that requirements for new hardware and equipment on the aircraft be
|
||
minimized or the airlines would not be able to afford this new collision avoidance
|
||
system. Examples of nonsafety constraints for TCAS II are:
|
||
C.1: The system must use the transponders routinely carried by aircraft for ground
|
||
ATC purposes (↓2.3, 2.6).
|
||
Rationale: To be acceptable to airlines, TCAS must minimize the amount of
|
||
new hardware needed.
|
||
|
||
C.4: TCAS must comply with all applicable FAA and FCC policies, rules, and
|
||
philosophies (↓2.30, 2.79).
|
||
The physical environment with which TCAS interacts is shown in figure 10.9. The
|
||
constraints imposed by these existing environmental components must also be
|
||
identified before system design can begin.
|
||
Safety-related constraints should have two-way links to the system hazard log and
|
||
to any analysis results that led to that constraint being identified as well as links to
|
||
the design features (usually level 2) included to eliminate or control them. Hazard
|
||
analyses are linked to level 1 requirements and constraints, to design features on
|
||
level 2, and to system limitations (or accepted risks). An example of a level 1 safety
|
||
constraint derived to prevent hazards is:
|
||
SC.3: TCAS must generate advisories that require as little deviation as possible
|
||
from ATC clearances .
|
||
|
||
|
||
The link in SC.3 to 2.30 points to the level 2 system design feature that implements
|
||
this safety constraint. The other links provide traceability to the hazard (H6) from
|
||
which the constraint was derived and to the parts of the hazard analysis involved,
|
||
in this case the part of the hazard analysis labeled HA-550.
|
||
The following is another example of a safety constraint for TCAS II and some
|
||
constraints refined from it, all of which stem from a high-level environmental con-
|
||
straint derived from safety considerations in the encompassing system into which
|
||
TCAS will be integrated. The refinement will occur as safety-related decisions are
|
||
made and guided by an STPA hazard analysis:
|
||
SC.2: TCAS must not interfere with the ground ATC system or other aircraft
|
||
transmissions to the ground ATC system (→ H5).
|
||
SC.2.1: The system design must limit interference with ground-based second-
|
||
ary surveillance radar, distance-measuring equipment channels, and with
|
||
other radio services that operate in the 1030/1090 MHz frequency band
|
||
(↓2.5.1).
|
||
SC.2.1.1: The design of the Mode S waveforms used by TCAS must provide
|
||
compatibility with Modes A and C of the ground-based secondary surveil-
|
||
lance radar system (↓2.6).
|
||
SC.2.1.2: The frequency spectrum of Mode S transmissions must be
|
||
controlled to protect adjacent distance-measuring equipment channels
|
||
(↓2.13).
|
||
SC.2.1.3: The design must ensure electromagnetic compatibility between
|
||
TCAS and [...] [↓21.4).
|
||
SC.2.2: Multiple TCAS units within detection range of one another (approxi-
|
||
mately 30 nmi) must be designed to limit their own transmissions. As the
|
||
number of such TCAS units within this region increases, the interrogation
|
||
rate and power allocation for each of them must decrease in order to prevent
|
||
undesired interference with ATC (↓2.13).
|
||
Assumptions are also associated with safety constraints. As an example of such an
|
||
assumption, consider:
|
||
SC.6: TCAS must not disrupt the pilot and ATC operations during critical
|
||
phases of flight nor disrupt aircraft operation (→ H3, ↓2.2.3, 2.19,
|
||
2.24.2).
|
||
SC.6.1: The pilot of a TCAS-equipped aircraft must have the option to switch
|
||
to the Traffic-Advisory-Only mode where TAs are displayed but display of
|
||
resolution advisories is inhibited (↓ 2.2.3).
|
||
|
||
|
||
Assumption: This feature will be used during final approach to parallel
|
||
runways, when two aircraft are projected to come close to each other and
|
||
TCAS would call for an evasive maneuver (↓ 6.17).
|
||
The specified assumption is critical for evaluating safety during operations. Humans
|
||
tend to change their behavior over time and use automation in different ways than
|
||
originally intended by the designers. Sometimes, these new uses are dangerous. The
|
||
hyperlink at the end of the assumption (↓ 6.17) points to the required auditing
|
||
procedures for safety during operations and to where the procedures for auditing
|
||
this assumption are specified.
|
||
Where do these safety constraints come from? Is the system engineer required
|
||
to simply make them up? While domain knowledge and expertise is always going
|
||
to be required, there are procedures that can be used to guide this process.
|
||
The highest-level safety constraints come directly from the identified hazards for
|
||
the system. For example, TCAS must not cause or contribute to a near miss (H1),
|
||
TCAS must not cause or contribute to a controlled maneuver into the ground (H2),
|
||
and TCAS must not interfere with the ground-based ATC system. STPA can be used
|
||
to refine these high-level design constraints into more detailed design constraints
|
||
as described in chapter 8.
|
||
The first step in STPA is to create the high-level TCAS operational safety control
|
||
structure. For TCAS, this structure is shown in figure 10.10. For simplicity, much of
|
||
the structure above ATC operations management has been omitted and the roles and
|
||
responsibilities have been simplified here. In a real design project, roles and respon-
|
||
sibilities will be augmented and refined as development proceeds, analyses are per-
|
||
formed, and design decisions are made. Early in the system concept formation,
|
||
specific roles may not all have been determined, and more will be added as the design
|
||
concepts are refined. One thing to note is that there are three groups with potential
|
||
responsibilities over the pilot’s response to a potential NMAC: TCAS, the ground
|
||
ATC, and the airline operations center which provides the airline procedures for
|
||
responding to TCAS alerts. Clearly any potential conflicts and coordination prob-
|
||
lems between these three controllers will need to be resolved in the overall air traffic
|
||
management system design. In the case of TCAS, the designers decided that because
|
||
there was no practical way, at that time, to downlink information to the ground con-
|
||
trollers about any TCAS advisories that might have been issued for the crew, the pilot
|
||
was to immediately implement the TCAS advisory and the co-pilot would transmit
|
||
the TCAS alert information by radio to ground ATC. The airline would provide the
|
||
appropriate procedures and training to implement this protocol.
|
||
Part of defining this control structure involves identifying the responsibilities of
|
||
each of the components related to the goal of the system, in this case collision avoid-
|
||
ance. For TCAS, these responsibilities include:
|
||
|
||
|
||
1.•Aircraft Components (e.g., transponders, antennas): Execute control maneu-
|
||
vers, read and send messages to other aircraft, etc.
|
||
2.•TCAS: Receive information about its own and other aircraft, analyze the
|
||
information received and provide the pilot with (1) information about where
|
||
other aircraft in the vicinity are located and (2) an escape maneuver to avoid
|
||
potential NMAC threats.
|
||
3.•Aircraft Components (e.g., transponders, antennas): Execute pilot-generated
|
||
TCAS control maneuvers, read and send messages to and from other aircraft,
|
||
etc.
|
||
4.•Pilot: Maintain separation between own and other aircraft, monitor the TCAS
|
||
displays, and implement TCAS escape maneuvers. The pilot must also follow
|
||
ATC advisories.
|
||
5.•Air Traffic Control: Maintain separation between aircraft in the controlled
|
||
airspace by providing advisories (control actions) for the pilot to follow. TCAS
|
||
is designed to be independent of and a backup for the air traffic controller so
|
||
ATC does not have a direct role in the TCAS safety control structure but clearly
|
||
has an indirect one.
|
||
6.•Airline Operations Management: Provide procedures for using TCAS and
|
||
following TCAS advisories, train pilots, and audit pilot performance.
|
||
7.•ATC Operations Management: Provide procedures, train controllers, audit
|
||
performance of controllers and of the overall collision avoidance system.
|
||
8.•ICAO: Provide worldwide procedures and policies for the use of TCAS and
|
||
provide oversight that each country is implementing them.
|
||
After the general control structure has been defined (or alternative candidate
|
||
control structures identified), the next step is to determine how the controlled
|
||
system (the two aircraft) can get into a hazardous state. That information will be
|
||
used to generate safety constraints for the designers. STAMP assumes that hazard-
|
||
ous states (states that violate the safety constraints) are the result of ineffective
|
||
control. Step 1 of STPA is to identify the potentially inadequate control actions.
|
||
Control actions in TCAS are called resolution advisories or RAs. An RA is an
|
||
aircraft escape maneuver created by TCAS for the pilots to follow. Example reso-
|
||
lution advisories are descend, increase rate of climb to 2500 fmp, and don’t
|
||
descend. Consider the TCAS component of the control structure (see figure 10.10)
|
||
and the NMAC hazard. The four types of control flaws for this example translate
|
||
into:
|
||
1. The aircraft are on a near collision course, and TCAS does not provide an RA
|
||
that avoids it (that is, does not provide an RA, or provides an RA that does
|
||
not avoid the NMAC).
|
||
2. The aircraft are in close proximity and TCAS provides an RA that degrades
|
||
vertical separation (causes an NMAC).
|
||
3. The aircraft are on a near collision course and TCAS provides a maneuver too
|
||
late to avoid an NMAC.
|
||
4. TCAS removes an RA too soon.
|
||
These inadequate control actions can be restated as high-level constraints on the
|
||
behavior of TCAS:
|
||
1. TCAS must provide resolution advisories that avoid near midair collisions.
|
||
2. TCAS must not provide resolution advisories that degrade vertical separation
|
||
between two aircraft (that is, cause an NMAC).
|
||
3. TCAS must provide the resolution advisory while enough time remains for
|
||
the pilot to avoid an NMAC. (A human factors and aerodynamic analysis
|
||
should be performed at this point to determine exactly how much time that
|
||
implies.)
|
||
4. TCAS must not remove the resolution advisory before the NMAC is resolved.
|
||
|
||
|
||
Similarly, for the pilot, the inadequate control actions are:
|
||
1. The pilot does not provide a control action to avoid a near midair collision.
|
||
2. The pilot provides a control action that does not avoid the NMAC.
|
||
3. The pilot provides a control action that causes an NMAC that would not oth-
|
||
erwise have occurred.
|
||
4. The pilot provides a control action that could have avoided the NMAC but it
|
||
was too late.
|
||
5. The pilot starts a control action to avoid an NMAC but stops it too soon.
|
||
Again, these inadequate pilot control actions can be restated as safety constraints
|
||
that can be used to generate pilot procedures. Similar hazardous control actions and
|
||
constraints must be identified for each of the other system components. In addition,
|
||
inadequate control actions must be identified for the other functions provided by
|
||
TCAS (beyond RAs) such as traffic advisories.
|
||
Once the high-level design constraints have been identified, they must be refined
|
||
into more detailed design constraints to guide the system design and then aug-
|
||
mented with new constraints as design decisions are made, creating a seamless
|
||
integrated and iterative process of system design and hazard analysis.
|
||
Refinement of the constraints involves determining how they could be violated.
|
||
The refined constraints will be used to guide attempts to eliminate or control the
|
||
hazards in the system design or, if that is not possible, to prevent or control them
|
||
in the system or component design. This process of scenario development is exactly
|
||
the goal of hazard analysis and STPA. As an example of how the results of the
|
||
analysis are used to refine the high-level safety constraints, consider the second
|
||
high-level TCAS constraint: that TCAS must not provide resolution advisories that
|
||
degrade vertical separation between two aircraft (cause an NMAC):
|
||
SC.7: TCAS must not create near misses (result in a hazardous level of vertical
|
||
separation that would not have occurred had the aircraft not carried TCAS)
|
||
.
|
||
SC.7.1: Crossing Maneuvers must be avoided if possible .
|
||
SC.7.2: The reversal of a displayed advisory must be extremely rare .
|
||
SC.7.3: TCAS must not reverse an advisory if the pilot will have insufficient
|
||
time to respond to the RA before the closest point of approach (four seconds
|
||
|
||
|
||
or less) or if own and intruder aircraft are separated by less than 200 feet
|
||
vertically when ten seconds or less remain to closest point of approach
|
||
.
|
||
Note again that pointers are used to trace these constraints into the design features
|
||
used to implement them.
|
||
|
||
footnote. This requirement is clearly vague and untestable. Unfortunately, I could find no definition of “extremely
|
||
rare” in any of the TCAS documentation to which I had access.
|
||
|
||
|
||
section 10.3.8. System Design and Analysis.
|
||
Once the basic requirements and design constraints have been at least partially
|
||
specified, the system design features that will be used to implement them must be
|
||
created. A strict top-down design process is, of course, not usually feasible. As design
|
||
decisions are made and the system behavior becomes better understood, additions
|
||
and changes will likely be made in the requirements and constraints. The specifica-
|
||
tion of assumptions and the inclusion of traceability links will assist in this process
|
||
and in ensuring that safety is not compromised by later decisions and changes. It is
|
||
surprising how quickly the rationale behind the decisions that were made earlier is
|
||
forgotten.
|
||
Once the system design features are determined, (1) an internal control structure
|
||
for the system itself is constructed along with the interfaces between the com-
|
||
ponents and (2) functional requirements and design constraints, derived from the
|
||
system-level requirements and constraints, are allocated to the individual system
|
||
components.
|
||
System Design
|
||
What has been presented so far in this chapter would appear in level 1 of an intent
|
||
specification. The second level of an intent specification contains System Design
|
||
Principles—the basic system design and scientific and engineering principles needed
|
||
to achieve the behavior specified in the top level, as well as any derived require-
|
||
ments and design features not related to the level 1 requirements.
|
||
While traditional design processes can be used, STAMP and STPA provide the
|
||
potential for safety-driven design. In safety-driven design, the refinement of the
|
||
high-level hazard analysis is intertwined with the refinement of the system design
|
||
to guide the development of the system design and system architecture. STPA can
|
||
be used to generate safe design alternatives or applied to the design alternatives
|
||
generated in some other way to continually evaluate safety as the design progresses
|
||
and to assist in eliminating or controlling hazards in the emerging design, as described
|
||
in chapter 9.
|
||
For TCAS, this level of the intent specification includes such general principles
|
||
as the basic tau concept, which is related to all the high-level alerting goals and
|
||
constraints:
|
||
|
||
|
||
2.2: Each TCAS-equipped aircraft is surrounded by a protected volume of air-
|
||
space. The boundaries of this volume are shaped by the tau and DMOD criteria
|
||
.
|
||
2.2.1: TAU: In collision avoidance, time-to-go to the closest point of approach
|
||
(CPA) is more important than distance-to-go to the CPA. Tau is an approxi-
|
||
mation of the time in seconds to CPA. Tau equals 3600 times the slant range
|
||
in nmi, divided by the closing speed in knots.
|
||
2.2.2: DMOD: If the rate of closure is very low, a target could slip in very
|
||
close without crossing the tau boundaries and triggering an advisory. In order
|
||
to provide added protection against a possible maneuver or speed change by
|
||
either aircraft, the tau boundaries are modified (called DMOD). DMOD
|
||
varies depending on own aircraft’s altitude regime.
|
||
The principles are linked to the related higher-level requirements, constraints,
|
||
assumptions, limitations, and hazard analysis as well as to lower-level system design
|
||
and documentation and to other information at the same level. Assumptions used
|
||
in the formulation of the design principles should also be specified at this level.
|
||
For example, design principle 2.51 (related to safety constraint SC-7.2 shown in
|
||
the previous section) describes how sense reversals are handled:
|
||
2.51: Sense Reversals: (↓ Reversal-Provides-More-Separation) In most encoun-
|
||
ter situations, the resolution advisory will be maintained for the duration of an
|
||
encounter with a threat aircraft . However, under certain circumstances,
|
||
it may be necessary for that sense to be reversed. For example, a conflict between
|
||
two TCAS-equipped aircraft will, with very high probability, result in selection
|
||
of complementary advisory senses because of the coordination protocol between
|
||
the two aircraft. However, if coordination communication between the two air-
|
||
craft is disrupted at a critical time of sense selection, both aircraft may choose
|
||
their advisories independently (↑HA-130). This could possibly result in selec-
|
||
tion of incompatible senses .
|
||
|
||
footnote. The sense is the direction of the advisory, such as descend or climb.
|
||
|
||
2.51.1: . . . information about how incompatibilities are handled.
|
||
Design principle 2.51 describes the conditions under which reversals of TCAS advi-
|
||
sories can result in incompatible senses and lead to the creation of a hazard by
|
||
TCAS. The pointer labeled HA-395 points to the part of the hazard analysis analyz-
|
||
ing that problem. The hazard analysis portion labeled HA-395 would have a com-
|
||
plementary pointer to section 2.51. The design decisions made to handle such
|
||
|
||
|
||
incompatibilities are described in 2.51.1, but that part of the specification is omitted
|
||
here. 2.51 also contains a hyperlink (↓Reversal-Provides-More-Separation) to the
|
||
detailed functional level 3 logic (component black-box requirements specification)
|
||
used to implement the design decision.
|
||
Information about the allocation of these design decisions to individual system
|
||
components and the logic involved is located in level 3, which in turn has links to
|
||
the implementation of the logic in lower levels. If a change has to be made to a
|
||
system component (such as a change to a software module), it is possible to trace
|
||
the function computed by that module upward in the intent specification levels to
|
||
determine whether the module is safety critical and if (and how) the change might
|
||
affect system safety.
|
||
As another example, the TCAS design has a built-in bias against generating
|
||
advisories that would result in the aircraft crossing paths (called altitude crossing
|
||
advisories).
|
||
2.36.2: A bias against altitude crossing RAs is also used in situations involving
|
||
intruder level-offs at least 600 feet above or below the TCAS aircraft .
|
||
In such a situation, an altitude-crossing advisory is deferred if an intruder
|
||
aircraft that is projected to cross own aircraft’s altitude is more than 600 feet
|
||
away vertically .
|
||
Assumption: In most cases, the intruder will begin a level-off maneuver
|
||
when it is more than 600 feet away and so should have a greatly reduced
|
||
vertical rate by the time it is within 200 feet of its altitude clearance (thereby
|
||
either not requiring an RA if it levels off more than ZTHR feet away or
|
||
requiring a non-crossing advisory for level-offs begun after ZTHR is crossed
|
||
but before the 600 foot threshold is reached).
|
||
|
||
footnote. The vertical dimension, called zthr, used to determine whether advisories should be issued varies
|
||
from 750 to 950 feet, depending on the TCAS aircraft’s altitude.
|
||
|
||
Again, the example above includes a pointer down to the part of the black box
|
||
component requirements (functional) specification (Alt_Separation_Test) that
|
||
embodies the design principle. Links could also be provided to detailed mathemati-
|
||
cal analyses used to support and validate the design decisions.
|
||
As another example of using links to embed design rationale in the specification
|
||
and of specifying limitations (defined later) and potential hazardous behavior that
|
||
could not be controlled in the design, consider the following. TCAS II advisories
|
||
may need to be inhibited because of an inadequate climb performance for the par-
|
||
ticular aircraft on which TCAS is installed. The collision avoidance maneuvers
|
||
posted as advisories (called RAs or resolution advisories) by TCAS assume an
|
||
aircraft’s ability to safely achieve them. If it is likely they are beyond the capability
|
||
|
||
|
||
of the aircraft, then TCAS must know beforehand so it can change its strategy and
|
||
issue an alternative advisory. The performance characteristics are provided to TCAS
|
||
through the aircraft interface (via what are called aircraft discretes). In some cases,
|
||
no feasible solutions to the problem could be found. An example design principle
|
||
related to this problem found at level 2 of the TCAS intent specification is:
|
||
2.39: Because of the limited number of inputs to TCAS for aircraft, performance
|
||
inhibits, in some instances where inhibiting RAs would be appropriate it is not
|
||
possible to do so (↑L6). In these cases, TCAS may command maneuvers that
|
||
may significantly reduce stall margins or result in stall warning (↑SC9.1). Con-
|
||
ditions where this may occur include . . . The aircraft flight manual or flight
|
||
manual supplement should provide information concerning this aspect of TCAS
|
||
so that flight crews may take appropriate action (↓ [Pointers to pilot procedures
|
||
on level 3 and Aircraft Flight Manual on level 6).
|
||
Finally, design principles may reflect tradeoffs between higher-level goals and con-
|
||
straints. As examples:
|
||
2.2.3: Tradeoffs must be made between necessary protection (↑1.18) and unnec-
|
||
essary advisories (↑SC.5, SC.6). This is accomplished by controlling the
|
||
sensitivity level, which controls the tau, and therefore the dimensions of the
|
||
protected airspace around each TCAS-equipped aircraft. The greater the
|
||
sensitivity level, the more protection is provided but the higher is the incidence
|
||
of unnecessary alerts. Sensitivity level is determined by . . .
|
||
2.38: The need to inhibit climb RAs because of inadequate aircraft climb perfor-
|
||
mance will increase the likelihood of TCAS II (a) issuing crossing maneuvers,
|
||
which in turn increases the possibility that an RA may be thwarted by the
|
||
intruder maneuvering (↑SC7.1, HA-115), (b) causing an increase in descend
|
||
RAs at low altitude (↑SC8.1), and (c) providing no RAs if below the descend
|
||
inhibit level (1200 feet above ground level on takeoff and 1000 feet above
|
||
ground level on approach).
|
||
Architectural Design, Functional Allocation, and Component Implementation
|
||
(Level 3)
|
||
Once the general system design concepts are agreed upon, the next step usually
|
||
involves developing the design architecture and allocating behavioral requirements
|
||
and constraints to the subsystems and components. Once again, two-way tracing
|
||
should exist between the component requirements and the system design principles
|
||
and requirements. These links will be available to the subsystem developers to be
|
||
used in their implementation and development activities and in verification (testing
|
||
and reviews). Finally, during field testing and operations, the links and recorded
|
||
assumptions and design rationale can be used in safety change analysis, incident and
|
||
|
||
|
||
accident analysis, periodic audits, and performance monitoring as required to ensure
|
||
that the operational system is and remains safe.
|
||
Level 3 of an intent specification contains the system architecture, that is, the
|
||
allocation of functions to components and the designed communication paths
|
||
among those components (including human operators). At this point, a black-box
|
||
functional requirements specification language becomes useful, particularly a formal
|
||
language that is executable. SpecTRM-RL is used as the example specification
|
||
language in this section [85, 86]). An early version of the language was developed
|
||
in 1990 to specify the requirements for TCAS II and has been refined and improved
|
||
since that time. SpecTRM-RL is part of a larger specification management system
|
||
called SpecTRM (Specification Tools and Requirements Methodology). Other
|
||
languages, of course, can be used.
|
||
One of the first steps in low-level architectural design is to break the system into
|
||
a set of components. For TCAS, only three components were used: surveillance,
|
||
collision avoidance, and performance monitoring.
|
||
The environment description at level 3 includes the assumed behavior of the
|
||
external components (such as the altimeters and transponders for TCAS), including
|
||
perhaps failure behavior, upon which the correctness of the system design is pre-
|
||
dicated, along with a description of the interfaces between the TCAS system
|
||
and its environment. Figure 10.11 shows part of a SpecTRM-RL description of an
|
||
environment component, in this case an altimeter.
|
||
|
||
|
||
enient for the purposes of the specifier. In this example, the environment includes
|
||
any component that was already on the aircraft or in the airspace control system
|
||
and was not newly designed or built as part of the TCAS effort.
|
||
All communications between the system and external components need to be
|
||
described in detail, including the designed interfaces. The black-box behavior of
|
||
each component also needs to be specified. This specification serves as the func-
|
||
tional requirements for the components. What is included in the component speci-
|
||
fication will depend on whether the component is part of the environment or part
|
||
of the system being constructed. Figure 10.12 shows part of the SpecTRM-RL
|
||
description of the behavior of the CAS (collision avoidance system) subcomponent.
|
||
SpecTRM-RL specifications are intended to be both easily readable with minimum
|
||
instruction and formally analyzable. They are also executable and can be used in a
|
||
|
||
|
||
system simulation environment. Readability was a primary goal in the design of
|
||
SpecTRM-RL, as was completeness with regard to safety. Most of the requirements
|
||
completeness criteria described in Safeware and rewritten as functional design prin-
|
||
ciples in chapter 9 of this book are included in the syntax of the language to assist
|
||
in system safety reviews of the requirements.
|
||
SpecTRM-RL explicitly shows the process model used by the controller and
|
||
describes the required behavior in terms of this model. A state machine model is used
|
||
to describe the system component’s process model, in this case the state of the air-
|
||
craft and the air space around it, and the ways the process model can change state.
|
||
Logical behavior is specified in SpecTRM-RL using and/or tables. Figure 10.12
|
||
shows a small part of the specification of the TCAS collision avoidance logic. For
|
||
TCAS, an important state variable is the status of the other aircraft around the
|
||
TCAS aircraft, called intruders. Intruders are classified into four groups: Other
|
||
Traffic, Potential Threat, and Threat. The figure shows the logic for classifying an
|
||
intruder as Other Traffic using an and/or table. The information in the tables can
|
||
be visualized in additional ways.
|
||
The rows of the table represent and relationships, while the columns represent
|
||
or. The state variable takes the specified value (in this case, Other Traffic) if any of
|
||
the columns evaluate to true. A column evaluates to true if all the rows have the
|
||
value specified for that row in the column. A dot in the table indicates that the value
|
||
for the row is irrelevant. Underlined variables represent hyperlinks. For example,
|
||
clicking on “Alt Reporting” would show how the Alt Reporting variable is defined:
|
||
In our TCAS intent specification7 [121], the altitude report for an aircraft is defined
|
||
as Lost if no valid altitude report has been received in the past six seconds. Bearing
|
||
Valid, Range Valid, Proximate Traffic Condition, and Proximate Threat Condition
|
||
are macros, which simply means that they are defined using separate logic tables.
|
||
The additional logic for the macros could have been inserted here, but sometimes
|
||
the logic gets very complex and it is easier for specifiers and reviewers if, in those
|
||
cases, the tables are broken up into smaller pieces (a form of refinement abstrac-
|
||
tion). This decision is, of course, up to the creator of the table.
|
||
The behavioral descriptions at this level are purely black-box: They describe the
|
||
inputs and outputs of each component and their relationships only in terms of
|
||
externally visible behavior. Essentially it represents the transfer function across the
|
||
component. Any of these components (except the humans, of course) could be
|
||
implemented either in hardware or software. Some of the TCAS surveillance
|
||
|
||
functions are, in fact, implemented using analog devices by some vendors and digital
|
||
by others. Decisions about physical implementation, software design, internal vari-
|
||
ables, and so on are limited to levels of the specification below this one. Thus, this
|
||
level serves as a rugged interface between the system designers and the component
|
||
designers and implementers (including subcontractors).
|
||
Software need not be treated any differently than the other parts of the system.
|
||
Most safety-related software problems stem from requirements flaws. The system
|
||
requirements and system hazard analysis should be used to determine the behav-
|
||
ioral safety constraints that must be enforced on software behavior and that the
|
||
software must enforce on the controlled system. Once that is accomplished, those
|
||
requirements and constraints are passed to the software developers (through the
|
||
black-box requirements specifications), and they use them to generate and validate
|
||
their designs just as the hardware developers do.
|
||
Other information at this level might include flight crew requirements such as
|
||
description of tasks and operational procedures, interface requirements, and the
|
||
testing requirements for the functionality described on this level. If the black-box
|
||
requirements specification is executable, system testing can be performed early to
|
||
validate requirements using system and environment simulators or hardware-in-
|
||
the-loop simulation. Including a visual operator task-modeling language permits
|
||
integrated simulation and analysis of the entire system, including human–computer
|
||
interactions [15, 177].
|
||
Models at this level are reusable, and we have found that these models provide the
|
||
best place to provide component reuse and build component libraries [119]. Reuse
|
||
of application software at the code level has been problematic at best, contributing
|
||
to a surprising number of accidents [116]. Level 3 black-box behavioral specifications
|
||
provide a way to make the changes almost always necessary to reuse software in a
|
||
format that is both reviewable and verifiable. In addition, the black-box models can
|
||
be used to maintain the system and to specify and validate changes before they are
|
||
made in the various manufacturers’ products. Once the changed level 3 specifications
|
||
have been validated, the links to the modules implementing the modeled behavior
|
||
can be used to determine which modules need to be changed and how. Libraries of
|
||
component models can also be developed and used in a plug-and-play fashion,
|
||
making changes as required, in order to develop product families [211].
|
||
The rest of the development process, involving the implementation of the com-
|
||
ponent requirements and constraints and documented at levels 4 and 5 of intent
|
||
specifications, is straightforward and differs little from what is normally done today.
|
||
|
||
|
||
|
||
footnote. A SpecTRM-RL model of TCAS was created by the author and her students Jon Reese, Mats Heim-
|
||
dahl, and Holly Hildreth to assist in the certification of TCAS II. Later, as an experiment to show the
|
||
feasibility of creating intent specifications, the author created the level 1 and level 2 intent specification
|
||
for TCAS. Jon Reese rewrote the level 3 collision avoidance system logic from the early version of the
|
||
language into SpecTRM-RL.
|
||
|
||
|
||
|
||
section 10.3.9. Documenting System Limitations.
|
||
When the system is completed, the system limitations need to be identified and
|
||
documented. Some of the identification will, of course, be done throughout the
|
||
|
||
development. This information is used by management and stakeholders to deter-
|
||
mine whether the system is adequately safe to use, along with information about
|
||
each of the identified hazards and how they were handled.
|
||
Limitations should be included in level 1 of the intent specification, because they
|
||
properly belong in the customer view of the system and will affect both acceptance
|
||
and certification.
|
||
Some limitations may be related to the basic functional requirements, such as
|
||
these:
|
||
L4: TCAS does not currently indicate horizontal escape maneuvers and therefore
|
||
does not (and is not intended to) increase horizontal separation.
|
||
Limitations may also relate to environment assumptions. For example:
|
||
L1: TCAS provides no protection against aircraft without transponders or with
|
||
nonoperational transponders (→EA3, HA-430).
|
||
L6: Aircraft, performance limitations constrain the magnitude of the escape
|
||
maneuver that the flight crew can safely execute in response to a resolution
|
||
advisory. It is possible for these limitations to preclude a successful resolution
|
||
of the conflict (→H3, ↓2.38, 2.39).
|
||
L4: TCAS is dependent on the accuracy of the threat aircraft’s reported altitude.
|
||
Separation assurance may be degraded by errors in intruder pressure altitude
|
||
as reported by the transponder of the intruder aircraft (→EA5).
|
||
Assumption: This limitation holds for the airspace existing at the time of the
|
||
initial TCAS deployment, where many aircraft use pressure altimeters rather
|
||
than GPS. As more aircraft install GPS systems with greater accuracy than
|
||
current pressure altimeters, this limitation will be reduced or eliminated.
|
||
Limitations are often associated with hazards or hazard causal factors that could
|
||
not be completely eliminated or controlled in the design. Thus they represent
|
||
accepted risks. For example,
|
||
L3: TCAS will not issue an advisory if it is turned on or enabled to issue resolution
|
||
advisories in the middle of a conflict (→ HA-405).
|
||
L5: If only one of two aircraft is TCAS equipped while the other has only ATCRBS
|
||
altitude-reporting capability, the assurance of safe separation may be reduced
|
||
(→ HA-290).
|
||
In the specification, both of these system limitations would have pointers to the
|
||
relevant parts of the hazard analysis along with an explanation of why they could
|
||
not be eliminated or adequately controlled in the system design. Decisions about
|
||
deployment and certification of the system will need to be based partially on these
|
||
|
||
|
||
limitations and their impact on the safety analysis and safety assumptions of the
|
||
encompassing system, which, in the case of TCAS, is the overall air traffic system.
|
||
A final type of limitation is related to problems encountered or tradeoffs made
|
||
during system design. For example, TCAS has a high-level performance-monitoring
|
||
requirement that led to the inclusion of a self-test function in the system design to
|
||
determine whether TCAS is operating correctly. The following system limitation
|
||
relates to this self-test facility:
|
||
L9: Use by the pilot of the self-test function in flight will inhibit TCAS operation
|
||
for up to 20 seconds depending upon the number of targets being tracked. The
|
||
ATC transponder will not function during some portion of the self-test sequence
|
||
(↓6.52).
|
||
These limitations should be linked to the relevant parts of the development and,
|
||
most important, operational specifications. For example, L9 may be linked to the
|
||
pilot operations manual.
|
||
|
||
section 10.3.10. System Certification, Maintenance, and Evolution.
|
||
At this point in development, the safety requirements and constraints are docu-
|
||
mented and traced to the design features used to implement them. A hazard log
|
||
contains the hazard information (or links to it) generated during the development
|
||
process and the results of the hazard analysis performed. The log will contain
|
||
embedded links to the resolution of each hazard, such as functional requirements,
|
||
design constraints, system design features, operational procedures, and system limi-
|
||
tations. The information documented should be easy to collect into a form that can
|
||
be used for the final safety assessment and certification of the system.
|
||
Whenever changes are made in safety-critical systems or software (during devel-
|
||
opment or during maintenance and evolution), the safety of the change needs to be
|
||
reevaluated. This process can be difficult and expensive if it has to start from scratch
|
||
each time. By providing links throughout the specification, it should be easy to assess
|
||
whether a particular design decision or piece of code was based on the original
|
||
safety analysis or safety-related design constraint and only that part of the safety
|
||
analysis process repeated or reevaluated. |