AI Safety

Below is a sampling of some high-level areas covered by our work products. This list is not a complete list of what we will address in the CLAIS Knowledge Graph, nor is it a comprehensive list of topics in the field of AI safety.

AI Safety Foundations

Foundational ideas, characteristics, and problems related to agency and AI safety need special consideration from a theoretical perspective. This includes concepts such as uncertainty, generality, self-modeling, intrinsic motivation, goal stability, and intent alignment, as well as characteristics such autonomy levels, safety criticality, and types of human-machine, machine-machine, and environment-machine interaction.

Specification and Modelling

It is necessary to describe from different technical perspectives and abstraction levels the needs, intents, designs, and understood operating dynamics of safety-critical systems. This includes the specification and modelling of risk management properties (e.g., hazards, failures modes, mitigation measures), as well as safety-related requirements, training, behavior, and quality metrics in AI-based systems.

Verification and Validation

This category concerns design-time approaches to ensure that an AI-based system meets its requirements, i.e. verification, and behaves as expected and intended, i.e. validation. The range of techniques covers any formal, mathematical, or model-based simulations or testing approaches that provide evidence that an AI-based system satisfies its defined (safety) requirements and doesn't deviate from its intended behavior or causes unintended (negative) consequences.

Runtime Monitoring and Enforcement

The increasing autonomy and learning nature of AI-based systems can pose a challenge for their verification and validation (V&V) due to limits on collecting an epistemologically sufficient amount of evidence to ensure correctness. Runtime monitoring is useful to cover the gaps of design-time V&V by observing the internal states of a given system and its interactions with external entities, with the aim of determining system behavior correctness or predicting potential risks. Enforcement deals with runtime mechanisms to self-adapt, optimize or reconfigure system behavior with the aim of supporting fallback to a safe system state from any anomalous state.

Human-Machine Interaction

As autonomous systems progressively substitute for cognitive tasks once performed by humans, some kinds of human-machine interaction issues become more critical, such as the loss of situational awareness and overconfidence. Other relevant topics include: collaborative missions that need unambiguous communication to manage initiative to start or transfer tasks; safety-critical situations in which earning and maintaining trust is essential at operational phases; and cooperative human-machine decision tasks where understanding machine decisions is crucial to validating safe autonomous actions.

Process Assurance and Certification

Process Assurance is the set of planned and systematic activities which assure that a system's lifecycle processes conform to its requirements (including safety) and quality procedures. In our context, it covers the management of the different phases of AI-based systems (including training and operational phases), the traceability of data and artefacts, and best practices for related human activities. Certification implies a recognition that a system or process complies with industry standards, regulations, or some auditing checklist to ensure it delivers its intended functions safely. Certification is sometimes challenged by the opacity of AI-based systems and the inability to a priori ensure functional safety under uncertain or exceptional situations.

Safety-related Ethics, Security, and Privacy

While these are quite large fields, in scope are their intersection and dependencies with safety issues. Ethics becomes increasingly important for trustworthy systems as autonomy, replete with learning and adaptive abilities, involves the transfer of safety risks, responsibility, and liability. AI-specific security and privacy issues must be considered with regard to their impact on safety-- for example, malicious adversarial attacks can be studied with a focus on situations that can compromise systems towards dangerous situations.