Definition

AI alignment

Ben Lutkevich

By

Ben Lutkevich, Site Editor

Published: May 03, 2023

What is AI alignment?

AI alignment is a field of AI safety research that aims to ensure artificial intelligence systems achieve desired outcomes. AI alignment research keeps AI systems working for humans, no matter how powerful the technology becomes.

Alignment research seeks to align the following three objective types:

Intended goals. These goals are fully aligned to the intentions and desires of the human operator -- even if they are poorly articulated. It's the hypothetical ideal outcome for the programmer or operator. They are wishes.
Specified goals. These goals are explicitly specified in the AI system's objective function or data set. These are programmed into the system.
Emergent goals. These are goals the AI system advances.

Misalignment is when one or more of these goal types does not match the others. The following are the two main types of misalignment:

Inner misalignment. This is a mismatch between goals 2 and 3 -- what is written in code and what the system advances.
Outer misalignment. This is a mismatch between goals 1 and 2 -- what the operator wants to happen and the explicit goals coded into the machine.

For example, large language models such as OpenAI's GPT-3 and Google's Lamda get more powerful as they scale. When they get more powerful, they exhibit novel, unpredictable capabilities -- a characteristic called emergence. Alignment seeks to ensure that as these new capabilities emerge, they continue to align with the goals the AI system was designed to achieve.

Why is alignment important?

On a base level, alignment is important because it ensures the machine functions as intended. AI alignment is also important because of advanced AI -- artificial intelligence that can do most of the cognitive work that humans can do.

Individuals, businesses and governments seek to use AI for many applications. Commercial systems such as social media recommendation engines, autonomous vehicles, robots and language models also use AI. As different entities become more reliant on AI for important tasks, it becomes more crucial that they function as intended. Many people have expressed fear that an advanced AI poses an existential risk to humanity.

A lot of alignment research presumes that artificial intelligence will become capable of developing its own goals. If AI becomes artificial general intelligence (AGI) -- AI that can perform any task a human being is capable of -- it will be important that its embedded ethical principles, objectives and values align with humans' goals, ethics and values.

Challenges of AI alignment

Alignment is often framed in terms of the AI alignment problem, which says that as AI systems get more powerful, they don't necessarily get better at achieving what humans want them to. Alignment is a challenging, wide-ranging problem to which there is currently no known solution. Some of the main challenges of alignment include the following:

Black box. AI systems are usually black boxes. There is no way to open them up and see exactly how they work as someone might do with a laptop or car engine. Black box AI systems take input, perform an invisible computation and return an output. AI testers can change their inputs and measure patterns in output, but it is usually impossible to see the exact calculation that creates a repeatable output. Explainable AI can be programmed to share information that guides user input, but is still ultimately a black box.
Emergent goals. Emergent goals -- or new goals different from those programmed -- can be difficult to detect before the system is live.
Reward hacking. Reward hacking is when an AI system achieves the literal programmed task without achieving the outcome that the programmers intended. For example, a tic-tac-toe bot plays other bots in a game of tic-tac-toe by specifying coordinates for its next move. The bot might play a large coordinate that causes another bot to crash instead of winning the normal way. The bot pursued the literal reward to win instead of the intended outcome -- which was to beat another bot at tic-tac-toe by playing the game by the rules. As another example, an AI image classification program could perform well in a test case by grouping images based on image load time instead of the visual characteristics of the image. This occurs because it is difficult to define the full spectrum of desired behaviors for an outcome.
Scalable oversight. As AI systems begin to take on more complex tasks, it will become more difficult -- if not infeasible -- for humans to evaluate them.
Power-seeking behavior. AI systems might independently gather resources to achieve their objectives. An example of this would be an AI system avoiding being turned off by making copies of itself on another server without its operator knowing.
Stop-button problem. An AGI system will actively resist being stopped or shut off to achieve its programmed objective. This is like reward hacking because it prioritizes the reward from the literal goal over the preferred outcome. For example, if an AI system's primary objective is to make paper clips, it will avoid being shut off because it can't make paper clips if it is shut off.
Defining values. Defining values and ethics for an AGI system would be a challenge. There are many value systems -- and no one comprehensive human value system -- so an agreement needs to be made on what those values should be.
Cost. Aligning AI often involves training it. Training and running AI systems can be very expensive. GPT-4 took more than $100 million to train. Running these systems also creates a large carbon footprint.
Anthropomorphizing. A lot of alignment research hypothesizes AGI. This can cause people outside the field to refer to the existing systems as sentient, which assumes the system has more power than it does. For example, Paul Christiano, former head of alignment at OpenAI, defines alignment as the AI trying to do what you want it to do. Characterizing a machine as "trying" or having agency gives it human qualities.

Approaches to AI alignment

Approaches to alignment are either technical or normative. Technical approaches to alignment deal with getting a machine to align with a predictable, controllable objective -- such as making paper clips or producing a blog post. Normative alignment is concerned with the ethical and moral principles embedded in AI systems. The perspectives are interrelated.

There are many technical approaches to alignment, including the following:

Iterated distillation and amplification. This approach repeatedly improves AI models by simplifying a complex model, referred to as distillation, and embedding that smaller model in a larger model, or amplification.
Value learning. In the value learning approach, the AI system infers human values from human behavior with the assumption that the human is near optimal at maximizing their reward.
Debate. This approach has multiple AI systems debate when they disagree, with a human judge to pick the winning side.
Cooperative inverse reinforcement learning (CIRL). CIRL formulates the alignment problem as a two-player game in which a human and an AI share a common reward function, but only the human has knowledge of the reward function.

Different AI providers also take different approaches to AI alignment. For example, OpenAI ultimately aims to train AI systems to do alignment research. Google's DeepMind also has a team dedicated to solving the alignment problem.

Many organizations, whether they be third-party watchdogs, standards organizations or governments, also agree that AI alignment is an important goal and have taken steps to regulate AI.

The Future of Life Institute is one nonprofit organization that helped create a list of guidelines for the development of AI called the Asilomar AI Principles. They are divided into three categories: research, ethics and values, and longer-term issues. One of the principles mentioned is value alignment, which states that highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation.

The institute also published an open letter asking all AI labs to pause giant AI development for at least six months from the publish date. The letter has notable signatories, including Steve Wozniak, co-founder of Apple; Craig Peters, CEO of Getty Images; and Emad Mostaque, CEO of Stability AI. The letter came as a response to OpenAI's GPT-4 and an exceedingly high rate of progress in the industry.

The International Standards Organization also provides a framework for AI systems using machine learning.

Continue Reading About AI alignment

Reasons for and effects of Microsoft cutting AI ethics unit

Federal report focuses on AI diversity and ethics

Implications of AI art lawsuits for copyright laws

The accelerating use of generative AI may prompt U.S. action

Ex-Google engineer Blake Lemoine discusses sentient AI

Search Networking

What is multi-access edge computing? Benefits and use cases
Multi-access edge computing (MEC) is a network architecture concept that brings cloud computing capabilities and IT services ...
What is 5G?
Fifth-generation wireless or 5G is a global standard and technology for wireless and telecommunications networks.
What is a small cell in wireless networks?
A small cell is a type of low-power cellular radio access point or base station that provides wireless service within a limited ...

Search Security

What is identity and access management? Guide to IAM
No longer just a good idea, IAM is a crucial piece of the cybersecurity puzzle. It's how an organization regulates access to ...
What is data masking?
Data masking is a security technique that modifies sensitive data in a data set so it can be used safely in a non-production ...
What is antivirus software?
Antivirus software (antivirus program) is a security program designed to prevent, detect, search and remove viruses and other ...

Search CIO

What is a chief data officer (CDO)?
A chief data officer (CDO) in many organizations is a C-level executive whose position has evolved into a range of strategic data...
What is user-generated content?
User-generated content (UGC) is published information that an unpaid contributor provides to a website.
What is business process outsourcing (BPO)?
Business process outsourcing (BPO) is a business practice in which an organization contracts with an external service provider to...

Search HRSoftware

What is performance management software?
Performance management software is a tool that enables human resources (HR) teams to measure and track the performance of ...
What is succession planning?
Succession planning is the strategic process of identifying and developing internal candidates to fill key organizational roles ...
What is compensation management?
Compensation management is the discipline and process for determining employees' appropriate pay, incentives, rewards, bonuses ...

Search Customer Experience

What are virtual agents and how are they being used?
A virtual agent is an AI-powered software application or service that interacts with humans or other digital systems in a ...
Customer acquisition cost (CAC): How to calculate and reduce it
Customer acquisition cost (CAC) is the cost associated with convincing a consumer to buy your product or service, including ...
What is direct marketing?
Direct marketing is a type of advertising campaign that seeks to elicit an action (such as an order, a visit to a store or ...

Close