Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Definition

What is chain-of-thought prompting (CoT)? Examples and benefits

By

Cameron Hashemi-Pour, Former Site Editor
Lev Craig

Published: Jan 13, 2025

Chain-of-thought (CoT) prompting is a prompt engineering technique that aims to improve language models' performance on tasks requiring logic, calculation and decision-making by structuring the input prompt in a way that mimics human reasoning.

To construct a CoT prompt, a user typically appends an instruction such as "Describe your reasoning in steps" or "Explain your answer step by step" to the end of their query to a large language model (LLM). In essence, this prompting technique asks the LLM to not only generate a result, but also detail the series of intermediate steps that led to that answer.

Guiding the model to articulate these intermediate steps has shown promising results. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" is a seminal paper by the Google Brain -- now DeepMind -- research team, presented at the 2022 NeurIPS conference. The researchers found that CoT prompting outperformed standard prompting techniques on a range of arithmetic, commonsense and symbolic reasoning benchmarks.

How does CoT prompting work?

CoT prompting takes advantage of LLMs' capabilities, such as a sophisticated ability to generate fluent language. It does this to simulate techniques from human cognitive processing, such as planning and sequential reasoning.

This article is part of

What is GenAI? Generative AI explained

Which also includes:
8 top generative AI tool categories for 2025
Will AI replace jobs? 18 job types that might be affected
27 of the best large language models in 2025

When people are confronted with a challenging problem, they often break it down into smaller, more manageable pieces. For example, solving a complex math equation typically involves several substeps, each of which is essential to arriving at the final correct answer. CoT prompting asks an LLM to mimic this process of decomposing a problem and working through it step by step -- essentially, asking the model to "think out loud," rather than simply providing a solution.

The screenshot below shows an example of how chain-of-thought prompting works. The user presents OpenAI's ChatGPT with a classic river-crossing logic puzzle, adding the phrase "Describe your reasoning step by step" at the end of the prompt. When the chatbot responds, it sequentially works through the problem, describing each crossing leading up to the final answer.

Screenshot of a GPT-4 response to a chain-of-thought prompt. — GPT-4 provides a step-by-step solution to a logic puzzle in response to a chain-of-thought prompt.

The following are other examples of CoT prompts:

"John has one pizza, cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left? Explain your reasoning step by step."
"Alice left a glass of water outside overnight when the temperature was below freezing. The next morning, she found the glass cracked. Explain step by step why the glass cracked."
"If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly? Explain your reasoning in steps."
"A classroom has two blue chairs for every three red chairs. If there are a total of 30 chairs in the classroom, how many blue chairs are there? Describe your reasoning step by step."

Different approaches to CoT prompting

CoT prompting has multiple variants, each of which uses a different approach to getting LLMs to explain their outputs:

Auto-CoT. In automatic CoT, the user crafts a few examples of inputs and desired outputs for an LLM to learn, including the intermediate steps taken to achieve those outputs. The LLM then learns from these examples and automatically applies the same reasoning to future interactions with the user.
Multimodal CoT. LLMs that are capable of processing inputs besides text -- such as audio, image and video -- are multimodal AI. An example of multimodal CoT would be asking an LLM to examine images when explaining and justifying outputs.
Zero-shot CoT. With this approach, the user doesn't provide an LLM with any examples for it to reference, instead asking it to "show its work" and explain how it achieved its output. This process is efficient, but not as effective for complex inputs; zero-shot chain of thought is best suited for simpler problems.
Least-to-most CoT. With this approach, a user breaks a large problem into smaller subproblems and sends each one to the LLM sequentially. The LLM can then solve each subsequent subproblem more easily using the answers to previous subproblems for reference.

Few-shot prompting and traditional or standard prompting are similar to CoT, but aren't considered CoT. Standard prompting doesn't require LLMs to provide complex reasoning and justify their outputs; producing an output is all that matters in the standard approach. Few-shot prompting means a user provides examples of desired outputs to similar inputs -- such as answers to similar math problems -- to help guide the LLM. However, it doesn't classify as a CoT approach.

Advantages of CoT prompting

CoT prompting offers several advantages:

Better responses. LLMs can only take in a limited amount of information at one time. Breaking down complex problems into simpler subtasks helps mitigate this issue. It lets LLMs process those smaller components individually, leading to more accurate and precise model responses.
Expanded knowledge base. CoT prompting takes advantage of LLMs' extensive pool of general knowledge. LLMs are exposed to a wide array of explanations, definitions and problem-solving examples during their training on vast textual data sets, encompassing books, articles and much of the open internet. CoT prompts tap into this reservoir of stored knowledge by triggering the model to call on and apply relevant information.
Logical reasoning. The technique directly targets a common limitation of LLMs: difficulty with logical reasoning. Although LLMs excel at generating coherent, relevant text, they weren't primarily designed to provide information or solve problems. Consequently, they often struggle with complex reasoning tasks and logic, especially for more complex problems. CoT prompting addresses this issue by guiding the model to take a structured reasoning approach. It directs the model to construct a logical pathway from the original prompt or problem statement to the final answer, reducing the likelihood of logical missteps and oversights.
Debugging. CoT prompting assists with model debugging and improvement by providing transparency in the process by which a model arrives at its answer. Because the prompts ask the model to explicitly delineate a reasoning process, they give model testers and developers better insight into how the model reached a particular conclusion. This, in turn, makes it easier to identify and correct errors when refining the model.
Fine-tuning. Developers can combine CoT prompting with fine-tuning to enhance LLM reasoning capabilities. For example, fine-tuning a model on a training data set containing curated examples of step-by-step reasoning and logical deduction can improve the effectiveness of CoT prompting.

Limitations of CoT prompting

Importantly, as the Google research team highlighted in its paper, the semblance of reasoning that CoT prompts elicit from LLMs doesn't mean the model is thinking. It's essential to remember that the model is a deep learning neural network trained to predict text sequences based on probability. There's no evidence to suggest that LLMs are capable of reasoning as people do. This distinction is crucial for users to understand the limitations of LLMs and maintain realistic expectations about their capabilities.

LLMs lack consciousness and metacognition, and their general knowledge derives solely from their training data -- reflecting that data set's errors, gaps and biases. Although an LLM can accurately mimic the structure of logical reasoning, this doesn't mean its conclusions are always accurate. CoT prompts serve as a valuable organizing mechanism for LLM output, but an LLM could nevertheless present a coherent, well-structured output that contains logical errors and oversights.

Techniques such as retrieval-augmented generation show promise for mitigating this limitation. RAG lets an LLM access an external source -- such as a vetted database or the internet -- in real time when asked to deliver information. In this way, RAG eliminates the need for the LLM to rely solely on the internal knowledge base gleaned from its training data, which might be flawed or incomplete.

However, while RAG can improve the accuracy and timeliness of an LLM's outputs, it doesn't inherently address the problem of logical reasoning. Deduction and reasoning require more than just factual recall; they also involve the ability to derive conclusions through logic and analysis. These are aspects of AI performance that are more closely related to the algorithmic architecture and training of the LLM itself.

Also, the scalability of CoT prompting remains in question. Although the underlying principle of sequential, multistep reasoning is applicable to AI and machine learning, CoT prompting is limited to LLMs because of their sophisticated performance on language tasks.

LLMs' large size requires significant data, compute and infrastructure, which raises issues around accessibility, efficiency and sustainability. In response to this problem, AI researchers have developed small language models, which -- while less powerful than LLMs -- perform competitively on various language tasks and require fewer computational resources. However, it remains to be seen whether the benefits of CoT prompting are transferable to smaller models, as reducing their capabilities risks compromising their problem-solving effectiveness.

It's important to keep in mind that CoT prompting is a technique for using an existing model more effectively, not a training method. While these prompts can help users elicit better results from pretrained LLMs, prompt engineering isn't a cure-all and can't fix model limitations that should have been handled during the training stage.

CoT prompting vs. prompt chaining

Chain-of-thought prompting and prompt chaining sound similar and are both prompt engineering techniques, but they differ in some important ways.

CoT prompting asks the model to describe the intermediate reasoning steps used to reason its way to a final answer within one response. This is useful for complex tasks that require detailed explanation, planning and reasoning, such as math problems and logic puzzles, where explaining the thought process is essential to fully understanding the solution.

In contrast, prompt chaining involves an iterative sequence of prompts and responses, in which each subsequent prompt is formulated based on the model's output in response to the previous one. This makes prompt chaining a useful technique for more creative, exploratory tasks that involve gradual refinement, such as generating detailed narratives and brainstorming ideas.

The fundamental difference between CoT prompting and prompt chaining lies in iteration and interactivity. CoT prompting presents the reasoning process within a single detailed, self-contained response. Prompt chaining takes a more dynamic approach, with multiple rounds of interaction that enable users to develop an idea over time.

Use cases of CoT prompting

CoT is more than just an AI technique for LLM users and tech enthusiasts. There are real-world uses for CoT that help organizations perform tasks such as the following:

Understanding regulations. Legal experts can use chain-of-thought prompting to direct an LLM to explain new or existing regulations -- such as laws surrounding data privacy -- and how those apply to their organization. This approach can also apply to writing new internal policies.
Educating new employees. An LLM can teach an organization's new hires about its internal policies. For example, a new hire can use CoT prompting to ask an LLM which policies would apply to a specific circumstance and why.
Answering customer queries. AI-powered chatbots are commonly used in industries for customer interactions. In this case, a customer might need to complete a complex troubleshooting process, and a chatbot can explain how and why the customer would need to perform certain actions.
Managing logistics and supply chains. A logistics or transportation company could rely on this technique when asking an LLM to craft a better logistics strategy. The LLM would have to explain its answers and how they optimize logistics operations.
Creating original content. Generative AI tools could draft and organize text in a way that's easy for readers to understand, and it could explain why it did so. Long-form content, such as complex scientific research papers, could benefit from this approach.

CoT prompting is one of multiple advanced strategies involved in prompt engineering. Learn other important strategies and tips.

Continue Reading About What is chain-of-thought prompting (CoT)? Examples and benefits

Prompt engineering tips and best practices

Prompt engineering vs. fine-tuning: What's the difference?

Compare prompt engineering tools

Skills needed to become a prompt engineer

AI engineers: What they do and how to become one

Dig Deeper on AI technologies

Search Business Analytics

What makes an effective data science team structure?
Data science team structures vary in strength, and their success depends on how roles and leadership align with business goals to...
Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...

Search CIO

Trump shifts U.S. competition policy
While revoking former President Joe Biden's executive order on competition may make M&A more favorable for tech companies, it ...
How to become a Web 3.0 developer: Required skills and guide
Becoming a Web 3.0 expert means mixing old and new skills.
How to attract tech talent in 2025: 7 essentials
In this time of 'the great churn,' finding and keeping great tech talent sounds merely aspirational. Read on for seven methods ...

Search Data Management

Latest from Vast Data aims to simplify, speed AI development
SyncEngine has the potential to be a differentiator for the vendor, combining capabilities usually performed by specialized tools...
How AI-powered governance enables scalable AI deployment
AI-powered governance tools help organizations move AI from trials to production by automating compliance, mitigating risks and ...
Alation unveils agentic AI-powered query capabilities
By accessing a knowledge layer consisting of curated data products and metadata, Chat with Your Data provides more accurate ...

Search ERP

7 last-mile delivery trends in 2025
More and more companies are making their deliveries as fast as possible to meet demand and focusing on how to improve last-mile ...
Should you crowdsource last-mile delivery?
Many retailers experience shifts in demand, so crowdsourcing delivery workers might help address fluctuation. Learn other ...
7 last-mile delivery metrics to measure success
Getting an accurate picture of last-mile delivery often requires measuring all related operational expenses. Learn more about ...

Close