Building AI Agents with the Vercel AI SDK
If a chatbot is like a helpful person at an information desk, an AI agent is like a personal assistant who can leave the desk, go out into the world, and get things done for you.
An agent is an autonomous system that can understand a high-level goal, break it down into steps, use tools to execute those steps, and adapt its plan based on the results. This is the leap from simply answering questions to actively solving problems.
How Do Agents Work?
At its core, an agent operates in a loop: it assesses a goal, chooses a tool, executes it, observes the result, and then decides what to do next. This process, often called a ReAct (Reason + Act) loop, allows the agent to tackle complex tasks that require multiple steps or access to external information.
The Vercel AI SDK makes building these agentic systems surprisingly simple. Instead of you having to manually create this loop, the SDK's generateText
function can manage it for you using its powerful tool-calling and maxSteps
features.
Building a Simple Multi-Step Agent
Let's build a simple math agent that can solve a multi-step word problem. The agent will have two tools: a calculator
and an answer
tool to deliver the final result.
Notice how we're not telling it when to use the calculator, just that it's available. The magic of the agent is that it will reason about the problem and decide for itself when a tool is necessary.
// lib/agents/math-agent.ts
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import * as mathjs from 'mathjs';
// Ensure you have a .env file with your OPENAI_API_KEY
import 'dotenv/config';
async function main() {
const { text, toolCalls, finishReason } = await generateText({
model: openai('gpt-4o'),
// The `maxSteps` parameter allows the model to recursively call tools.
maxSteps: 5,
tools: {
// A calculator tool that the model can use to evaluate math expressions.
calculator: tool({
description: 'A tool to evaluate mathematical expressions.',
parameters: z.object({
expression: z.string().describe('The mathematical expression to evaluate.'),
}),
execute: async ({ expression }) => mathjs.evaluate(expression),
}),
// An answer tool that the model uses to provide the final answer.
answer: tool({
description: 'A tool to provide the final answer to the user.',
parameters: z.object({
finalAnswer: z.string().describe('The final answer.'),
explanation: z.string().describe('A step-by-step explanation of how the answer was derived.'),
}),
// When the model calls this tool, it will terminate the agent loop.
}),
},
prompt: `You are a helpful math assistant.
A user is asking a question about a word problem.
Reason step-by-step and use the calculator tool when needed.
When you have the final answer, use the "answer" tool to provide it.
Question: A taxi driver earns $9461 in a 1-hour shift.
If they work 12 hours a day and use 14 liters of petrol per hour at a price of $134 per liter,
how much money does the driver earn in one day after deducting petrol costs?`,
});
console.log('---');
if (finishReason === 'tool-calls') {
const answerToolCall = toolCalls.find(call => call.toolName === 'answer');
if (answerToolCall) {
console.log('Final Answer:', answerToolCall.args.finalAnswer);
console.log('\nExplanation:\n', answerToolCall.args.explanation);
} else {
console.log('The agent did not provide a final answer.');
}
} else {
console.log('Final Text Response:', text);
}
console.log('---');
}
main();
In this example:
- We give the model a prompt and a set of
tools
. - We set
maxSteps: 5
, which tells the AI SDK to let the model make up to 5 sequential tool calls to solve the problem. - The model first reasons that it needs to calculate the total earnings and total fuel cost. It calls the
calculator
tool multiple times to do this. - After each tool call, the result is fed back into the model's context, allowing it to plan its next step.
- Once it has all the information it needs, it calls the
answer
tool to present the final, structured result. Because theanswer
tool has noexecute
function, the AI SDK knows this is a terminal step and stops the loop.
This is the power of agentic design. We provided the goal and the tools, and the agent figured out the rest.
Key Concepts for Building Agents
- Tool Definition: Tools are the actions your agent can take. They should be well-described so the LLM knows when and how to use them. Using
zod
for parameters ensures the data passed to your tools is always correctly structured. - State Management: For more complex agents, you'll need a way to persist information between interactions. This could be a simple object in memory, or a more robust solution like a database.
- Planning: The LLM's ability to break down a large goal into smaller, manageable steps. This is often guided by a strong system prompt that encourages the model to "think step-by-step."
- Multi-Agent Systems: For very complex workflows, you can even have multiple specialized agents collaborate. For example, a "researcher" agent could find information, and a "writer" agent could use that information to compose a report.
Building AI agents is one of the most exciting frontiers in software development. With a framework like the Vercel AI SDK, you can move from simple Q&A bots to creating truly autonomous systems that can reason, plan, and solve real-world problems.