How to Integrate AI Agents into Web Apps (React & Next.js)

AI agents can make your web applications smarter, more interactive, and dramatically more useful. Whether you want to build a customer support chatbot, a code assistant, a document analyzer, or a fully autonomous workflow agent that calls external APIs and takes actions on behalf of your users — this guide covers everything you need to know. We start with the fundamentals of what AI agents actually are, then walk through progressively more advanced integrations with production-ready code examples in React and Next.js.

What Are AI Agents? (And How Are They Different from Plain LLMs?)
AI Agent Architecture in Web Apps
Choosing an AI Provider
Method 1: Vercel AI SDK (Recommended for Next.js)
Method 2: LangChain.js
Method 3: Direct OpenAI API
Method 4: Anthropic Claude SDK
Full Build: Production Chatbot with Streaming
Advanced: Tool Calling (Function Calling)
Advanced: Memory and Conversation History
Advanced: RAG — Connecting Your Agent to Your Own Data
Advanced: Human-in-the-Loop (HITL)
Advanced: Multi-Agent Workflows
Security Best Practices
Performance and Cost Optimization
Real-World Use Cases with Code Patterns
FAQ

What Are AI Agents? (And How Are They Different from Plain LLMs?)

A Large Language Model (LLM) like GPT-4 or Claude is a powerful text-in, text-out system. You send it a message and it generates a response. That's useful — but it is passive. An AI agent takes this further: it can perceive its environment, decide what to do, take actions (calling tools, APIs, or functions), observe the results, and continue reasoning until it achieves a goal. The key distinction is autonomy and action.

A plain LLM call

User: "What's the weather in Nairobi?"

LLM: "I don't have access to real-time data, but Nairobi typically has..."

The LLM can only respond with what it already knows. It can't check current weather.

An AI agent

User: "What's the weather in Nairobi?"

Agent: [Calls weather API tool] — [Gets current data] — "It's currently 22°C in Nairobi with partly cloudy skies."

The agent acts — it chooses to call a tool, processes the result, and gives a grounded answer.

In a web application context, an AI agent typically consists of three components working together: an LLM (the brain that reasons and decides), a set of tools (functions the agent can call — databases, APIs, file systems, web search), and an orchestration layer (the code that manages the loop of reasoning, action, observation, and response). Modern frameworks like the Vercel AI SDK and LangChain handle most of this orchestration for you.

AI Agent Architecture in Web Apps

Before writing any code, understanding how AI agents fit into a typical Next.js application architecture saves a lot of debugging time later. The key architectural principle is: LLM API calls must always happen server-side, never in the browser. Your API keys would be exposed if you called OpenAI directly from frontend JavaScript.

Standard Next.js AI Agent Architecture

Browser (React Client Component)
  — "  HTTP / WebSocket / Server-Sent Events
Next.js API Route (app/api/chat/route.ts)  — — Your server-side agent logic lives here
  — "  HTTPS
AI Provider API (OpenAI / Anthropic / Google)
  — "  (optional)
External Tools (Database, Weather API, Web Search, Email, etc.)

The React component handles UI — displaying messages, capturing user input, showing loading states. The API route handles everything AI-related — calling the LLM, executing tools, managing conversation history. The user never has direct access to your API keys or tool logic.

Synchronous flow (simple)

User sends message — API route receives it — Calls LLM — LLM may call tools — Returns complete response — UI displays it. Simpler to implement but user sees a blank screen until the full response is ready.

Streaming flow (recommended)

User sends message — API route receives it — Calls LLM with streaming — Tokens stream back to client in real time — UI displays text as it appears. Better UX — users see the response forming immediately instead of waiting 5—10 seconds.

Choosing an AI Provider

The choice of AI provider affects your agent's capabilities, cost, latency, and the complexity of your integration. Here is a practical comparison for web app developers in 2026.

Provider	Best Models	Strengths	Free Tier	Best For
OpenAI	GPT-4o, GPT-4o-mini	Best ecosystem, most tutorials, function calling support	$5 credit on signup	Most web app use cases; chatbots, code assistants
Anthropic	Claude 3.5 Sonnet, Claude 3 Haiku	Excellent for long documents and nuanced reasoning; large context window	Limited free API access	Document analysis, long conversations, research assistants
Google Gemini	Gemini 1.5 Pro, Gemini Flash	Multimodal (text + images + video); generous free tier	Generous — 15 RPM free	Image analysis, multimodal features, cost-conscious projects
Mistral AI	Mistral Large, Mistral 7B	Open-weight models; can self-host; European data residency	Free API trial	Privacy-sensitive apps, European compliance, self-hosting
Groq	Llama 3, Mixtral	Extremely fast inference (10x+ faster than OpenAI); free tier	Generous free tier	Speed-critical applications; real-time voice; prototyping

Method 1: Vercel AI SDK (Recommended for Next.js)

The Vercel AI SDK is the most ergonomic way to add AI to a Next.js application. It provides first-class support for streaming, tool calling, and multi-step agent loops, and it works with all major AI providers through a unified API. You write the same code regardless of which AI provider you choose — swapping providers is a one-line change.

Installation and Setup

# Install the AI SDK core and your chosen provider
npm install ai @ai-sdk/openai

# Or for Anthropic
npm install ai @ai-sdk/anthropic

# Or for Google
npm install ai @ai-sdk/google

Add your API key to .env.local:

OPENAI_API_KEY=sk-your-key-here

Basic API Route — app/api/chat/route.ts

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-4o'),
    system: 'You are a helpful assistant for a tech blog.',
    messages,
  });

  return result.toDataStreamResponse();
}

Frontend Chat Component — components/Chat.tsx

'use client';
import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
    api: '/api/chat',
    onError: (err) => console.error('Chat error:', err),
  });

  return (
    <div style={{ maxWidth: 700, margin: '0 auto', padding: 24 }}>
      {/* Messages */}
      <div style={{ minHeight: 400, marginBottom: 16 }}>
        {messages.map((m) => (
          <div
            key={m.id}
            style={{
              padding: '12px 16px',
              marginBottom: 12,
              borderRadius: 12,
              background: m.role === 'user' ? '#EDE9FE' : '#F8FAFF',
              borderLeft: m.role === 'assistant' ? '4px solid #4F46E5' : '4px solid #EC4899',
            }}
          >
            <strong style={{ color: m.role === 'user' ? '#7C3AED' : '#4F46E5' }}>
              {m.role === 'user' ? 'You' : 'Assistant'}:
            </strong>
            <p style={{ margin: '6px 0 0', whiteSpace: 'pre-wrap' }}>{m.content}</p>
          </div>
        ))}
        {isLoading && (
          <div style={{ padding: 16, color: '#4F46E5', fontStyle: 'italic' }}>
            Assistant is thinking...
          </div>
        )}
        {error && (
          <div style={{ padding: 16, color: '#DC2626', background: '#FEF2F2', borderRadius: 8 }}>
            Error: {error.message}
          </div>
        )}
      </div>

      {/* Input */}
      <form onSubmit={handleSubmit} style={{ display: 'flex', gap: 8 }}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask me anything..."
          disabled={isLoading}
          style={{
            flex: 1, padding: '12px 16px', borderRadius: 8,
            border: '1px solid #E2E8F0', fontSize: 16,
          }}
        />
        <button
          type="submit"
          disabled={isLoading || !input.trim()}
          style={{
            padding: '12px 24px', background: '#4F46E5', color: '#fff',
            border: 'none', borderRadius: 8, cursor: 'pointer', fontWeight: 600,
          }}
        >
          {isLoading ? '...' : 'Send'}
        </button>
      </form>
    </div>
  );
}

The useChat hook manages all conversation state automatically — it tracks the message history, sends the full history with each request (so the AI has context), and handles the streaming response. You do not need to manage any of this state yourself.

Method 2: LangChain.js

LangChain is a framework specifically designed for building complex AI agent workflows. Where the Vercel AI SDK is optimized for simple to moderately complex chat interfaces, LangChain shines when you need chains of AI calls, agents with multiple tools, complex memory systems, or retrieval-augmented generation (RAG) pipelines. It supports every major AI provider and has a rich library of pre-built tools and integrations.

Installation

npm install langchain @langchain/openai @langchain/core

Simple LangChain agent in a Next.js API Route

import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, SystemMessage, AIMessage } from '@langchain/core/messages';

const model = new ChatOpenAI({
  modelName: 'gpt-4o',
  temperature: 0.7,
  streaming: true,
});

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Convert messages to LangChain format
  const langchainMessages = messages.map((m: { role: string; content: string }) => {
    if (m.role === 'system') return new SystemMessage(m.content);
    if (m.role === 'user') return new HumanMessage(m.content);
    return new AIMessage(m.content);
  });

  // Stream response
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of await model.stream(langchainMessages)) {
        controller.enqueue(encoder.encode(chunk.content as string));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}

LangChain Agent with Tools

import { ChatOpenAI } from '@langchain/openai';
import { createOpenAIFunctionsAgent, AgentExecutor } from 'langchain/agents';
import { DynamicTool } from '@langchain/core/tools';
import { ChatPromptTemplate } from '@langchain/core/prompts';

// Define custom tools
const weatherTool = new DynamicTool({
  name: 'get_weather',
  description: 'Get current weather for a given city',
  func: async (city: string) => {
    // In production, call a real weather API here
    return `The weather in ${city} is currently 22°C with partly cloudy skies.`;
  },
});

const calculatorTool = new DynamicTool({
  name: 'calculate',
  description: 'Perform mathematical calculations. Input: a mathematical expression.',
  func: async (expression: string) => {
    try {
      // Use a safe math evaluator in production
      return String(eval(expression));
    } catch {
      return 'Could not evaluate the expression.';
    }
  },
});

const tools = [weatherTool, calculatorTool];

export async function createAgent() {
  const llm = new ChatOpenAI({ modelName: 'gpt-4o', temperature: 0 });

  const prompt = ChatPromptTemplate.fromMessages([
    ['system', 'You are a helpful assistant with access to weather and calculation tools.'],
    ['human', '{input}'],
    ['placeholder', '{agent_scratchpad}'],
  ]);

  const agent = await createOpenAIFunctionsAgent({ llm, tools, prompt });
  return new AgentExecutor({ agent, tools, verbose: false });
}

Method 3: Direct OpenAI API

For cases where you need maximum control or want to minimize dependencies, calling the OpenAI API directly using their official Node.js SDK is a clean option. This gives you full access to every OpenAI feature including function calling, vision, and the Assistants API without any abstraction layer on top.

Installation and basic streaming call

npm install openai

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: Request) {
  const { messages } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const streamResponse = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [
          { role: 'system', content: 'You are a helpful assistant.' },
          ...messages,
        ],
        stream: true,
      });

      for await (const chunk of streamResponse) {
        const text = chunk.choices[0]?.delta?.content || '';
        if (text) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
        }
      }
      controller.enqueue(encoder.encode('data: [DONE]\n\n'));
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
    },
  });
}

Method 4: Anthropic Claude SDK

Anthropic's Claude models are excellent for applications requiring nuanced understanding, document analysis, or long conversations. Claude 3.5 Sonnet in particular is competitive with GPT-4o on most benchmarks while being more cost-effective for high-volume applications. The official Anthropic SDK makes integration straightforward.

Installation and streaming with Claude

npm install @anthropic-ai/sdk

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

export async function POST(req: Request) {
  const { messages } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const streamResponse = anthropic.messages.stream({
        model: 'claude-sonnet-4-6',
        max_tokens: 1024,
        system: 'You are a helpful assistant.',
        messages,
      });

      for await (const chunk of streamResponse) {
        if (
          chunk.type === 'content_block_delta' &&
          chunk.delta.type === 'text_delta'
        ) {
          controller.enqueue(encoder.encode(chunk.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}

Full Build: Production Chatbot with Streaming

The following is a complete, production-ready chatbot implementation using the Vercel AI SDK. It includes proper error handling, a loading indicator, auto-scrolling to the latest message, and a cancel button to stop generation mid-stream — all the details that separate a demo from a real feature.

app/api/chat/route.ts — Production API Route

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

// Allow streaming responses up to 30 seconds
export const maxDuration = 30;

export async function POST(req: Request) {
  try {
    const { messages } = await req.json();

    if (!messages || !Array.isArray(messages)) {
      return new Response('Invalid messages format', { status: 400 });
    }

    const result = await streamText({
      model: openai('gpt-4o-mini'), // cheaper model for production; upgrade as needed
      system: `You are a helpful assistant for G-Tech Blog.
You specialize in technology, web development, and AI topics.
Keep responses concise and practical. Use code examples when helpful.`,
      messages,
      maxTokens: 1000,
      temperature: 0.7,
    });

    return result.toDataStreamResponse();
  } catch (error) {
    console.error('Chat API error:', error);
    return new Response('Internal server error', { status: 500 });
  }
}

components/ProductionChat.tsx — Full Featured Chat UI

'use client';
import { useChat } from 'ai/react';
import { useEffect, useRef } from 'react';

export default function ProductionChat() {
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    isLoading,
    error,
    stop,
    reload,
    setMessages,
  } = useChat({
    api: '/api/chat',
    onError: (err) => console.error('Chat error:', err),
  });

  // Auto-scroll to latest message
  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  const handleKeyDown = (e: React.KeyboardEvent) => {
    if (e.key === 'Enter' && !e.shiftKey) {
      e.preventDefault();
      if (!isLoading && input.trim()) {
        handleSubmit(e as unknown as React.FormEvent);
      }
    }
  };

  return (
    <div style={{ display: 'flex', flexDirection: 'column', height: '600px', border: '1px solid #E2E8F0', borderRadius: 16, overflow: 'hidden', background: '#fff' }}>

      {/* Header */}
      <div style={{ padding: '16px 20px', background: '#4F46E5', color: '#fff', display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
        <span style={{ fontWeight: 700 }}>’x— AI Assistant</span>
        <button onClick={() => setMessages([])} style={{ background: 'rgba(255,255,255,0.2)', border: 'none', color: '#fff', padding: '4px 12px', borderRadius: 20, cursor: 'pointer', fontSize: 12 }}>
          Clear
        </button>
      </div>

      {/* Messages */}
      <div style={{ flex: 1, overflowY: 'auto', padding: 20 }}>
        {messages.length === 0 && (
          <div style={{ textAlign: 'center', color: '#94A3B8', marginTop: 80 }}>
            <p style={{ fontSize: 48 }}>’x—</p>
            <p>Start a conversation!</p>
          </div>
        )}
        {messages.map((m) => (
          <div key={m.id} style={{ marginBottom: 16, display: 'flex', justifyContent: m.role === 'user' ? 'flex-end' : 'flex-start' }}>
            <div style={{
              maxWidth: '75%',
              padding: '10px 16px',
              borderRadius: m.role === 'user' ? '18px 18px 4px 18px' : '18px 18px 18px 4px',
              background: m.role === 'user' ? '#4F46E5' : '#F1F5F9',
              color: m.role === 'user' ? '#fff' : '#0F172A',
              fontSize: 15,
              lineHeight: 1.6,
              whiteSpace: 'pre-wrap',
            }}>
              {m.content}
            </div>
          </div>
        ))}
        {isLoading && (
          <div style={{ display: 'flex', gap: 6, padding: '10px 16px' }}>
            {[0,1,2].map(i => (
              <div key={i} style={{ width: 8, height: 8, borderRadius: '50%', background: '#4F46E5', animation: `bounce 1.2s ${i * 0.2}s infinite` }} />
            ))}
          </div>
        )}
        {error && (
          <div style={{ padding: 12, background: '#FEF2F2', borderRadius: 8, color: '#DC2626', fontSize: 14, display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
            <span>{error.message}</span>
            <button onClick={() => reload()} style={{ background: '#DC2626', color: '#fff', border: 'none', borderRadius: 6, padding: '4px 10px', cursor: 'pointer', fontSize: 12 }}>Retry</button>
          </div>
        )}
        <div ref={messagesEndRef} />
      </div>

      {/* Input */}
      <div style={{ padding: '12px 16px', borderTop: '1px solid #E2E8F0', display: 'flex', gap: 8 }}>
        <textarea
          value={input}
          onChange={handleInputChange}
          onKeyDown={handleKeyDown}
          placeholder="Type a message... (Enter to send, Shift+Enter for new line)"
          disabled={isLoading}
          rows={1}
          style={{ flex: 1, padding: '10px 14px', borderRadius: 10, border: '1px solid #E2E8F0', fontSize: 15, resize: 'none', fontFamily: 'inherit', outline: 'none' }}
        />
        {isLoading ? (
          <button onClick={stop} style={{ padding: '10px 18px', background: '#EF4444', color: '#fff', border: 'none', borderRadius: 10, cursor: 'pointer', fontWeight: 600 }}>Stop</button>
        ) : (
          <button onClick={(e) => handleSubmit(e as unknown as React.FormEvent)} disabled={!input.trim()} style={{ padding: '10px 18px', background: '#4F46E5', color: '#fff', border: 'none', borderRadius: 10, cursor: 'pointer', fontWeight: 600, opacity: input.trim() ? 1 : 0.5 }}>Send</button>
        )}
      </div>
    </div>
  );
}

Advanced: Tool Calling (Function Calling)

Tool calling is what transforms a passive LLM into an actual agent. You define a set of functions (tools) that the AI is allowed to call, describe what they do in natural language, and the model decides when and how to use them based on the user's request. The Vercel AI SDK makes this elegant with the tool utility from the ai package.

API Route with Tools — app/api/agent/route.ts

import { openai } from '@ai-sdk/openai';
import { streamText, tool } from 'ai';
import { z } from 'zod';

export const maxDuration = 60;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-4o'),
    system: 'You are a helpful assistant. Use tools when appropriate.',
    messages,
    tools: {
      // Tool 1: Get current weather
      getWeather: tool({
        description: 'Get the current weather for a city',
        parameters: z.object({
          city: z.string().describe('The city name, e.g. Nairobi'),
          unit: z.enum(['celsius', 'fahrenheit']).describe('Temperature unit'),
        }),
        execute: async ({ city, unit }) => {
          // In production, call a real weather API (e.g. OpenWeatherMap)
          const temp = unit === 'celsius' ? 22 : 72;
          return {
            city,
            temperature: temp,
            unit,
            condition: 'Partly cloudy',
            humidity: '65%',
          };
        },
      }),

      // Tool 2: Search the database
      searchProducts: tool({
        description: 'Search for products in the database',
        parameters: z.object({
          query: z.string().describe('Search query'),
          maxResults: z.number().optional().describe('Maximum results to return'),
        }),
        execute: async ({ query, maxResults = 5 }) => {
          // In production, query your actual database here
          return {
            results: [
              { id: 1, name: `Product matching "${query}"`, price: 2999 },
            ],
            total: 1,
          };
        },
      }),

      // Tool 3: Calculate
      calculate: tool({
        description: 'Perform a mathematical calculation',
        parameters: z.object({
          expression: z.string().describe('The mathematical expression to evaluate'),
        }),
        execute: async ({ expression }) => {
          // Use a safe math library in production
          try {
            return { result: String(Function(`"use strict"; return (${expression})`)()) };
          } catch {
            return { error: 'Could not evaluate expression' };
          }
        },
      }),
    },

    // Allow multiple tool-call rounds before the final response
    maxSteps: 5,
  });

  return result.toDataStreamResponse();
}

The maxSteps parameter controls how many tool-call + observe + respond cycles the agent can run before returning to the user. Setting it to 5 means the agent can call up to 5 tools in a single user turn — for example, calling weather for 3 cities and then combining the results into a comparison. Without maxSteps, the agent only makes one tool call per turn.

Advanced: Memory and Conversation History

By default, LLMs are stateless — they have no memory of previous conversations. The useChat hook handles in-session memory automatically by sending the full message history with every request. But for persistent memory across browser sessions or user accounts, you need to save and load conversation history from a database.

Persistent conversation history with a database

// lib/conversations.ts — Database operations (example using Prisma)
import { prisma } from './prisma';

export async function loadConversation(userId: string, conversationId: string) {
  const messages = await prisma.message.findMany({
    where: { userId, conversationId },
    orderBy: { createdAt: 'asc' },
  });
  return messages.map(m => ({ role: m.role, content: m.content }));
}

export async function saveMessage(
  userId: string,
  conversationId: string,
  role: 'user' | 'assistant',
  content: string
) {
  return prisma.message.create({
    data: { userId, conversationId, role, content },
  });
}

// app/api/chat/route.ts — Load history and persist new messages
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import { loadConversation, saveMessage } from '@/lib/conversations';

export async function POST(req: Request) {
  const { messages, userId, conversationId } = await req.json();

  // Load full history from database
  const history = await loadConversation(userId, conversationId);

  // Save the new user message
  const latestUserMessage = messages[messages.length - 1];
  await saveMessage(userId, conversationId, 'user', latestUserMessage.content);

  const result = await streamText({
    model: openai('gpt-4o'),
    system: 'You are a helpful assistant with memory of past conversations.',
    messages: [...history, ...messages],
    onFinish: async ({ text }) => {
      // Save the AI response once streaming is complete
      await saveMessage(userId, conversationId, 'assistant', text);
    },
  });

  return result.toDataStreamResponse();
}

Advanced: RAG — Connecting Your Agent to Your Own Data

Retrieval-Augmented Generation (RAG) is one of the most powerful patterns in AI web apps. It lets your agent answer questions based on your specific documents, database, or knowledge base — not just its training data. The pattern works by converting your documents into vector embeddings, storing them in a vector database, and at query time retrieving the most relevant chunks to include in the AI's context.

RAG Pattern Overview

// Step 1: Index your documents (run once)
// embeddings = openai.embeddings.create(document_chunks)
// store in vector DB (Pinecone, Supabase pgvector, Qdrant, etc.)

// Step 2: At query time, retrieve relevant chunks
// query_embedding = openai.embeddings.create(user_question)
// relevant_chunks = vectorDB.similarity_search(query_embedding, top_k=5)

// Step 3: Inject retrieved context into the prompt
// system = `Answer based on this context:\n${relevant_chunks.join('\n')}`
// messages = [{ role: 'user', content: user_question }]
// response = openai.chat.completions.create(system, messages)

Minimal RAG Implementation with Vercel AI SDK + Supabase pgvector

import { openai } from '@ai-sdk/openai';
import { streamText, embed } from 'ai';
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

async function getRelevantContext(query: string): Promise<string> {
  // Create embedding for the user's query
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  });

  // Search for semantically similar document chunks
  const { data } = await supabase.rpc('match_documents', {
    query_embedding: embedding,
    match_threshold: 0.78,
    match_count: 5,
  });

  if (!data || data.length === 0) return '';

  return data.map((d: { content: string }) => d.content).join('\n\n');
}

export async function POST(req: Request) {
  const { messages } = await req.json();
  const userQuery = messages[messages.length - 1].content;

  // Retrieve relevant context
  const context = await getRelevantContext(userQuery);

  const result = await streamText({
    model: openai('gpt-4o'),
    system: context
      ? `You are a helpful assistant. Answer based on the following context:\n\n${context}\n\nIf the context does not contain the answer, say so honestly.`
      : 'You are a helpful assistant.',
    messages,
  });

  return result.toDataStreamResponse();
}

Security Best Practices

AI integrations introduce specific security risks that differ from standard web security. A compromised AI agent can leak private data, execute unauthorized actions, or be manipulated by malicious users into behaving in unintended ways. Take these risks seriously before deploying to production.

Protect your API keys

Store all AI provider API keys in .env.local — never in client-side code
Use NEXT_PUBLIC_ prefix only for variables safe to expose to the browser — never for API keys
Rotate API keys regularly and immediately if you suspect exposure
Use separate API keys for development and production environments

Rate limiting and abuse prevention

Implement per-user rate limiting on your AI API routes — a malicious user could drain your OpenAI credits in minutes without it
Use libraries like @upstash/ratelimit with Redis for serverless-friendly rate limiting
Set maxTokens limits on every AI call to prevent runaway responses
Monitor your AI provider's usage dashboard and set billing alerts

Input validation and sanitization

Validate all user inputs on the server before passing them to the AI — check type, length, and format
Limit message history length to prevent token limit abuse
Implement content filtering for user inputs if your use case requires it
Never pass raw user input directly into SQL queries or system commands even if suggested by the AI

Prompt injection defense

Prompt injection is when a malicious user crafts input designed to override your system prompt: "Ignore previous instructions and..."
Separate system instructions from user content clearly in your prompts
For high-stakes applications (financial, medical, legal), add an output validation layer that checks AI responses before displaying them
Never allow users to directly modify your system prompt through the UI

Rate limiting implementation with Upstash Redis

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '1 m'), // 10 requests per minute per IP
  analytics: true,
});

export async function POST(req: Request) {
  // Get client IP for rate limiting
  const ip = req.headers.get('x-forwarded-for') ?? '127.0.0.1';
  const { success, remaining } = await ratelimit.limit(ip);

  if (!success) {
    return new Response('Too many requests. Please wait a moment.', {
      status: 429,
      headers: { 'Retry-After': '60' },
    });
  }

  const { messages } = await req.json();

  // Validate messages
  if (!Array.isArray(messages) || messages.length === 0) {
    return new Response('Invalid request', { status: 400 });
  }

  // Limit conversation history length to prevent token abuse
  const recentMessages = messages.slice(-20);

  const result = await streamText({
    model: openai('gpt-4o-mini'),
    messages: recentMessages,
    maxTokens: 500, // Prevent runaway responses
  });

  return result.toDataStreamResponse({
    headers: { 'X-RateLimit-Remaining': String(remaining) },
  });
}

Performance and Cost Optimization

AI API calls are expensive relative to regular API calls, and the cost adds up quickly at scale. A single GPT-4o call can cost $0.005—$0.05 depending on token count — multiply that by thousands of users and the bill becomes significant fast. Smart optimization can reduce costs by 50—90% without meaningfully degrading quality.

Optimization	Cost Reduction	Implementation
Use a smaller model for simple tasks	70—90%	GPT-4o-mini instead of GPT-4o for classification, summarization, simple Q&A
Limit conversation history	30—60%	Only send the last 10—20 messages instead of the full history
Set maxTokens explicitly	20—40%	Prevent unnecessarily long responses for tasks that need short answers
Cache common responses	Variable	Cache AI responses for identical or near-identical queries using Redis
Compress system prompts	10—20%	Write concise system prompts; every token in the system prompt is charged on every request
Batch similar requests	30—50%	Use OpenAI's Batch API for non-real-time tasks (document processing, classification)

Real-World Use Cases with Code Patterns

Customer Support Chatbot

Load your FAQ, product documentation, and support policies into a RAG system. The agent answers questions from this knowledge base and escalates to human support when it can't find an answer. Key tools: search_knowledge_base, create_support_ticket, check_order_status.

RAGTool CallingStreaming

Code Review Assistant

Users paste code and ask for review, optimization suggestions, or bug detection. Use Claude or GPT-4o with a specialized system prompt for the language. Add a tool to look up documentation for specific libraries. Stream the response for long code files.

StreamingSystem PromptClaude/GPT-4o

Document Analyzer

Users upload PDFs or paste long text. The agent extracts key information, answers questions about the document, or generates summaries. Use Claude for its large context window (200K tokens). Combine with file upload handling via Vercel Blob or AWS S3.

Large ContextClaudeFile Upload

Data Dashboard Assistant

Users ask questions about their data in plain English ("Show me sales trends for Q3"). The agent translates to SQL queries, executes them via a query_database tool, and returns structured results or chart data. Requires careful SQL injection prevention.

Tool CallingText-to-SQLHITL

E-commerce Product Finder

A conversational shopping assistant that understands natural language product queries, filters inventory in real time, compares products, and guides purchase decisions. Tools: search_catalog, get_product_details, check_stock, add_to_cart.

Tool CallingMemoryStreaming

Content Generation Pipeline

A multi-step agent that researches a topic (web search tool), outlines an article, writes each section, and self-reviews the result. Uses maxSteps for multi-round generation. Best combined with a human review step before publishing.

Multi-stepTool CallingHITL

Frequently Asked Questions

What's the difference between the Vercel AI SDK and LangChain?

The Vercel AI SDK is optimized for Next.js and React web applications — it provides excellent streaming support, clean React hooks (useChat, useCompletion), and works with any AI provider through a unified API. It's the right choice for most web chat and assistant features. LangChain is a more general-purpose agent framework designed for complex multi-step workflows, chains of AI calls, sophisticated memory systems, and deep tool integrations. If your use case involves complex reasoning pipelines, document processing workflows, or agent orchestration beyond simple chat, LangChain offers more flexibility. Many production applications use both — the Vercel AI SDK for the UI layer and LangChain for complex backend agent logic.

Should I use GPT-4o or GPT-4o-mini?

GPT-4o-mini is 30x cheaper than GPT-4o and surprisingly capable for most tasks. Use GPT-4o-mini for classification, summarization, simple Q&A, and first-pass generation. Reserve GPT-4o for tasks requiring complex reasoning, nuanced understanding, or multi-step tool use where quality is critical. A common production pattern is to route simple queries to GPT-4o-mini and complex queries (detected by length, complexity keywords, or topic classification) to GPT-4o — this can reduce costs by 60—80% while maintaining quality for complex tasks.

How do I prevent users from abusing my AI integration?

The three key protections are: rate limiting (limit requests per IP or per user account), token limits (set maxTokens on every call), and authentication (require login for AI features and tie usage to specific user accounts). For public-facing AI features, also implement content filtering on inputs, monitor for unusual usage patterns (very long prompts, many requests per minute), and set billing alerts with your AI provider so you are notified before costs exceed your budget.

Can I run an AI agent without paying for the OpenAI API?

Yes — several free alternatives exist. Google's Gemini API has a generous free tier (15 requests per minute). Groq offers free API access to open-source models (Llama 3, Mixtral) with very fast inference. Hugging Face Inference API provides free access to thousands of open-source models. For development and prototyping, these free tiers are typically sufficient. For production applications with significant traffic, you will likely need a paid plan from one of the major providers, though Groq and Gemini remain significantly cheaper than OpenAI for equivalent capability.

Conclusion

Integrating AI agents into web applications has never been more accessible. The Vercel AI SDK, LangChain, and the official SDKs from OpenAI and Anthropic give you the building blocks to go from a simple streaming chatbot to a fully autonomous agent with tool calling, persistent memory, and retrieval-augmented generation — all within a Next.js application.

Start with the Vercel AI SDK and a basic streaming chat interface. Once that is working, add tool calling to give your agent the ability to take actions. Then layer in persistent memory for multi-session context, RAG for grounding responses in your own data, and rate limiting for production safety. Each addition opens up a new class of user experiences that were simply impossible to build without AI a few years ago.

The developers who build fluency with AI agent integration now are positioning themselves for the most significant wave of software development since the mobile revolution. The patterns in this guide are the foundation — build on them, experiment, and ship something your users have never seen before.

Table of Contents