There are two reasonable ways to add AI to a Next.js app on AppEngine. The recommended path is the metered agent endpoint — /ai/agent/stream — which proxies Anthropic, OpenAI, DeepSeek, and Gemini behind a single bearer-token interface and meters tokens per org. The escape hatch is calling the vendor SDK directly. This tutorial covers both.
Why the agent endpoint
- One bearer token, swap models with one config flag
- Token usage counted against the org's quota — no surprise bills
- Caching on Anthropic + DeepSeek without you wiring it
- The same endpoint works from server, browser, and mobile
- Conversation history persists in the chat module if you want it
You give up direct vendor-feature access (e.g., OpenAI's structured outputs aren't currently exposed). For 90% of cases the agent endpoint is what you want.
The endpoint shape
/ai/agent/streamAPIKEYTwo-stage flow:
- POST a request body. AppEngine returns
{ streamId }. - GET
/ai/agent/stream/:streamIdover Server-Sent Events. AppEngine pipes vendor chunks down.
This split exists so you can reconnect to a stream after a network blip — the streamId is good for the duration of the response.
Prerequisites
- AppEngine instance with at least one of
ANTHROPIC_API_KEY,OPENAI_API_KEY,DEEPSEEK_API_KEY, orGEMINI_API_KEYconfigured - Bearer token issued by AppEngine (admin → API keys → new key, scope
ai:agent:stream) - Next.js app with the AppEngine client wired
Step-by-step
- 1
Create a server action to start a stream
The browser shouldn't hold the bearer token. Wrap the POST in a server action.
// src/app/actions/ai-stream.ts 'use server'; import { getAppEngineClient } from '@/lib/appmint-client'; export async function startAiStream(messages: Message[], model: string) { const client = getAppEngineClient(); const { streamId } = await client.processRequest('post', 'ai/agent/stream', { messages, model, max_tokens: 2048, temperature: 0.7, }); return streamId; }getAppEngineClient()reads the bearer from server-side env. Don't expose it inNEXT_PUBLIC_*. - 2
Build a streaming hook
Server-Sent Events are well-supported in modern browsers via
EventSource. The agent endpoint streams OpenAI-shaped deltas:data: {"choices":[{"delta":{"content":"Hello"}}]} data: {"choices":[{"delta":{"content":" world"}}]} data: {"usage":{"prompt_tokens":12,"completion_tokens":2}} data: [DONE]The hook:
// src/hooks/use-ai-stream.ts 'use client'; import { useState, useCallback } from 'react'; export function useAiStream() { const [content, setContent] = useState(''); const [loading, setLoading] = useState(false); const [usage, setUsage] = useState<Usage | null>(null); const send = useCallback(async (messages: Message[], model: string) => { setLoading(true); setContent(''); setUsage(null); // /api/ai/start is a server route that calls startAiStream() above const res = await fetch('/api/ai/start', { method: 'POST', body: JSON.stringify({ messages, model }), }); const { streamId } = await res.json(); const source = new EventSource(`/api/ai/stream/${streamId}`); source.onmessage = (e) => { if (e.data === '[DONE]') { source.close(); setLoading(false); return; } try { const chunk = JSON.parse(e.data); if (chunk.choices?.[0]?.delta?.content) { setContent((c) => c + chunk.choices[0].delta.content); } if (chunk.usage) setUsage(chunk.usage); } catch { /* ignore non-JSON keepalives */ } }; source.onerror = () => { source.close(); setLoading(false); }; }, []); return { content, loading, usage, send }; }The
/api/ai/stream/[id]Next.js route proxies the AppEngine SSE so the browser doesn't hit AppEngine directly with a bearer token. - 3
Build the proxy route
// src/app/api/ai/stream/[id]/route.ts import { NextRequest } from 'next/server'; export async function GET(req: NextRequest, { params }: { params: Promise<{ id: string }> }) { const { id } = await params; const upstream = await fetch( `${process.env.APPENGINE_URL}/ai/agent/stream/${id}`, { headers: { Authorization: `Bearer ${process.env.APPENGINE_BEARER!}`, Accept: 'text/event-stream', }, } ); return new Response(upstream.body, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache, no-transform', Connection: 'keep-alive', }, }); }Pipes the upstream body verbatim. No buffering — chunks reach the browser as fast as the model produces them.
- 4
Render the stream
The hook accumulates content. Render with
whitespace-pre-wrap(so newlines look right) or pipe through a markdown renderer for richer output.// src/components/AiChat.tsx 'use client'; import { useState } from 'react'; import { useAiStream } from '@/hooks/use-ai-stream'; export function AiChat() { const [history, setHistory] = useState<Message[]>([]); const [input, setInput] = useState(''); const { content, loading, usage, send } = useAiStream(); function submit() { const next = [...history, { role: 'user' as const, content: input }]; setHistory(next); setInput(''); send(next, 'claude-3-5-sonnet-20241022'); } return ( <div className="flex flex-col gap-4"> {history.map((m, i) => ( <Message key={i} role={m.role} content={m.content} /> ))} {loading && <Message role="assistant" content={content} streaming />} <textarea value={input} onChange={(e) => setInput(e.target.value)} className="w-full rounded border p-2" /> <button onClick={submit} disabled={loading} className="rounded bg-blue-600 px-4 py-2 text-white"> Send </button> {usage && ( <p className="text-xs text-gray-500"> {usage.prompt_tokens} in / {usage.completion_tokens} out tokens </p> )} </div> ); }When the stream finishes, append the assistant message to history and clear the streaming buffer:
useEffect(() => { if (!loading && content) { setHistory((h) => [...h, { role: 'assistant', content }]); } }, [loading]); - 5
Pick a model
The agent endpoint supports models from any configured provider. Common picks:
Model Best for claude-3-5-sonnet-20241022General reasoning, coding, long context claude-3-5-haiku-20241022Fast, cheap, summarization, classification gpt-4-turbo-previewFunction calling, structured tasks gpt-3.5-turboCheap, fast, simple tasks deepseek-reasonerMath, logic, transparent chain-of-thought gemini-2.0-flashMultimodal, fast, vision GET /ai/agent/modelsreturns the list withcontext_window,supports_images,supports_reasoningflags. Render a model picker off this rather than hardcoding. - 6
Quotas and metering
Each org has an AI usage budget (set in Settings → Plan). The usage module charges 5 units per
/ai/agent/streamcall plus a per-token rate. When the budget runs out, AppEngine returns 402 from the start endpoint.To check remaining budget:
const usage = await client.processRequest('get', 'usage/current'); // { aiTokensUsed: 1234567, aiTokensLimit: 5000000, billingPeriod: '2026-04' }For an admin view, expose this on a settings page. For end-customers, hide it — but consider rate-limiting per-customer in your own code so one user can't burn the whole org budget.
- 7
Vision and multimodal
Models that support images take messages with image content blocks:
const messages = [ { role: 'user', content: [ { type: 'text', text: 'What is in this image?' }, { type: 'image_url', image_url: { url: 'https://...' } }, ], }, ];The endpoint returns 400 if the model doesn't support images. Check
supports_imagesfrom/ai/agent/modelsbefore rendering an image upload affordance.
Vendor LLM fallback
Sometimes you need a feature only the vendor exposes — OpenAI's structured outputs, Anthropic's prompt caching headers, Gemini's grounding. Drop down to the vendor SDK in a server route. Yugo does this for Gemini in src/app/api/ai/route.ts.
// src/app/api/gemini/route.ts
import { GoogleGenerativeAI } from '@google/generative-ai';
export async function POST(req: Request) {
const { prompt } = await req.json();
const genai = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genai.getGenerativeModel({ model: 'gemini-2.0-flash' });
const result = await model.generateContentStream(prompt);
const stream = new ReadableStream({
async start(controller) {
for await (const chunk of result.stream) {
controller.enqueue(new TextEncoder().encode(chunk.text()));
}
controller.close();
},
});
return new Response(stream, { headers: { 'Content-Type': 'text/plain; charset=utf-8' } });
}
You lose the metering. Add your own counter (write to data/ai-usage) and enforce a per-org budget in the route. Or use the agent endpoint for normal traffic and reserve the direct path for the one feature that needs it.
Never put OPENAI_API_KEY or GEMINI_API_KEY in NEXT_PUBLIC_*. The agent endpoint exists so client code never sees them — keep it that way.
What's next
- Vibe Studio — AppEngine's site-builder agent that uses these endpoints behind the scenes.
- Embed the chat widget — drop in a pre-built AI chat UI instead of rolling your own.
- Activity tracking — log AI interactions to spot abuse patterns.