Documentation

Add AI features

Stream AI completions in a Next.js app via AppEngine's metered agent endpoint, with vendor-LLM fallback.

There are two reasonable ways to add AI to a Next.js app on AppEngine. The recommended path is the metered agent endpoint — /ai/agent/stream — which proxies Anthropic, OpenAI, DeepSeek, and Gemini behind a single bearer-token interface and meters tokens per org. The escape hatch is calling the vendor SDK directly. This tutorial covers both.

Why the agent endpoint

  • One bearer token, swap models with one config flag
  • Token usage counted against the org's quota — no surprise bills
  • Caching on Anthropic + DeepSeek without you wiring it
  • The same endpoint works from server, browser, and mobile
  • Conversation history persists in the chat module if you want it

You give up direct vendor-feature access (e.g., OpenAI's structured outputs aren't currently exposed). For 90% of cases the agent endpoint is what you want.

The endpoint shape

POST/ai/agent/streamAPIKEY

Two-stage flow:

  1. POST a request body. AppEngine returns { streamId }.
  2. GET /ai/agent/stream/:streamId over Server-Sent Events. AppEngine pipes vendor chunks down.

This split exists so you can reconnect to a stream after a network blip — the streamId is good for the duration of the response.

Prerequisites

  • AppEngine instance with at least one of ANTHROPIC_API_KEY, OPENAI_API_KEY, DEEPSEEK_API_KEY, or GEMINI_API_KEY configured
  • Bearer token issued by AppEngine (admin → API keys → new key, scope ai:agent:stream)
  • Next.js app with the AppEngine client wired

Step-by-step

  1. 1

    Create a server action to start a stream

    The browser shouldn't hold the bearer token. Wrap the POST in a server action.

    // src/app/actions/ai-stream.ts
    'use server';
    import { getAppEngineClient } from '@/lib/appmint-client';
    
    export async function startAiStream(messages: Message[], model: string) {
      const client = getAppEngineClient();
      const { streamId } = await client.processRequest('post', 'ai/agent/stream', {
        messages,
        model,
        max_tokens: 2048,
        temperature: 0.7,
      });
      return streamId;
    }
    

    getAppEngineClient() reads the bearer from server-side env. Don't expose it in NEXT_PUBLIC_*.

  2. 2

    Build a streaming hook

    Server-Sent Events are well-supported in modern browsers via EventSource. The agent endpoint streams OpenAI-shaped deltas:

    data: {"choices":[{"delta":{"content":"Hello"}}]}
    data: {"choices":[{"delta":{"content":" world"}}]}
    data: {"usage":{"prompt_tokens":12,"completion_tokens":2}}
    data: [DONE]
    

    The hook:

    // src/hooks/use-ai-stream.ts
    'use client';
    import { useState, useCallback } from 'react';
    
    export function useAiStream() {
      const [content, setContent] = useState('');
      const [loading, setLoading] = useState(false);
      const [usage, setUsage] = useState<Usage | null>(null);
    
      const send = useCallback(async (messages: Message[], model: string) => {
        setLoading(true);
        setContent('');
        setUsage(null);
    
        // /api/ai/start is a server route that calls startAiStream() above
        const res = await fetch('/api/ai/start', {
          method: 'POST',
          body: JSON.stringify({ messages, model }),
        });
        const { streamId } = await res.json();
    
        const source = new EventSource(`/api/ai/stream/${streamId}`);
        source.onmessage = (e) => {
          if (e.data === '[DONE]') {
            source.close();
            setLoading(false);
            return;
          }
          try {
            const chunk = JSON.parse(e.data);
            if (chunk.choices?.[0]?.delta?.content) {
              setContent((c) => c + chunk.choices[0].delta.content);
            }
            if (chunk.usage) setUsage(chunk.usage);
          } catch {
            /* ignore non-JSON keepalives */
          }
        };
        source.onerror = () => {
          source.close();
          setLoading(false);
        };
      }, []);
    
      return { content, loading, usage, send };
    }
    

    The /api/ai/stream/[id] Next.js route proxies the AppEngine SSE so the browser doesn't hit AppEngine directly with a bearer token.

  3. 3

    Build the proxy route

    // src/app/api/ai/stream/[id]/route.ts
    import { NextRequest } from 'next/server';
    
    export async function GET(req: NextRequest, { params }: { params: Promise<{ id: string }> }) {
      const { id } = await params;
      const upstream = await fetch(
        `${process.env.APPENGINE_URL}/ai/agent/stream/${id}`,
        {
          headers: {
            Authorization: `Bearer ${process.env.APPENGINE_BEARER!}`,
            Accept: 'text/event-stream',
          },
        }
      );
    
      return new Response(upstream.body, {
        headers: {
          'Content-Type': 'text/event-stream',
          'Cache-Control': 'no-cache, no-transform',
          Connection: 'keep-alive',
        },
      });
    }
    

    Pipes the upstream body verbatim. No buffering — chunks reach the browser as fast as the model produces them.

  4. 4

    Render the stream

    The hook accumulates content. Render with whitespace-pre-wrap (so newlines look right) or pipe through a markdown renderer for richer output.

    // src/components/AiChat.tsx
    'use client';
    import { useState } from 'react';
    import { useAiStream } from '@/hooks/use-ai-stream';
    
    export function AiChat() {
      const [history, setHistory] = useState<Message[]>([]);
      const [input, setInput] = useState('');
      const { content, loading, usage, send } = useAiStream();
    
      function submit() {
        const next = [...history, { role: 'user' as const, content: input }];
        setHistory(next);
        setInput('');
        send(next, 'claude-3-5-sonnet-20241022');
      }
    
      return (
        <div className="flex flex-col gap-4">
          {history.map((m, i) => (
            <Message key={i} role={m.role} content={m.content} />
          ))}
          {loading && <Message role="assistant" content={content} streaming />}
          <textarea
            value={input}
            onChange={(e) => setInput(e.target.value)}
            className="w-full rounded border p-2"
          />
          <button onClick={submit} disabled={loading} className="rounded bg-blue-600 px-4 py-2 text-white">
            Send
          </button>
          {usage && (
            <p className="text-xs text-gray-500">
              {usage.prompt_tokens} in / {usage.completion_tokens} out tokens
            </p>
          )}
        </div>
      );
    }
    

    When the stream finishes, append the assistant message to history and clear the streaming buffer:

    useEffect(() => {
      if (!loading && content) {
        setHistory((h) => [...h, { role: 'assistant', content }]);
      }
    }, [loading]);
    
  5. 5

    Pick a model

    The agent endpoint supports models from any configured provider. Common picks:

    ModelBest for
    claude-3-5-sonnet-20241022General reasoning, coding, long context
    claude-3-5-haiku-20241022Fast, cheap, summarization, classification
    gpt-4-turbo-previewFunction calling, structured tasks
    gpt-3.5-turboCheap, fast, simple tasks
    deepseek-reasonerMath, logic, transparent chain-of-thought
    gemini-2.0-flashMultimodal, fast, vision

    GET /ai/agent/models returns the list with context_window, supports_images, supports_reasoning flags. Render a model picker off this rather than hardcoding.

  6. 6

    Quotas and metering

    Each org has an AI usage budget (set in Settings → Plan). The usage module charges 5 units per /ai/agent/stream call plus a per-token rate. When the budget runs out, AppEngine returns 402 from the start endpoint.

    To check remaining budget:

    const usage = await client.processRequest('get', 'usage/current');
    // { aiTokensUsed: 1234567, aiTokensLimit: 5000000, billingPeriod: '2026-04' }
    

    For an admin view, expose this on a settings page. For end-customers, hide it — but consider rate-limiting per-customer in your own code so one user can't burn the whole org budget.

  7. 7

    Vision and multimodal

    Models that support images take messages with image content blocks:

    const messages = [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'What is in this image?' },
          { type: 'image_url', image_url: { url: 'https://...' } },
        ],
      },
    ];
    

    The endpoint returns 400 if the model doesn't support images. Check supports_images from /ai/agent/models before rendering an image upload affordance.

Vendor LLM fallback

Sometimes you need a feature only the vendor exposes — OpenAI's structured outputs, Anthropic's prompt caching headers, Gemini's grounding. Drop down to the vendor SDK in a server route. Yugo does this for Gemini in src/app/api/ai/route.ts.

// src/app/api/gemini/route.ts
import { GoogleGenerativeAI } from '@google/generative-ai';

export async function POST(req: Request) {
  const { prompt } = await req.json();
  const genai = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
  const model = genai.getGenerativeModel({ model: 'gemini-2.0-flash' });
  const result = await model.generateContentStream(prompt);

  const stream = new ReadableStream({
    async start(controller) {
      for await (const chunk of result.stream) {
        controller.enqueue(new TextEncoder().encode(chunk.text()));
      }
      controller.close();
    },
  });

  return new Response(stream, { headers: { 'Content-Type': 'text/plain; charset=utf-8' } });
}

You lose the metering. Add your own counter (write to data/ai-usage) and enforce a per-org budget in the route. Or use the agent endpoint for normal traffic and reserve the direct path for the one feature that needs it.

API keys, server side only

Never put OPENAI_API_KEY or GEMINI_API_KEY in NEXT_PUBLIC_*. The agent endpoint exists so client code never sees them — keep it that way.

What's next

  • Vibe Studio — AppEngine's site-builder agent that uses these endpoints behind the scenes.
  • Embed the chat widget — drop in a pre-built AI chat UI instead of rolling your own.
  • Activity tracking — log AI interactions to spot abuse patterns.