Documentation

Architecture

The agent loop, context, compaction, tool registry, multi-agent — how a turn actually runs.

agent-zero's design is small once you see the moving parts. A turn is: build context → stream from the provider → execute tool calls → loop. Everything else — memory, skills, sub-agents, compaction — feeds into one of those steps. This page traces the path through a-mini's source because it's the easiest to read; the same shape lives in a-full.

The loop

a-mini/agent.py defines run(user_message, state, config, system_prompt, depth=0, cancel_check=None). It's a generator that yields events the UI can render — text chunks, thinking chunks, tool starts, tool ends, turn-done markers, permission requests.

def run(user_message, state, config, system_prompt, depth=0, cancel_check=None):
    state.messages.append({"role": "user", "content": user_message})
    config = {**config, "_depth": depth, "_system_prompt": system_prompt}

    while True:
        if cancel_check and cancel_check():
            return
        state.turn_count += 1
        assistant_turn = None

        maybe_compact(state, config)

        for event in stream(model=config["model"],
                            system=system_prompt,
                            messages=state.messages,
                            tool_schemas=get_tool_schemas(),
                            config=config):
            if isinstance(event, (TextChunk, ThinkingChunk)):
                yield event
            elif isinstance(event, AssistantTurn):
                assistant_turn = event

        if assistant_turn is None:
            break

        state.messages.append({
            "role": "assistant",
            "content": assistant_turn.text,
            "tool_calls": assistant_turn.tool_calls,
        })
        yield TurnDone(assistant_turn.in_tokens, assistant_turn.out_tokens)

        if not assistant_turn.tool_calls:
            break

        # ── Execute tools ──
        for tc in assistant_turn.tool_calls:
            yield ToolStart(tc["name"], tc["input"])
            result = execute_tool(tc["name"], tc["input"], config)
            yield ToolEnd(tc["name"], result)
            state.messages.append({
                "role": "tool",
                "tool_call_id": tc["id"],
                "content": result,
            })
        # then loop again so the model can read tool results

The shape: stream → if tool calls → execute → append results → stream again. Stop when the model produces a turn with no tool calls.

State

AgentState is a small dataclass:

@dataclass
class AgentState:
    messages: list = field(default_factory=list)
    total_input_tokens:  int = 0
    total_output_tokens: int = 0
    turn_count: int = 0

Messages use a neutral provider-independent format: {role, content, tool_calls?, tool_call_id?}. Provider adapters in providers.py map this to and from each LLM's specific API shape.

Context

context.py builds the system prompt at startup and on demand. It assembles:

  • Base system prompt (the agent's own instructions).
  • The user's CLAUDE.md (if present, walked up from cwd or ~/.claude/).
  • Persistent memory (the MEMORY.md index from the memory package).
  • Git status and current working directory.
  • Optional skill listings.

The result is a single string passed as system to the provider on every turn. It changes between turns only when memory or files change — there's no per-turn rebuilding of expensive context.

Provider mux

providers.py exports stream(model, system, messages, tool_schemas, config). Auto-detects the provider from the model name prefix:

PrefixProvider
claude-Anthropic
gpt-, o1, o3OpenAI
gemini-Gemini
moonshot-, kimi-Moonshot (Kimi)
qwen, qwq-Alibaba (Qwen)
glm-Zhipu
deepseek-DeepSeek
llama, mistral, phi, gemma, mixtral, codellamaOllama
explicit <provider>/<model>force provider

OpenAI-compatible servers (Ollama, LM Studio, vLLM, custom) work via the same interface — provide a base URL and key.

a-full has its own provider layer; the same mux concept, different code.

Tool registry

tool_registry.py is a global dict mapping name → ToolDef. A ToolDef is:

@dataclass
class ToolDef:
    name: str
    schema: Dict[str, Any]              # JSON schema sent to the model
    func: Callable[[dict, dict], str]   # (params, config) -> result string
    read_only: bool = False
    concurrent_safe: bool = False

get_tool_schemas() returns the schema list for the API call. execute_tool(name, params, config) dispatches with output truncation (default 32k chars) so a runaway tool doesn't blow the context window.

tools.py registers the 18 built-in tools on import. Custom tools register from any module imported by the entry point. See Tools.

Compaction

compaction.py runs at the start of each turn (maybe_compact(state, config)). Two layers:

  • Snip — old tool outputs (file reads, bash results) get truncated to a header + tail after a few turns. Cheap, no API cost.
  • Auto-compact — when total_input_tokens crosses ~70% of the model's context limit, the model itself summarises older messages into a recap. The recap replaces the original messages in state.

Compaction is transparent — the calling code doesn't know it happened. The result: long sessions don't fall over when the conversation grows.

Skills

skill/loader.py parses ~/.nano_claude/skills/*.md and ./.nano_claude/skills/*.md into SkillDef objects. skill/executor.py runs them either inline (in the current conversation) or forked (as a sub-agent with fresh history). Built-in /commit and /review ship in skill/builtin.py. See Skills.

Multi-agent

multi_agent/subagent.py defines AgentDefinition (name, description, system prompt, model, allowed tools) and SubAgentManager. The Agent tool lets the main agent spawn a sub-agent for a specific task. Sub-agents have their own conversation history, share the file system, optionally run in a git worktree, and are limited to 3 levels of nesting. Built-in types: general-purpose, coder, reviewer, researcher, tester. See Multi-agent.

Memory

memory/store.py saves markdown files under ~/.nano_claude/memory/ (user scope) or ./.nano_claude/memory/ (project scope). memory/scan.py produces the MEMORY.md index that context.py injects. Memory tools (MemorySave, MemoryDelete, MemorySearch, MemoryList) live in memory/tools.py. See Memory.

Permission system

config.py carries a permission_mode: auto, accept-all, or manual. Tools marked read_only=True always run; others prompt the user (or yield PermissionRequest to the caller). auto allows read-only operations and prompts before mutations; accept-all skips all prompts; manual prompts before everything.

The CLI (nano_claude.py) renders prompts; embedders (Vibe Studio, custom hosts) handle the PermissionRequest event themselves.

Cancellation

The cancel_check callable is checked at the start of each iteration. The Vibe Studio chat panel passes a check that returns true when the user clicks Stop. The loop returns mid-turn cleanly — no half-applied tool results, but the partial assistant message is preserved.

Sessions

nano_claude.py (the CLI) saves sessions to ~/.nano_claude/sessions/ as JSON. state.messages, token counts, and config snapshot. Resume via /load <name>. a-full has its own session storage in session-manager (conversationService).

How it ties together

A typical Vibe Studio turn:

  1. Builder types in the chat panel.
  2. session-manager forwards via Socket.IO bridge to the running a-full process.
  3. a-full builds context (CLAUDE.md, memory, planner cards, file tree).
  4. Streams from Anthropic with the conversation history and tool schemas.
  5. Tool calls execute (reads from disk, writes to disk, AppMint API calls, planner updates).
  6. Each event flows back over the bridge to Vibe Studio.
  7. Studio re-renders the editor, the preview, and the chat panel.

The same flow works in a-mini for a CLI session — fewer renderers, same loop.

Reading on

  • Tools — register a custom tool.
  • Skills — markdown-driven skills.
  • Memory — long-running context.
  • Multi-agent — sub-agent composition.