agent-zero's design is small once you see the moving parts. A turn is: build context → stream from the provider → execute tool calls → loop. Everything else — memory, skills, sub-agents, compaction — feeds into one of those steps. This page traces the path through a-mini's source because it's the easiest to read; the same shape lives in a-full.
The loop
a-mini/agent.py defines run(user_message, state, config, system_prompt, depth=0, cancel_check=None). It's a generator that yields events the UI can render — text chunks, thinking chunks, tool starts, tool ends, turn-done markers, permission requests.
def run(user_message, state, config, system_prompt, depth=0, cancel_check=None):
state.messages.append({"role": "user", "content": user_message})
config = {**config, "_depth": depth, "_system_prompt": system_prompt}
while True:
if cancel_check and cancel_check():
return
state.turn_count += 1
assistant_turn = None
maybe_compact(state, config)
for event in stream(model=config["model"],
system=system_prompt,
messages=state.messages,
tool_schemas=get_tool_schemas(),
config=config):
if isinstance(event, (TextChunk, ThinkingChunk)):
yield event
elif isinstance(event, AssistantTurn):
assistant_turn = event
if assistant_turn is None:
break
state.messages.append({
"role": "assistant",
"content": assistant_turn.text,
"tool_calls": assistant_turn.tool_calls,
})
yield TurnDone(assistant_turn.in_tokens, assistant_turn.out_tokens)
if not assistant_turn.tool_calls:
break
# ── Execute tools ──
for tc in assistant_turn.tool_calls:
yield ToolStart(tc["name"], tc["input"])
result = execute_tool(tc["name"], tc["input"], config)
yield ToolEnd(tc["name"], result)
state.messages.append({
"role": "tool",
"tool_call_id": tc["id"],
"content": result,
})
# then loop again so the model can read tool results
The shape: stream → if tool calls → execute → append results → stream again. Stop when the model produces a turn with no tool calls.
State
AgentState is a small dataclass:
@dataclass
class AgentState:
messages: list = field(default_factory=list)
total_input_tokens: int = 0
total_output_tokens: int = 0
turn_count: int = 0
Messages use a neutral provider-independent format: {role, content, tool_calls?, tool_call_id?}. Provider adapters in providers.py map this to and from each LLM's specific API shape.
Context
context.py builds the system prompt at startup and on demand. It assembles:
- Base system prompt (the agent's own instructions).
- The user's
CLAUDE.md(if present, walked up from cwd or~/.claude/). - Persistent memory (the
MEMORY.mdindex from the memory package). - Git status and current working directory.
- Optional skill listings.
The result is a single string passed as system to the provider on every turn. It changes between turns only when memory or files change — there's no per-turn rebuilding of expensive context.
Provider mux
providers.py exports stream(model, system, messages, tool_schemas, config). Auto-detects the provider from the model name prefix:
| Prefix | Provider |
|---|---|
claude- | Anthropic |
gpt-, o1, o3 | OpenAI |
gemini- | Gemini |
moonshot-, kimi- | Moonshot (Kimi) |
qwen, qwq- | Alibaba (Qwen) |
glm- | Zhipu |
deepseek- | DeepSeek |
llama, mistral, phi, gemma, mixtral, codellama | Ollama |
explicit <provider>/<model> | force provider |
OpenAI-compatible servers (Ollama, LM Studio, vLLM, custom) work via the same interface — provide a base URL and key.
a-full has its own provider layer; the same mux concept, different code.
Tool registry
tool_registry.py is a global dict mapping name → ToolDef. A ToolDef is:
@dataclass
class ToolDef:
name: str
schema: Dict[str, Any] # JSON schema sent to the model
func: Callable[[dict, dict], str] # (params, config) -> result string
read_only: bool = False
concurrent_safe: bool = False
get_tool_schemas() returns the schema list for the API call. execute_tool(name, params, config) dispatches with output truncation (default 32k chars) so a runaway tool doesn't blow the context window.
tools.py registers the 18 built-in tools on import. Custom tools register from any module imported by the entry point. See Tools.
Compaction
compaction.py runs at the start of each turn (maybe_compact(state, config)). Two layers:
- Snip — old tool outputs (file reads, bash results) get truncated to a header + tail after a few turns. Cheap, no API cost.
- Auto-compact — when
total_input_tokenscrosses ~70% of the model's context limit, the model itself summarises older messages into a recap. The recap replaces the original messages instate.
Compaction is transparent — the calling code doesn't know it happened. The result: long sessions don't fall over when the conversation grows.
Skills
skill/loader.py parses ~/.nano_claude/skills/*.md and ./.nano_claude/skills/*.md into SkillDef objects. skill/executor.py runs them either inline (in the current conversation) or forked (as a sub-agent with fresh history). Built-in /commit and /review ship in skill/builtin.py. See Skills.
Multi-agent
multi_agent/subagent.py defines AgentDefinition (name, description, system prompt, model, allowed tools) and SubAgentManager. The Agent tool lets the main agent spawn a sub-agent for a specific task. Sub-agents have their own conversation history, share the file system, optionally run in a git worktree, and are limited to 3 levels of nesting. Built-in types: general-purpose, coder, reviewer, researcher, tester. See Multi-agent.
Memory
memory/store.py saves markdown files under ~/.nano_claude/memory/ (user scope) or ./.nano_claude/memory/ (project scope). memory/scan.py produces the MEMORY.md index that context.py injects. Memory tools (MemorySave, MemoryDelete, MemorySearch, MemoryList) live in memory/tools.py. See Memory.
Permission system
config.py carries a permission_mode: auto, accept-all, or manual. Tools marked read_only=True always run; others prompt the user (or yield PermissionRequest to the caller). auto allows read-only operations and prompts before mutations; accept-all skips all prompts; manual prompts before everything.
The CLI (nano_claude.py) renders prompts; embedders (Vibe Studio, custom hosts) handle the PermissionRequest event themselves.
Cancellation
The cancel_check callable is checked at the start of each iteration. The Vibe Studio chat panel passes a check that returns true when the user clicks Stop. The loop returns mid-turn cleanly — no half-applied tool results, but the partial assistant message is preserved.
Sessions
nano_claude.py (the CLI) saves sessions to ~/.nano_claude/sessions/ as JSON. state.messages, token counts, and config snapshot. Resume via /load <name>. a-full has its own session storage in session-manager (conversationService).
How it ties together
A typical Vibe Studio turn:
- Builder types in the chat panel.
- session-manager forwards via Socket.IO bridge to the running a-full process.
- a-full builds context (CLAUDE.md, memory, planner cards, file tree).
- Streams from Anthropic with the conversation history and tool schemas.
- Tool calls execute (reads from disk, writes to disk, AppMint API calls, planner updates).
- Each event flows back over the bridge to Vibe Studio.
- Studio re-renders the editor, the preview, and the chat panel.
The same flow works in a-mini for a CLI session — fewer renderers, same loop.
Reading on
- Tools — register a custom tool.
- Skills — markdown-driven skills.
- Memory — long-running context.
- Multi-agent — sub-agent composition.