A More Useful Classification Framework
Common ways to classify agent architectures are “personal vs commercial,” “lightweight vs heavyweight,” and “minimal vs complex.” These are intuitive, but they are weak in explanatory power for concrete design decisions.
A more discriminative dimension is: what kind of result guarantee the scenario requires.
- Exploration-tolerant scenarios: user is present, willing to interact, and can correct direction at any time. Uncertainty is acceptable, sometimes even valuable. Example: developer-driven coding exploration and iterative debugging.
- Deterministic-convergence scenarios: result must converge quickly and stably to predictable output. Uncertainty is cost. Example: CI/CD auto-fix, customer-facing product functions, bulk migration pipelines.
These two scenario types require completely different choices for action space, feedback mechanisms, and stopping conditions.
This article uses pi-mono as a source-based case study and compares it with engineering choices from OpenAI Agents SDK, Kimi-CLI, and Codex.
What pi-mono Actually Does — Source-grounded Analysis
Many online articles wrap pi-mono in metaphor-heavy “philosophy.” Here we skip metaphors and read the code.
Core loop is standard ReAct
The agent loop is in packages/agent/src/agent-loop.ts, function runLoop:
async function runLoop(currentContext, newMessages, config, signal, stream, streamFn) {
let pendingMessages = (await config.getSteeringMessages?.()) || [];
// Outer loop: process follow-up messages
while (true) {
let hasMoreToolCalls = true;
let steeringAfterTools = null;
// Inner loop: process tool calls and steering
while (hasMoreToolCalls || pendingMessages.length > 0) {
// 1. Inject pending messages into context
if (pendingMessages.length > 0) {
for (const message of pendingMessages) {
currentContext.messages.push(message);
newMessages.push(message);
}
pendingMessages = [];
}
// 2. Call LLM
const message = await streamAssistantResponse(currentContext, config, signal, stream, streamFn);
// 3. Check toolCalls
const toolCalls = message.content.filter((c) => c.type === "toolCall");
hasMoreToolCalls = toolCalls.length > 0;
// 4. Execute tools + steering check
if (hasMoreToolCalls) {
const toolExecution = await executeToolCalls(/* ... */);
steeringAfterTools = toolExecution.steeringMessages ?? null;
// write results back to context
for (const result of toolExecution.toolResults) {
currentContext.messages.push(result);
}
}
// 5. Fetch new steering messages
pendingMessages = steeringAfterTools || (await config.getSteeringMessages?.()) || [];
}
// Agent is about to stop -> check follow-up
const followUpMessages = (await config.getFollowUpMessages?.()) || [];
if (followUpMessages.length > 0) {
pendingMessages = followUpMessages;
continue; // continue outer loop
}
break; // exit
}
}
This is ReAct (Reasoning + Acting): generate reasoning/tool calls, execute tools, write tool results back, reason again from updated context, as introduced in Yao et al., 2022.
streamAssistantResponse is the boundary adapter between internal AgentMessage[] and model-facing Message[]:
async function streamAssistantResponse(context, config, signal, stream, streamFn) {
// 1. AgentMessage[] -> AgentMessage[] context transform (optional)
let messages = context.messages;
if (config.transformContext) {
messages = await config.transformContext(messages, signal);
}
// 2. AgentMessage[] -> Message[] conversion for LLM
const llmMessages = await config.convertToLlm(messages);
// 3. Build LLM context and request
const llmContext = {
systemPrompt: context.systemPrompt,
messages: llmMessages,
tools: context.tools,
};
const response = await streamFunction(config.model, llmContext, { ...config, apiKey, signal });
// ...streaming handling
}
pi-mono keeps a richer internal
AgentMessagetype system (bashExecution,branchSummary,compactionSummary, etc.), and converts to standard model messages only at call boundary. Internally rich, externally standard. Good separation.
Toolset: not 4 tools, but 7
Some posts claim pi-mono has only 4 primitives (read/edit/write/bash). Real code in packages/coding-agent/src/core/tools/index.ts:
// Default tools for full access mode (using process.cwd())
export const codingTools: Tool[] = [readTool, bashTool, editTool, writeTool];
// Read-only tools for exploration without modification (using process.cwd())
export const readOnlyTools: Tool[] = [readTool, grepTool, findTool, lsTool];
// All available tools (using process.cwd())
export const allTools = {
read: readTool, bash: bashTool, edit: editTool, write: writeTool,
grep: grepTool, find: findTool, ls: lsTool,
};
There are 7 tools in total. codingTools exposes 4 by default; readOnlyTools includes grep/find/ls.
System prompt generation (system-prompt.ts) also adds dynamic guidelines based on loaded tools. For example, when grep/find/ls and bash coexist:
if (hasBash && (hasGrep || hasFind || hasLs)) {
guidelinesList.push(
"Prefer grep/find/ls tools over bash for file exploration (faster, respects .gitignore)"
);
}
So “minimal four primitives” is selective presentation, not full implementation reality.
bash tool: no execution policy
bash.ts essentially performs direct spawn:
const child = spawn(shell, [...args, command], {
cwd,
detached: true,
env: env ?? getShellEnv(),
stdio: ["ignore", "pipe", "pipe"],
});
Whatever command model generates is executed. No allow/deny list, no pre-execution review, no sandbox by default. The only control is optional timeout (seconds), after which killProcessTree is invoked. Output larger than DEFAULT_MAX_BYTES (~30KB) is tail-truncated and full output is written to temp file.
This can be reasonable in user-present terminal workflows: user watches execution and can interrupt. In unattended scenarios, it means full system-access execution with no automated safety constraints.
Steering: about 10 lines of interruption mechanism
Steering is often narrated as deep philosophy. In code (executeToolCalls), it is a practical checkpoint after each tool execution:
async function executeToolCalls(tools, assistantMessage, signal, stream, getSteeringMessages) {
const toolCalls = assistantMessage.content.filter((c) => c.type === "toolCall");
const results = [];
let steeringMessages;
for (let index = 0; index < toolCalls.length; index++) {
const toolCall = toolCalls[index];
const tool = tools?.find((t) => t.name === toolCall.name);
// validate args -> execute tool
const validatedArgs = validateToolArguments(tool, toolCall);
result = await tool.execute(toolCall.id, validatedArgs, signal, onUpdate);
results.push(/* toolResultMessage */);
// key: check steering after each tool execution
if (getSteeringMessages) {
const steering = await getSteeringMessages();
if (steering.length > 0) {
steeringMessages = steering;
// skip all remaining tool calls
const remainingCalls = toolCalls.slice(index + 1);
for (const skipped of remainingCalls) {
results.push(skipToolCall(skipped, stream));
}
break;
}
}
}
return { toolResults: results, steeringMessages };
}
Skipped calls receive isError: true with message "Skipped due to queued user message.".
This is practical engineering, not mysticism.
Compaction: Goal-first summarization, implementation details
Compaction is triggered near context limit. Core logic is in packages/coding-agent/src/core/compaction/compaction.ts.
Token estimation uses heuristic chars / 4, accumulated block by block:
export function estimateTokens(message: AgentMessage): number {
let chars = 0;
switch (message.role) {
case "assistant": {
for (const block of assistant.content) {
if (block.type === "text") chars += block.text.length;
else if (block.type === "thinking") chars += block.thinking.length;
else if (block.type === "toolCall")
chars += block.name.length + JSON.stringify(block.arguments).length;
}
return Math.ceil(chars / 4);
}
// ... similar handling for other roles
// image estimated as 4800 chars (≈1200 tokens)
}
}
Default compaction settings:
export const DEFAULT_COMPACTION_SETTINGS: CompactionSettings = {
enabled: true,
reserveTokens: 16384, // reserve token space for future turns
keepRecentTokens: 20000, // target recent token amount after compaction
};
shouldCompact condition is contextTokens > contextWindow - reserveTokens.
findCutPoint is not a naive tail trim. It walks backward from newest messages, accumulates token estimates, and after exceeding keepRecentTokens, finds nearest legal cut point. Legal cut roles are user, assistant, custom, bashExecution — never cut at toolResult, because result must remain adjacent to its toolCall.
When cut happens in middle of a turn, pi-mono generates two summaries in parallel: full-history summary before cut, and split-turn prefix summary (TURN_PREFIX_SUMMARIZATION_PROMPT), then merges them:
if (isSplitTurn && turnPrefixMessages.length > 0) {
const [historyResult, turnPrefixResult] = await Promise.all([
generateSummary(messagesToSummarize, model, ...),
generateTurnPrefixSummary(turnPrefixMessages, model, ...),
]);
summary = `${historyResult}\n\n---\n\n**Turn Context (split turn):**\n\n${turnPrefixResult}`;
}
The summary template is:
## Goal
## Constraints & Preferences
## Progress (Done / In Progress / Blocked)
## Key Decisions
## Next Steps
## Critical Context
Follow-up compaction uses UPDATE_SUMMARIZATION_PROMPT to incrementally update instead of full rewrite. It asks to preserve existing information and update progress transitions. Generation uses reasoning: "high" and maxTokens = reserveTokens * 0.8.
Compaction also tracks file operations. extractFileOpsFromMessage traverses assistant tool calls and extracts paths from read/write/edit, appending this to summary.
But what is not done is also important: no summary-quality verification pass, no rollback on summary replacement, no user confirmation checkpoint.
chars / 4is a rough heuristic. Different languages and tokenizers can have very different chars/token ratios. The code comments also acknowledge this estimate is conservative (overestimates tokens). pi-mono does include one pragmatic optimization: when real usage tokens are returned by the model, those real token counts are preferred, and the heuristic is only applied to messages added after usage checkpoints.
Session tree: append-only branching on JSONL
Sessions are stored as JSONL. Each entry is a SessionEntry with id (8-char hex short UUID) and parentId, forming a tree:
export interface SessionEntryBase {
type: string;
id: string; // 8-char hex from randomUUID().slice(0, 8)
parentId: string | null; // null means root
timestamp: string;
}
SessionManager maintains leafId pointer to current path tail. All append operations are append-only: new entry uses current leafId as parentId, then updates leafId.
Branching is simple pointer rewind:
branch(branchFromId: string): void {
if (!this.byId.has(branchFromId)) {
throw new Error(`Entry ${branchFromId} not found`);
}
this.leafId = branchFromId;
}
New entries after this naturally form a new branch. Old branch data stays in file.
branchWithSummary() adds branch_summary entry for abandoned path context. forkFrom() creates a new session file (new header and session ID) with parentSession linking to source session.
Context reconstruction (buildSessionContext) walks from leafId to root via parentId chain:
// Walk from leaf to root, collecting path
const path: SessionEntry[] = [];
let current: SessionEntry | undefined = leaf;
while (current) {
path.unshift(current);
current = current.parentId ? byId.get(current.parentId) : undefined;
}
This is operationally elegant: append-only file + pointer rewinding simulates branching without DB dependency. Trade-off: file grows as branches accumulate, but usually acceptable for coding-agent lifecycles.
Design Tradeoffs
Putting these modules together reveals a shared assumption: user is present.
bash spawns directly because user can interrupt. Steering depends on user correction message. Termination is “LLM stops -> loop stops.” Compaction has no quality verification because user can re-steer. Session branching is manual.
When this assumption holds, many choices are reasonable, sometimes optimal. No approval popups breaking flow, no rigid policies slowing execution, no strict output schemas reducing flexibility.
When the assumption fails (unattended mode), these advantages become blank spots: steering may never trigger, bash has full authority without supervision, LLM may stop prematurely, compaction loss may go unnoticed.
This is not a bug by itself. pi-mono was not designed for all unattended deterministic scenarios.
Action space: open vs constrained
pi-mono action space is fixed at startup (codingTools or readOnlyTools), not dynamically constrained during runtime. bash has no pre-execution policy gate.
Codex makes opposite choices for shell execution: multi-level pre-execution evaluation:
pub(crate) async fn create_exec_approval_requirement_for_command(&self, req: ExecApprovalRequest<'_>)
-> ExecApprovalRequirement
{
let exec_policy = self.current();
let commands = parse_shell_lc_plain_commands(command)
.unwrap_or_else(|| vec![command.to_vec()]);
let evaluation = exec_policy.check_multiple(commands.iter(), &exec_policy_fallback);
match evaluation.decision {
Decision::Forbidden => ExecApprovalRequirement::Forbidden {
reason: derive_forbidden_reason(command, &evaluation),
},
Decision::Prompt => {
if matches!(approval_policy, AskForApproval::Never) {
ExecApprovalRequirement::Forbidden { reason: PROMPT_CONFLICT_REASON.to_string() }
} else {
ExecApprovalRequirement::NeedsApproval {
reason: derive_prompt_reason(command, &evaluation),
proposed_execpolicy_amendment: /* proposed rule */,
}
}
},
Decision::Allow => ExecApprovalRequirement::Skip {
bypass_sandbox: /* may bypass sandbox if explicitly allowed */,
},
}
}
Kimi-CLI constrains at role/tool-scope granularity with YAML declarations.
Main Agent config (agent.yaml)
agent:
tools:
- "kimi_cli.tools.multiagent:Task"
- "kimi_cli.tools.todo:SetTodoList"
- "kimi_cli.tools.shell:Shell"
- "kimi_cli.tools.file:ReadFile"
- "kimi_cli.tools.file:WriteFile"
- "kimi_cli.tools.file:StrReplaceFile"
- "kimi_cli.tools.web:SearchWeb"
subagents:
coder:
path: ./sub.yaml
description: "Good at general software engineering tasks."
Sub Agent config (sub.yaml)
agent:
extend: ./agent.yaml
exclude_tools:
- "kimi_cli.tools.multiagent:Task"
- "kimi_cli.tools.multiagent:CreateSubagent"
- "kimi_cli.tools.dmail:SendDMail"
- "kimi_cli.tools.todo:SetTodoList"
subagents:
OpenAI Agents SDK constrains at per-agent tool layer with runtime enable/disable patterns and handoffs.
@dataclass
class Agent(AgentBase, Generic[TContext]):
tools: list[Tool] = field(default_factory=list) # per-agent dedicated tool set
handoffs: list[Agent | Handoff] = field(default_factory=list) # delegatable sub-agents
output_type: type[Any] | None = None # structured output contract
Comparison of constraint granularity:
| Constraint Granularity | Implementation | Characteristics |
|---|---|---|
| Tool-level | OpenAI Agents SDK (per-agent tools, is_enabled) | Flexible, suitable for multi-agent orchestration |
| Role-level | Kimi-CLI (AgentSpec YAML, exclude_tools) | Declarative inheritance + pruning |
| Command-level | Codex (exec policy rules + sandbox) | Finest granularity, highest safety ceiling |
The
AskForApproval::Neverbranch is important: if policy is configured as “never ask,” commands requiring approval becomeForbiddendirectly, not silently allowed. This is the safest default for unattended execution.
Correction: who intercepts
pi-mono steering relies on user noticing issues and intervening. Another strategy is active system interception. Kimi-CLI Approval implements tiered behavior:
async def request(self, sender, action, description, display=None) -> bool:
# fast path 1: yolo mode -> always allow
if self._state.yolo:
return True
# fast path 2: action pre-marked as auto-approved for session
if action in self._state.auto_approve_actions:
return True
# slow path: create approval request, await user response
request = Request(id=str(uuid.uuid4()), tool_call_id=tool_call.id, ...)
approved_future = asyncio.Future[bool]()
self._request_queue.put_nowait(request)
self._requests[request.id] = (request, approved_future)
return await approved_future
def resolve_request(self, request_id, response):
match response:
case "approve": # approve this call
future.set_result(True)
case "approve_for_session": # auto-approve same action for current session
self._state.auto_approve_actions.add(request.action)
future.set_result(True)
case "reject": # reject
future.set_result(False)
Steering and approval solve different problems. Steering is direction correction (“you are going wrong, change direction”). Approval is operation admission (“is this operation allowed”).
approve_for_sessionis a useful balance: approve once, then auto-approve same action type for the current session to reduce repeated friction while keeping explicit control.yolomode bypasses all approvals, but it is an explicit user choice, not a default.Approvalalso supportsshare()so sub-agents can inherit approval state from the main agent.
When to stop, and how to validate correctness
pi-mono stop condition: no more tool calls generated by LLM. No hard step cap, no structured output validation, no timeout governance by default.
OpenAI Agents SDK provides deterministic stopping/validation via output_type, StopAtTools, OutputGuardrail.
@dataclass
class OutputGuardrail(Generic[TContext]):
guardrail_function: Callable[..., GuardrailFunctionOutput]
Kimi-CLI uses explicit deterministic checks in sub-agent execution:
async def _run_subagent(self, agent, prompt):
# 1. run subagent
try:
await run_soul(soul, prompt, _ui_loop_fn, asyncio.Event())
except MaxStepsReached as e:
# hard step cap
return ToolError(
message=f"Max steps {e.n_steps} reached when running subagent. "
"Please try splitting the task into smaller subtasks.",
)
# 2. validate output existence/role
if len(context.history) == 0 or context.history[-1].role != "assistant":
return ToolError(message="The subagent seemed not to run properly. "
"Maybe you have to do the task yourself.")
# 3. validate output length and request continuation if too short
final_response = context.history[-1].extract_text(sep="\n")
if len(final_response) < 200 and n_attempts_remaining > 0:
await run_soul(soul, CONTINUE_PROMPT, _ui_loop_fn, asyncio.Event())
final_response = context.history[-1].extract_text(sep="\n")
return ToolOk(output=final_response)
In failure handling, pi-mono writes tool errors as isError: true toolResult and lets LLM recover without structured retry logic. Session branching supports manual rollback, not automatic recovery. Kimi-CLI returns structured ToolError; OpenAI Agents SDK can route via handoff to different-capability agents.
Final Note
pi-mono is a clean ReAct coding-agent implementation: clean loop, practical compaction details (split-turn handling + file-op tracking), and JSONL append-only session branching without DB dependency. All of these are coherent under one assumption: user is present.
But it is not mystical philosophy. It is standard ReAct loop, 7 tools (4 exposed by default set), callback-based steering checkpoint, Goal-first compaction template, and JSONL branch structure. Every part is source-verifiable and aligns with existing industry patterns.
Where is your target scenario on the spectrum between exploration tolerance and deterministic convergence? Does your architecture actually match that point on the spectrum?
Neither scenario type is superior. The real risk is mixing their requirements and using success in one to claim feasibility in the other.