Introduction
When building efficient, reliable, and scalable AI agents, success depends heavily on how we design and manage model context. Context is not only the interface between an agent and the outside world; it is also the foundation of memory, reasoning, and decision-making.
Recently, the Manus team shared practical lessons from building their own agent in the post Context Engineering for AI Agents: Lessons from Building Manus. This article extracts and organizes those ideas into an actionable guide for teams building agent systems.
Six Core Context Engineering Practices
After extensive experimentation and four framework rewrites, the Manus team summarized six key principles. These are not universal truths, but they are battle-tested local optima in real-world systems:
- Design Around the KV Cache
- Mask, Don’t Remove
- Use the File System as Context
- Manipulate Attention Through Recitation
- Keep the Wrong Stuff In
- Don’t Get Few-Shotted
Principle Breakdown
1. Design Around the KV Cache
- Problem: Agent runs are iterative. Each turn appends new information, so input context quickly becomes much larger than output actions. Cost and latency become the bottleneck.
- Solution: Maximize KV-cache reuse. Cache hits can dramatically reduce latency and cost (for example, cached vs non-cached token cost can differ by an order of magnitude).
- Implementation points:
- Keep prefixes stable: Avoid putting highly dynamic content (for example, second-level timestamps) at the beginning of the prompt. Even a single token change can invalidate downstream cache.
- Append-only context: Prefer appending new observations over rewriting historical actions/observations.
- Deterministic serialization: Keep key ordering stable when serializing objects (for example JSON), or you silently break cacheability.
2. Mask, Don’t Remove
- Problem: As agents grow more capable, the tool/action space becomes huge. Dynamically adding/removing tool definitions seems natural, but it damages KV-cache reuse because tool definitions usually sit near the front.
- Solution: Use masking instead of removal. Keep full tool definitions in context, but constrain decoding by manipulating logits so only a valid subset can be chosen.
- Implementation points:
- Use
response prefillor equivalent capabilities from your inference stack to constrain output ranges. - Adopt tool name prefixes (for example
browser_*,shell_*) so you can mask by tool family.
- Use
3. Use the File System as Context
- Problem: Even with long context windows (128K+), unstructured inputs such as webpages and PDFs still overflow limits. Long contexts are expensive and can degrade reasoning quality (“needle in a haystack”).
- Solution: Treat the file system as effectively unbounded, persistent external memory that agents can read/write directly. Externalize large information and load it on demand.
- Implementation points:
- Build recoverable compression strategies. Keep URLs or file paths in prompt context rather than full payloads.
- When needed, rehydrate full content from those pointers. This is effectively lossless compression for context windows.
4. Manipulate Attention Through Recitation
- Problem: In long tasks with dozens of steps, agents drift and forget original goals (“lost in the middle”).
- Solution: Use recitation to steer attention. Let the agent maintain a task checklist in an external file (for example
todo.md) and update it after each step. - Implementation points:
- Rewriting the checklist continuously pushes global goals and current progress to the tail of context.
- Since attention is biased toward recent tokens, this acts as a natural reminder loop that reduces goal drift.
5. Keep the Wrong Stuff In
- Problem: Agents will make mistakes. Developers often hide errors and retry, but that removes the model’s chance to learn from failure.
- Solution: Preserve failed actions, error messages, and stack traces in context.
- Implementation points:
- Seeing failure records helps the model update its internal beliefs and lowers repeated error probability.
- Error recovery is a core intelligence signal. Exposing failures to the agent is one of the most effective ways to improve robustness.
6. Don’t Get Few-Shotted
- Problem: LLMs are excellent imitators. If context contains many repetitive action-observation pairs, the model can fall into rigid pattern copying even when the situation changes.
- Solution: Introduce structured diversity in context representations.
- Implementation points:
- Add small, controlled variations in serialized action/observation format (templates, wording, structure).
- This controlled randomness breaks monotony, rebalances attention, and reduces brittle imitation loops.
Conclusion
Context engineering is an emerging and experimental discipline, but it is essential for building truly useful AI agents. Raw model capability matters, but the way we shape memory, environment, and feedback loops determines how fast an agent runs, how well it recovers, and how far it scales.
The Manus experience shows that carefully designed context is not optional infrastructure; it is the path to stronger and more reliable agents.