Status: 🚧 In progress
A design-philosophy, technical-implementation, and compatibility analysis of Gemini Interactions API vs OpenAI Responses API
1. Introduction: The Shift from Stateless Chat to Stateful Agent Architectures
In the evolution of generation intelligence, the period from 2023 to late 2024 was a clear watershed: application paradigms moved from Text Completion and Stateless Chat to more complex Agentic Interactions and Deep Reasoning.
This shift is not only about stronger model capability. It also forces a redesign of underlying software architecture and API philosophy.
For a long time, RESTful Chat Completions APIs (with OpenAI /v1/chat/completions as de facto standard) dominated developer integration. Its core characteristic is statelessness: every request is independent; clients must maintain full history and resend it on every turn. This is simple and scalable, but its limits become obvious with models that support long context, multimodal understanding, and CoT-like reasoning behavior.
-
Thinking data is hard to handle: for reasoning models (for example OpenAI o1/o3 and Google Gemini 2.5/3.0), large internal “CoT-like” traces are generated before final answers. These traces are both costly and central to model intelligence. In stateless architecture, if server does not return these traces, client cannot let model “remember” prior reasoning in next turn; if server does return them, bandwidth and IP leakage risks increase.
So model vendors usually return a thinking summary, not raw CoT content. 🙂
-
Task time becomes much longer: vendors are no longer only offering base models. Agent-as-model execution now means a single API call can trigger workflows that run for minutes or hours (typical case: deep research). Traditional synchronous HTTP request-response with timeout-constrained long connections cannot satisfy this need.
The Emergence of Next-generation APIs
To address the above problems, two major vendors launched new interfaces: OpenAI Responses API and Google Interactions API. The following sections provide a technical breakdown of both APIs, compare their design choices in state management, multimodal handling, long-running task scheduling, and reasoning transparency, and then discuss compatibility integration strategies for existing agent frameworks such as ADK.
2. OpenAI Responses API: Containerized Reasoning-as-a-Service
OpenAI’s Responses API is a substantial redesign of Chat Completions. Its core philosophy is to move conversation state and reasoning process into the server, and use a clearer action type system (Item Ontology) to normalize complex multi-turn interactions.
A quick comparison:
{
"message": {
"role": "assistant",
"content": "I'm going to use the get_weather tool to find the weather.",
"tool_calls": [
{
"id": "call_88O3ElkW2RrSdRTNeeP1PZkm",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"New York, NY\",\"unit\":\"f\"}"
}
}
],
"refusal": null,
"annotations": []
}
}
In Chat Completions, a single request usually emits one message object 👆.
{
"id": "rs_6888f6d0606c819aa8205ecee386963f0e683233d39188e7",
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "**Determining weather response**\n\nI need to answer the user's question about the weather in San Francisco. ...."
},
},
{
"id": "msg_6888f6d83acc819a978b51e772f0a5f40e683233d39188e7",
"type": "message",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "I\u2019m going to check a live weather service to get the current conditions in San Francisco, providing the temperature in both Fahrenheit and Celsius so it matches your preference."
}
],
"role": "assistant"
},
{
"id": "fc_6888f6d86e28819aaaa1ba69cca766b70e683233d39188e7",
"type": "function_call",
"status": "completed",
"arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"f\"}",
"call_id": "call_XOnF4B9DvB8EJVB3JvWnGg83",
"name": "get_weather"
}
In Responses API, what you get is a full behavior chain. What to display, persist, or ignore is decided by the developer.
2.1 Design Philosophy: Opaque Reasoning with Hosted State
Responses API largely exists to solve a commercialization paradox for reasoning models. OpenAI o-series models (o1, o3) improve capability through internal long CoT-like reasoning. This reasoning is core IP and is not intended to be exposed directly.
In old Chat APIs, if model did not return thought traces, multi-turn follow-up often caused sharp intelligence drops because prior reasoning context could not be preserved client-side. Responses API solves this through server statefulness: reasoning state is retained on server (encrypted and hidden), and continued safely via previous_response_id (or reasoning items). Client only needs prior response ID to continue internal reasoning, without receiving raw CoT.
This philosophy can be summarized as: “trust server memory.” It frees developers from prompt-caching and truncation housekeeping, while increasing infrastructure dependence on OpenAI.
2.2 Core Data Model: Rise of Item Ontology
The most significant technical change is replacing loose Message objects with strict Item union types. Multimodal complexity outgrew single text-field structures.
2.2.1 Input Item
Inputs are no longer just messages; they are input arrays that can contain multiple InputItem types:
| Type | Description and Use | Design Intent |
|---|---|---|
input_text | plain text input | base interaction unit replacing legacy content string |
input_image | image input (URL or Base64) | native multimodal understanding instead of attachment-style extension |
input_audio | audio input | make listen/speak first-class in multimodal interaction |
2.2.2 Output Item
Outputs are also structured as Item sequences, so tool calls, code-execution traces, and text replies are clearly separated:
| Type | Description and Use | Design Intent |
|---|---|---|
message | model reply item (can contain output_text parts) | avoid forcing plain text and tool calls into one mixed structure |
function_call | structured tool/function call (name, arguments, call_id) | provide auditable tool-call receipt for UI/logging |
reasoning | structured reasoning output (for example summary) | retain server reasoning state while supporting safe continuation without exposing raw CoT |
The move from
MessagetoItemmarks the shift from “text exchange” to “object operation.” Developers no longer manipulate a continuous text blob; they operate on structured multimodal objects.
2.3 State Management: previous_response_id and reasoning state
Responses is designed as stateful-by-default: session and tool state are tracked server-side; reasoning state is retained across turns (encrypted, hidden), so reasoning models do not “forget how they were thinking.”
ID-based safe continuation (previous_response_id)
- Mechanism: client sends prior response ID; server continues context and reasoning state internally.
- Use cases: multi-turn chat, tool chains, stateful agent workflows.
- Notes: client must persist at least prior response ID; reasoning state itself is not returned.
OpenAI also mentions reasoning items for continuation assistance, but raw CoT still remains hidden.
2.4 Streaming and semantic events
Responses streaming is no longer token-delta-only. It is semantic streaming events over response lifecycle and item-level generation (message, function_call, reasoning, etc.), enabling finer UI states (for example showing tool status when call starts).
SDK helpers such as output_text also avoid manual extraction from legacy paths like choices[0].message.content.
3. Google Gemini Interactions API: Operating System for Asynchronous Agents
If OpenAI Responses optimizes “thinking,” Google Interactions optimizes “doing.” Google positions Interactions as a unified interface for autonomous agents, especially for long-running, multi-source tasks.
3.1 Design Philosophy: Freeing the Time Dimension
Google’s philosophy is close in direction: deep research and complex tasks often exceed standard HTTP timeout windows (typically 30-60 seconds). So Interactions is designed as a task scheduling system where jobs can run in background for tens of minutes.
3.2 Core Architecture: Interaction and Content
Interactions endpoint is generativelanguage.googleapis.com/v1beta/interactions, and its shape differs significantly from OpenAI.
3.2.1 Interaction object
An Interaction is not just a chat turn. It is a full lifecycle object containing input, execution state, and outputs:
| Key Field | Type | Deep Interpretation |
|---|---|---|
id | string | unique interaction ID; used for previous_interaction_id continuation and GET /interactions/{id} status/result query |
model / agent | string | polymorphic: can be base model (gemini-3-pro-preview) or preset agent (deep-research-pro-preview-12-2025), reducing switching cost |
input | string or Content[] | structured multimodal input including function_result parts |
previous_interaction_id | string | optional server-side continuation via prior interaction |
background | bool | async background execution; docs indicate background=true is agent-focused |
status | string | execution state like completed, in_progress, requires_action, failed |
store | bool | default true; store=false disables persistence but also blocks previous_interaction_id and conflicts with background=true |
If you are familiar with old Gemini endpoints, Interactions input changes are less drastic than OpenAI’s message→item migration. This likely reflects Gemini’s long-standing multimodal-first design.
3.2.2 State management: optional server-side state
Interactions is explicit about whether state is hosted server-side:
- Stateful: send
previous_interaction_id; server loads full context internally, reducing client-side history replay and often improving cache hit rates. - Stateless: send full history each time; client owns context assembly.
- Storage/compliance: default
store=true;store=falseis possible but disablesprevious_interaction_idand conflicts withbackground=true. - Comparison: OpenAI emphasizes stateful-by-default with server-managed reasoning/tool traces; Google explicitly exposes dual-mode optional hosting.
This does not mean OpenAI Responses rejects stateless full-history submission. Both can emulate legacy stateless behavior through constructed input history; but once you do that, much of the new API value is lost.
Gemini Deep Research Agent is the flagship Interactions use case, which is already widely known.
3.3 Native support for MCP (Model Context Protocol)
Google also explicitly integrates MCP support into Interactions API.
4. Comparative Analysis
Overall the two APIs are closer than many assume. Differences are mainly in selected details.
4.1 State management and context handling
| Dimension | OpenAI Responses API | Gemini Interactions API | Analysis |
|---|---|---|---|
| State carrier | previous_response_id (and reasoning items; server auto-tracks session/tool state) | previous_interaction_id (optional server state; can also be fully stateless with full history) | Both reduce client-side history burden via prior-ID continuation. Difference: OpenAI emphasizes hosted reasoning/tool state; Google emphasizes explicit dual mode. |
| Data retention | default 30 days; server-side reasoning state retained (encrypted/hidden) | default store=true; paid 55 days / free 1 day; store=false disables previous_interaction_id and conflicts with background=true | Both involve compliance trade-offs in hosted mode. Google provides explicit retention windows and opt-out switch with capability sacrifices. |
5. Compatibility and Migration Guide
For existing developers, the core questions are: Can my OpenAI Chat-based code still run? How should I migrate?
5.1 Gemini OpenAI compatibility layer: truth and traps
Google claims “three-line migration.” Largely true, but there is a large scope trap.
Compatibility scope
Gemini compatibility layer targets OpenAI legacy Chat Completions (/v1/chat/completions), not the newer Responses API (/v1/responses).
- If you use standard
openai.chat.completions.create, changingbase_urlandapi_keyis often enough. - Google also maps OpenAI
reasoning_effortto Geminithinking_level/thinking_budget, enabling low-friction comparative testing.
Non-compatible scope
You cannot access Gemini Interactions API features through OpenAI SDK compatibility mode:
- cannot pass
agent='deep-research'in OpenAI SDK and expect Interactions semantics; - cannot pass
background=truefor async execution; - cannot use
previous_interaction_idchain-state behavior.
Conclusion: Gemini compatibility exists mainly to capture legacy chat workloads. If you need Interactions-specific capabilities, you must adopt Google genai SDK and rewrite integration paths.
5.2 Migrating from Chat Completions to Responses API (within OpenAI ecosystem)
Even inside OpenAI ecosystem, migration is one-way and non-trivial:
1. Refactor data model
messages must become structured input items. Simple string concatenation no longer matches model shape.
2. Stop full client-side history persistence
Instead of storing large history blobs, store at least prior response IDs and continue via previous_response_id. Database pressure drops, but state dependence on server increases.
3. Rewrite stream parser
Front-end stream parser must handle semantic event streams and multi-item outputs, not legacy token deltas such as choices[0].delta.content.
5.3 Integration adaptation strategy
Because architecture divergence between Interactions and Chat is significant, a single universal adapter that fully supports both is often unrealistic. A dual-stack strategy is usually more practical.
Synchronous interaction layer
For chatbot/realtime QA low-latency scenarios, continue using standard Chat Completions abstraction. This remains compatible with OpenAI (legacy), Gemini (compat layer), Anthropic, and open-source runtimes (for example vLLM), and helps preserve provider-neutrality.
Google’s own blog reflects this:
although Interactions API supports most generateContent capabilities and improves developer experience, it is still in public preview and may change significantly; for production-standard workloads, generateContent remains the primary path.
Asynchronous task layer
For Deep Research and agent kernel workloads, Interactions API can be introduced as a dedicated async path.
6. Usage Constraints
6.1 State Lock-in
As APIs become stateful, vendor lock-in risk becomes stronger.
In stateless times, migration often meant changing one URL line. In stateful times, migration can mean moving large session/tool-run histories and remapping them into another vendor’s state model (if supported). Hidden server-side reasoning state is usually not exportable, making migration costs much higher once deep workflows are running.
6.2 Agent as Infrastructure
Gemini Deep Research demonstrates a broader trend: agent as infrastructure. Future API shape may shift from completion(prompt) to hire(agent, goal). Vendors provide not only models, but also runtime, tool ecosystems, and memory layers. Interactions is an early shape of this trajectory.
6.3 Vertex AI not Ready
Per Google official blog: Interactions API and Gemini deep research capabilities are coming to Vertex AI. https://blog.google/technology/developers/interactions-api/
7. Summary
OpenAI Responses API and Google Interactions API both target next-generation AI application challenges. For developers, the choice is no longer only about model benchmark scores; it is now an architecture decision: build an instant-response chat surface, or build an async task-delivery system. Understanding this difference is key to next-generation AI product engineering.
Appendix
This appendix provides a lightweight integration approach for model service + Google ecosystem reality.
Current state: many ADK-based agents use ADK’s LiteLLM compatibility layer to route through custom model services.
Core requirement: keep native Gemini capabilities in Google ADK (google-genai type system, tool calling, cachedContents, files/resumable upload, SSE, etc.) while centralizing upstream switching/governance/metrics in our own gateway.
Under current google-adk, this implies: gateway must expose Gemini Developer API (AI Studio / v1beta) compatible surface. Otherwise you either connect Google directly, or fallback to OpenAI endpoint + LiteLLM with capability drift.
ADK’s three current integration paths (and constraints)
-
ADK
Gemini(default)- Uses
google-genaiSDK (best native Gemini experience). - But ADK’s
google.genai.Client(...)does not passbase_urlexplicitly in default path; only tracking headers are injected (so gateway routing depends ongoogle-genaibase-url behavior/environment variables). - See
google/adk/models/google_llm.py(Gemini.api_client).
- Uses
-
ADK
ApigeeLlm(named Apigee but effectively proxy client)- Also uses
google-genai, but explicitly supportsproxy_url/base_urlandcustom_headers. - See
google/adk/models/apigee_llm.pywithHttpOptions(base_url=proxy_url). - Limitation: this does not reduce required gateway protocol compatibility. In
vertex_aimode it often requiresGOOGLE_CLOUD_PROJECT/LOCATIONand may introduce caller-side GCP credential paths (against “centralize creds/switching in gateway” goal).
- Also uses
-
ADK
LiteLlm(OpenAI endpoint/provider-style)- Good for quick adoption when OpenAI-compatible gateway already exists.
- But if native Gemini semantics are required (especially cachedContents, files/resumable upload, and some tool/stream behaviors), ChatCompletions compatibility needs extra translation and is not guaranteed 1:1. Complexity still returns to gateway.
Recommended practical approach
If goal is “keep native Gemini invocation,” the most stable ADK approach is:
-
Call gateway using Gemini Developer API (AI Studio /
v1beta) conventions:- set
GOOGLE_GEMINI_BASE_URL=https://<gateway>sogoogle-genaipoints to gateway; - use
GOOGLE_API_KEY=<gateway_token>for gateway auth.
- set
-
Do not rely on default ADK
Gemini+GOOGLE_GENAI_USE_VERTEXAI=1 + GOOGLE_VERTEX_BASE_URL=...to emulate Vertex-through-gateway:- during client initialization,
google.genai.Clientdecides whether to readGOOGLE_GEMINI_BASE_URLorGOOGLE_VERTEX_BASE_URLbased onvertexaiargument (vertexai or False), and ADK default path does not always passvertexai=Trueexplicitly; - result: even if lower
BaseApiClientswitches mode via env vars later,Clientlayer may not readGOOGLE_VERTEX_BASE_URLas expected; fixed/v1beta(AI Studio shape) routing is often more stable, with gateway internally routing to Vertex.
- during client initialization,
-
Let gateway decide upstream (Vertex / AI Studio / other providers):
- recommended: route and attribute metrics by token or
X-Client-Id; - optional: support headers like
X-LLM-Upstreamfor gray release/switching (but check ADK header propagation limits below).
- recommended: route and attribute metrics by token or
A subtle but critical limitation: ADK context caching does not carry per-request custom routing headers.
- ADK
GeminiContextCacheManageruses sharedself.api_clientfor cache operations and does not useGenerateContentConfig.http_options(so headers set there do not propagate). - Therefore, if cachedContents also needs per-request upstream switching, default ADK
Geminipath is insufficient. More stable options are token/client-based mapping in gateway, or caller-side methods that inject global headers (for exampleApigeeLlm(custom_headers=...)or custommodel_code).