API Standards¶
LLM-Rosetta supports 5 API standards across 4 LLM providers. Each standard defines its own request/response structure, authentication mechanism, and streaming format. This page describes each standard and highlights the key differences.
Overview¶
| ProviderType | API Standard | Endpoint | Converter Class |
|---|---|---|---|
openai_chat |
OpenAI Chat Completions | POST /v1/chat/completions |
OpenAIChatConverter |
openai_responses |
OpenAI Responses | POST /v1/responses |
OpenAIResponsesConverter |
open_responses |
Open Responses | POST /v1/responses |
OpenAIResponsesConverter |
anthropic |
Anthropic Messages | POST /v1/messages |
AnthropicConverter |
google |
Google GenAI | POST /v1beta/models/{model}:generateContent |
GoogleConverter |
OpenAI Chat Completions (openai_chat)¶
The most widely adopted LLM API standard. Uses a role-based message array and returns responses in a choices[] array.
Request shape:
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
"max_completion_tokens": 1000,
"temperature": 0.7
}
Response shape:
{
"id": "chatcmpl-123",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "Hi there!"},
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 20, "completion_tokens": 5, "total_tokens": 25}
}
Key characteristics:
- System instructions as a message with
"role": "system" - Tool call arguments are JSON strings (not dicts)
- Tool results are separate messages with
"role": "tool" top_kis not supported- Streaming via
"stream": truewith SSEdata: {...}chunks
OpenAI Responses (openai_responses)¶
OpenAI's newer API format (2025). Uses a flat list of typed items instead of nested messages, and supports stateful server-side conversations.
Request shape:
{
"model": "gpt-4o",
"instructions": "You are helpful.",
"input": [
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
}
],
"max_output_tokens": 1000
}
Response shape:
{
"id": "resp_123",
"status": "completed",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Hi there!"}]
}
],
"usage": {"input_tokens": 20, "output_tokens": 5, "total_tokens": 25}
}
Key characteristics:
- System instructions via top-level
"instructions"field - Flat item list: messages,
function_call, andfunction_call_outputare siblings - Tool call arguments are JSON strings
- Content parts have explicit types (
input_text,output_text,input_image, etc.) - Reasoning config as nested object:
"reasoning": {"type": "enabled", "effort": "high"} - Streaming via SSE with typed events (
response.output_item.added,response.output_text.delta, etc.)
Open Responses (open_responses)¶
Open Responses is an open-source, vendor-neutral specification (Apache 2.0) that extends the OpenAI Responses API. Initiated by OpenAI in January 2026, it adds formal extensibility rules while maintaining full backward compatibility.
In LLM-Rosetta, open_responses is an alias for openai_responses — the same OpenAIResponsesConverter handles both formats.
Differences from OpenAI Responses:
| Feature | Description |
|---|---|
OpenResponses-Version header |
Spec versioning mechanism — the gateway forwards this header to upstream |
| Slug-prefixed extensions | implementor:type_name items, tools, and events (e.g., openai:web_search_call) |
Reasoning content field |
Raw reasoning traces from open-weight models |
allowed_tools field |
Cache-preserving tool restriction |
| Stateless default | No server-side state assumption |
Adopters: OpenRouter, Hugging Face, Vercel, LM Studio, Ollama, vLLM.
Anthropic Messages (anthropic)¶
Anthropic's native API for Claude models. Notable for requiring max_tokens and supporting top_k and extended thinking.
Request shape:
{
"model": "claude-sonnet-4-20250514",
"system": "You are helpful.",
"messages": [
{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}
],
"max_tokens": 4096
}
Response shape:
{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hi there!"}],
"stop_reason": "end_turn",
"usage": {"input_tokens": 20, "output_tokens": 5}
}
Key characteristics:
- System instructions via top-level
"system"field (not in messages array) max_tokensis required — LLM-Rosetta defaults to 4096 if not provided- Temperature is clamped to 0.0–1.0 (OpenAI allows up to 2.0)
top_kis supported- Tool calls are
"tool_use"blocks; arguments are dicts (not JSON strings) - Tool results are
"tool_result"blocks inside a user message - Single response (no
choices[]/candidates[]array) - Extended thinking:
"thinking": {"type": "enabled", "budget_tokens": 10000} - Auth via
x-api-keyheader (notAuthorization: Bearer)
Google GenAI (google)¶
Google's Generative AI API for Gemini models. Uses contents[] instead of messages[] and parts[] instead of content.
Request shape:
{
"contents": [
{
"role": "user",
"parts": [{"text": "Hello!"}]
}
],
"system_instruction": {"parts": [{"text": "You are helpful."}]},
"generationConfig": {
"maxOutputTokens": 1000,
"temperature": 0.7
}
}
Response shape:
{
"candidates": [
{
"content": {
"role": "model",
"parts": [{"text": "Hi there!"}]
},
"finishReason": "STOP"
}
],
"usageMetadata": {
"promptTokenCount": 20,
"candidatesTokenCount": 5
}
}
Key characteristics:
- System instructions via top-level
"system_instruction"field - Messages are
"contents", content blocks are"parts" - Assistant role is
"model"(not"assistant") - Tool calls are
functionCallparts; arguments are dicts - Tool results are
functionResponseparts in a user turn - Response format via separate fields:
responseMimeType+responseSchema top_kis supported- REST API uses camelCase; Python SDK uses snake_case — the converter handles both transparently
- Auth via
x-goog-api-keyheader or query parameter - Streaming via separate endpoint:
streamGenerateContent
Comparison¶
| Feature | OpenAI Chat | OpenAI Responses | Anthropic | Google GenAI |
|---|---|---|---|---|
| System instructions | Message role | instructions |
system |
system_instruction |
| Message container | messages |
input items |
messages |
contents |
| Tool call args | JSON string | JSON string | Dict | Dict |
| Tool result delivery | tool message |
function_call_output item |
tool_result in user msg |
functionResponse part |
| Response wrapper | choices[] |
output[] |
Single message | candidates[] |
max_tokens field |
max_completion_tokens |
max_output_tokens |
max_tokens (required) |
maxOutputTokens |
top_k |
No | No | Yes | Yes |
| Temperature range | 0–2 | 0–2 | 0–1 | 0–2 |
| Streaming | stream: true |
stream: true |
stream: true |
Separate endpoint |
| Auth header | Authorization: Bearer |
Authorization: Bearer |
x-api-key |
x-goog-api-key |
In LLM-Rosetta¶
Each API standard has a corresponding converter that translates between the provider format and the IR (Intermediate Representation). The gateway uses the type field in provider config to select the right converter:
"providers": {
"my-openai": { "type": "openai_chat", "api_key": "...", "base_url": "..." },
"my-anthropic": { "type": "anthropic", "api_key": "...", "base_url": "..." },
"my-google": { "type": "google", "api_key": "...", "base_url": "..." }
}
For programmatic use, see API Layers for the import guide.