Overview
When stream: true, the gateway streams the response as Server-Sent Events (SSE). Mako’s streaming goes beyond standard OpenAI streaming — it also emits custom events for tool execution, giving your frontend real-time visibility into the agent’s actions.
Event types
content_delta
Standard OpenAI-compatible content chunk. Your OpenAI SDK handles these automatically.
{
"id": "chatcmpl-1718464968543",
"object": "chat.completion.chunk",
"model": "mako",
"choices": [
{
"index": 0,
"delta": { "content": "eth is " },
"finish_reason": null
}
]
}
Emitted when the agent begins executing a tool.
{
"type": "tool_start",
"tool": "get_eth_balance",
"args": { "address": "0xd8dA...", "chain": "ethereum" }
}
Emitted when a tool completes. Contains the result or error.
{
"type": "tool_trace",
"tool": "get_eth_balance",
"ok": true,
"result": {
"address": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
"balance_eth": "5.688914350696581971",
"chain": "Ethereum"
}
}
agent_text
Intermediate text from the model during tool rounds (e.g., “let me check that for you”). Not the final answer.
{
"type": "agent_text",
"content": "let me look that up"
}
done
Final event. The last two SSE messages are always:
data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Consuming the stream
const response = await fetch(
"https://gateway.deepmako.com/v1/chat/completions",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"x-wallet-address": "0xYourAddress",
},
body: JSON.stringify({
model: "operator",
messages: [{ role: "user", content: "what is gas on base?" }],
stream: true,
}),
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
for (const line of text.split("\n")) {
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") break;
const event = JSON.parse(payload);
if (event.type === "tool_start") {
console.log(`🔧 Calling ${event.tool}...`);
} else if (event.type === "tool_trace") {
console.log(`✅ ${event.tool}: ${JSON.stringify(event.result)}`);
} else if (event.choices?.[0]?.delta?.content) {
process.stdout.write(event.choices[0].delta.content);
}
}
}
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.deepmako.com/v1",
api_key="not-needed",
default_headers={"x-wallet-address": "0xYourAddress"}
)
stream = client.chat.completions.create(
model="operator",
messages=[{"role": "user", "content": "what is gas on base?"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
The 32B Conductor model does not support streaming. If you request model: "conductor" with stream: true, the gateway will route to the 8B Operator model instead.