Skip to main content

Overview

When stream: true, the gateway streams the response as Server-Sent Events (SSE). Mako’s streaming goes beyond standard OpenAI streaming — it also emits custom events for tool execution, giving your frontend real-time visibility into the agent’s actions.

Event types

content_delta

Standard OpenAI-compatible content chunk. Your OpenAI SDK handles these automatically.
{
  "id": "chatcmpl-1718464968543",
  "object": "chat.completion.chunk",
  "model": "mako",
  "choices": [
    {
      "index": 0,
      "delta": { "content": "eth is " },
      "finish_reason": null
    }
  ]
}

tool_start

Emitted when the agent begins executing a tool.
{
  "type": "tool_start",
  "tool": "get_eth_balance",
  "args": { "address": "0xd8dA...", "chain": "ethereum" }
}

tool_trace

Emitted when a tool completes. Contains the result or error.
{
  "type": "tool_trace",
  "tool": "get_eth_balance",
  "ok": true,
  "result": {
    "address": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
    "balance_eth": "5.688914350696581971",
    "chain": "Ethereum"
  }
}

agent_text

Intermediate text from the model during tool rounds (e.g., “let me check that for you”). Not the final answer.
{
  "type": "agent_text",
  "content": "let me look that up"
}

done

Final event. The last two SSE messages are always:
data: {"id":"chatcmpl-...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Consuming the stream

const response = await fetch(
  "https://gateway.deepmako.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-wallet-address": "0xYourAddress",
    },
    body: JSON.stringify({
      model: "operator",
      messages: [{ role: "user", content: "what is gas on base?" }],
      stream: true,
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (!line.startsWith("data: ")) continue;
    const payload = line.slice(6);
    if (payload === "[DONE]") break;

    const event = JSON.parse(payload);

    if (event.type === "tool_start") {
      console.log(`🔧 Calling ${event.tool}...`);
    } else if (event.type === "tool_trace") {
      console.log(`✅ ${event.tool}: ${JSON.stringify(event.result)}`);
    } else if (event.choices?.[0]?.delta?.content) {
      process.stdout.write(event.choices[0].delta.content);
    }
  }
}
The 32B Conductor model does not support streaming. If you request model: "conductor" with stream: true, the gateway will route to the 8B Operator model instead.