Skip to main content

Overview

PropertyValue
Base modelQwen3-32B
Parameters32 billion
StreamingNot supported
BackendRunPod Serverless (A100/H100)
Request formatmodel: "conductor", stream: false
The Conductor is Mako’s primary model. It excels at multi-step tool orchestration — resolving ENS names, fetching balances, calculating USD values, and synthesizing web search results into natural responses.

Inference flow

When a request arrives with model: "conductor" and stream: false, the gateway routes it to RunPod Serverless:
  1. The gateway calls RunPod’s /runsync endpoint with the full message history, system prompt, and tool definitions.
  2. If the worker is cold, RunPod returns IN_QUEUE. The gateway polls /status/{jobId} every 5 seconds for up to 6 minutes.
  3. Once the model responds, the gateway checks for tool calls and enters the agentic loop (up to 8 rounds).
  4. If the model returns an empty output error, the gateway retries the request without tool definitions as a fallback.

Temperature defaults

The Conductor uses conservative sampling to ensure reliable tool calling:
  • Temperature: 0.4
  • Top-p: 0.9
These values are set in the gateway and are not currently configurable per-request.

Cold starts

RunPod Serverless workers may take 30–90 seconds to cold start. The gateway handles this automatically with polling. If you’re building a frontend, the 503 status code from the gateway indicates a cold start — display a loading state and retry.

Limitations

  • No streaming. If you request model: "conductor" with stream: true, the gateway silently routes to the 8B Operator model instead.
  • Latency. Expect 3–15 seconds per response depending on tool chain complexity, plus cold start time if the worker is idle.
  • Context window. The full system prompt, knowledge context, tool definitions, and conversation history all count against the context window.