Mako-32B Conductor

Overview

Property	Value
Base model	Qwen3-32B
Parameters	32 billion
Streaming	Not supported
Backend	RunPod Serverless (A100/H100)
Request format	`model: "conductor"`, `stream: false`

The Conductor is Mako’s primary model. It excels at multi-step tool orchestration — resolving ENS names, fetching balances, calculating USD values, and synthesizing web search results into natural responses.

Inference flow

When a request arrives with model: "conductor" and stream: false, the gateway routes it to RunPod Serverless:

The gateway calls RunPod’s /runsync endpoint with the full message history, system prompt, and tool definitions.
If the worker is cold, RunPod returns IN_QUEUE. The gateway polls /status/{jobId} every 5 seconds for up to 6 minutes.
Once the model responds, the gateway checks for tool calls and enters the agentic loop (up to 8 rounds).
If the model returns an empty output error, the gateway retries the request without tool definitions as a fallback.

Temperature defaults

The Conductor uses conservative sampling to ensure reliable tool calling:

Temperature: 0.4
Top-p: 0.9

These values are set in the gateway and are not currently configurable per-request.

Cold starts

RunPod Serverless workers may take 30–90 seconds to cold start. The gateway handles this automatically with polling. If you’re building a frontend, the 503 status code from the gateway indicates a cold start — display a loading state and retry.

Limitations

No streaming. If you request model: "conductor" with stream: true, the gateway silently routes to the 8B Operator model instead.
Latency. Expect 3–15 seconds per response depending on tool chain complexity, plus cold start time if the worker is idle.
Context window. The full system prompt, knowledge context, tool definitions, and conversation history all count against the context window.

​Overview

​Inference flow

​Temperature defaults

​Cold starts

​Limitations

Overview

Inference flow

Temperature defaults

Cold starts

Limitations