Mako-8B Operator

Overview

The Operator is Mako’s fast model, optimized for streaming responses and general conversation. It supports both streaming and non-streaming requests.

Requests to the Operator model are routed to an Ollama instance at MODEL_HOST_URL:

The gateway sends the message history, system prompt, and tool definitions to Ollama’s /api/chat endpoint.
For streaming requests, the gateway converts Ollama’s streaming format to OpenAI-compatible SSE chunks on the fly.
Tool calls are parsed from both structured Ollama tool call format and text-embedded formats (JSON blocks, XML tags).
The agentic tool loop runs for up to 8 rounds, same as the Conductor model.

When streaming, the Operator emits additional custom SSE events beyond standard content deltas:

See Streaming for details on consuming these events.

The 8B model sometimes embeds tool calls in its text response rather than using structured tool calling format. The gateway handles this by parsing:

These are automatically extracted, executed, and the results fed back to the model.

Scenario	Recommended model
Real-time chat with streaming	Operator
Complex multi-tool queries	Conductor
Development and testing	Operator
Production accuracy-critical tasks	Conductor
Low-latency responses	Operator