Skip to main content

Overview

PropertyValue
Parameters8 billion
StreamingSupported
BackendOllama (local or remote)
Request formatmodel: "operator"
The Operator is Mako’s fast model, optimized for streaming responses and general conversation. It supports both streaming and non-streaming requests.

Inference flow

Requests to the Operator model are routed to an Ollama instance at MODEL_HOST_URL:
  1. The gateway sends the message history, system prompt, and tool definitions to Ollama’s /api/chat endpoint.
  2. For streaming requests, the gateway converts Ollama’s streaming format to OpenAI-compatible SSE chunks on the fly.
  3. Tool calls are parsed from both structured Ollama tool call format and text-embedded formats (JSON blocks, XML tags).
  4. The agentic tool loop runs for up to 8 rounds, same as the Conductor model.

Streaming events

When streaming, the Operator emits additional custom SSE events beyond standard content deltas:
  • tool_start — emitted when a tool begins execution
  • tool_trace — emitted when a tool completes, with the result
  • agent_text — intermediate model text during tool rounds
See Streaming for details on consuming these events.

Text-embedded tool call parsing

The 8B model sometimes embeds tool calls in its text response rather than using structured tool calling format. The gateway handles this by parsing:
  • JSON blocks{"name": "tool_name", "arguments": {...}}
  • Markdown code blocks```json {...} ``` containing tool calls
  • XML tags<tool_call>...</tool_call> or <tool_calls>...</tool_calls>
These are automatically extracted, executed, and the results fed back to the model.

When to use Operator vs. Conductor

ScenarioRecommended model
Real-time chat with streamingOperator
Complex multi-tool queriesConductor
Development and testingOperator
Production accuracy-critical tasksConductor
Low-latency responsesOperator