Overview
| Property | Value |
|---|---|
| Parameters | 8 billion |
| Streaming | Supported |
| Backend | Ollama (local or remote) |
| Request format | model: "operator" |
Inference flow
Requests to the Operator model are routed to an Ollama instance atMODEL_HOST_URL:
- The gateway sends the message history, system prompt, and tool definitions to Ollama’s
/api/chatendpoint. - For streaming requests, the gateway converts Ollama’s streaming format to OpenAI-compatible SSE chunks on the fly.
- Tool calls are parsed from both structured Ollama tool call format and text-embedded formats (JSON blocks, XML tags).
- The agentic tool loop runs for up to 8 rounds, same as the Conductor model.
Streaming events
When streaming, the Operator emits additional custom SSE events beyond standard content deltas:tool_start— emitted when a tool begins executiontool_trace— emitted when a tool completes, with the resultagent_text— intermediate model text during tool rounds
Text-embedded tool call parsing
The 8B model sometimes embeds tool calls in its text response rather than using structured tool calling format. The gateway handles this by parsing:- JSON blocks —
{"name": "tool_name", "arguments": {...}} - Markdown code blocks —
```json {...} ```containing tool calls - XML tags —
<tool_call>...</tool_call>or<tool_calls>...</tool_calls>
When to use Operator vs. Conductor
| Scenario | Recommended model |
|---|---|
| Real-time chat with streaming | Operator |
| Complex multi-tool queries | Conductor |
| Development and testing | Operator |
| Production accuracy-critical tasks | Conductor |
| Low-latency responses | Operator |