System diagram
Request lifecycle
Auth + credits
The gateway validates the
x-wallet-address header and checks the Authorization bearer token against the API key database. If credits are enabled, it verifies the user has at least 1 credit remaining.Model routing
Based on the
model and stream fields in the request body:"conductor"+stream: false→ RunPod Serverless (32B model, handles cold starts with polling)- Anything else → Ollama (8B model, local or remote)
Knowledge injection
Before inference, the gateway scans the user’s message against 85+ keyword triggers. If a match is found (e.g., the user mentions “aerodrome” or “virtuals”), relevant project context is injected as a system message — up to 2 projects, ~500 tokens.
Agentic tool loop
The model’s response is checked for tool calls. If found, the gateway executes the tool, feeds the result back to the model, and loops — up to 8 rounds. The gateway also detects when the model is “stalling” (saying “let me check…” without actually calling a tool) and nudges it.
Auto-chaining
When a
web_search returns results, the gateway automatically reads any tweet URLs via read_tweet and extracts the top non-tweet page via web_extract — all without additional model round-trips.Tool system
The gateway has 16 built-in tools, all read-only:| Category | Tools |
|---|---|
| On-chain | get_eth_balance, get_token_balance, get_token_info, get_gas_price, get_block, get_tx_count, is_contract, resolve_ens, get_transaction |
| Market | get_crypto_price |
| Web | web_search, web_extract, read_tweet, find_music |
| Knowledge | knowledge_search |
| Meta | list_tools |
gateway/src/tools.js as a TOOLS object. Each tool has a description, params array, and async run function. The gateway generates Ollama-format tool definitions from this registry and sends them with every inference call.
Knowledge base (RAG)
Thegateway/src/knowledgeBase.js module loads JSON files from gateway/src/knowledge/ at startup. Each file describes a Base ecosystem project:
- Auto-injection (
getContextForMessage) — keywords in the user’s message trigger automatic context injection before inference. Top 2 matches, ~500 tokens. - Explicit search (
knowledge_searchtool) — the model can search the knowledge base directly. Returns up to 5 results with fuzzy scoring.
Inference backends
RunPod Serverless (32B Conductor)
The gateway calls RunPod’s/runsync endpoint and handles cold starts by polling /status/{jobId} for up to 6 minutes. If the model returns an empty output error, the gateway retries without tools.
Ollama (8B Operator)
The gateway connects to an Ollama instance atMODEL_HOST_URL (default: http://localhost:11434/api/chat). Streaming responses are converted from Ollama’s format to OpenAI SSE chunks on the fly via ollamaChunkToOpenAI().