Architecture

System diagram

┌──────────────┐     ┌─────────────────────────────────────┐
│   Frontend   │────▶│          Gateway Server              │
│  (Next.js)   │◀────│          (Express.js)                │
└──────────────┘     │                                     │
                     │  ┌─────────┐  ┌──────────────────┐  │
                     │  │  Auth   │  │  Credit Manager   │  │
                     │  │  Layer  │  │  (SQLite/Postgres) │  │
                     │  └─────────┘  └──────────────────┘  │
                     │                                     │
                     │  ┌──────────────────────────────┐   │
                     │  │     Inference Engine          │   │
                     │  │  ┌────────────────────────┐   │   │
                     │  │  │  Agentic Tool Loop     │   │   │
                     │  │  │  (up to 8 rounds)      │   │   │
                     │  │  └────────────────────────┘   │   │
                     │  └──────────┬───────────────────┘   │
                     │             │                        │
                     └─────────────┼────────────────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    │              │              │
              ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
              │  RunPod    │ │  Ollama   │ │  16 Tools │
              │  32B       │ │  8B       │ │  (viem,   │
              │  Conductor │ │  Operator │ │   APIs)   │
              └───────────┘ └───────────┘ └───────────┘

Request lifecycle

Auth + credits

The gateway validates the x-wallet-address header and checks the Authorization bearer token against the API key database. If credits are enabled, it verifies the user has at least 1 credit remaining.

Model routing

Based on the model and stream fields in the request body:

"conductor" + stream: false → RunPod Serverless (32B model, handles cold starts with polling)
Anything else → Ollama (8B model, local or remote)

Knowledge injection

Before inference, the gateway scans the user’s message against 85+ keyword triggers. If a match is found (e.g., the user mentions “aerodrome” or “virtuals”), relevant project context is injected as a system message — up to 2 projects, ~500 tokens.

Agentic tool loop

The model’s response is checked for tool calls. If found, the gateway executes the tool, feeds the result back to the model, and loops — up to 8 rounds. The gateway also detects when the model is “stalling” (saying “let me check…” without actually calling a tool) and nudges it.

Auto-chaining

When a web_search returns results, the gateway automatically reads any tweet URLs via read_tweet and extracts the top non-tweet page via web_extract — all without additional model round-trips.

Response + billing

The final text response is returned in OpenAI format. Credits are deducted based on total token usage across all tool rounds.

Tool system

The gateway has 16 built-in tools, all read-only:

Category	Tools
On-chain	`get_eth_balance`, `get_token_balance`, `get_token_info`, `get_gas_price`, `get_block`, `get_tx_count`, `is_contract`, `resolve_ens`, `get_transaction`
Market	`get_crypto_price`
Web	`web_search`, `web_extract`, `read_tweet`, `find_music`
Knowledge	`knowledge_search`
Meta	`list_tools`

Tools are defined in gateway/src/tools.js as a TOOLS object. Each tool has a description, params array, and async run function. The gateway generates Ollama-format tool definitions from this registry and sends them with every inference call.

Knowledge base (RAG)

The gateway/src/knowledgeBase.js module loads JSON files from gateway/src/knowledge/ at startup. Each file describes a Base ecosystem project:

{
  "name": "Aerodrome",
  "category": "DeFi",
  "summary": "The central DEX and liquidity hub on Base...",
  "keywords": ["aerodrome", "aero", "dex", "liquidity"],
  "token": { "symbol": "AERO", "chain": "base" },
  "links": { "website": "https://aerodrome.finance" },
  "details": "Aerodrome is a ve(3,3) DEX..."
}

Two retrieval modes:

Auto-injection (getContextForMessage) — keywords in the user’s message trigger automatic context injection before inference. Top 2 matches, ~500 tokens.
Explicit search (knowledge_search tool) — the model can search the knowledge base directly. Returns up to 5 results with fuzzy scoring.

Inference backends

RunPod Serverless (32B Conductor)

The gateway calls RunPod’s /runsync endpoint and handles cold starts by polling /status/{jobId} for up to 6 minutes. If the model returns an empty output error, the gateway retries without tools.

Ollama (8B Operator)

The gateway connects to an Ollama instance at MODEL_HOST_URL (default: http://localhost:11434/api/chat). Streaming responses are converted from Ollama’s format to OpenAI SSE chunks on the fly via ollamaChunkToOpenAI().

​System diagram

​Request lifecycle

​Tool system

​Knowledge base (RAG)

​Inference backends

​RunPod Serverless (32B Conductor)

​Ollama (8B Operator)