Skip to main content

System diagram

┌──────────────┐     ┌─────────────────────────────────────┐
│   Frontend   │────▶│          Gateway Server              │
│  (Next.js)   │◀────│          (Express.js)                │
└──────────────┘     │                                     │
                     │  ┌─────────┐  ┌──────────────────┐  │
                     │  │  Auth   │  │  Credit Manager   │  │
                     │  │  Layer  │  │  (SQLite/Postgres) │  │
                     │  └─────────┘  └──────────────────┘  │
                     │                                     │
                     │  ┌──────────────────────────────┐   │
                     │  │     Inference Engine          │   │
                     │  │  ┌────────────────────────┐   │   │
                     │  │  │  Agentic Tool Loop     │   │   │
                     │  │  │  (up to 8 rounds)      │   │   │
                     │  │  └────────────────────────┘   │   │
                     │  └──────────┬───────────────────┘   │
                     │             │                        │
                     └─────────────┼────────────────────────┘

                    ┌──────────────┼──────────────┐
                    │              │              │
              ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
              │  RunPod    │ │  Ollama   │ │  16 Tools │
              │  32B       │ │  8B       │ │  (viem,   │
              │  Conductor │ │  Operator │ │   APIs)   │
              └───────────┘ └───────────┘ └───────────┘

Request lifecycle

1

Auth + credits

The gateway validates the x-wallet-address header and checks the Authorization bearer token against the API key database. If credits are enabled, it verifies the user has at least 1 credit remaining.
2

Model routing

Based on the model and stream fields in the request body:
  • "conductor" + stream: falseRunPod Serverless (32B model, handles cold starts with polling)
  • Anything else → Ollama (8B model, local or remote)
3

Knowledge injection

Before inference, the gateway scans the user’s message against 85+ keyword triggers. If a match is found (e.g., the user mentions “aerodrome” or “virtuals”), relevant project context is injected as a system message — up to 2 projects, ~500 tokens.
4

Agentic tool loop

The model’s response is checked for tool calls. If found, the gateway executes the tool, feeds the result back to the model, and loops — up to 8 rounds. The gateway also detects when the model is “stalling” (saying “let me check…” without actually calling a tool) and nudges it.
5

Auto-chaining

When a web_search returns results, the gateway automatically reads any tweet URLs via read_tweet and extracts the top non-tweet page via web_extract — all without additional model round-trips.
6

Response + billing

The final text response is returned in OpenAI format. Credits are deducted based on total token usage across all tool rounds.

Tool system

The gateway has 16 built-in tools, all read-only:
CategoryTools
On-chainget_eth_balance, get_token_balance, get_token_info, get_gas_price, get_block, get_tx_count, is_contract, resolve_ens, get_transaction
Marketget_crypto_price
Webweb_search, web_extract, read_tweet, find_music
Knowledgeknowledge_search
Metalist_tools
Tools are defined in gateway/src/tools.js as a TOOLS object. Each tool has a description, params array, and async run function. The gateway generates Ollama-format tool definitions from this registry and sends them with every inference call.

Knowledge base (RAG)

The gateway/src/knowledgeBase.js module loads JSON files from gateway/src/knowledge/ at startup. Each file describes a Base ecosystem project:
{
  "name": "Aerodrome",
  "category": "DeFi",
  "summary": "The central DEX and liquidity hub on Base...",
  "keywords": ["aerodrome", "aero", "dex", "liquidity"],
  "token": { "symbol": "AERO", "chain": "base" },
  "links": { "website": "https://aerodrome.finance" },
  "details": "Aerodrome is a ve(3,3) DEX..."
}
Two retrieval modes:
  1. Auto-injection (getContextForMessage) — keywords in the user’s message trigger automatic context injection before inference. Top 2 matches, ~500 tokens.
  2. Explicit search (knowledge_search tool) — the model can search the knowledge base directly. Returns up to 5 results with fuzzy scoring.

Inference backends

RunPod Serverless (32B Conductor)

The gateway calls RunPod’s /runsync endpoint and handles cold starts by polling /status/{jobId} for up to 6 minutes. If the model returns an empty output error, the gateway retries without tools.

Ollama (8B Operator)

The gateway connects to an Ollama instance at MODEL_HOST_URL (default: http://localhost:11434/api/chat). Streaming responses are converted from Ollama’s format to OpenAI SSE chunks on the fly via ollamaChunkToOpenAI().