OpenSquilla — Token-Efficient AI Agent with Smart Model Routing
OpenSquilla — Token-Efficient AI Agent with Smart Model Routing
LLM API costs add up fast. Every agent turn hits your wallet, and most frameworks use the same expensive model for trivial lookups and complex reasoning alike. OpenSquilla takes a different approach: a microkernel AI agent with a local model router that scores each turn and dispatches it to the cheapest capable model.
Launched in early May 2026, OpenSquilla already has over 2,000 stars on GitHub. It's Apache 2.0 licensed, written in Python 3.12+, and works on Windows, macOS, and Linux. In PinchBench benchmarks, it scored 0.9251 (nearly identical to Claude Opus 4.7 at 0.9255) while using 56% fewer input tokens — costing $0.69 instead of $6.23 for the same workload.
Why It's Trending
Three things make OpenSquilla stand out in the crowded AI agent space:
- Token-efficient routing — SquillaRouter, a local LightGBM + ONNX classifier, evaluates each turn on length, language, code presence, keywords, and semantic embeddings, then routes to one of four tiers (T0–T3). The prompt never leaves your machine for this decision.
- 20+ LLM providers — OpenRouter, OpenAI, Anthropic, Ollama, DeepSeek, Gemini, Groq, Mistral, vLLM, LM Studio, and more. Primary-plus-fallback selection keeps your agent running even when one provider is down.
- Unified gateway — Web UI, CLI, Telegram, Slack, Discord, Matrix, and 10+ other channels all share the same turn loop. Write one config, deploy everywhere.
Architecture Overview
The architecture follows a microkernel design. A central TurnRunner orchestrates all interactions, with pluggable components for routing, memory, tools, security, and channels.
- Gateway Layer — ASGI server (Starlette) on
127.0.0.1:18791accepts WebSocket RPC and HTTP connections. CLI, Web UI (/control/), and all messaging channels connect here. - TurnRunner — The shared turn loop. Every entry point — chat, agent one-shot, cron job, channel message — runs through this same loop. Tool dispatch, retries, decision logging, and subagent spawning all follow identical paths.
- SquillaRouter — Local on-device classifier (LightGBM + ONNX). Scores each turn across four tiers (T0–T3) and picks the cheapest model that can handle it. Runs entirely on your machine — no data leaves for routing decisions.
- Provider Registry — Pluggable adapter layer for 20+ LLM backends. Each provider has primary and fallback models configured.
- Memory System — Persistent local storage via a curated
MEMORY.mdplus dated Markdown notes. SQLite full-text search +sqlite-vecfor semantic recall. Embeddings run on-device via bundled ONNX or via OpenAI/Ollama. - Layered Security Sandbox — Three policy tiers (Standard / Strict / Locked) with a permission matrix. Bubblewrap isolates code execution on Linux. A denial ledger auto-pauses autonomous runs after repeated denials.
- Scheduler Engine — Built-in cron parser for recurring jobs.
opensquilla cronmanages scheduled tasks. - Skill System — 15 bundled skills (coding, GitHub, cron, document authoring, summarization, weather, and more) load on demand. OpenSquilla is also an MCP client and can run as an MCP server.
Prerequisites
- Python 3.12+ (bundled in Windows portable)
- uv (recommended) —
curl -LsSf https://astral.sh/uv/install.sh | sh - Git + Git LFS (only for source install)
- An LLM provider API key (OpenRouter, OpenAI, Anthropic, etc.)
Installation
OpenSquilla offers four installation paths. The Quick terminal install is recommended for most users.
Quick Terminal Install (Recommended)
# Step 1: Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
. "$HOME/.local/bin/env"
# Step 2: Install OpenSquilla
uv tool install --python 3.12 "opensquilla[recommended] @ https://github.com/opensquilla/opensquilla/releases/download/v0.2.1/opensquilla-0.2.1-py3-none-any.whl"
# Step 3: Configure and run
opensquilla onboard
opensquilla gateway run
Open http://127.0.0.1:18791/control/ in your browser to access the Web UI.
Windows Portable (No Python Required)
Download the portable zip from the releases page, extract it, and run Start OpenSquilla.cmd as administrator.
Docker
git clone https://github.com/opensquilla/opensquilla.git
cd opensquilla
git lfs pull --include="src/opensquilla/squilla_router/models/**"
docker build -t opensquilla:local .
./start.sh
Configuration
The first-run wizard (opensquilla onboard) walks you through provider setup, router configuration, channels, and security policies.
Non-Interactive Setup (SSH / CI)
export OPENROUTER_API_KEY="sk-..."
opensquilla onboard --provider openrouter --api-key-env OPENROUTER_API_KEY
Reconfigure Individual Sections
opensquilla configure provider --provider openai --model gpt-4o --api-key-env OPENAI_API_KEY
opensquilla configure router --router recommended
opensquilla configure search --search-provider brave --api-key-env BRAVE_SEARCH_API_KEY
Config Load Order
OPENSQUILLA_GATEWAY_CONFIG_PATH → ./opensquilla.toml → ~/.opensquilla/config.toml → built-in defaults. Environment variables always win over file values.
Usage
Start the Gateway
opensquilla gateway run # foreground, 127.0.0.1:18791
opensquilla gateway start --json # background + health wait
Interact
opensquilla chat # interactive REPL
opensquilla agent -m "your prompt" # one-shot, automation-friendly
Check Cost
opensquilla cost
Benchmark Results
PinchBench 1.2.1 average results across 25 tasks:
OpenSquilla: Model router (Opus4.7, GLM5.1, DS4 Flash) — 0.9251 score, 1,721,328 input tokens, 61,475 output tokens, $0.688 OpenClaw (baseline): Claude Opus 4.7 — 0.9255 score, 3,066,243 input tokens, 50,890 output tokens, $6.233
OpenSquilla achieves nearly identical scores while consuming 56% fewer input tokens and costing 89% less.
Key Features
- SquillaRouter — Local LightGBM + ONNX classifier routes each turn across four tiers (T0–T3) to the cheapest capable model. Classification runs on-device.
- Adaptive reasoning — Extended reasoning only for complex turns. System prompt scales with task complexity.
- 20+ providers — OpenRouter, OpenAI, Anthropic, Ollama, DeepSeek, Gemini, Groq, Mistral, vLLM, LM Studio, and more, with primary-plus-fallback.
- 15 bundled skills — Load only when needed. Also MCP client + MCP server.
- Persistent memory — SQLite full-text + semantic recall via
sqlite-vec. On-device embeddings. - 3-tier sandbox — Standard / Strict / Locked. Bubblewrap isolation on Linux.
- 10+ channels — Terminal, Web UI, Slack, Telegram, Discord, Feishu, Matrix, and more.
Resources
- GitHub Repository
- Website
- Releases
- License (Apache 2.0)