Overview
Local Model Quantization Router recommends the optimal local LLM model, quantization level, and routing strategy for your OpenClaw workloads. Supply your hardware profile, task complexity, and privacy requirements — the router tells you exactly which Ollama model to use, at which quant level, and whether to route local-only, local-first, hybrid, or cloud-required. No guesswork. No model downloads. No config changes.
Key Features
- Four route types: local-only, local-first, hybrid, cloud-required
- Hardware-aware: VRAM, RAM, CPU-only detection for right-sizing
- Privacy enforcement: high/regulated privacy forces local-only routing
- Complexity tiers: simple to critical with appropriate model selection
- Clean JSON output with route, model, quantization, fallback, and reasons
- Hardware JSON file support with clean malformed-input error handling
Use Cases
- Choosing the right Ollama model and quant level for your hardware
- Enforcing local-only routing for high-privacy or air-gapped deployments
- Selecting hybrid routing for critical tasks needing cloud fallback
- Cost optimization by routing simple tasks to small local models
Requirements
- OpenClaw v2026.3.23 or later
- Python 3.8+ (stdlib only — no pip installs required)