llmfit — Find the Perfect LLM for Your Hardware in One Command
llmfit — Find the Perfect LLM for Your Hardware in One Command
Running local LLMs is exciting — until you spend hours downloading models that don't fit your GPU, crash your system RAM, or run at 2 tokens per second. The ecosystem has hundreds of models across dozens of providers, and figuring out which ones actually work on your hardware is a painful trial-and-error process.
llmfit solves this. It's a Rust-powered CLI (and TUI) that detects your hardware, scores every model against your GPU, RAM, and CPU, and tells you — in one command — exactly which models will run well, how fast they'll go, and how well they fit. With 26,000+ GitHub stars and growing fast, it's becoming the standard tool for local LLM model selection.
What is llmfit?
Created by AlexsJones and released in February 2026, llmfit is an open-source terminal tool that right-sizes LLM models to your system's hardware. Written in Rust, it ships with an interactive TUI (Terminal User Interface), a classic CLI mode for scripting, a web dashboard, and a REST API.
Key capabilities:
- Hardware detection — reads total/available RAM, CPU cores, and probes for NVIDIA, AMD, Intel Arc, Apple Silicon, and Ascend GPUs
- Model database — hundreds of models sourced from HuggingFace, embedded at compile time, covering Meta Llama, Mistral, Qwen, DeepSeek, Google Gemma, Microsoft Phi, and more
- Multi-dimensional scoring — each model scored across Quality, Speed, Fit, and Context dimensions (0–100 each), with use-case-specific weights
- Dynamic quantization — tries the best quality quantization that fits your hardware, walking from Q8_0 down to Q2_K
- MoE support — correctly detects Mixture-of-Experts architectures with expert offloading, so Mixtral and DeepSeek get realistic VRAM estimates
- Provider integration — connects to Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio for one-click model downloads
- Multi-platform — Linux (full GPU support), macOS (Apple Silicon + Intel), Windows, and even Android/Termux
Why is it Trending?
llmfit hit 26,000+ stars in just three months for several reasons:
- The local LLM boom — everyone wants to run models locally, but nobody wants to guess which models work
- Rust performance — blazing fast, single binary, no dependency hell
- Beautiful TUI — interactive terminal interface with filtering, sorting, and one-click downloads
- Practical value — saves hours of trial-and-error per user, especially for people with mid-range hardware
- OpenCLaw/Hermes Agent skill — ships with an agent skill so AI coding assistants can auto-configure model providers
Prerequisites
- A Linux, macOS, or Windows machine
- Optional: NVIDIA GPU with drivers, AMD GPU with ROCm, or Apple Silicon
- Optional: Ollama, llama.cpp, MLX, Docker Model Runner, or LM Studio for model downloads
- For Docker: Docker Engine or Podman
Installation
Homebrew (macOS / Linux)
brew install AlexsJones/llmfit/llmfit
Scoop (Windows)
scoop install llmfit
Quick install script (Linux / macOS)
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
Cargo (if you have Rust installed)
cargo install llmfit
Docker
docker run ghcr.io/alexsjones/llmfit
Nix flake
nix run github:AlexsJones/llmfit
Quick Start
Just run llmfit — that's it. The TUI launches automatically, detects your hardware, scores every model, and ranks them by fit:
llmfit
For non-interactive output, use CLI mode:
llmfit --cli
This prints a table of all models ranked by composite score, with columns for Quality, Speed, Fit, Context, total parameters, and hardware requirements.
See what you're working with
llmfit system
This shows detected hardware: CPU cores, total RAM, GPU model, VRAM, and the acceleration backend (CUDA, Metal, ROCm, etc.).
Find the best models for coding
llmfit recommend --use-case coding --limit 5
Find models that fit perfectly
llmfit fit --perfect -n 10
Output as JSON (for scripting or AI agents)
llmfit recommend --json --limit 5
Architecture Overview
llmfit's architecture follows a clean pipeline: hardware detection → model database query → multi-dimensional scoring → ranked output.
The pipeline works as follows:
-
Hardware Detection — The
hardware.rsmodule reads system specs: total/available RAM viasysinfo, CPU core count, and GPU probing. For NVIDIA, it usesnvidia-smiwith multi-GPU aggregation. AMD GPUs are detected viarocm-smi, Intel Arc via sysfs, and Apple Silicon viasystem_profiler. A backend identifier (CUDA, Metal, ROCm, SYCL, CPU) is attached for speed estimation. -
Model Database — Over 200 models are stored in
data/hf_models.json, embedded at compile time viainclude_str!. Each entry includes parameter count, context length, quantization hierarchy, and use-case categories. Mixture-of-Experts models are flagged with their active expert ratio. -
Quantization Engine — Instead of a fixed quantization, llmfit walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed). It picks the highest quality that fits available memory. If nothing fits at full context, it retries at half context.
-
Scoring Engine — Each model receives a 0–100 score across four dimensions:
- Quality: parameter count, family reputation, quantization penalty, task alignment
- Speed: estimated tokens/sec based on backend bandwidth × efficiency factor
- Fit: memory utilization efficiency (sweet spot: 50–80% of available memory)
- Context: context window capability vs target use case
-
Runtime Providers — llmfit integrates with Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio. When you press
din the TUI, it pulls/downloads the model to your chosen provider. Installed models get a green checkmark. -
Output Modes — Interactive TUI (ratatui), classic CLI table, JSON output, web dashboard (auto-starts on
:8787), and REST API (llmfit serve).
Configuration
llmfit is zero-config by design — just run it. But you can tune several aspects.
Hardware overrides
If autodetection fails (broken nvidia-smi, VMs), override manually:
llmfit --memory=24G --ram=64G --cpu-cores=8
Context length cap
Override the context length used for memory estimation:
llmfit --max-context 4096 --cli
Theme selection
Press t in the TUI to cycle through 10 built-in themes (Dracula, Nord, Gruvbox, Catppuccin variants, etc.). Your selection persists across sessions.
Environment variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Remote Ollama instance |
LLMFIT_DASHBOARD_PORT |
8787 |
Web dashboard port |
DOCKER_MODEL_RUNNER_HOST |
http://localhost:12434 |
Remote Docker Model Runner |
LMSTUDIO_HOST |
http://127.0.0.1:1234 |
Remote LM Studio instance |
Use Cases
1. Model Discovery
You have a mid-range GPU (8 GB VRAM) and want to run a local coding assistant. Run:
llmfit recommend --use-case coding --limit 10
llmfit will recommend models like Qwen2.5-Coder-7B at Q4_K_M, DeepSeek-Coder-V2-Lite, or CodeLlama-7B, with estimated tokens/sec for each.
2. Hardware Planning
You're building a new workstation and want to know what GPU to buy. Use hardware overrides:
llmfit --memory=24G recommend --json
Simulate different GPU configurations and see which models unlock.
3. CI/CD Integration
Use the REST API as a node-level scheduler for a cluster:
llmfit serve --host 0.0.0.0 --port 8787
curl http://localhost:8787/api/v1/models/top?limit=5&use_case=coding
4. AI Agent Integration
llmfit ships with an OpenClaw skill. Install it:
cp -r skills/llmfit-advisor ~/.openclaw/skills/
Then ask your agent: "Recommend a coding model for my hardware and configure Ollama with it."
Resources
- GitHub: github.com/AlexsJones/llmfit
- Documentation: built-in via
llmfit --helpand the extensive README - Community Leaderboard: localmaxxing.com — real-world performance data from users
- License: MIT
- Install script: llmfit.axjns.dev/install.sh