llmfit — Find the Perfect LLM for Your Hardware in One Command

2026-05-27

llmfit — Find the Perfect LLM for Your Hardware in One Command

Running local LLMs is exciting — until you spend hours downloading models that don't fit your GPU, crash your system RAM, or run at 2 tokens per second. The ecosystem has hundreds of models across dozens of providers, and figuring out which ones actually work on your hardware is a painful trial-and-error process.

llmfit solves this. It's a Rust-powered CLI (and TUI) that detects your hardware, scores every model against your GPU, RAM, and CPU, and tells you — in one command — exactly which models will run well, how fast they'll go, and how well they fit. With 26,000+ GitHub stars and growing fast, it's becoming the standard tool for local LLM model selection.

What is llmfit?

Created by AlexsJones and released in February 2026, llmfit is an open-source terminal tool that right-sizes LLM models to your system's hardware. Written in Rust, it ships with an interactive TUI (Terminal User Interface), a classic CLI mode for scripting, a web dashboard, and a REST API.

Key capabilities:

Hardware detection — reads total/available RAM, CPU cores, and probes for NVIDIA, AMD, Intel Arc, Apple Silicon, and Ascend GPUs
Model database — hundreds of models sourced from HuggingFace, embedded at compile time, covering Meta Llama, Mistral, Qwen, DeepSeek, Google Gemma, Microsoft Phi, and more
Multi-dimensional scoring — each model scored across Quality, Speed, Fit, and Context dimensions (0–100 each), with use-case-specific weights
Dynamic quantization — tries the best quality quantization that fits your hardware, walking from Q8_0 down to Q2_K
MoE support — correctly detects Mixture-of-Experts architectures with expert offloading, so Mixtral and DeepSeek get realistic VRAM estimates
Provider integration — connects to Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio for one-click model downloads
Multi-platform — Linux (full GPU support), macOS (Apple Silicon + Intel), Windows, and even Android/Termux

Why is it Trending?

llmfit hit 26,000+ stars in just three months for several reasons:

The local LLM boom — everyone wants to run models locally, but nobody wants to guess which models work
Rust performance — blazing fast, single binary, no dependency hell
Beautiful TUI — interactive terminal interface with filtering, sorting, and one-click downloads
Practical value — saves hours of trial-and-error per user, especially for people with mid-range hardware
OpenCLaw/Hermes Agent skill — ships with an agent skill so AI coding assistants can auto-configure model providers

Prerequisites

A Linux, macOS, or Windows machine
Optional: NVIDIA GPU with drivers, AMD GPU with ROCm, or Apple Silicon
Optional: Ollama, llama.cpp, MLX, Docker Model Runner, or LM Studio for model downloads
For Docker: Docker Engine or Podman

Installation

Homebrew (macOS / Linux)

brew install AlexsJones/llmfit/llmfit

Scoop (Windows)

scoop install llmfit

Quick install script (Linux / macOS)

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Cargo (if you have Rust installed)

cargo install llmfit

Docker

docker run ghcr.io/alexsjones/llmfit

Nix flake

nix run github:AlexsJones/llmfit

Quick Start

Just run llmfit — that's it. The TUI launches automatically, detects your hardware, scores every model, and ranks them by fit:

llmfit

For non-interactive output, use CLI mode:

llmfit --cli

This prints a table of all models ranked by composite score, with columns for Quality, Speed, Fit, Context, total parameters, and hardware requirements.

See what you're working with

llmfit system

This shows detected hardware: CPU cores, total RAM, GPU model, VRAM, and the acceleration backend (CUDA, Metal, ROCm, etc.).

Find the best models for coding

llmfit recommend --use-case coding --limit 5

Find models that fit perfectly

llmfit fit --perfect -n 10

Output as JSON (for scripting or AI agents)

llmfit recommend --json --limit 5

Architecture Overview

llmfit's architecture follows a clean pipeline: hardware detection → model database query → multi-dimensional scoring → ranked output.

llmfit Architecture

The pipeline works as follows:

Hardware Detection — The hardware.rs module reads system specs: total/available RAM via sysinfo, CPU core count, and GPU probing. For NVIDIA, it uses nvidia-smi with multi-GPU aggregation. AMD GPUs are detected via rocm-smi, Intel Arc via sysfs, and Apple Silicon via system_profiler. A backend identifier (CUDA, Metal, ROCm, SYCL, CPU) is attached for speed estimation.
Model Database — Over 200 models are stored in data/hf_models.json, embedded at compile time via include_str!. Each entry includes parameter count, context length, quantization hierarchy, and use-case categories. Mixture-of-Experts models are flagged with their active expert ratio.
Quantization Engine — Instead of a fixed quantization, llmfit walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed). It picks the highest quality that fits available memory. If nothing fits at full context, it retries at half context.
Scoring Engine — Each model receives a 0–100 score across four dimensions:
- Quality: parameter count, family reputation, quantization penalty, task alignment
- Speed: estimated tokens/sec based on backend bandwidth × efficiency factor
- Fit: memory utilization efficiency (sweet spot: 50–80% of available memory)
- Context: context window capability vs target use case
Runtime Providers — llmfit integrates with Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio. When you press d in the TUI, it pulls/downloads the model to your chosen provider. Installed models get a green checkmark.
Output Modes — Interactive TUI (ratatui), classic CLI table, JSON output, web dashboard (auto-starts on :8787), and REST API (llmfit serve).

Configuration

llmfit is zero-config by design — just run it. But you can tune several aspects.

Hardware overrides

If autodetection fails (broken nvidia-smi, VMs), override manually:

llmfit --memory=24G --ram=64G --cpu-cores=8

Context length cap

Override the context length used for memory estimation:

llmfit --max-context 4096 --cli

Theme selection

Press t in the TUI to cycle through 10 built-in themes (Dracula, Nord, Gruvbox, Catppuccin variants, etc.). Your selection persists across sessions.

Environment variables

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Remote Ollama instance
`LLMFIT_DASHBOARD_PORT`	`8787`	Web dashboard port
`DOCKER_MODEL_RUNNER_HOST`	`http://localhost:12434`	Remote Docker Model Runner
`LMSTUDIO_HOST`	`http://127.0.0.1:1234`	Remote LM Studio instance

Use Cases

1. Model Discovery

You have a mid-range GPU (8 GB VRAM) and want to run a local coding assistant. Run:

llmfit recommend --use-case coding --limit 10

llmfit will recommend models like Qwen2.5-Coder-7B at Q4_K_M, DeepSeek-Coder-V2-Lite, or CodeLlama-7B, with estimated tokens/sec for each.

2. Hardware Planning

You're building a new workstation and want to know what GPU to buy. Use hardware overrides:

llmfit --memory=24G recommend --json

Simulate different GPU configurations and see which models unlock.

3. CI/CD Integration

Use the REST API as a node-level scheduler for a cluster:

llmfit serve --host 0.0.0.0 --port 8787
curl http://localhost:8787/api/v1/models/top?limit=5&use_case=coding

4. AI Agent Integration

llmfit ships with an OpenClaw skill. Install it:

cp -r skills/llmfit-advisor ~/.openclaw/skills/

Then ask your agent: "Recommend a coding model for my hardware and configure Ollama with it."

Resources

GitHub: github.com/AlexsJones/llmfit
Documentation: built-in via llmfit --help and the extensive README
Community Leaderboard: localmaxxing.com — real-world performance data from users
License: MIT
Install script: llmfit.axjns.dev/install.sh

llmfit — Trouvez le Modèle LLM Parfait pour Votre Matériel en Une Commande

2026-05-27

llmfit — Trouvez le Modèle LLM Parfait pour Votre Matériel en Une Commande

Exécuter des LLM en local est excitant — jusqu'à ce que vous passiez des heures à télécharger des modèles qui ne tiennent pas sur votre GPU, plantent votre RAM système, ou tournent à 2 tokens par seconde. L'écosystème compte des centaines de modèles chez des dizaines de fournisseurs, et déterminer lesquels fonctionnent réellement sur votre matériel est un processus douloureux d'essais et d'erreurs.

llmfit résout ce problème. C'est un CLI (et TUI) écrit en Rust qui détecte votre matériel, évalue chaque modèle par rapport à votre GPU, RAM et CPU, et vous indique — en une seule commande — exactement quels modèles fonctionneront bien, à quelle vitesse, et à quel point ils sont adaptés. Avec plus de 26 000 étoiles GitHub et une croissance rapide, il devient l'outil standard pour la sélection de modèles LLM locaux.

Qu'est-ce que llmfit ?

Créé par AlexsJones et publié en février 2026, llmfit est un outil terminal open-source qui adapte les modèles LLM à votre matériel. Écrit en Rust, il propose une TUI interactive (Interface Utilisateur Terminal), un mode CLI classique pour les scripts, un tableau de bord web, et une API REST.

Fonctionnalités clés :

Détection matérielle — lit la RAM totale/disponible, les cœurs CPU, et探测e les GPU NVIDIA, AMD, Intel Arc, Apple Silicon et Ascend
Base de données de modèles — des centaines de modèles provenant de HuggingFace, intégrés à la compilation, couvrant Meta Llama, Mistral, Qwen, DeepSeek, Google Gemma, Microsoft Phi, et plus
Score multidimensionnel — chaque modèle noté sur les dimensions Qualité, Vitesse, Adaptation et Contexte (0–100 chacune), avec des poids spécifiques aux cas d'usage
Quantification dynamique — essaie la meilleure quantification qui tient dans votre matériel, en parcourant de Q8_0 jusqu'à Q2_K
Support MoE — détecte correctement les architectures Mixture-of-Experts avec déchargement d'experts
Intégration fournisseurs — se connecte à Ollama, llama.cpp, MLX, Docker Model Runner et LM Studio
Multi-plateforme — Linux, macOS (Apple Silicon + Intel), Windows et Android/Termux

Pourquoi est-il Tendance ?

llmfit a atteint 26 000+ étoiles en seulement trois mois pour plusieurs raisons :

L'essor des LLM locaux — tout le monde veut exécuter des modèles localement, mais personne ne veut deviner
Performances Rust — rapide, binaire unique, pas de dépendances complexes
TUI magnifique — interface terminal interactive avec filtrage, tri et téléchargement en un clic
Valeur pratique — économise des heures d'essais et d'erreurs par utilisateur
Compétence OpenClaw/Hermes Agent — inclut une compétence d'agent pour la configuration automatique

Prérequis

Un ordinateur Linux, macOS ou Windows
Optionnel : GPU NVIDIA avec pilotes, AMD avec ROCm, ou Apple Silicon
Optionnel : Ollama, llama.cpp, MLX, Docker Model Runner, ou LM Studio pour les téléchargements
Pour Docker : Docker Engine ou Podman

Installation

Homebrew (macOS / Linux)

brew install AlexsJones/llmfit/llmfit

Scoop (Windows)

scoop install llmfit

Script d'installation rapide (Linux / macOS)

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Cargo (si Rust est installé)

cargo install llmfit

Docker

docker run ghcr.io/alexsjones/llmfit

Nix flake

nix run github:AlexsJones/llmfit

Démarrage Rapide

Exécutez simplement llmfit — c'est tout. La TUI se lance automatiquement, détecte votre matériel, évalue chaque modèle et les classe par pertinence :

llmfit

Pour une sortie non interactive, utilisez le mode CLI :

llmfit --cli

Voir votre configuration matérielle

llmfit system

Meilleurs modèles pour le codage

llmfit recommend --use-case coding --limit 5

Modèles parfaitement adaptés

llmfit fit --perfect -n 10

Sortie JSON

llmfit recommend --json --limit 5

Architecture

L'architecture de llmfit suit un pipeline clair : détection matérielle → interrogation de la base de modèles → notation multidimensionnelle → classement.

Architecture llmfit

Le pipeline fonctionne comme suit :

Détection Matérielle — Le module hardware.rs lit les spécifications système : RAM via sysinfo, cœurs CPU, et探测e les GPU via nvidia-smi, rocm-smi, system_profiler, etc.
Base de Modèles — Plus de 200 modèles dans data/hf_models.json, intégrés à la compilation. Chaque entrée inclut le nombre de paramètres, la longueur de contexte, la hiérarchie de quantification.
Moteur de Quantification — Parcourt une hiérarchie de Q8_0 à Q2_K pour choisir la meilleure qualité possible.
Moteur de Score — Chaque modèle reçoit un score 0–100 sur quatre dimensions : Qualité, Vitesse, Adaptation et Contexte.
Fournisseurs d'Exécution — Intégration avec Ollama, llama.cpp, MLX, Docker Model Runner et LM Studio.
Modes de Sortie — TUI interactive, tableau CLI classique, JSON, tableau de bord web et API REST.

Configuration

llmfit est conçu sans configuration — exécutez-le simplement. Mais vous pouvez ajuster plusieurs aspects.

Surcharge matérielle

llmfit --memory=24G --ram=64G --cpu-cores=8

Limite de contexte

llmfit --max-context 4096 --cli

Sélection de thème

Appuyez sur t dans la TUI pour parcourir 10 thèmes intégrés (Dracula, Nord, Gruvbox, Catppuccin, etc.).

Cas d'Usage

1. Découverte de Modèles

llmfit recommend --use-case coding --limit 10

2. Planification Matérielle

llmfit --memory=24G recommend --json

3. Intégration CI/CD

llmfit serve --host 0.0.0.0 --port 8787

4. Intégration Agent IA

cp -r skills/llmfit-advisor ~/.openclaw/skills/

Ressources

GitHub : github.com/AlexsJones/llmfit
Documentation : via llmfit --help et le README détaillé
Classement Communautaire : localmaxxing.com
Licence : MIT
Script d'installation : llmfit.axjns.dev/install.sh