Run AI Locally
A no-nonsense guide to running LLMs on your own hardware. Full privacy, zero API costs, works offline. Here's exactly what you need and how to set it up.
Who this is for
I've been running local models for over a year on everything from a budget Linux box to an M3 Max MacBook. This guide covers the practical side: what hardware you actually need, which software to use, and which models are worth downloading. If you're tired of API costs, privacy concerns, or rate limits, this is your starting point.
Why run locally
Privacy
Your data never leaves your machine. No third-party API logging, no training on your inputs. Critical for handling sensitive documents, client data, or classified material.
Cost
After the hardware investment, every inference is free. Run thousands of queries per day without watching a billing dashboard. Pays for itself within months for heavy users.
Speed
No network latency, no rate limits, no queue times. Local inference on a good GPU delivers responses in seconds. Perfect for batch processing and tooling integration.
Offline Access
Works on planes, in air-gapped environments, and during outages. Your AI assistant doesn't need the internet. Essential for field work and secure facilities.
Hardware requirements
You do not need a data center. Click each tier to see the details.
Software options
Four tools, each good at different things. If you are starting from scratch, go with Ollama.
Model recommendations
Matched to use case. Sizes listed are for the quantized (Q4_K_M) variant you will actually download.
| Use Case | Model | Size | Why |
|---|---|---|---|
| General chat | Llama 3.1 8B | 4.7GB | Fast, capable, great instruction following |
| Code assistance | CodeLlama 34B | 19GB | Trained on code, good at generation and review |
| Security analysis | Mixtral 8x7B | 26GB | Strong reasoning, handles technical analysis well |
| Document Q&A | Llama 3.1 70B (Q4) | 40GB | Best open-source quality, needs good GPU |
| Quick tasks | Phi-3 Mini | 2.3GB | Tiny but surprisingly capable for simple tasks |
| Privacy-critical | Mistral 7B | 4.1GB | Good balance of size and capability, runs anywhere |
Security considerations
Quick start
From zero to chatting with a local LLM in under five minutes. Copy and paste these commands.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model (Llama 3.1 8B, ~4.7GB download)
ollama pull llama3.1
# Start chatting
ollama run llama3.1
# Or use the API
curl http://localhost:11434/api/generate -d '{"model":"llama3.1","prompt":"Explain XSS in 3 sentences"}'Want more hands-on guides?
I write about AI, security tooling, and practical infrastructure on this blog. Real setups, real numbers, no vendor pitches.
Read the blog