How to Install Ollama on Windows, Mac, and Linux
Ollama has quickly become one of the most popular tools for running large language models (LLMs) locally on your own machine — no cloud subscription, no API key, no data leaving your device. If you've heard the buzz and want to get started, the installation process is more straightforward than you might expect. But a few variables — your operating system, hardware specs, and intended use — will shape how smoothly things go.
What Is Ollama and Why Does Installation Matter?
Ollama is an open-source runtime that lets you download, manage, and run AI language models locally. Think of it like a package manager, but specifically for LLMs. Once installed, you can pull models like Llama 3, Mistral, Gemma, or Phi directly from the command line and run them entirely offline.
Installation isn't complicated, but it does involve a few steps that vary by platform. Getting it right the first time means understanding what Ollama actually installs and what your system needs to support it.
System Requirements Before You Begin 🖥️
Ollama's core requirement is a reasonably modern computer with enough RAM to load a model into memory. Here's what generally applies:
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB or more |
| Storage | 5 GB free | 20+ GB (for multiple models) |
| CPU | 64-bit, modern | Multi-core, recent generation |
| GPU | Not required | NVIDIA (CUDA) or Apple Silicon for speed |
A key point: Ollama can run on CPU alone, but performance is significantly slower. If you have an NVIDIA GPU, Ollama will automatically use CUDA for acceleration. On Apple Silicon Macs (M1, M2, M3, M4), it uses Metal for GPU acceleration natively. AMD GPU support exists but varies by platform and driver version.
Your available RAM is arguably the most important factor. Smaller models (around 3–7 billion parameters) typically need 4–8 GB of RAM. Larger models (13B+) benefit from 16 GB or more.
How to Install Ollama on macOS
macOS installation is the most streamlined experience:
- Visit ollama.com and download the macOS installer (a
.zipfile). - Open the downloaded file and drag the Ollama app to your Applications folder.
- Launch Ollama from Applications. It will appear as a menu bar icon.
- Open Terminal and verify the installation by typing:
ollama --version Ollama runs as a background service once launched. Apple Silicon Macs get hardware-accelerated inference automatically — no extra configuration needed.
How to Install Ollama on Windows
Windows support is mature and straightforward:
- Go to ollama.com and download the Windows installer (
.exe). - Run the installer and follow the prompts. Ollama installs and starts as a background service automatically.
- Open Command Prompt or PowerShell and confirm with:
ollama --version If you have an NVIDIA GPU, ensure your drivers are up to date before installing. Ollama will detect CUDA automatically. Windows users without a GPU will still run models — just at CPU speeds.
Note for WSL users: Ollama can also run inside Windows Subsystem for Linux (WSL 2), which some developers prefer for a Linux-native workflow. This requires a slightly different setup than the native Windows installer.
How to Install Ollama on Linux
Linux installation uses a single shell command:
curl -fsSL https://ollama.com/install.sh | sh This script handles everything: downloading the binary, setting up a systemd service, and configuring Ollama to start automatically. After the script completes, verify with:
ollama --version For NVIDIA GPU acceleration on Linux, you'll need the NVIDIA container toolkit or appropriate CUDA drivers installed beforehand. The installer script will attempt to detect your GPU setup, but driver configuration is a separate step that depends on your Linux distribution.
AMD GPU (ROCm) support on Linux is available but requires compatible hardware and ROCm drivers. It's functional but involves more manual setup than the NVIDIA path.
Running Your First Model After Installation
Once Ollama is installed, pulling and running a model is a single command:
ollama run llama3 The first run downloads the model files (sizes range from a few hundred MB to tens of GB depending on the model). After that, it's cached locally and loads much faster on subsequent runs.
You can list available models with ollama list and pull specific model variants using ollama pull modelname:tag. Ollama's model library covers dozens of open-source LLMs across different sizes and specializations.
What Can Go Wrong — and What to Check 🔧
A few common friction points appear across all platforms:
- Port conflicts: Ollama runs a local server on port 11434 by default. If something else is using that port, you'll need to configure an alternate.
- Firewall or antivirus flags: Some security software flags Ollama's installer or service. Whitelisting the application usually resolves this.
- Insufficient RAM: If a model fails to load or crashes mid-conversation, the model may be too large for available memory. Try a smaller variant (e.g., a 3B or 7B parameter model instead of 13B+).
- Driver issues on Windows/Linux: GPU acceleration failing is almost always a driver version problem, not an Ollama bug.
The Variables That Shape Your Experience
Installation itself is the easy part. What happens after depends on factors that differ from machine to machine:
- Hardware generation: A 3-year-old laptop with 8 GB RAM will have a meaningfully different experience than a desktop workstation with an NVIDIA RTX-series GPU.
- Which models you want to run: A small model for quick Q&A has very different requirements than a larger model for code generation or document analysis.
- Your workflow: Running Ollama standalone via terminal is different from integrating it with tools like Open WebUI, VS Code extensions, or custom API applications — each adds its own setup steps.
- Operating system quirks: Linux users with non-standard distributions or older kernels may encounter dependency issues the installer script doesn't handle automatically.
Someone running Ollama on a modern MacBook Pro with M3 will have an almost plug-and-play experience. Someone on a Linux server with an older AMD GPU will navigate a more involved process. Both paths lead to the same destination — but the road looks different depending on where you're starting from.