How to Install WhisperX: A Complete Setup Guide
WhisperX is a powerful speech recognition tool built on top of OpenAI's Whisper model. It adds word-level timestamps, speaker diarization, and faster processing — making it a popular choice for transcription workflows, subtitle generation, and research. But installing it correctly requires navigating a few dependencies that trip up many users.
Here's what you need to know before you begin.
What WhisperX Actually Is
WhisperX is an open-source Python library that extends the base Whisper model with:
- Faster inference via the
faster-whisperbackend - Word-level timestamps using forced alignment
- Speaker diarization (identifying who said what) via PyAnnote
Because it layers multiple libraries together, the installation is more involved than a typical pip install. Understanding what each component does helps you troubleshoot if something breaks.
Core Prerequisites Before You Install
Before running any install commands, your system needs to meet several requirements. Skipping this step is the most common reason installations fail.
Python Version
WhisperX requires Python 3.8 or higher. Python 3.10 and 3.11 are generally the most stable choices. You can check your version by running:
python --version If you're on an older version, install a newer one via python.org or use a tool like pyenv to manage multiple versions.
CUDA and GPU Support 🖥️
WhisperX is designed to run significantly faster on a CUDA-compatible NVIDIA GPU. If you're running on CPU only, it will still work — but processing will be considerably slower.
| Setup | Speed | Requirements |
|---|---|---|
| NVIDIA GPU (CUDA) | Fast | CUDA Toolkit + cuDNN |
| Apple Silicon (MPS) | Moderate | macOS 12.3+, PyTorch nightly |
| CPU only | Slow | None beyond Python |
If you're using GPU acceleration, install the appropriate version of PyTorch for your CUDA version before installing WhisperX. Visit pytorch.org and use the official selector to get the correct install command.
FFmpeg
WhisperX uses FFmpeg for audio processing. This must be installed at the system level, not via pip.
- Windows: Download from ffmpeg.org and add it to your PATH
- macOS:
brew install ffmpeg - Linux (Debian/Ubuntu):
sudo apt install ffmpeg
Skipping FFmpeg causes audio loading errors that aren't always obvious from the error message.
Installing WhisperX Step by Step
Step 1: Create a Virtual Environment
Isolating your installation in a virtual environment prevents dependency conflicts with other Python projects.
python -m venv whisperx-env source whisperx-env/bin/activate # On Windows: whisperx-envScriptsactivate Step 2: Install PyTorch First
Install PyTorch before WhisperX, using the version matched to your hardware. Example for CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 For CPU-only systems:
pip install torch torchvision torchaudio Step 3: Install WhisperX
pip install whisperx This pulls in faster-whisper, transformers, and other core dependencies automatically.
Step 4: Set Up Speaker Diarization (Optional)
If you need speaker diarization — the ability to label which speaker said what — you'll need to:
- Create a free account at Hugging Face
- Accept the user agreements for the PyAnnote speaker diarization models
- Generate an access token from your Hugging Face account settings
Without completing this step, diarization will fail with an authentication error. The base transcription and alignment features work without it.
Running WhisperX After Installation 🎙️
Once installed, you can run WhisperX from the command line:
whisperx audio.mp3 --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --language en Or use it in Python scripts by importing it as a library, which gives you more control over output format, batch size, and compute type.
Common Installation Problems
No module named whisperx — Usually means you installed into the wrong Python environment. Confirm your virtual environment is activated.
CUDA out of memory — Reduce the --batch_size parameter or switch to a smaller model like base or small.
ffmpeg not found — FFmpeg isn't in your system PATH. Reinstall it and verify with ffmpeg -version in your terminal.
PyAnnote authentication errors — You haven't accepted the model terms on Hugging Face, or your token isn't correctly passed as --hf_token.
What Shapes Your Experience
WhisperX behaves differently depending on factors specific to your setup:
- Your GPU's VRAM determines which model sizes you can run without memory errors
- Your audio quality affects transcription accuracy — background noise, overlapping speakers, and low bitrate recordings all introduce errors
- Your operating system and CUDA version affect which PyTorch build you need
- Whether you need diarization adds a meaningful layer of setup complexity and a Hugging Face dependency
The base installation is straightforward for users comfortable with Python environments. The complexity scales up as you add GPU acceleration and diarization — and exactly how that plays out depends on what your machine is running and what your workflow actually requires.