How to Install WhisperX: A Complete Setup Guide

WhisperX is a powerful speech recognition tool built on top of OpenAI's Whisper model. It adds word-level timestamps, speaker diarization, and faster processing — making it a popular choice for transcription workflows, subtitle generation, and research. But installing it correctly requires navigating a few dependencies that trip up many users.

Here's what you need to know before you begin.

What WhisperX Actually Is

WhisperX is an open-source Python library that extends the base Whisper model with:

  • Faster inference via the faster-whisper backend
  • Word-level timestamps using forced alignment
  • Speaker diarization (identifying who said what) via PyAnnote

Because it layers multiple libraries together, the installation is more involved than a typical pip install. Understanding what each component does helps you troubleshoot if something breaks.

Core Prerequisites Before You Install

Before running any install commands, your system needs to meet several requirements. Skipping this step is the most common reason installations fail.

Python Version

WhisperX requires Python 3.8 or higher. Python 3.10 and 3.11 are generally the most stable choices. You can check your version by running:

python --version 

If you're on an older version, install a newer one via python.org or use a tool like pyenv to manage multiple versions.

CUDA and GPU Support 🖥️

WhisperX is designed to run significantly faster on a CUDA-compatible NVIDIA GPU. If you're running on CPU only, it will still work — but processing will be considerably slower.

SetupSpeedRequirements
NVIDIA GPU (CUDA)FastCUDA Toolkit + cuDNN
Apple Silicon (MPS)ModeratemacOS 12.3+, PyTorch nightly
CPU onlySlowNone beyond Python

If you're using GPU acceleration, install the appropriate version of PyTorch for your CUDA version before installing WhisperX. Visit pytorch.org and use the official selector to get the correct install command.

FFmpeg

WhisperX uses FFmpeg for audio processing. This must be installed at the system level, not via pip.

  • Windows: Download from ffmpeg.org and add it to your PATH
  • macOS:brew install ffmpeg
  • Linux (Debian/Ubuntu):sudo apt install ffmpeg

Skipping FFmpeg causes audio loading errors that aren't always obvious from the error message.

Installing WhisperX Step by Step

Step 1: Create a Virtual Environment

Isolating your installation in a virtual environment prevents dependency conflicts with other Python projects.

python -m venv whisperx-env source whisperx-env/bin/activate # On Windows: whisperx-envScriptsactivate 

Step 2: Install PyTorch First

Install PyTorch before WhisperX, using the version matched to your hardware. Example for CUDA 11.8:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 

For CPU-only systems:

pip install torch torchvision torchaudio 

Step 3: Install WhisperX

pip install whisperx 

This pulls in faster-whisper, transformers, and other core dependencies automatically.

Step 4: Set Up Speaker Diarization (Optional)

If you need speaker diarization — the ability to label which speaker said what — you'll need to:

  1. Create a free account at Hugging Face
  2. Accept the user agreements for the PyAnnote speaker diarization models
  3. Generate an access token from your Hugging Face account settings

Without completing this step, diarization will fail with an authentication error. The base transcription and alignment features work without it.

Running WhisperX After Installation 🎙️

Once installed, you can run WhisperX from the command line:

whisperx audio.mp3 --model large-v2 --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --language en 

Or use it in Python scripts by importing it as a library, which gives you more control over output format, batch size, and compute type.

Common Installation Problems

No module named whisperx — Usually means you installed into the wrong Python environment. Confirm your virtual environment is activated.

CUDA out of memory — Reduce the --batch_size parameter or switch to a smaller model like base or small.

ffmpeg not found — FFmpeg isn't in your system PATH. Reinstall it and verify with ffmpeg -version in your terminal.

PyAnnote authentication errors — You haven't accepted the model terms on Hugging Face, or your token isn't correctly passed as --hf_token.

What Shapes Your Experience

WhisperX behaves differently depending on factors specific to your setup:

  • Your GPU's VRAM determines which model sizes you can run without memory errors
  • Your audio quality affects transcription accuracy — background noise, overlapping speakers, and low bitrate recordings all introduce errors
  • Your operating system and CUDA version affect which PyTorch build you need
  • Whether you need diarization adds a meaningful layer of setup complexity and a Hugging Face dependency

The base installation is straightforward for users comfortable with Python environments. The complexity scales up as you add GPU acceleration and diarization — and exactly how that plays out depends on what your machine is running and what your workflow actually requires.