How to Install RDKit in JupyterLab: A Complete Setup Guide
RDKit is one of the most widely used open-source cheminformatics libraries, built for working with molecular structures, chemical data, and computational chemistry workflows. Getting it running inside JupyterLab takes a few deliberate steps โ and the method you use matters more than most people realize.
What RDKit Actually Is (And Why Installation Is Trickier Than Usual)
RDKit isn't a pure Python package. It includes compiled C++ extensions, which means a simple pip install rdkit historically failed or produced broken installs. While this has improved in recent years with official PyPI support, the recommended and most reliable installation path still runs through conda โ specifically the conda-forge channel.
This distinction matters because JupyterLab itself can be installed multiple ways, and the method you used to set it up directly affects how you should install RDKit alongside it.
Understanding the Environment Before You Install
Before running any install command, you need to know two things:
- Which Python environment is JupyterLab running in?
- Are you using conda, pip, or a virtual environment manager?
Installing RDKit into the wrong environment is the most common reason it installs without errors but then can't be imported inside a notebook. JupyterLab's kernel and your terminal's active environment are not always the same thing.
Check Your Active Environment
In your terminal:
which python # or on Windows: where python In a JupyterLab cell:
import sys print(sys.executable) If these two paths don't match, you're working in different environments. That mismatch is what causes the dreaded ModuleNotFoundError: No module named 'rdkit' even after a successful install.
Method 1: Installing RDKit via Conda (Recommended) ๐งช
If you're using Anaconda or Miniconda, this is the cleanest path.
Step 1: Create a Dedicated Conda Environment
Working in a dedicated environment keeps your RDKit dependencies isolated:
conda create -n chem_env python=3.10 conda activate chem_env Step 2: Install RDKit from conda-forge
conda install -c conda-forge rdkit This pulls in all required compiled dependencies automatically.
Step 3: Install JupyterLab Into the Same Environment
conda install -c conda-forge jupyterlab Then launch it:
jupyter lab Step 4: Verify the Install
Inside a JupyterLab notebook cell:
from rdkit import Chem mol = Chem.MolFromSmiles('CCO') print(mol) If this returns a molecule object rather than None or an error, RDKit is working correctly.
Method 2: Installing RDKit via pip
Since RDKit 2022.03, official pip support has improved significantly. If your JupyterLab setup is pip-based, this route is viable:
pip install rdkit Or, in a notebook cell using the recommended in-notebook install pattern:
import sys !{sys.executable} -m pip install rdkit Using sys.executable instead of a plain !pip install ensures the package installs into the exact Python kernel your notebook is running โ not a different environment on your system.
When pip Works Well vs. When It Doesn't
| Scenario | pip Reliability |
|---|---|
| Clean virtual environment (venv) | Generally reliable |
| Base system Python | May conflict with system packages |
| Conda environment | Use conda instead to avoid conflicts |
| Google Colab / cloud notebooks | pip works; rdkit installs cleanly |
Method 3: Using RDKit in Cloud-Based JupyterLab (Google Colab, etc.)
If you're using Google Colab or similar hosted notebook environments, installation is simpler:
!pip install rdkit No environment management needed โ the install applies to the current session. Note that cloud sessions are ephemeral, so you'll need to reinstall each time you start a new session.
Registering a Conda Environment as a JupyterLab Kernel
A common scenario: you've installed RDKit into a conda environment, but JupyterLab (launched separately) doesn't show that environment as a kernel option. Fix this by registering it:
conda activate chem_env pip install ipykernel python -m ipykernel install --user --name chem_env --display-name "Python (chem_env)" After restarting JupyterLab, the Python (chem_env) kernel will appear in the launcher. Switching to it gives your notebooks access to RDKit. ๐ฌ
Common Errors and What They Usually Mean
ModuleNotFoundError: No module named 'rdkit' Almost always an environment mismatch. Confirm that sys.executable in your notebook points to the environment where RDKit is installed.
ImportError with compiled extension messages Suggests a broken or incomplete install. Most common with pip installs on certain OS/architecture combinations. Switching to conda typically resolves this.
Kernel crashes on import Can indicate a version conflict between RDKit and another installed package (NumPy version mismatches are frequent). Creating a fresh environment and installing RDKit first before adding other packages usually avoids this.
Variables That Affect Which Method Works Best for You
The right installation path isn't universal โ it depends on several factors specific to your setup:
- Operating system: macOS (especially Apple Silicon), Windows, and Linux have different quirks with compiled packages
- Python version: RDKit's pip wheels don't cover every Python version equally; conda-forge tends to have broader coverage
- Existing environment setup: Whether you're already deep into a conda or pip-based workflow affects which method introduces fewer conflicts
- JupyterLab version: Newer JupyterLab versions handle kernel environments differently than older ones
- Use case complexity: Simple molecule rendering needs differ from full cheminformatics pipelines that also require pandas, numpy, and visualization libraries alongside RDKit
Getting RDKit running in JupyterLab is straightforward once the environment relationship is clear โ but what "straightforward" looks like in practice depends almost entirely on how your Python environment is already structured and what you're building with it.