MorphoCLIP
Text-supervised contrastive learning for perturbation matching in Cell Painting images.
MorphoCLIP connects microscopy images of cells with text descriptions of biological treatments. Given an image of cells that have been treated with a drug or genetic modification, MorphoCLIP can identify what treatment was applied — by learning to match visual patterns with textual descriptions.
How It Works
MorphoCLIP uses two AI models working together:
- A vision model (DINOv3) looks at microscopy images and extracts visual features — patterns in how cells look after treatment.
- A language model (BioClinical ModernBERT) reads text descriptions of biological treatments and extracts their meaning.
- Contrastive learning trains the system to place matching image-text pairs close together in a shared space, so similar treatments cluster together.
Key Numbers
| Dataset | CPJUMP1 — 51 plates, 3M+ Cell Painting images |
| Compounds tested | 303 chemical compounds |
| Genes tested | 160 genes (CRISPR knockouts + ORF overexpressions) |
| Image encoder | Frozen DINOv3 ViT-L/16 (300M parameters) |
| Text encoder | Frozen BioClinical ModernBERT (150M parameters) |
Get Started
- Installation — Set up MorphoCLIP on your machine
- Quick Start — Run the training pipeline end-to-end
- Training Pipeline — Understand the model architecture and training process
- Glossary — Plain-language definitions of technical terms used throughout these docs
Last updated on