MorphoCLIP

Text-supervised contrastive learning for perturbation matching in Cell Painting images.

MorphoCLIP connects microscopy images of cells with text descriptions of biological treatments. Given an image of cells that have been treated with a drug or genetic modification, MorphoCLIP can identify what treatment was applied — by learning to match visual patterns with textual descriptions.

How It Works

MorphoCLIP Architecture

MorphoCLIP uses two AI models working together:

A vision model (DINOv3) looks at microscopy images and extracts visual features — patterns in how cells look after treatment.
A language model (BioClinical ModernBERT) reads text descriptions of biological treatments and extracts their meaning.
Contrastive learning trains the system to place matching image-text pairs close together in a shared space, so similar treatments cluster together.

Key Numbers


Dataset	CPJUMP1 — 51 plates, 3M+ Cell Painting images
Compounds tested	303 chemical compounds
Genes tested	160 genes (CRISPR knockouts + ORF overexpressions)
Image encoder	Frozen DINOv3 ViT-L/16 (300M parameters)
Text encoder	Frozen BioClinical ModernBERT (150M parameters)

Get Started

Installation — Set up MorphoCLIP on your machine
Quick Start — Run the training pipeline end-to-end
Training Pipeline — Understand the model architecture and training process
Glossary — Plain-language definitions of technical terms used throughout these docs