Skip to Content
Overview

MorphoCLIP

Text-supervised contrastive learning for perturbation matching in Cell Painting images.

MorphoCLIP connects microscopy images of cells with text descriptions of biological treatments. Given an image of cells that have been treated with a drug or genetic modification, MorphoCLIP can identify what treatment was applied — by learning to match visual patterns with textual descriptions.

How It Works

MorphoCLIP Architecture

MorphoCLIP uses two AI models working together:

  1. A vision model (DINOv3) looks at microscopy images and extracts visual features — patterns in how cells look after treatment.
  2. A language model (BioClinical ModernBERT) reads text descriptions of biological treatments and extracts their meaning.
  3. Contrastive learning trains the system to place matching image-text pairs close together in a shared space, so similar treatments cluster together.

Key Numbers

DatasetCPJUMP1 — 51 plates, 3M+ Cell Painting images
Compounds tested303 chemical compounds
Genes tested160 genes (CRISPR knockouts + ORF overexpressions)
Image encoderFrozen DINOv3 ViT-L/16 (300M parameters)
Text encoderFrozen BioClinical ModernBERT (150M parameters)

Get Started

  • Installation — Set up MorphoCLIP on your machine
  • Quick Start — Run the training pipeline end-to-end
  • Training Pipeline — Understand the model architecture and training process
  • Glossary — Plain-language definitions of technical terms used throughout these docs
Last updated on