Benchmark Guide

This repo uses a local, stable CPJUMP1 benchmark workflow built around scripts/benchmark/benchmark_stable.py.

The benchmark measures three tasks on CPJUMP1:

Replicability: whether replicate wells from the same perturbation match each other.
Within-modality target matching: whether perturbations in the same modality that hit the same target match each other.
Cross-modality matching: whether compounds and genetic perturbations that share a target match each other.

The stable workflow filters to batch 2020_11_04_CPJUMP1, density 100, antibiotics absent, and uses the timeline labels:

short: compound 24h, ORF 48h, CRISPR 96h
long: compound 48h, ORF 96h, CRISPR 144h

Inputs and Outputs

Baseline benchmark input:

Profiles under data/profiles/<batch>/<plate>/*_normalized_feature_select_negcon_batch.csv.gz
Metadata under output/benchmark/input/

CellCLIP benchmark input:

Benchmark-format CellCLIP profile CSVs under data/profiles_cellclip_hf/<batch>/<plate>/
These are produced from per-site feature tensors with scripts/cellclip/export_cellclip_profiles.py

Benchmark outputs:

CSVs in output/<run_name>/
Summary tables in output/<run_name>/tables/
Figures in output/<run_name>/figures/

Baseline Benchmark

Run the baseline CellProfiler-style benchmark on the default profile directory:


.venv/bin/python scripts/benchmark/benchmark_stable.py \
  --profiles-dir data/profiles \
  --output-dir output/benchmark \
  --batch 2020_11_04_CPJUMP1 \
  --timelines short

Useful variants:

Add --timelines short long to run the full benchmark.
Add --cell-filter A549 or --cell-filter U2OS to restrict to one cell type.
Add --test-mode to run a smaller smoke test.

CellCLIP Benchmark

The local CellCLIP workflow has two stages:

Export benchmark-format well profiles from a pretrained CellCLIP visual encoder.
Run the same stable benchmark on those exported profiles.

1. Export CellCLIP Profiles


.venv/bin/python scripts/cellclip/export_cellclip_profiles.py \
  --ckpt-path /path/to/cellclip_checkpoint.pt \
  --feature-root data/features_cellclip_base \
  --output-profiles-root data/profiles_cellclip_hf \
  --batch 2020_11_04_CPJUMP1 \
  --timelines short

Notes:

If --ckpt-path is omitted, the script falls back to the checkpoint settings in configs/benchmark.yml.
The default output root is data/profiles_cellclip_hf.
This step expects precomputed per-site tensors in data/features_cellclip_base unless you override --feature-root.

2. Benchmark the Exported CellCLIP Profiles


.venv/bin/python scripts/benchmark/benchmark_stable.py \
  --profiles-dir data/profiles_cellclip_hf \
  --output-dir output/benchmark_cellclip_hf \
  --batch 2020_11_04_CPJUMP1 \
  --timelines short \
  --batch-correction \
  --pca-kernel linear \
  --pca-n-components 500

Notes:

--batch-correction enables the local eval-time KernelPCA + StandardScaler correction fit on pooled negative controls from the selected plates.
Drop --batch-correction if you want the raw exported CellCLIP profile benchmark.

Optional: One-Step CellCLIP Pipeline

If you need to extract features, export CellCLIP profiles, and clean caches plate-by-plate in one run, use:


.venv/bin/python scripts/cellclip/run_cellclip_pipeline.py \
  --ckpt-path /path/to/cellclip_checkpoint.pt \
  --batch 2020_11_04_CPJUMP1 \
  --timelines short

This is the heavier end-to-end path. If you already have per-site feature tensors, export_cellclip_profiles.py is the simpler entrypoint.

Optional: Compare Baseline vs CellCLIP

After both runs finish, generate comparison tables and plots with:


.venv/bin/python scripts/benchmark/compare_benchmark.py \
  --baseline-dir output/benchmark \
  --candidate-dir output/benchmark_cellclip_hf \
  --output-dir output/benchmark_comparison

Recommended Workflow

For a normal local comparison run:

Run the baseline benchmark on data/profiles.
Export CellCLIP profiles to data/profiles_cellclip_hf.
Run the benchmark on data/profiles_cellclip_hf.
Optionally run compare_benchmark.py to summarize the delta.