Skip to Content
BaselinesBenchmark Guide

Benchmark Guide

This repo uses a local, stable CPJUMP1 benchmark workflow built around scripts/benchmark/benchmark_stable.py.

The benchmark measures three tasks on CPJUMP1:

  • Replicability: whether replicate wells from the same perturbation match each other.
  • Within-modality target matching: whether perturbations in the same modality that hit the same target match each other.
  • Cross-modality matching: whether compounds and genetic perturbations that share a target match each other.

The stable workflow filters to batch 2020_11_04_CPJUMP1, density 100, antibiotics absent, and uses the timeline labels:

  • short: compound 24h, ORF 48h, CRISPR 96h
  • long: compound 48h, ORF 96h, CRISPR 144h

Inputs and Outputs

Baseline benchmark input:

  • Profiles under data/profiles/<batch>/<plate>/*_normalized_feature_select_negcon_batch.csv.gz
  • Metadata under output/benchmark/input/

CellCLIP benchmark input:

Benchmark outputs:

  • CSVs in output/<run_name>/
  • Summary tables in output/<run_name>/tables/
  • Figures in output/<run_name>/figures/

Baseline Benchmark

Run the baseline CellProfiler-style benchmark on the default profile directory:

.venv/bin/python scripts/benchmark/benchmark_stable.py \ --profiles-dir data/profiles \ --output-dir output/benchmark \ --batch 2020_11_04_CPJUMP1 \ --timelines short

Useful variants:

  • Add --timelines short long to run the full benchmark.
  • Add --cell-filter A549 or --cell-filter U2OS to restrict to one cell type.
  • Add --test-mode to run a smaller smoke test.

CellCLIP Benchmark

The local CellCLIP workflow has two stages:

  1. Export benchmark-format well profiles from a pretrained CellCLIP visual encoder.
  2. Run the same stable benchmark on those exported profiles.

1. Export CellCLIP Profiles

.venv/bin/python scripts/cellclip/export_cellclip_profiles.py \ --ckpt-path /path/to/cellclip_checkpoint.pt \ --feature-root data/features_cellclip_base \ --output-profiles-root data/profiles_cellclip_hf \ --batch 2020_11_04_CPJUMP1 \ --timelines short

Notes:

  • If --ckpt-path is omitted, the script falls back to the checkpoint settings in configs/benchmark.yml.
  • The default output root is data/profiles_cellclip_hf.
  • This step expects precomputed per-site tensors in data/features_cellclip_base unless you override --feature-root.

2. Benchmark the Exported CellCLIP Profiles

.venv/bin/python scripts/benchmark/benchmark_stable.py \ --profiles-dir data/profiles_cellclip_hf \ --output-dir output/benchmark_cellclip_hf \ --batch 2020_11_04_CPJUMP1 \ --timelines short \ --batch-correction \ --pca-kernel linear \ --pca-n-components 500

Notes:

  • --batch-correction enables the local eval-time KernelPCA + StandardScaler correction fit on pooled negative controls from the selected plates.
  • Drop --batch-correction if you want the raw exported CellCLIP profile benchmark.

Optional: One-Step CellCLIP Pipeline

If you need to extract features, export CellCLIP profiles, and clean caches plate-by-plate in one run, use:

.venv/bin/python scripts/cellclip/run_cellclip_pipeline.py \ --ckpt-path /path/to/cellclip_checkpoint.pt \ --batch 2020_11_04_CPJUMP1 \ --timelines short

This is the heavier end-to-end path. If you already have per-site feature tensors, export_cellclip_profiles.py is the simpler entrypoint.

Optional: Compare Baseline vs CellCLIP

After both runs finish, generate comparison tables and plots with:

.venv/bin/python scripts/benchmark/compare_benchmark.py \ --baseline-dir output/benchmark \ --candidate-dir output/benchmark_cellclip_hf \ --output-dir output/benchmark_comparison

For a normal local comparison run:

  1. Run the baseline benchmark on data/profiles.
  2. Export CellCLIP profiles to data/profiles_cellclip_hf.
  3. Run the benchmark on data/profiles_cellclip_hf.
  4. Optionally run compare_benchmark.py to summarize the delta.
Last updated on