Benchmark Guide
This repo uses a local, stable CPJUMP1 benchmark workflow built around
scripts/benchmark/benchmark_stable.py.
The benchmark measures three tasks on CPJUMP1:
- Replicability: whether replicate wells from the same perturbation match each other.
- Within-modality target matching: whether perturbations in the same modality that hit the same target match each other.
- Cross-modality matching: whether compounds and genetic perturbations that share a target match each other.
The stable workflow filters to batch 2020_11_04_CPJUMP1, density 100, antibiotics absent, and uses the timeline labels:
short: compound 24h, ORF 48h, CRISPR 96hlong: compound 48h, ORF 96h, CRISPR 144h
Inputs and Outputs
Baseline benchmark input:
- Profiles under
data/profiles/<batch>/<plate>/*_normalized_feature_select_negcon_batch.csv.gz - Metadata under
output/benchmark/input/
CellCLIP benchmark input:
- Benchmark-format CellCLIP profile CSVs under
data/profiles_cellclip_hf/<batch>/<plate>/ - These are produced from per-site feature tensors with
scripts/cellclip/export_cellclip_profiles.py
Benchmark outputs:
- CSVs in
output/<run_name>/ - Summary tables in
output/<run_name>/tables/ - Figures in
output/<run_name>/figures/
Baseline Benchmark
Run the baseline CellProfiler-style benchmark on the default profile directory:
.venv/bin/python scripts/benchmark/benchmark_stable.py \
--profiles-dir data/profiles \
--output-dir output/benchmark \
--batch 2020_11_04_CPJUMP1 \
--timelines shortUseful variants:
- Add
--timelines short longto run the full benchmark. - Add
--cell-filter A549or--cell-filter U2OSto restrict to one cell type. - Add
--test-modeto run a smaller smoke test.
CellCLIP Benchmark
The local CellCLIP workflow has two stages:
- Export benchmark-format well profiles from a pretrained CellCLIP visual encoder.
- Run the same stable benchmark on those exported profiles.
1. Export CellCLIP Profiles
.venv/bin/python scripts/cellclip/export_cellclip_profiles.py \
--ckpt-path /path/to/cellclip_checkpoint.pt \
--feature-root data/features_cellclip_base \
--output-profiles-root data/profiles_cellclip_hf \
--batch 2020_11_04_CPJUMP1 \
--timelines shortNotes:
- If
--ckpt-pathis omitted, the script falls back to the checkpoint settings inconfigs/benchmark.yml. - The default output root is
data/profiles_cellclip_hf. - This step expects precomputed per-site tensors in
data/features_cellclip_baseunless you override--feature-root.
2. Benchmark the Exported CellCLIP Profiles
.venv/bin/python scripts/benchmark/benchmark_stable.py \
--profiles-dir data/profiles_cellclip_hf \
--output-dir output/benchmark_cellclip_hf \
--batch 2020_11_04_CPJUMP1 \
--timelines short \
--batch-correction \
--pca-kernel linear \
--pca-n-components 500Notes:
--batch-correctionenables the local eval-timeKernelPCA + StandardScalercorrection fit on pooled negative controls from the selected plates.- Drop
--batch-correctionif you want the raw exported CellCLIP profile benchmark.
Optional: One-Step CellCLIP Pipeline
If you need to extract features, export CellCLIP profiles, and clean caches plate-by-plate in one run, use:
.venv/bin/python scripts/cellclip/run_cellclip_pipeline.py \
--ckpt-path /path/to/cellclip_checkpoint.pt \
--batch 2020_11_04_CPJUMP1 \
--timelines shortThis is the heavier end-to-end path. If you already have per-site feature tensors, export_cellclip_profiles.py is the simpler entrypoint.
Optional: Compare Baseline vs CellCLIP
After both runs finish, generate comparison tables and plots with:
.venv/bin/python scripts/benchmark/compare_benchmark.py \
--baseline-dir output/benchmark \
--candidate-dir output/benchmark_cellclip_hf \
--output-dir output/benchmark_comparisonRecommended Workflow
For a normal local comparison run:
- Run the baseline benchmark on
data/profiles. - Export CellCLIP profiles to
data/profiles_cellclip_hf. - Run the benchmark on
data/profiles_cellclip_hf. - Optionally run
compare_benchmark.pyto summarize the delta.