Skip to Content
DatasetCPJUMP1 Overview

CPJUMP Dataset Structure

Endpoint: s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/

The following relative paths are relevant to model training (choose 2020_11_04_CPJUMP1 as our batch)

images/2020_11_04_CPJUMP1/Images workspace/metadata/platemaps/2020_11_04_CPJUMP1/ workspace/metadata/external_metadata/

Example Plate S3 URI: s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/

The folder for each 384-well plate typically contains images from nine sites for each well (for some wells 7,8 or 16 sites were imaged). The (x,y) coordinates of sites are available in the Metadata_PositionX and Metadata_PositionY columns of the load_data.csv.gz files in the load_data_csv folder. There are eight images per site (five from the fluorescent channels and three brightfield images). The names of the image files follow the naming convention - rXXcXXfXXp01-chXXsk1fk1fl1.tiff where

rXX is the row number of the well that was imaged. rXX ranges from r01 to r16. cXX is the column number of the well that was imaged. cXX ranges from c01 to c24. fXX corresponds to the site that was imaged. fXX ranges from f01 to f16. chXX corresponds to the fluorescent channels imaged. chXX ranges from ch01 to ch08. ch01 - Alexa 647 (Mitochondria / MitoTracker) ch02 - Alexa 568 (Actin / Phalloidin) ch03 - Alexa 488 long (Golgi / WGA) ch04 - Alexa 488 (ER / Concanavalin A) ch05 - Hoechst 33342 (DNA / Nucleus) ch06-8 - three brighfield z planes.

Cell bounding boxes and segmentation masks have not been provided.

Plate Size Summary

Generated via pdm run check-plates (51 plates, 3.31 TiB total).

PlateSizeFiles
BR00116991__2020-11-05T19_51_35-Measurement162.72 GiB27651
BR00116992__2020-11-05T21_31_31-Measurement163.13 GiB27651
BR00116993__2020-11-05T23_11_39-Measurement162.45 GiB27651
BR00116994__2020-11-06T00_59_44-Measurement162.84 GiB27651
BR00116995__2020-11-06T02_41_05-Measurement157.56 GiB27651
BR00116996__2020-11-09T15_32_10-Measurement360.47 GiB27651
BR00116997__2020-11-06T09_19_16-Measurement162.19 GiB27651
BR00116998__2020-11-06T20_06_24-Measurement162.29 GiB27651
BR00116999__2020-11-06T21_45_58-Measurement162.64 GiB27651
BR00117000__2020-11-03T07_28_24-Measurement164.70 GiB27651
BR00117001__2020-11-03T09_07_42-Measurement164.23 GiB27651
BR00117002__2020-11-05T00_04_13-Measurement158.99 GiB27651
BR00117003__2020-11-05T05_07_49-Measurement163.65 GiB27651
BR00117004__2020-11-05T01_44_16-Measurement164.00 GiB27651
BR00117005__2020-11-05T12_53_34-Measurement262.00 GiB27652
BR00117006__2020-11-02T19_54_45-Measurement164.34 GiB27651
BR00117008__2020-11-09T00_55_40-Measurement159.37 GiB27651
BR00117009__2020-11-09T02_35_08-Measurement159.50 GiB27651
BR00117010__2020-11-08T18_18_00-Measurement158.55 GiB27651
BR00117011__2020-11-08T19_57_47-Measurement158.86 GiB27651
BR00117012__2020-11-08T14_58_34-Measurement157.80 GiB27651
BR00117013__2020-11-08T16_38_19-Measurement158.67 GiB27651
BR00117015__2020-11-10T23_51_39-Measurement1107.71 GiB49155
BR00117016__2020-11-11T02_32_26-Measurement1107.88 GiB49155
BR00117017__2020-11-10T18_25_46-Measurement1107.91 GiB49155
BR00117019__2020-11-10T21_10_40-Measurement1107.59 GiB49155
BR00117020__2020-11-04T20_45_03-Measurement162.36 GiB27651
BR00117021__2020-11-04T19_05_14-Measurement261.85 GiB27651
BR00117022__2020-11-05T14_43_33-Measurement163.32 GiB27651
BR00117023__2020-11-05T16_32_08-Measurement162.63 GiB27651
BR00117024__2020-11-06T04_20_37-Measurement160.32 GiB27651
BR00117025__2020-11-06T06_00_19-Measurement160.26 GiB27651
BR00117026__2020-11-06T07_39_45-Measurement160.31 GiB27651
BR00117050__2020-11-09T07_34_11-Measurement161.48 GiB27651
BR00117051__2020-11-09T09_14_02-Measurement163.08 GiB27651
BR00117052__2020-11-09T04_14_50-Measurement163.31 GiB27651
BR00117053__2020-11-09T05_54_45-Measurement163.49 GiB27651
BR00117054__2020-11-08T21_37_22-Measurement163.52 GiB27651
BR00117055__2020-11-08T23_16_47-Measurement163.67 GiB27651
BR00118039__2020-11-02T18_16_01-Measurement161.91 GiB27651
BR00118040__2020-11-02T23_13_02-Measurement162.12 GiB27651
BR00118041__2020-11-05T03_24_00-Measurement164.02 GiB27651
BR00118042__2020-11-04T22_24_50-Measurement1118.47 GiB55302
BR00118043__2020-11-05T18_12_02-Measurement264.73 GiB27651
BR00118044__2020-11-03T23_15_33-Measurement164.27 GiB27651
BR00118045__2020-11-03T05_49_34-Measurement161.73 GiB27651
BR00118046__2020-11-03T00_51_55-Measurement161.26 GiB27651
BR00118047__2020-11-03T02_31_55-Measurement160.13 GiB27651
BR00118048__2020-11-03T04_10_30-Measurement160.39 GiB27651
BR00118049__2020-11-02T16_37_05-Measurement158.17 GiB27651
BR00118050__2020-11-02T21_33_56-Measurement163.04 GiB27651
TOTAL3391.89 GiB (3.31 TiB)1,523,869

Notable observations:

  • Most plates are ~58-65 GiB with 27,651 files each (5 fluorescent + 3 brightfield channels x 9 sites x 384 wells)
  • 4 plates (BR00117015-BR00117019) are ~2x larger (~108 GiB, 49,155 files) — likely 16 sites per well
  • BR00118042 is the largest single plate at 118.47 GiB (55,302 files)
  • BR00117005 has one extra file (27,652 vs 27,651)

Access Methods

rclone

List files

rclone lsd :s3,provider=AWS,region=us-east-1,no_check_bucket=true:cellpainting-gallery/cpg0000-jump-pilot/source_4/
Last updated on