Tutorial

Welcome to the EuBI-Bridge conversion tutorial. Here we demonstrate how to convert batches of image datasets to OME-Zarr using the EuBI-Bridge CLI.

EuBI-Bridge supports two different conversion modes: unary (one-to-one) and aggregative (multiple-to-one) conversion. Unary conversion converts each input file to a single OME-Zarr container, whereas aggregative conversion concatenates input images along specified dimensions. Below we explain each of these modes with examples.

Unary Conversion

Given a dataset structured as follows:

multichannel_timeseries
├── Channel1-T0001.tif
├── Channel1-T0002.tif
├── Channel1-T0003.tif
├── Channel1-T0004.tif
├── Channel2-T0001.tif
├── Channel2-T0002.tif
├── Channel2-T0003.tif
└── Channel2-T0004.tif

To convert each TIFF into a separate OME-Zarr container (unary conversion):

eubi to_zarr multichannel_timeseries multichannel_timeseries_zarr

To create OME-Zarr version 0.5 (with zarr version 3), add the --zarr_format 3 argument to the command:

eubi to_zarr multichannel_timeseries multichannel_timeseries_zarr --zarr_format 3

This produces:

multichannel_timeseries_zarr
├── Channel1-T0001.zarr
├── Channel1-T0002.zarr
├── Channel1-T0003.zarr
├── Channel1-T0004.zarr
├── Channel2-T0001.zarr
├── Channel2-T0002.zarr
├── Channel2-T0003.zarr
└── Channel2-T0004.zarr

Use wildcards to specifically convert the images belonging to Channel1:

eubi to_zarr "multichannel_timeseries/Channel1*" multichannel_timeseries_channel1_zarr

This produces:

multichannel_timeseries_zarr
├── Channel1-T0001.zarr
├── Channel1-T0002.zarr
├── Channel1-T0003.zarr
└── Channel1-T0004.zarr

Aggregative Conversion (Concatenation Along Dimensions)

To concatenate images along specific dimensions, EuBI-Bridge needs to be informed of file patterns that specify image dimensions. For this example, the file pattern for the channel dimension is Channel, which is followed by the channel index, and the file pattern for the time dimension is T, which is followed by the time index.

To concatenate along the time dimension:

eubi to_zarr multichannel_timeseries multichannel_timeseries_concat_zarr --channel_tag Channel --time_tag T --concatenation_axes t

Output:

multichannel_timeseries_time-concat_zarr
├── Channel1-T_tset.zarr
└── Channel2-T_tset.zarr

Important note: if the --channel_tag was not provided, the tool would not be aware of the multiple channels in the image and try to concatenate all images into a single one-channeled OME-Zarr. Therefore, when an aggregative conversion is performed, all dimensions existing in the input files must be specified via their respective tags.

For multidimensional concatenation (channel + time):

eubi to_zarr multichannel_timeseries multichannel_timeseries_concat_zarr --channel_tag Channel --time_tag T --concatenation_axes ct

Note that both axes are specified via the argument --concatenation_axes ct.

Output:

multichannel_timeseries_concat_zarr
└── Channel_cset-T_tset.zarr

Handling Nested Directories

For datasets stored in nested directories such as:

multichannel_timeseries_nested
├── Channel1
│   ├── T0001.tif
│   ├── T0002.tif
│   ├── T0003.tif
│   ├── T0004.tif
├── Channel2
│   ├── T0001.tif
│   ├── T0002.tif
│   ├── T0003.tif
│   ├── T0004.tif

EuBI-Bridge automatically detects the nested structure. To concatenate along both channel and time dimensions:

eubi to_zarr multichannel_timeseries_nested multichannel_timeseries_nested_concat_zarr --channel_tag Channel --time_tag T --concatenation_axes ct

Output:

multichannel_timeseries_nested_concat_zarr
└── Channel_cset-T_tset.zarr

To concatenate along the channel dimension only:

eubi to_zarr multichannel_timeseries_nested multichannel_timeseries_nested_concat_zarr --channel_tag Channel --time_tag T --concatenation_axes c

Output:

multichannel_timeseries_nested_concat_zarr
├── Channel_cset-T0001.zarr
├── Channel_cset-T0002.zarr
├── Channel_cset-T0003.zarr
└── Channel_cset-T0004.zarr

Selective Data Conversion

To recursively select specific files for conversion, wildcard patterns can be used. For example, to concatenate only timepoint 3 along the channel dimension:

eubi to_zarr "multichannel_timeseries_nested/**/*T0003*" multichannel_timeseries_nested_concat_zarr --channel_tag Channel --time_tag T --concatenation_axes c

Output:

multichannel_timeseries_nested_concat_zarr
└── Channel_cset-T0003.zarr

Note: When using wildcards, the input directory path must be enclosed in quotes as shown in the example above.

Handling Categorical Dimension Patterns

For datasets where channel names are categorical such as in:

blueredchannel_timeseries
├── Blue-T0001.tif
├── Blue-T0002.tif
├── Blue-T0003.tif
├── Blue-T0004.tif
├── Red-T0001.tif
├── Red-T0002.tif
├── Red-T0003.tif
└── Red-T0004.tif

Specify categorical names as a comma-separated list:

eubi to_zarr blueredchannels_timeseries blueredchannels_timeseries_concat_zarr --channel_tag Blue,Red --time_tag T --concatenation_axes ct

Output:

blueredchannels_timeseries_concat_zarr
└── BlueRed_cset-T_tset.zarr

Note that the categorical names are aggregated in the output OME-Zarr name.

With nested input structure such as in:

blueredchannels_timeseries_nested
├── Blue
│   ├── T0001.tif
│   ├── T0002.tif
│   ├── T0003.tif
│   ├── T0004.tif
├── Red
│   ├── T0001.tif
│   ├── T0002.tif
│   ├── T0003.tif
│   ├── T0004.tif

One can run the exact same command:

eubi to_zarr blueredchannels_timeseries_nested blueredchannels_timeseries_nested_concat_zarr --channel_tag Blue,Red --time_tag T --concatenation_axes ct

Output:

blueredchannels_timeseries_nested_concat_zarr
└── BlueRed_cset-T_tset.zarr