Dataset

Class to load datasets saved in folder structure of separated files.

class registration_tools.dataset.Dataset(data, axis_data, axis_files, scale=None, h5_key=None)[source]

Bases: object

A class to represent an image dataset of files saved in several folders.

Parameters:
  • data (nested list or str) – The data of the dataset, which can be a list of paths with regex patterns or a numpy array.

  • axis_data (str) – A string representing the format of the folder nested structure. Each character must appear only once.

  • axis_files (str) – A string representing the format of the file dimensions. Each character must appear only once.

  • scale (tuple, optional) – A tuple representing the scale for the spatial dimensions. Must be the same length as the number of spatial dimensions. If None, the scale is set to (1, 1, …, 1). Defaults to None.

  • h5_key (str, optional) – The key to read data from an h5 file, if applicable.

Examples

Having a dataset with the folowwing structure:

  • main_folder
    • ch1
      • file_t1.tif

      • file_t2.tif

    • ch2
      • file_t1.tif

      • file_t2.tif

Where each file is a 3D image saved in format ‘ZYX’ with scale (2., 1., 1.). You can create a dataset object with the following code:

dataset = Dataset(data=[‘ch1/file{0:03d}.tif’,’ch2/file{0:03d}.tif’], axis_data=’CT’, axis_files=’ZYX’, scale=(2., 1., 1.))

shape

The shape of the dataset including both data and file dimensions.

Type:

tuple

scale

The scale of the spatial dimensions.

Type:

tuple

_data

The data of the dataset, expanded from regex patterns or file paths.

Type:

np.ndarray

_axis

The combined format of data and file dimensions.

Type:

str

_n_axis

The total number of dimensions.

Type:

int

_axis_data

The format of the folder structure.

Type:

str

_n_axis_data

The number of data dimensions.

Type:

int

_axis_files

The format of the file dimensions.

Type:

str

_n_axis_files

The number of file dimensions.

Type:

int

_axis_spatial

The spatial dimensions in the dataset.

Type:

str

_n_axis_spatial

The number of spatial dimensions.

Type:

int

dtype

The data type of the images.

Type:

np.dtype

_h5_key

The key to read data from an h5 file, if applicable.

Type:

str

to_zarr(file, flavor='dask', **kwargs)[source]

Save the dataset to a Zarr file.

Parameters: file (str or store): The file path or store to save the Zarr file. flavor (str, optional): The method to use for saving. Options are ‘legacy’ or ‘dask’. Default is ‘dask’. **kwargs: Additional keyword arguments to pass to the respective saving method.

Raises: ValueError: If an invalid flavor is provided.

to_zarr_dask(file, **kwargs)[source]

Save the dataset to a Zarr file using Dask for parallel processing.

Parameters:

filestr or MutableMapping

The file path or MutableMapping to save the Zarr file.

**kwargsdict

Additional keyword arguments to pass to the to_zarr method.

Returns:

None

Notes:

This method uses Dask to parallelize the reading and saving of images. The _read_image function is delayed to allow for parallel execution. A progress bar is displayed using _TqdmCallback to show the saving progress.

to_zarr_legacy(file, **kwargs)[source]

Save the dataset to a Zarr array using the legacy method.

Parameters:

filestr or zarr.storage.Store

The file path or Zarr store where the array will be saved.

**kwargsdict

Additional keyword arguments to pass to zarr.create.

Returns:

None

Notes:

This method saves the dataset to a Zarr array without parallel processing. It iterates over the dataset and saves each image sequentially.

registration_tools.dataset.check_dataset_structure(data)[source]

Check and print the structure of a dataset.

This function prints the shape of the dataset and checks for the presence of specific attributes (‘axis’ and ‘scale’). If these attributes are found, their values are printed. If not, a message indicating their absence is printed.

Parameters: data (object): The dataset to be checked. It is expected to have a ‘shape’ attribute

and optionally ‘attrs’ attribute which is a dictionary containing ‘axis’ and ‘scale’ keys.

Returns: None

registration_tools.dataset.load_dataset(path, axis=None, scale=None, h5_key=None)[source]

Load a dataset from a given file path.

Parameters:
  • path (str) – The file path to the dataset. Supported formats are .npy, .h5, .hdf5, and image files.

  • axis (optional) – The axis attribute of the dataset. If not provided, it will be read from the dataset attributes.

  • scale (optional) – The scale attribute of the dataset. If not provided, it will be read from the dataset attributes.

  • h5_key (str, optional) – The key to read from an h5 file. Required if the file is in .h5 or .hdf5 format.

Returns:

The loaded dataset as a zarr array.

Return type:

zarr.core.Array

registration_tools.dataset.show_dataset_structure(folder_path, indent=0, max_files=6)[source]

Recursively prints the folder structure.

Parameters: folder_path (str): The path to the folder to display. indent (int): The indentation level for nested folders. max_files (int): The maximum number of files to display per folder.

Returns: None