Dataset¶
Class to load datasets saved in folder structure of separated files.
- class registration_tools.dataset.Dataset(data, axis_data, axis_files, scale=None, h5_key=None)[source]¶
Bases:
object
A class to represent an image dataset of files saved in several folders.
- Parameters:
data (nested list or str) – The data of the dataset, which can be a list of paths with regex patterns or a numpy array.
axis_data (str) – A string representing the format of the folder nested structure. Each character must appear only once.
axis_files (str) – A string representing the format of the file dimensions. Each character must appear only once.
scale (tuple, optional) – A tuple representing the scale for the spatial dimensions. Must be the same length as the number of spatial dimensions. If None, the scale is set to (1, 1, …, 1). Defaults to None.
h5_key (str, optional) – The key to read data from an h5 file, if applicable.
Examples
Having a dataset with the folowwing structure:
- main_folder
- ch1
file_t1.tif
file_t2.tif
…
- ch2
file_t1.tif
file_t2.tif
…
Where each file is a 3D image saved in format ‘ZYX’ with scale (2., 1., 1.). You can create a dataset object with the following code:
dataset = Dataset(data=[‘ch1/file{0:03d}.tif’,’ch2/file{0:03d}.tif’], axis_data=’CT’, axis_files=’ZYX’, scale=(2., 1., 1.))
- shape¶
The shape of the dataset including both data and file dimensions.
- Type:
tuple
- scale¶
The scale of the spatial dimensions.
- Type:
tuple
- _data¶
The data of the dataset, expanded from regex patterns or file paths.
- Type:
np.ndarray
- _axis¶
The combined format of data and file dimensions.
- Type:
str
- _n_axis¶
The total number of dimensions.
- Type:
int
- _axis_data¶
The format of the folder structure.
- Type:
str
- _n_axis_data¶
The number of data dimensions.
- Type:
int
- _axis_files¶
The format of the file dimensions.
- Type:
str
- _n_axis_files¶
The number of file dimensions.
- Type:
int
- _axis_spatial¶
The spatial dimensions in the dataset.
- Type:
str
- _n_axis_spatial¶
The number of spatial dimensions.
- Type:
int
- dtype¶
The data type of the images.
- Type:
np.dtype
- _h5_key¶
The key to read data from an h5 file, if applicable.
- Type:
str
- to_zarr(file, flavor='dask', **kwargs)[source]¶
Save the dataset to a Zarr file.
Parameters: file (str or store): The file path or store to save the Zarr file. flavor (str, optional): The method to use for saving. Options are ‘legacy’ or ‘dask’. Default is ‘dask’. **kwargs: Additional keyword arguments to pass to the respective saving method.
Raises: ValueError: If an invalid flavor is provided.
- to_zarr_dask(file, **kwargs)[source]¶
Save the dataset to a Zarr file using Dask for parallel processing.
Parameters:¶
- filestr or MutableMapping
The file path or MutableMapping to save the Zarr file.
- **kwargsdict
Additional keyword arguments to pass to the to_zarr method.
Returns:¶
None
Notes:¶
This method uses Dask to parallelize the reading and saving of images. The _read_image function is delayed to allow for parallel execution. A progress bar is displayed using _TqdmCallback to show the saving progress.
- to_zarr_legacy(file, **kwargs)[source]¶
Save the dataset to a Zarr array using the legacy method.
Parameters:¶
- filestr or zarr.storage.Store
The file path or Zarr store where the array will be saved.
- **kwargsdict
Additional keyword arguments to pass to zarr.create.
Returns:¶
None
Notes:¶
This method saves the dataset to a Zarr array without parallel processing. It iterates over the dataset and saves each image sequentially.
- registration_tools.dataset.check_dataset_structure(data)[source]¶
Check and print the structure of a dataset.
This function prints the shape of the dataset and checks for the presence of specific attributes (‘axis’ and ‘scale’). If these attributes are found, their values are printed. If not, a message indicating their absence is printed.
Parameters: data (object): The dataset to be checked. It is expected to have a ‘shape’ attribute
and optionally ‘attrs’ attribute which is a dictionary containing ‘axis’ and ‘scale’ keys.
Returns: None
- registration_tools.dataset.load_dataset(path, axis=None, scale=None, h5_key=None)[source]¶
Load a dataset from a given file path.
- Parameters:
path (str) – The file path to the dataset. Supported formats are .npy, .h5, .hdf5, and image files.
axis (optional) – The axis attribute of the dataset. If not provided, it will be read from the dataset attributes.
scale (optional) – The scale attribute of the dataset. If not provided, it will be read from the dataset attributes.
h5_key (str, optional) – The key to read from an h5 file. Required if the file is in .h5 or .hdf5 format.
- Returns:
The loaded dataset as a zarr array.
- Return type:
zarr.core.Array
- registration_tools.dataset.show_dataset_structure(folder_path, indent=0, max_files=6)[source]¶
Recursively prints the folder structure.
Parameters: folder_path (str): The path to the folder to display. indent (int): The indentation level for nested folders. max_files (int): The maximum number of files to display per folder.
Returns: None