Helper functions and utilities#

This section contains information on modules that are used within the packages, but are not part of the key algorithm/functionality.

File IO#

Helper classes and functions for I/O functionality.

cebra.io.device()#

The preferred compute device.

Return type:

str

Returns:

cuda, if available, otherwise cpu.

class cebra.io.HasDevice(device=None)#

Bases: object

Base class for classes that use CPU/CUDA processing via PyTorch.

If implementing this class, any derived instanced will track any attribute of types torch.Tensor (this includes torch.nn.Parameter), torch.nn.Module as well as any class subclassing HasDevice itself.

When calling to(), all of these attributes will themselves be moved to the specified device.

Any instance of this class will need to be initialized. This can happen explicitly through the constructor (by specifying a device during initialization) or by assigning the first element, which will assign the device of the first element to the whole instance.

Every following assignment will yield in tensors, parameters, modules and other instances being moved to the device of the instance.

Parameters:

device (Optional[str]) – The device name, typically cpu or cuda, or combined with a device idea, e.g. cuda:0.

Note

Do not subclass this class when some dependency to a compute device is already implemented, e.g. via a pytorch torch.nn.Module.

property device: str#

Returns the name of the current compute device.

Return type:

str

to(device)#

Moves the instance to the specified device.

Parameters:

device (str) – The device (cpu or cuda) to move this instance to.

Return type:

HasDevice

Returns:

the instance itself.

cebra.io.reduce(data, *, ratio=None, num_components=None)#

Map the specified data to its principal components

Specify either an explained variance ratio between 0 and 1, or a number of principle components to use.

Parameters:
  • ratio – The ratio (needs to be between 0 and 1) of explained variance required by the returned number of components. Note that the dimension of the output will vary based on the provided input data.

  • num_components – The number of principal components to return

Returns:

An (N, d) array, where the dimension d is either limited by the specified number of components, or is chosen to explain the specified variance in the data.

class cebra.io.FileKeyValueDataset(path)#

Bases: object

Load datasets from HDF, torch, numpy or joblib files.

The data is directly accessible through attributes of instances of this class.

Parameters:

path (str) – The filepath for loading the data from. Should point to a file in a valid file format (hdf, torch, numpy, joblib). Valid extensions are jl, joblib, h5, hdf, hdf5, pth, pt and npz.

Example

>>> import cebra.io
>>> import joblib
>>> import tempfile
>>> from pathlib import Path
>>> tmp_file = Path(tempfile.gettempdir(), 'test.jl')
>>> _ = joblib.dump({'foo' : 42}, tmp_file)
>>> data = cebra.io.FileKeyValueDataset(tmp_file)
>>> data.foo
42

Registry#

A simple registry for python modules.

This module only exposes a single public function, add_helper_functions, which takes a python module or module name (or package) as its argument and defines the decorator functions

  • register

  • parametrize

and the functions

  • init and

  • get_options

within this module. It also (implicitly and lazy) initializes a singleton registry object which holds all registered classes. Typically, the helper functions should be added in the first lines of a package __init__.py module.

Note that all functions carrying the respective decorators need to be discovered by the import system, otherwise they will not be available when calling get_options or init.

cebra.registry.add_helper_functions(module)#

Add registry functionality to the given module.

Call this function within a python module to add the three functions register, init and get_options to the module.

  • register is a decorator for classes within the module. Each class will be added by a (unique) name and can be initialized with the init function.

  • init takes a name as its argument and returns an instance of the specified class, with optional arguments.

  • get_options returns a list of all registered names within the module.

Parameters:

module (Union[module, str]) – The module for adding registry functions. This can be the name of a module as returned by __name__ within the module, or by passing the module type directory.

cebra.registry.add_docstring(module)#

Apply additional information about configuration options to registry modules.

Parameters:

module (Union[module, str]) – Name of the module, or the module itself. If a string is given, it needs to match the representation in sys.modules.

cebra.registry.is_registry(module, check_docs=False)#

Check if the given module implements all registry functions.

Parameters:
  • module (Union[module, str]) – Name of the module, or the module itself. If a string is given, it needs to match the representation in sys.modules.

  • check_docs (bool) – Optionally specify whether or not to check if a docstring was adapted, specifying all default options.

Return type:

bool

Returns:

True if the module is a registry and implements the register, init and get_options functions. If check_docs is set to True, then the documentation needs to match in addition. False if at least one function is missing.

Data helpers#

cebra.data.helper.get_loader_options(dataset)#

Return all possible dataloaders for the given dataset.

Return type:

List[str]

class cebra.data.helper.OrthogonalProcrustesAlignment(top_k=5, subsample=None)#

Bases: object

Aligns two dataset by solving the orthogonal Procrustes problem.

Tip

In linear algebra, the orthogonal Procrustes problem is a matrix approximation problem. Considering two matrices A and B, it consists in finding the orthogonal matrix R which most closely maps A to B, so that it minimizes the Frobenius norm of (A @ R) - B subject to R.T @ R = I. See scipy.linalg.orthogonal_procrustes() for more information.

For each dataset, the data and labels to align the data on is provided.

  1. The top_k indexes of the labels to align (label) that are the closest to the labels of the reference dataset (ref_label) are selected and used to sample from the dataset to align (data).

  2. data and ref_data (the reference dataset) are subsampled to the same number of samples subsample.

  3. The orthogonal mapping is computed, using scipy.linalg.orthogonal_procrustes(), on those subsampled datasets.

  4. The resulting orthongonal matrix _transform can be used to map the original data to the ref_data.

Note

data and ref_data can be of different sample size (axis 0) but must have the same number of features (axis 1) to be aligned.

top_k#

Number of indexes in the labels of the matrix to align to consider for alignment (label). The selected indexes consist in the top_k th indexes the closest to the reference labels (ref_label).

Type:

int

subsample#

Number of samples to subsample the data and ref_data from, to solve the orthogonal Procrustes problem on.

Type:

int

fit(ref_data, data, ref_label=None, label=None)#

Compute the matrix solution of the orthogonal Procrustes problem.

The obtained matrix is used to align a dataset to a reference dataset.

Parameters:
Return type:

OrthogonalProcrustesAlignment

Returns:

self, for chaining operations.

Example

>>> import cebra.data.helper
>>> import numpy as np
>>> ref_embedding = np.random.uniform(0, 1, (1000, 30))
>>> aux_embedding = np.random.uniform(0, 1, (800, 30))
>>> ref_label = np.random.uniform(0, 1, (1000, 1))
>>> aux_label = np.random.uniform(0, 1, (800, 1))
>>> orthogonal_procrustes = cebra.data.helper.OrthogonalProcrustesAlignment()
>>> orthogonal_procrustes = orthogonal_procrustes.fit(ref_data=ref_embedding,
...                                                   data=aux_embedding,
...                                                   ref_label=ref_label,
...                                                   label=aux_label)
transform(data)#

Transform the data using the matrix solution computed in py:meth:fit.

Parameters:

data (Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]) – The 2D data matrix to align.

Return type:

ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]

Returns:

The aligned input matrix.

Example

>>> import cebra.data.helper
>>> import numpy as np
>>> ref_embedding = np.random.uniform(0, 1, (1000, 30))
>>> aux_embedding = np.random.uniform(0, 1, (800, 30))
>>> ref_label = np.random.uniform(0, 1, (1000, 1))
>>> aux_label = np.random.uniform(0, 1, (800, 1))
>>> orthogonal_procrustes = cebra.data.helper.OrthogonalProcrustesAlignment()
>>> orthogonal_procrustes = orthogonal_procrustes.fit(ref_data=ref_embedding,
...                                                   data=aux_embedding,
...                                                   ref_label=ref_label,
...                                                   label=aux_label)
>>> aligned_aux_embedding = orthogonal_procrustes.transform(data=aux_embedding)
>>> assert aligned_aux_embedding.shape == aux_embedding.shape
fit_transform(ref_data, data, ref_label=None, label=None)#

Compute the matrix solution to align a data array to a reference matrix.

Note

Uses a combination of fit() and transform().

Parameters:
Return type:

ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]

Returns:

The data matrix aligned onto the reference data matrix.

Example

>>> import cebra.data.helper
>>> import numpy as np
>>> ref_embedding = np.random.uniform(0, 1, (1000, 30))
>>> aux_embedding = np.random.uniform(0, 1, (800, 30))
>>> ref_label = np.random.uniform(0, 1, (1000, 1))
>>> aux_label = np.random.uniform(0, 1, (800, 1))
>>> orthogonal_procrustes = cebra.data.helper.OrthogonalProcrustesAlignment(top_k=10,
...                                                                         subsample=700)
>>> aligned_aux_embedding = orthogonal_procrustes.fit_transform(ref_data=ref_embedding,
...                                                             data=aux_embedding,
...                                                             ref_label=ref_label,
...                                                             label=aux_label)
>>> assert aligned_aux_embedding.shape == aux_embedding.shape
cebra.data.helper.ensemble_embeddings(embeddings, labels=None, post_norm=False, n_jobs=0)#

Ensemble aligned embeddings together.

The embeddings contained in embeddings are aligned onto the same embedding, using OrthogonalProcrustesAlignment. Then, they are averaged and the resulting averaged embedding is the ensemble embedding.

Tip

By ensembling embeddings coming from the same dataset but obtained from different models, the resulting joint embedding usually shows an increase in performances compared to the individual embeddings.

Note

The embeddings in embeddings must be the same shape, i.e., the same number of samples and same number of features (axis 1).

Parameters:
  • embeddings (List[Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]]) – List of embeddings to align and ensemble.

  • labels (Optional[List[Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]]]) – Optional list of indexes associated to the embeddings in embeddings to align the embeddings on. To be ensembled, the embeddings should already be aligned on time, and consequently do not require extra labels for alignment.

  • post_norm (bool) – If True, the resulting joint embedding is normalized (divided by its norm across the features - axis 1).

  • n_jobs (int) – The maximum number of concurrently running jobs to compute embedding alignment in a parallel manner using joblib.Parallel. Specify 0 to iterate naively over the embeddings for ensembling without using joblib.Parallel. Specify -1 to use all cores. Using more than a single core can considerably speed up the computation of ensembled embeddings for large datasets, but will also require more memory.

Return type:

ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]

Returns:

A numpy.array() corresponding to the joint embedding.

Example

>>> import cebra.data.helper
>>> import numpy as np
>>> embedding1 = np.random.uniform(0, 1, (100, 4))
>>> embedding2 = np.random.uniform(0, 1, (100, 4))
>>> embedding3 = np.random.uniform(0, 1, (100, 4))
>>> joint_embedding = cebra.data.helper.ensemble_embeddings(embeddings=[embedding1, embedding2, embedding3])
>>> assert joint_embedding.shape == embedding1.shape