Metrics#

cebra.integrations.sklearn.metrics.infonce_loss(cebra_model, X, *y, session_id=None, num_batches=500, correct_by_batchsize=False)#

Compute the InfoNCE loss on a single session dataset on the model.

Parameters:

cebra_model (CEBRA) – The model to use to compute the InfoNCE loss on the samples.
X (Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]) – A 2D data matrix, corresponding to a single session recording.
y – An arbitrary amount of continuous indices passed as 2D matrices, and up to one discrete index passed as a 1D array. Each index has to match the length of X.
session_id (Optional[int]) – The session ID, an int between 0 and cebra.CEBRA.num_sessions for multisession, set to None for single session.
num_batches (int) – The number of iterations to consider to evaluate the model on the new data. Higher values will give a more accurate estimate. Set it to at least 500 iterations.
correct_by_batchsize (bool) – If True, the loss is corrected by the batch size.

Return type:

float

Returns:

The average InfoNCE loss estimated over num_batches batches from the data distribution.

Example

>>> import cebra
>>> import numpy as np
>>> neural_data = np.random.uniform(0, 1, (1000, 20))
>>> cebra_model = cebra.CEBRA(max_iterations=10)
>>> cebra_model.fit(neural_data)
CEBRA(max_iterations=10)
>>> loss = cebra.sklearn.metrics.infonce_loss(cebra_model,
...                                           neural_data,
...                                           num_batches=5)

cebra.integrations.sklearn.metrics.goodness_of_fit_score(cebra_model, X, *y, session_id=None, num_batches=500)#

Compute the goodness of fit score on a single session dataset on the model.

This function uses the infonce_loss() function to compute the InfoNCE loss for a given cebra_model and the infonce_to_goodness_of_fit() function to derive the goodness of fit from the InfoNCE loss.

Parameters:

cebra_model (CEBRA) – The model to use to compute the InfoNCE loss on the samples.
X (Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]) – A 2D data matrix, corresponding to a single session recording.
y – An arbitrary amount of continuous indices passed as 2D matrices, and up to one discrete index passed as a 1D array. Each index has to match the length of X.
session_id (Optional[int]) – The session ID, an int between 0 and cebra.CEBRA.num_sessions for multisession, set to None for single session.
num_batches (int) – The number of iterations to consider to evaluate the model on the new data. Higher values will give a more accurate estimate. Set it to at least 500 iterations.

Return type:

float

Returns:

The average GoF score estimated over num_batches batches from the data distribution.

Related:: infonce_to_goodness_of_fit()

Example

>>> import cebra
>>> import numpy as np
>>> neural_data = np.random.uniform(0, 1, (1000, 20))
>>> cebra_model = cebra.CEBRA(max_iterations=10, batch_size = 512)
>>> cebra_model.fit(neural_data)
CEBRA(batch_size=512, max_iterations=10)
>>> gof = cebra.sklearn.metrics.goodness_of_fit_score(cebra_model, neural_data)

cebra.integrations.sklearn.metrics.goodness_of_fit_history(model)#

Return the history of the goodness of fit score.

Parameters:: model (CEBRA) – A trained CEBRA model.
Return type:: ndarray
Returns:: A numpy array containing the goodness of fit values, measured in bits.

Related:: infonce_to_goodness_of_fit()

Example

>>> import cebra
>>> import numpy as np
>>> neural_data = np.random.uniform(0, 1, (1000, 20))
>>> cebra_model = cebra.CEBRA(max_iterations=10, batch_size = 512)
>>> cebra_model.fit(neural_data)
CEBRA(batch_size=512, max_iterations=10)
>>> gof_history = cebra.sklearn.metrics.goodness_of_fit_history(cebra_model)

cebra.integrations.sklearn.metrics.infonce_to_goodness_of_fit(infonce, model=None, batch_size=None, num_sessions=None)#

Given a trained CEBRA model, return goodness of fit metric.

The goodness of fit ranges from 0 (lowest meaningful value) to a positive number with the unit “bits”, the higher the better.

Values lower than 0 bits are possible, but these only occur due to numerical effects. A perfectly collapsed embedding (e.g., because the data cannot be fit with the provided auxiliary variables) will have a goodness of fit of 0.

The conversion between the generalized InfoNCE metric that CEBRA is trained with and the goodness of fit computed with this function is

\[S = \log N - \text{InfoNCE}\]

To use this function, either provide a trained CEBRA model or the batch size and number of sessions.

Parameters:

infonce (Union[float, ndarray]) – The InfoNCE loss, either a single value or an iterable of values.
model (Optional[CEBRA]) – The trained CEBRA model.
batch_size (Optional[int]) – The batch size used to train the model.
num_sessions (Optional[int]) – The number of sessions used to train the model.

Return type:

Union[float, ndarray]

Returns:

Numpy array containing the goodness of fit values, measured in bits

Raises:

RuntimeError – If the provided model is not fit to data.
ValueError – If both model and (batch_size, num_sessions) are provided.

cebra.integrations.sklearn.metrics.consistency_score(embeddings, between=None, labels=None, dataset_ids=None, num_discretization_bins=100)#

Compute the consistency score between embeddings, either between runs or between datasets.

Parameters:

embeddings (List[Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]]) – List of embedding matrices.
labels (Optional[List[Union[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], Tensor]]]) – List of labels corresponding to each embedding and to use for alignment between them. They are only required for a between-datasets comparison.
dataset_ids (Optional[List[Union[int, str, float]]]) – List of dataset ID associated to each embedding. Multiple embeddings can be associated to the same dataset. In both modes (runs or datasets), if no dataset_ids is provided, then all the provided embeddings are compared one-to-one. Internally, the function will consider that the embeddings are all different runs from the same dataset for between-runs mode and on the contrary, that they are all computed on different dataset in the between-datasets mode.
between (Optional[Literal[‘datasets’, ‘runs’]]) – A string describing the type of comparison to perform between the embeddings, either between all embeddings or between datasets or runs. Consistency between runs means the consistency between embeddings obtained from multiple models trained on the same dataset. Consistency between datasets means the consistency between embeddings obtained from models trained on different datasets, such as different animals, sessions, etc.
num_discretization_bins (int) – Number of values for the digitalized common labels. The discretized labels are used for embedding alignment. Also see the n_bins argument in cebra.integrations.sklearn.helpers.align_embeddings for more information on how this parameter is used internally. This argument is only used if labels is not None, alignment between datasets is used (between = "datasets"), and the given labels are continuous and not already discrete.

Return type:

Tuple[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]], ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]]

Returns:

The list of scores computed between the embeddings (first returns), the list of pairs corresponding to each computed score (second returns) and the list of id of the entities present in the comparison, either different datasets in the between-datasets comparison or runs in the between-runs comparison (third returns).

Example

>>> import cebra
>>> import numpy as np
>>> embedding1 = np.random.uniform(0, 1, (1000, 5))
>>> embedding2 = np.random.uniform(0, 1, (1000, 8))
>>> labels1 = np.random.uniform(0, 1, (1000, ))
>>> labels2 = np.random.uniform(0, 1, (1000, ))
>>> # Between-runs consistency
>>> scores, pairs, ids_runs = cebra.sklearn.metrics.consistency_score(embeddings=[embedding1, embedding2],
...                                                                   between="runs")
>>> # Between-datasets consistency, by aligning on the labels
>>> scores, pairs, ids_datasets = cebra.sklearn.metrics.consistency_score(embeddings=[embedding1, embedding2],
...                                                                   labels=[labels1, labels2],
...                                                                   dataset_ids=["achilles", "buddy"],
...                                                                   between="datasets")