Metrics#
- cebra.integrations.sklearn.metrics.infonce_loss(cebra_model, X, *y, session_id=None, num_batches=500, correct_by_batchsize=False)#
Compute the InfoNCE loss on a single session dataset on the model.
- Parameters:
cebra_model (
CEBRA
) – The model to use to compute the InfoNCE loss on the samples.X (
Union
[ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]],Tensor
]) – A 2D data matrix, corresponding to a single session recording.y – An arbitrary amount of continuous indices passed as 2D matrices, and up to one discrete index passed as a 1D array. Each index has to match the length of
X
.session_id (
Optional
[int
]) – The session ID, anint
between 0 andcebra.CEBRA.num_sessions
for multisession, set toNone
for single session.num_batches (
int
) – The number of iterations to consider to evaluate the model on the new data. Higher values will give a more accurate estimate. Set it to at least 500 iterations.correct_by_batchsize (
bool
) – If True, the loss is corrected by the batch size.
- Return type:
- Returns:
The average InfoNCE loss estimated over
num_batches
batches from the data distribution.
Example
>>> import cebra >>> import numpy as np >>> neural_data = np.random.uniform(0, 1, (1000, 20)) >>> cebra_model = cebra.CEBRA(max_iterations=10) >>> cebra_model.fit(neural_data) CEBRA(max_iterations=10) >>> loss = cebra.sklearn.metrics.infonce_loss(cebra_model, ... neural_data, ... num_batches=5)
- cebra.integrations.sklearn.metrics.goodness_of_fit_score(cebra_model, X, *y, session_id=None, num_batches=500)#
Compute the goodness of fit score on a single session dataset on the model.
This function uses the
infonce_loss()
function to compute the InfoNCE loss for a given cebra_model and theinfonce_to_goodness_of_fit()
function to derive the goodness of fit from the InfoNCE loss.- Parameters:
cebra_model (
CEBRA
) – The model to use to compute the InfoNCE loss on the samples.X (
Union
[ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]],Tensor
]) – A 2D data matrix, corresponding to a single session recording.y – An arbitrary amount of continuous indices passed as 2D matrices, and up to one discrete index passed as a 1D array. Each index has to match the length of
X
.session_id (
Optional
[int
]) – The session ID, anint
between 0 andcebra.CEBRA.num_sessions
for multisession, set toNone
for single session.num_batches (
int
) – The number of iterations to consider to evaluate the model on the new data. Higher values will give a more accurate estimate. Set it to at least 500 iterations.
- Return type:
- Returns:
The average GoF score estimated over
num_batches
batches from the data distribution.
- Related:
Example
>>> import cebra >>> import numpy as np >>> neural_data = np.random.uniform(0, 1, (1000, 20)) >>> cebra_model = cebra.CEBRA(max_iterations=10, batch_size = 512) >>> cebra_model.fit(neural_data) CEBRA(batch_size=512, max_iterations=10) >>> gof = cebra.sklearn.metrics.goodness_of_fit_score(cebra_model, neural_data)
- cebra.integrations.sklearn.metrics.goodness_of_fit_history(model)#
Return the history of the goodness of fit score.
- Parameters:
model (
CEBRA
) – A trained CEBRA model.- Return type:
- Returns:
A numpy array containing the goodness of fit values, measured in bits.
- Related:
Example
>>> import cebra >>> import numpy as np >>> neural_data = np.random.uniform(0, 1, (1000, 20)) >>> cebra_model = cebra.CEBRA(max_iterations=10, batch_size = 512) >>> cebra_model.fit(neural_data) CEBRA(batch_size=512, max_iterations=10) >>> gof_history = cebra.sklearn.metrics.goodness_of_fit_history(cebra_model)
- cebra.integrations.sklearn.metrics.infonce_to_goodness_of_fit(infonce, model=None, batch_size=None, num_sessions=None)#
Given a trained CEBRA model, return goodness of fit metric.
The goodness of fit ranges from 0 (lowest meaningful value) to a positive number with the unit “bits”, the higher the better.
Values lower than 0 bits are possible, but these only occur due to numerical effects. A perfectly collapsed embedding (e.g., because the data cannot be fit with the provided auxiliary variables) will have a goodness of fit of 0.
The conversion between the generalized InfoNCE metric that CEBRA is trained with and the goodness of fit computed with this function is
\[S = \log N - \text{InfoNCE}\]To use this function, either provide a trained CEBRA model or the batch size and number of sessions.
- Parameters:
- Return type:
- Returns:
Numpy array containing the goodness of fit values, measured in bits
- Raises:
RuntimeError – If the provided model is not fit to data.
ValueError – If both
model
and(batch_size, num_sessions)
are provided.
- cebra.integrations.sklearn.metrics.consistency_score(embeddings, between=None, labels=None, dataset_ids=None, num_discretization_bins=100)#
Compute the consistency score between embeddings, either between runs or between datasets.
- Parameters:
embeddings (
List
[Union
[ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]],Tensor
]]) – List of embedding matrices.labels (
Optional
[List
[Union
[ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]],Tensor
]]]) – List of labels corresponding to each embedding and to use for alignment between them. They are only required for a between-datasets comparison.dataset_ids (
Optional
[List
[Union
[int
,str
,float
]]]) – List of dataset ID associated to each embedding. Multiple embeddings can be associated to the same dataset. In both modes (runs
ordatasets
), if nodataset_ids
is provided, then all the provided embeddings are compared one-to-one. Internally, the function will consider that the embeddings are all different runs from the same dataset for between-runs mode and on the contrary, that they are all computed on different dataset in the between-datasets mode.between (
Optional
[Literal
[‘datasets’, ‘runs’]]) – A string describing the type of comparison to perform between the embeddings, either betweenall
embeddings or betweendatasets
orruns
. Consistency between runs means the consistency between embeddings obtained from multiple models trained on the same dataset. Consistency between datasets means the consistency between embeddings obtained from models trained on different datasets, such as different animals, sessions, etc.num_discretization_bins (
int
) – Number of values for the digitalized common labels. The discretized labels are used for embedding alignment. Also see then_bins
argument incebra.integrations.sklearn.helpers.align_embeddings
for more information on how this parameter is used internally. This argument is only used iflabels
is notNone
, alignment between datasets is used (between = "datasets"
), and the given labels are continuous and not already discrete.
- Return type:
Tuple
[ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]],ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]],ndarray
[tuple
[int
,...
],dtype
[TypeVar
(_ScalarType_co
, bound=generic
, covariant=True)]]]- Returns:
The list of scores computed between the embeddings (first returns), the list of pairs corresponding to each computed score (second returns) and the list of id of the entities present in the comparison, either different datasets in the between-datasets comparison or runs in the between-runs comparison (third returns).
Example
>>> import cebra >>> import numpy as np >>> embedding1 = np.random.uniform(0, 1, (1000, 5)) >>> embedding2 = np.random.uniform(0, 1, (1000, 8)) >>> labels1 = np.random.uniform(0, 1, (1000, )) >>> labels2 = np.random.uniform(0, 1, (1000, )) >>> # Between-runs consistency >>> scores, pairs, ids_runs = cebra.sklearn.metrics.consistency_score(embeddings=[embedding1, embedding2], ... between="runs") >>> # Between-datasets consistency, by aligning on the labels >>> scores, pairs, ids_datasets = cebra.sklearn.metrics.consistency_score(embeddings=[embedding1, embedding2], ... labels=[labels1, labels2], ... dataset_ids=["achilles", "buddy"], ... between="datasets")