Skip to content

LouisJalouzot/MLEM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

MLEM is a multivariate encoding framework that learns a metric over theoretical features to match neural distances. It:

  • Optimizes Spearman correlation between feature-based and neural distances.
  • Constrains the weight matrix W to be SPD (valid metric), improving convergence speed, accuracy and robustness.
  • Quantifies contributions of features (and optionally of their interactions) via permutation feature importance (PFI).

Installation

You can install MLEM with pip directly from GitHub:

(uv) pip install git+https://github.com/LouisJalouzot/MLEM

Usage

from mlem import MLEM

X = ...  # Your stimuli features as a pandas DataFrame (it can contain categorical features), numpy array, or PyTorch tensor, it has to be of shape (n_samples, n_features)
Y = ...  # Your neural representations of the stimuli as a NumPy array or PyTorch tensor, it has to be of shape (n_samples, hidden_size)
mlem = MLEM()
mlem.fit(X, Y) # Train the model
feature_importances, scores = mlem.score() # Compute feature importances on the same data

It is recommended to use a pandas.DataFrame for X to better handle categorical/nominal features. In this case, columns of type object or str will be treadted as categorical/nominal and the others (int, float, any subtype of np.number) will be treated as ordered. If X is a NumPy array or a PyTorch tensor, it is assumed to contain only numerical features which will be treated as ordered. Y will be flattened to a 2D tensor of shape (n_samples, -1).

The output feature_importances is a pandas DataFrame containing the feature importances for each feature (columns) across all the n_permutations permutations (rows). The output scores is a pandas Series of all the Spearman scores computed during the computation of the feature importances (number of features x n_permutations).

Options and defaults

  • interactions: bool = False disabled by default for interpretability; enables off-diagonal terms in W when True.
  • memory: {'auto', 'low', 'high'} to trade memory vs speed during PFI (default: 'auto').
  • random_seed: int | None for reproducibility.
  • distance: {'euclidean', 'manhattan', 'cosine', 'norm_diff', 'precomputed'} for neural distances (default: 'euclidean'), norm_diff computes the absolute difference between vector norms.

Test-train split

With a simple train-test split:

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
mlem.fit(X_train, Y_train) # Train the model
feature_importances, scores = mlem.score(X_test, Y_test) # Compute feature importances on the test set

With cross-validation:

from sklearn.model_selection import KFold
import pandas as pd

all_importances = []
all_scores = []

kf = KFold(shuffle=True)
for i, (train_index, test_index) in enumerate(kf.split(X)):
    mlem.fit(X[train_index], Y[train_index])
    fi, s = mlem.score(X[test_index], Y[test_index])
    fi["split"] = i
    s["split"] = i
    all_importances.append(fi)
    all_scores.append(s)

all_importances = pd.concat(all_importances)
all_scores = pd.concat(all_scores)

Precomputed distances

You can use MLEM on matrices of precomputed feature and neural distances. In this case X and Y are not preprocessed.

X = ...  # Your precomputed feature distance matrices of shape (n_samples, n_samples, n_features)
Y = ...  # Your precomputed matrix of pairwise neural distances of shape (n_samples, n_samples)
mlem = MLEM(distance='precomputed')
mlem.fit(X, Y)
fi, s = mlem.score()

Note: Distance matrices should be symmetric with zeros on the diagonal.

Interactions

You can enable the modelling of feature interactions by setting interactions=True when initializing MLEM. This will improve fit performance in particular if some interactions between features are represented in the embeddings Y. As a result, the output feature importances from mlem.score will also include interaction terms. Note that this will increase memory usage and computation time. Also, correlations between features and their interactions are often very high. Since permutation feature importance is sensitive to correlated features, the resulting feature importance values should be interpreted with caution. By default, interactions are disabled (diagonal W) for better interpretability.

Batch size estimation

The first step of the pipeline is to estimate a batch_size to use during training. This is done automatically in .fit() if batch_size is not provided. Since this estimation only depends on the feature data X, you can estimate it once and reuse it for different Ys.

batch_size = mlem.estimate_batch_size(X)
mlem_1 = MLEM(batch_size=batch_size)
mlem_1.fit(X, Y_1)
mlem_2 = MLEM(batch_size=batch_size)
mlem_2.fit(X, Y_2)

Why MLEM?

  • More accurate and more robust weight recovery than FR-RSA with interactions (SPD constraint regularizes learning).
  • Faster convergence and more data efficient.
  • Comparable encoding performance (Spearman) while providing stable, interpretable feature importance profiles.

Troubleshooting

High variability across runs

If you observe high variability in feature importance or score across runs or seeds (set by random_seed), in particular if modelling interactions (interactions=True), the model has likely not converged during training. To mitigate this, you can try decreasing threshold (e.g. to 0.005 instead of the default 0.01) so that the estimated batch_size is larger (or override it at the initialization of MLEM). Note that a larger batch_size will increase memory usage and increase computation time. Alternatively you can try increasing the patience parameter (e.g. to 100 instead of the default 50). Alternatively, you can also set random_seed to ensure reproducibility.

Out of memory errors

During feature importance computation

If you encounter out of memory errors when computing feature importances you can try setting the parameter memory to 'low' when initializing MLEM. This will reduce memory usage at the cost of increased computation time. On the other hand, if you have a lot of memory available, you can set memory to 'high' to speed up computation.

During batch size estimation or during training

If you encounter out of memory errors during batch size estimation or during training, you can try increasing the threshold parameter (e.g. to 0.02 instead of the default 0.01) so that the estimated batch_size is smaller. Or directly set the batch_size parameter to a smaller value. Note that this will decrease the precision of the method and induce more variability across runs.

If the dimension of neural representations (second dimension of Y, hidden_size) and the estimated batch size are very large then a lot of memory is required (hidden_size x batch_size x 4 bytes). If the number of samples is not too large, you can use precomputed distances instead. Furthermore the batch size estimation will require 16 times more memory (you can decrease the n_trials parameter that is set to 16 by default or directly set the batch_size parameter).

Citation

If you use MLEM, please cite:

  • Jalouzot et al. (2024). Metric Learning Encoding Models: A Multivariate Framework for Interpreting Neural Representations. arXiv:2402.11608 — https://arxiv.org/abs/2402.11608

Releases

No releases published

Packages

 
 
 

Contributors

Languages