Utils_downstream-checkpoint

Module Contents

Functions

rank_features_by_arch(X, inference_result, var_names)

correlation_by_archetype(matrix, inference_result[, ...])

Compute correlations between archetype cell scores and variables in a matrix.

compute_feature_importance(model, input_matrix[, ...])

Compute feature importance in a MIDAA model by measuring the change in the latent space reconstruction

Utils_downstream-checkpoint.rank_features_by_arch(X, inference_result, var_names, scale=True, plot=True)
Utils_downstream-checkpoint.correlation_by_archetype(matrix, inference_result, correlation_type='spearman', mt_correction_method='fdr_bh', variable_names=None)

Compute correlations between archetype cell scores and variables in a matrix.

This function calculates the correlation coefficients between each archetype’s cell scores and each variable (column) in the provided matrix. It supports different types of correlation methods and applies multiple testing correction to the p-values.

Parameters:
  • matrix (array-like, shape (n_cells, n_variables)) – A 2D array or list of lists representing the input matrix used for fitting the model. Each row corresponds to a cell, and each column corresponds to a variable.

  • inference_result (dict) – A dictionary containing inference results with the key “inferred_quantities” that maps to another dictionary containing “A”. “A” should be a 2D array-like structure with shape (n_cells, n_scores), where each column represents the scores for a particular archetype.

  • correlation_type (str, optional (default="spearman")) –

    The type of correlation to compute. Supported options are:
    • ’pearson’ : Pearson correlation coefficient

    • ’spearman’ : Spearman rank correlation

    • ’kendall’ : Kendall tau correlation

  • mt_correction_method (str, optional (default='fdr_bh')) – The method to use for multiple testing correction of p-values. Default is Benjamini-Hochberg (‘fdr_bh’). Other methods supported by statsmodels.stats.multitest.multipletests can be used.

  • variable_names (list of str, optional (default=None)) – A list of names for the variables (columns) in the matrix. If not provided, variables will be named as ‘Variable_1’, ‘Variable_2’, etc.

Returns:

A DataFrame containing the correlation results with the following columns:
  • ’Variable’ : Name of the variable.

  • ’Archetype’ : Identifier for the archetype score.

  • ’Correlation’ : Correlation coefficient between the archetype score and the variable.

  • ’P-value’ : P-value for the correlation.

  • ’Corrected P-value’ : P-value after multiple testing correction.

Return type:

pandas.DataFrame

Utils_downstream-checkpoint.compute_feature_importance(model, input_matrix, feature_names=None, device='cpu')

Compute feature importance in a MIDAA model by measuring the change in the latent space reconstruction when each feature is held out, using the Frobenius norm.

Parameters:
  • model (torch.nn.Module) – The trained PyTorch MIDAA model.

  • input_matrix (array-like or torch.Tensor, shape (n_samples, n_features)) – The input data matrix.

  • feature_names (list of str, optional (default=None)) – A list of names for the features (columns) in the input matrix. If not provided, features will be named as ‘Feature_1’, ‘Feature_2’, etc.

  • device (str, optional (default='cpu')) – The device to run the computations on. Options are ‘cpu’ or ‘cuda’.

Returns:

A DataFrame containing the feature names and their importance scores with columns:
  • ’Feature’: Name of the feature.

  • ’ImportanceScore’: Importance score computed using the Frobenius norm.

Return type:

pandas.DataFrame

Notes

  • The input_matrix will be converted to a torch.Tensor if it is not already one.

  • Holding out a feature is performed by setting its values to zero across all samples.

  • The function does not modify the original input_matrix.