Model Specific Functions

This module contains wrappers to load, tokenize and perform inference and format outputs from models included by default on the model_types list so that they can all be called with a uniform API.

This is where new models can be included in the code.

model_types

This script reads the model_types from model_types.csv .

load_models

This file contains functions to load models, tokenizers, etc. Because not all models are from huggingface and they might not all be installed, the right imports are directly inside the corresponding loading functions.

activation_extractor.model_functions.load_models.load_model(model_name, model_type, **kwargs)[source]

Loads a Pytorch model according to the passed model name. For sequence models, it loads the corresponding tokenizer. For image models, it loads the image processor.

Parameters:
  • model – the Pytorch model object

  • model_type (str) – A model type (see list of included models).

Returns:

tuple with (model, tokenizer) or (model, processor).

activation_extractor.model_functions.load_models.load_tokenizer(model_name, tokenizer_type, **kwargs)[source]

Load a tokenizer type for a model. This function is called inside load_model() for sequence type models.

Parameters:
  • model_name (str) – model name (for huggingface models it should be the same as the loaded model)

  • tokenizer_type (str) – the type of tokenizer (valid types - AutoTokenizer and T5Tokenizer)

Returns:

the tokenizer object

tokenize_funs

Defines a tokenizer wrapper function for the models included by default.

activation_extractor.model_functions.tokenize_funs.define_tokenize_function(model_type, tokenizer, device=None)[source]

Define the right function to tokenize the inputs based on the model type. This function is called inside inferencer.tokenizer().

Parameters:
  • model_type (str) – the model type (from the list in activation_extractor.model_functions.model_types)

  • tokenizer – the loaded tokenizer object

Returns:

the function used to tokenize the inputs

inference_funs

This file defines an inferencer wrapper for the included models.

activation_extractor.model_functions.inference_funs.define_inference_function(model_type, model, tokenizer, device)[source]

Define the right function to do inference based on the model type. The resulting function is called as inferencer.inference(). The functions move the tokenized input to device before performing inference.

Parameters:
  • model_type (str) – the model type (from the list in activation_extractor.model_functions.model_types)

  • model_type – the loaded pytorch model

  • tokenizer – the loaded tokenizer object

  • device (str) – the device (cpu, cuda…)

Returns:

the function used to do the inference

default_hooked_layers

This file contains functions to get relevant layer (module) names to hook from the models included by default.

activation_extractor.model_functions.default_hooked_layers.get_layers_to_hook(model, model_type, modality='sequence', return_structure=False)[source]

Get a list of default layers to hook (extract activations from) for each model type.

Parameters:
  • model – the Pytorch model object

  • model_type (str) – A model type (protein - esm, prot_t5, ankh; dna - nucleotide-transformer, hyenadna, evo, caduceus).

Returns:

the list of layers/modules names

Return type:

list

embedding_to_numpy

activation_extractor.model_functions.embedding_to_numpy.embedding_to_numpy(embeddings)[source]

Converts different types of module outputs to a numpy array. Handles different cases for the different models. Additionally, moves from GPU to CPU.

Parameters:

embedding – Intermediate output object from a pytorch model layer/module.

Returns:

intermediate output as a numpy array

Return type:

numpy array