Extractors

IntermediateExtractorBase

class activation_extractor.extractors.intermediateExtractorBase.IntermediateExtractorBase(model, layer_list)[source]

Extractor for intermediate model outputs. Extracts the intermediate calculations (from a specified list of modules) from a pytorch model during inference.

Parameters:

model – the Pytorch model object
layer_list (list of strings) – list of module names to get outputs from

clear_all_hooks()[source]: Clears ALL the forward hooks registered to the model.

create_hook(layer_name)[source]

Creates a pytorch hook that saves the output of a given module/layer in the model. A pytorch hook is a function that is executed after the module is called.

Parameters:: layer_name (str) – name of the module/layer
Returns:: the corresponding hook function
Return type:: function

detach_hooks()[source]: Detaches all the registered hooks saved in the hook_handles attribute.

get_outputs()[source]

Returns the intermediate activation outputs.

Returns:: dictionary with intermediate outputs for each specified module/layer.
Return type:: dictionary

register_hooks()[source]: Registers all the hooks for the specified layers. It saves the hook handles to the hook_handles attribute.

IntermediateExtractor

class activation_extractor.extractors.intermediateExtractor.IntermediateExtractor(model, layer_list)[source]

Extends the functionality of the IntermediateExtractorBase to automatically save activations.

emb_reformatting(outputs, emb_format='full', sequence_axis=1, custom_position=None)[source]: Takes an output to save and reformats it according to emb_format : * full: nothing * mean: mean along sequence axis * LT: last token in sequence * FT: first token in sequence * custom: custom token position in sequence

gpu_to_cpu()[source]: Takes all the intermediate activations stored in the extractor object and moves them to CPU (if on GPU) after formatting them to a numpy array.

save_outputs(output_folder, output_id, reset=False, move_to_cpu=True, save_method='numpy_compressed', emb_formats=['LT', 'FT'], sequence_axis=1, custom_position=None)[source]

Save intermediate activation dictionary to output folder. You can choose:

the saving function (numpy_compressed or numpy)
the embedding format (full, mean, LT: last token) : a list.
sequence_axis : sequence length axis to take mean or last token from