Reference

SaturationTracker

SaturationTracker provides a hook for PyTorch and extracts metrics during model training.

class delve.SaturationTracker(savefile: str, save_to: ~typing.Union[str, ~delve.writers.AbstractWriter], modules: ~torch.nn.modules.module.Module, layer_filter: ~typing.Callable[[~typing.Dict[str, ~torch.nn.modules.module.Module]], ~typing.Dict[str, ~torch.nn.modules.module.Module]] = <function SaturationTracker.<lambda>>, writer_args: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, log_interval=1, max_samples=None, stats: list = ['lsat'], layerwise_sat: bool = True, reset_covariance: bool = True, average_sat: bool = False, ignore_layer_names: ~typing.List[str] = [], include_conv: bool = True, conv_method: str = 'channelwise', timeseries_method: str = 'last_timestep', sat_threshold: str = 0.99, nosave=False, verbose: bool = False, device='cuda:0', initial_epoch: int = 0, interpolation_strategy: ~typing.Optional[str] = None, interpolation_downsampling: int = 32)[source]

Takes PyTorch module and records layer saturation,: intrinsic dimensionality and other scalars.

Parameters:

savefile (str) – destination for summaries
(str (save_to) –

Specify one or multiple save strategies.
You can use preimplemented save strategies or inherit from the AbstractWriter in order to implement your own preferred saving strategy.

pre-existing saving strategies are:

csvstores all stats in a csv-file with one
row for each epoch.

plotproduces plots from intrinsic dimensionality
and / or layer saturation

tensorboard : saves all stats to tensorboard print : print all metrics on console

as soon as they are logged

npycreates a folder-structure with npy-files
containing the logged values. This is the only save strategy that can save the full covariance matrix. This strategy is useful if you want to reproduce intrinsic dimensionality and saturation values with other thresholds without re-evaluating model checkpoints.
List[Union[str –

Specify one or multiple save strategies.
You can use preimplemented save strategies or inherit from the AbstractWriter in order to implement your own preferred saving strategy.

pre-existing saving strategies are:

csvstores all stats in a csv-file with one
row for each epoch.

plotproduces plots from intrinsic dimensionality
and / or layer saturation

tensorboard : saves all stats to tensorboard print : print all metrics on console

as soon as they are logged

npycreates a folder-structure with npy-files
containing the logged values. This is the only save strategy that can save the full covariance matrix. This strategy is useful if you want to reproduce intrinsic dimensionality and saturation values with other thresholds without re-evaluating model checkpoints.
delve.writers.AbstractWriter]] –

Specify one or multiple save strategies.
You can use preimplemented save strategies or inherit from the AbstractWriter in order to implement your own preferred saving strategy.

pre-existing saving strategies are:

csvstores all stats in a csv-file with one
row for each epoch.

plotproduces plots from intrinsic dimensionality
and / or layer saturation

tensorboard : saves all stats to tensorboard print : print all metrics on console

as soon as they are logged

npycreates a folder-structure with npy-files
containing the logged values. This is the only save strategy that can save the full covariance matrix. This strategy is useful if you want to reproduce intrinsic dimensionality and saturation values with other thresholds without re-evaluating model checkpoints.
modules (torch modules or list of modules) – layer-containing object. Per default, only Conv2D, Linear and LSTM-Cells are recorded
layer_filter (func) – A filter function that is used to avoid layers from being tracked. This is function receiving a dictionary as input and returning it with undesired entries removed. Default: Identity function. The dictionary contains string keys mapping to torch.nn.Module objects.
writers_args (dict) – contains additional arguments passed over to the writers. This is only used, when a writer is initialized through a string-key.
log_interval (int) – distances between two batches used for updating the covariance matrix. Default value is 1, which means that all data is used for computing intrinsic dimensionality and saturation. Increasing the log interval is usefull on very large datasets to reduce numeric instability.
max_samples (int) – (optional) the covariance matrix in each layer will halt updating itself when max_samples are reached. Usecase is similar to log-interval, when datasets are very large.
stats (list of str) –
list of stats to compute

supported stats are:
idim : intrinsic dimensionality lsat : layer saturation (intrinsic dimensionality divided by feature space dimensionality) cov : the covariance-matrix (only saveable using the ‘npy’ save strategy) det : the determinant of the covariance matrix (also known as generalized variance) trc : the trace of the covariance matrix, generally a more useful metric than det for determining

the total variance of the data than the determinant. However note that this does not take the correlation between features into account. On the other hand, in most cases the determinent will be zero, since there will be very strongly correlated features, so trace might be the better option.

dtrc : the trace of the diagonalmatrix, another way of measuring the dispersion of the data. lsat : layer saturation (intrinsic dimensionality

divided by feature space dimensionality)

embed : samples embedded in the eigenspace of dimension 2
layerwise_sat (bool) – whether or not to include layerwise saturation when saving
reset_covariance (bool) – True by default, resets the covariance every time the stats are computed. Disabling this option will strongly bias covariance since the gradient will influence the model. We recommend computing saturation at the end of training and testing.
include_conv – setting to False includes only linear layers
conv_method (str) –

how to subsample convolutional layers. Default is
channelwise, which means that the each position of the filter tensor is considered a datapoint, effectivly yielding a data matrix of shape (height*width*batch_size, num_filters)

supported methods are:

channelwisetreats every depth vector of the tensor as a
datapoint, effectivly reshaping the data tensor from shape (batch_size, height, width, channel) into (batch_size*height*width, channel).

meanapplies global average pooling on
each feature map

maxapplies global max pooling on
each feature map

medianapplies global median pooling on
each feature map

flattenflattenes the entire feature map to a vector,
reshaping the data tensor into a data matrix of shape (batch_size, height*width*channel). This strategy for dealing with convolutions is extremly memory intensive and will likely cause memory and performance problems for any non toy-problem
timeseries_method (str) –

how to subsample timeseries methods. Default
is last_timestep.

supported methods are:
timestepwise : stacks each sample timestep-by-timestep last_timestep : selects the last timestep’s output
nosave (bool) – If True, disables saving artifacts (images), default is False
verbose (bool) – print saturation for every layer during training
sat_threshold (float) – threshold used to determine the number of eigendirections belonging to the latent space. In effect, this is the threshold determining the the intrinsic dimensionality. Default value is 0.99 (99% of the explained variance), which is a compromise between a good and interpretable approximation. From experience the threshold should be between 0.97 and 0.9995 for meaningfull results.
verbose – Change verbosity level (default is 0)
device (str) – Device to do the computations on. Default is cuda:0. Generally it is recommended to do the computations on the gpu in order to get maximum performance. Using the cpu is generally slower but it lets delve use regular RAM instead of the generally more limited VRAM of the GPU. Not having delve run on the same device as the network causes slight performance decrease due to copying memory between devices during each forward pass. Delve can handle models distributed on multiple GPUs, however delve itself will always run on a single device.
initial_epoch (int) – The initial epoch to start with. Default is 0, which corresponds to a new run. If initial_epoch != 0 the writers will look for save states that they can resume. If set to zero, all existing states will be overwritten. If set to a lower epoch than actually recorded the behavior of the writers is undefined and may result in crashes, loss of data or corrupted data.
interpolation_strategy (str) – Default is None (disabled). If set to a string key accepted by the model-argument of torch.nn.functional.interpolate, the feature map will be resized to match the interpolated size. This is useful if you work with large resolutions and want to save up on computation time. is done if the resolution is smaller.
interpolation_downsampling (int) – Default is 32. The target resolution if downsampling is enabled.

API Pages

SaturationTracker(savefile, save_to, ...[, ...])

Takes PyTorch module and records layer saturation,