pretrained.hubert

Defines a simple API for using Meta’s pretrained Hubert model.

from pretrained.hubert import pretrained_hubert

model = pretrained_hubert("base")
predictor = model.predictor()

# Gets HuBERT embeddings for a waveform.
predictor.predict(torch.randn(1, 16_000), output_layer=None)

# Gets HuBERT embeddings for a long waveform, in batches.
predictor.predict_in_chunks(torch.randn(1, 160_000), 16_000, output_layer=None)

In order to get HuBERT clusters, you can use:

from pretrained.hubert import pretrained_hubert_with_kmeans

model, kmeans = pretrained_hubert_with_kmeans("base-l7-c100")
predictor = model.predictor(kmeans)

# Get the HuBERT tokens for a waveform.
predictor.predict(torch.randn(1, 16_000))

The choices for the model key are:

"base" - 12 layers, 768 hidden size, 12 attention heads.
"large" - 24 layers, 1024 hidden size, 16 attention heads.
"extra_large" - 48 layers, 1280 hidden size, 16 attention heads.

pretrained.hubert.cast_pretrained_hubert_size(s: str) → Literal['base', 'large', 'extra_large'][source]

pretrained.hubert.cast_pretrained_hubert_kmeans_size(s: str) → Literal['base-l7-c100', 'base-l7-c200', 'base-l7-c500', 'base-l7-c1000', 'base-l8-c100', 'base-l8-c200', 'base-l8-c500', 'base-l8-c1000', 'base-l10-c100', 'base-l10-c200'][source]

pretrained.hubert.normalize_output_layer(output_layer: int | float | None, num_layers: int) → int | None[source]

class pretrained.hubert.HubertSamePadLayer(num_conv_pos_embeddings: int = 128)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.PositionalConvEmbedding(hidden_size: int, num_conv_pos_embeddings: int = 128, num_conv_pos_embedding_groups: int = 16, feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.Attention(embed_dim: int, num_heads: int, bias: bool = True)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor, causal: bool = False) → Tensor[source]

Runs the HuBERT attention layer.

Parameters:

hidden_states – Input states for the attention layer.
causal – If set, use causal attention.

Returns:

The attention outputs.

class pretrained.hubert.FeedForward(hidden_size: int, intermediate_size: int, hidden_act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', hidden_dropout: float = 0.1, activation_dropout: float = 0.1)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.HubertEncoderLayer(hidden_size: int, intermediate_size: int, num_attention_heads: int, hidden_act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', layer_norm_eps: float = 1e-05, hidden_dropout: float = 0.1, activation_dropout: float = 0.1)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor, causal: bool = False) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.HubertEncoder(hidden_size: int, intermediate_size: int, num_attention_heads: int, num_hidden_layers: int, num_conv_pos_embeddings: int = 128, num_conv_pos_embedding_groups: int = 16, feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', hidden_act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', layer_norm_eps: float = 1e-05, hidden_dropout: float = 0.1, activation_dropout: float = 0.1)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor, causal: bool = False, output_layer: int | float | None = None) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

extract_all_features(hidden_states: Tensor, causal: bool = False, output_layer: int | float | None = None) → list[torch.Tensor][source]

class pretrained.hubert.GroupNormConvLayer(in_channels: int, out_channels: int, stride: int, kernel: int, bias: bool = True, feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.NoLayerNormConvLayer(in_channels: int, out_channels: int, stride: int, kernel: int, bias: bool = True, feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.LayerNormConvLayer(in_channels: int, out_channels: int, stride: int, kernel: int, bias: bool = True, feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.HubertFeatureEncoder(conv_dim: tuple[int, ...] = (512, 512, 512, 512, 512, 512, 512), conv_stride: tuple[int, ...] = (5, 2, 2, 2, 2, 2, 2), conv_kernel: tuple[int, ...] = (10, 3, 3, 3, 3, 2, 2), conv_bias: bool = True, feat_extract_norm: Literal['group', 'layer'] = 'layer', feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu')[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input_values: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.HubertFeatureProjection(input_size: int, hidden_size: int, layer_norm_eps: float = 1e-05, feat_proj_dropout: float = 0.0, feat_proj_layer_norm: bool = True)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.HubertEncoderLayerStableLayerNorm(hidden_size: int, intermediate_size: int, num_attention_heads: int, layer_norm_eps: float = 1e-05, hidden_act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', hidden_dropout: float = 0.1, activation_dropout: float = 0.1)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor, causal: bool = False) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pretrained.hubert.HubertEncoderStableLayerNorm(hidden_size: int, intermediate_size: int, num_attention_heads: int, num_hidden_layers: int, num_conv_pos_embeddings: int = 128, num_conv_pos_embedding_groups: int = 16, hidden_act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', layer_norm_eps: float = 1e-05, hidden_dropout: float = 0.1, activation_dropout: float = 0.1)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(hidden_states: Tensor, causal: bool = False, output_layer: int | float | None = None) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

extract_all_features(hidden_states: Tensor, causal: bool = False, output_layer: int | float | None = None) → list[torch.Tensor][source]

class pretrained.hubert.Hubert(hidden_size: int, intermediate_size: int, num_hidden_layers: int, num_attention_heads: int, conv_dim: tuple[int, ...] = (512, 512, 512, 512, 512, 512, 512), conv_stride: tuple[int, ...] = (5, 2, 2, 2, 2, 2, 2), conv_kernel: tuple[int, ...] = (10, 3, 3, 3, 3, 2, 2), conv_bias: bool = True, num_conv_pos_embeddings: int = 128, num_conv_pos_embedding_groups: int = 16, do_stable_layer_norm: bool = True, pre_normalize: bool = True, feat_extract_norm: Literal['group', 'layer'] = 'layer', feat_extract_activation: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', feat_proj_layer_norm: bool = True, hidden_act: Literal['no_act', 'relu', 'relu6', 'relu2', 'clamp6', 'leaky_relu', 'elu', 'celu', 'selu', 'gelu', 'gelu_fast', 'sigmoid', 'log_sigmoid', 'hard_sigomid', 'tanh', 'softsign', 'softplus', 'silu', 'mish', 'swish', 'hard_swish', 'soft_shrink', 'hard_shrink', 'tanh_shrink', 'soft_sign', 'relu_squared', 'laplace'] = 'gelu', layer_norm_eps: float = 1e-05, hidden_dropout: float = 0.1, activation_dropout: float = 0.1, feat_proj_dropout: float = 0.0)[source]

Bases: Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

set_output_layer(output_layer: int | float) → None[source]

forward(input_values: Tensor, sample_rate: int, causal: bool = False, output_layer: int | float | None = None) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

extract_all_features(input_values: Tensor, sample_rate: int, causal: bool = False, output_layer: int | float | None = None) → list[torch.Tensor][source]

predictor(kmeans: KMeans | None = None, *, device: base_device | None = None) → HubertPredictor[source]

class pretrained.hubert.HubertPredictor(hubert_model: Hubert, kmeans: KMeans | None = None, *, device: base_device | None = None)[source]

Bases: object

Provides an API for doing predictoins with a HuBERT model.

Note that this module is not an nn.Module, so you can use it in your module without worrying about storing all the weights on accident.

Parameters:

hubert_model – The HuBERT model to use for predictions.
kmeans – The kmeans model to use for quantization. If None, don’t quantize.
device – The device to use for predictions. If None, will use the device returned by detect_device().

predict(waveform: ndarray | Tensor, sample_rate: int, output_layer: int | float | None = None, causal: bool = False) → Tensor[source]

Gets the hidden states for the given waveform.

Parameters:

waveform – The waveform to get hidden states for, with shape (B, T)
sample_rate – The waveform’s sampling rate; this is only used to verify that it is 16 kHz, since it is easy for downstream applications to forget.
output_layer – The layer to get hidden states from. If None, will return the hidden states from the last layer. If an int, will return the hidden states from that layer. If a float, will return the hidden states from the layer at that percentage of the model. For example, 0.5 will return the hidden states from the middle layer. Negative values will wrap around.
causal – If set, use a causal attention mask.

Returns:

The hidden states for the given waveform, with shape (B, T, D)

predict_in_chunks(waveform: Tensor | ndarray, sample_rate: int, chunk_size: int = 160000, output_layer: int | float | None = None, causal: bool = False) → Tensor[source]

Gets the hidden states for the given waveform, in chunks.

This is useful for processing very long waveforms, as it allows you to process the waveform in chunks, rather than loading the entire waveform into memory at once.

Parameters:

waveform – The waveform to get hidden states for, with shape (B, T)
sample_rate – The waveform’s sampling rate; this is only used to verify that it is 16 kHz, since it is easy for downstream applications to forget.
chunk_size – The size of each chunk to process, in frames.
output_layer – The layer to get hidden states from. If None, will return the hidden states from the last layer. If an int, will return the hidden states from that layer. If a float, will return the hidden states from the layer at that percentage of the model. For example, 0.5 will return the hidden states from the middle layer. Negative values will wrap around.
causal – If set, use a causal attention mask.

Returns:

The hidden states for the given waveform, with shape (B, T, D)

predict_file(path: str | Path, chunk_length_sec: float = 10.0, output_layer: int | float | None = None, causal: bool = False) → Tensor[source]

Gets the hidden states for the given audio file, in chunks.

Parameters:

path – The path to the audio file to process.
sample_rate – The waveform’s sampling rate; this is only used to verify that it is 16 kHz, since it is easy for downstream applications to forget.
chunk_length_sec – The length of each chunk to process, in seconds.
output_layer – The layer to get hidden states from. If None, will return the hidden states from the last layer. If an int, will return the hidden states from that layer. If a float, will return the hidden states from the layer at that percentage of the model. For example, 0.5 will return the hidden states from the middle layer. Negative values will wrap around.
causal – If set, use a causal attention mask.

Returns:

The hidden states for the given waveform, with shape (B, T, D)

pretrained.hubert.pretrained_hubert(size: Literal['base', 'large', 'extra_large'], load_weights: bool = True) → Hubert[source]

pretrained.hubert.pretrained_kmeans_clusters(size: Literal['base-l7-c100', 'base-l7-c200', 'base-l7-c500', 'base-l7-c1000', 'base-l8-c100', 'base-l8-c200', 'base-l8-c500', 'base-l8-c1000', 'base-l10-c100', 'base-l10-c200']) → KMeans[source]

pretrained.hubert.pretrained_hubert_with_kmeans(size: Literal['base-l7-c100', 'base-l7-c200', 'base-l7-c500', 'base-l7-c1000', 'base-l8-c100', 'base-l8-c200', 'base-l8-c500', 'base-l8-c1000', 'base-l10-c100', 'base-l10-c200'], load_weights: bool = True) → tuple[pretrained.hubert.Hubert, ml.models.kmeans.KMeans][source]

pretrained.hubert.test_hubert_adhoc() → None[source]